All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH RFC v3 00/14] Dirty bitmaps migration
@ 2015-02-18 14:00 Vladimir Sementsov-Ogievskiy
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 01/14] qmp: add query-block-dirty-bitmap Vladimir Sementsov-Ogievskiy
                   ` (14 more replies)
  0 siblings, 15 replies; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-02-18 14:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, vsementsov, stefanha,
	den, amit.shah, pbonzini

v3:
 based on v13 of "block: incremental backup series" by John Snow.

 changes from v2:
 removed patch for adding dirty parameter (migration capablities used
 instead).
 
 0001: printf's dropped, qapi used
 0002: part0 -> zeroes
 0003: part0 -> zeroes
 0005: dirty_dirty -> meta
       add comments about meta bitmap
       
 0006: the format is changed, nodes used instead of devices.

 other patches are new.

 rfc: there are two tests. They are the same but using different
 interfaces: md5 checksum of the bitmap last layer in query-block or
 separate query-block-dirty-bitmap with dirty bitmap regions.
 The second form is more appropriate for debugging, the first is more
 appropriate for simple regression control. Which should go to
 upstream?

v2:
 1. bug-fixes, that are already in upstream, and renaming of function
 bdrv_reset_dirty_bitmap (which is already in Snow's series) are
 dropped
 2. bitmap store/restore: the concept renamed to serialization, added
 function hbitmap_deserialize_part0, to not transfer zero blocks
 3. migration dirty parameter: added description comment
 4. Other patches are new.

v2.rfc:
Actually, in this version of the series I'm trying not use
migration/block.c at all. Instead a separate migration unit is added
in the new file migration/dirty-bitmap.c. Now bitmaps are migrated
like blocks in block migration, they have their "dirty-dirty" bitmaps,
for tracking set/unset changes during migration.

The advantages are:
  - no complications of migration/block.c
  - separate dirty-dirty bitmaps provide handling of "unset's"
  - more effective meta-data/data ratio - no tiny bitmap-blocks.



v1:
These patches provide dirty bitmap migration feature. Only named dirty
bitmaps are to be migrated. Migration is made as a part of block
migration in block-migration.c.

Dirty bitmap migration may be enabled by "dirty" parameter for qmp migrate
command. If "blk" and "inc" parameters are false when "dirty" is true
block migration is actually skipped: no allocatoions, no bdrv_read's,
no bdrv_write's, only bitmaps are migrated.

The patch set includes two my previous bug fixes, which are necessary
for it. The patch set is based on Incremental backup series by John
Snow.

Vladimir Sementsov-Ogievskiy (14):
  qmp: add query-block-dirty-bitmap
  hbitmap: serialization
  block: BdrvDirtyBitmap serialization interface
  block: tiny refactoring: minimize hbitmap_(set/reset) usage
  block: add meta bitmaps
  block: add bdrv_next_dirty_bitmap()
  qapi: add dirty-bitmaps migration capability
  migration: add migration/block-dirty-bitmap.c
  iotests: maintain several vms in test
  iotests: add add_incoming_migration to VM class
  iotests: add dirty bitmap migration test
  qapi: add md5 checksum of last dirty bitmap level to query-block
  iotests: add dirty bitmap migration test
  migration/qemu-file: make functions qemu_(get/put)_string public

 block.c                        | 135 ++++++++-
 blockdev.c                     |  24 ++
 hmp-commands.hx                |   2 +
 hmp.c                          |  32 ++
 hmp.h                          |   1 +
 include/block/block.h          |  22 ++
 include/migration/block.h      |   1 +
 include/migration/migration.h  |   1 +
 include/migration/qemu-file.h  |  17 ++
 include/qemu/hbitmap.h         |  67 ++++
 migration/Makefile.objs        |   2 +-
 migration/block-dirty-bitmap.c | 673 +++++++++++++++++++++++++++++++++++++++++
 migration/migration.c          |   9 +
 migration/qemu-file.c          |  18 ++
 monitor.c                      |   7 +
 qapi-schema.json               |   5 +-
 qapi/block-core.json           |  79 ++++-
 qmp-commands.hx                |   5 +
 tests/qemu-iotests/117         |  88 ++++++
 tests/qemu-iotests/117.out     |   5 +
 tests/qemu-iotests/118         |  87 ++++++
 tests/qemu-iotests/118.out     |   5 +
 tests/qemu-iotests/group       |   2 +
 tests/qemu-iotests/iotests.py  |  12 +-
 util/hbitmap.c                 | 104 +++++++
 vl.c                           |   1 +
 26 files changed, 1395 insertions(+), 9 deletions(-)
 create mode 100644 migration/block-dirty-bitmap.c
 create mode 100755 tests/qemu-iotests/117
 create mode 100644 tests/qemu-iotests/117.out
 create mode 100755 tests/qemu-iotests/118
 create mode 100644 tests/qemu-iotests/118.out

-- 
1.9.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RFC v3 01/14] qmp: add query-block-dirty-bitmap
  2015-02-18 14:00 [Qemu-devel] [PATCH RFC v3 00/14] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
@ 2015-02-18 14:00 ` Vladimir Sementsov-Ogievskiy
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 02/14] hbitmap: serialization Vladimir Sementsov-Ogievskiy
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-02-18 14:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, vsementsov, stefanha,
	den, amit.shah, pbonzini

Adds qmp and hmp commands to query/info dirty bitmap. This is needed only for
testing.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 block.c               | 41 ++++++++++++++++++++++++++++
 blockdev.c            | 24 +++++++++++++++++
 hmp-commands.hx       |  2 ++
 hmp.c                 | 32 ++++++++++++++++++++++
 hmp.h                 |  1 +
 include/block/block.h |  2 ++
 monitor.c             |  7 +++++
 qapi/block-core.json  | 75 +++++++++++++++++++++++++++++++++++++++++++++++++++
 qmp-commands.hx       |  5 ++++
 9 files changed, 189 insertions(+)

diff --git a/block.c b/block.c
index f3a6dd4..e4547d7 100644
--- a/block.c
+++ b/block.c
@@ -5574,6 +5574,47 @@ BlockDirtyInfoList *bdrv_query_dirty_bitmaps(BlockDriverState *bs)
     return list;
 }
 
+BlockDirtyBitmapInfo *bdrv_query_dirty_bitmap(BlockDriverState *bs,
+                                              BdrvDirtyBitmap *bitmap)
+{
+    BlockDirtyBitmapInfo *info = g_new0(BlockDirtyBitmapInfo, 1);
+    BlockDirtyRegionList **plist = &info->dirty_regions;
+    uint64_t begin = 0, end = 0, size = bitmap->size;
+    HBitmap *hb = bitmap->bitmap;
+
+    info->dirty_count = bdrv_get_dirty_count(bitmap);
+    info->granularity = bdrv_dirty_bitmap_granularity(bitmap);
+    info->size = bitmap->size;
+    info->name = g_strdup(bitmap->name);
+    info->disabled = bitmap->disabled;
+    info->dirty_regions = NULL;
+
+    for (; begin < size; ++begin) {
+        BlockDirtyRegion *region;
+        BlockDirtyRegionList *entry;
+
+        if (!hbitmap_get(hb, begin)) {
+            continue;
+        }
+
+        for (end = begin + 1; end < size && hbitmap_get(hb, end); ++end) {
+            ;
+        }
+
+        region = g_new0(BlockDirtyRegion, 1);
+        entry = g_new0(BlockDirtyRegionList, 1);
+        region->start = begin;
+        region->count = end - begin;
+        entry->value = region;
+        *plist = entry;
+        plist = &entry->next;
+
+        begin = end;
+    }
+
+    return info;
+}
+
 int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap, int64_t sector)
 {
     if (bitmap) {
diff --git a/blockdev.c b/blockdev.c
index 13068c7..ad08ea0 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2323,6 +2323,30 @@ void qmp_block_dirty_bitmap_clear(const char *node, const char *name,
     aio_context_release(aio_context);
 }
 
+BlockDirtyBitmapInfo *qmp_query_block_dirty_bitmap(const char *node,
+                                                   const char *name,
+                                                   Error **errp)
+{
+    AioContext *aio_context;
+    BdrvDirtyBitmap *bitmap;
+    BlockDriverState *bs;
+    BlockDirtyBitmapInfo *ret = NULL;
+
+    bitmap = block_dirty_bitmap_lookup(node, name, &bs, errp);
+    if (!bitmap) {
+        return NULL;
+    }
+
+    aio_context = bdrv_get_aio_context(bs);
+    aio_context_acquire(aio_context);
+
+    ret = bdrv_query_dirty_bitmap(bs, bitmap);
+
+    aio_context_release(aio_context);
+
+    return ret;
+}
+
 int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
 {
     const char *id = qdict_get_str(qdict, "id");
diff --git a/hmp-commands.hx b/hmp-commands.hx
index e37bc8b..8c5e580 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1778,6 +1778,8 @@ show roms
 show the TPM device
 @item info memory-devices
 show the memory devices
+@item info dirty-bitmap
+show dirty bitmap details and content
 @end table
 ETEXI
 
diff --git a/hmp.c b/hmp.c
index 015499f..2f7ce87 100644
--- a/hmp.c
+++ b/hmp.c
@@ -788,6 +788,38 @@ void hmp_info_tpm(Monitor *mon, const QDict *qdict)
     qapi_free_TPMInfoList(info_list);
 }
 
+void hmp_info_block_dirty_bitmap(Monitor *mon, const QDict *qdict)
+{
+    const char *device = qdict_get_str(qdict, "device");
+    const char *name = qdict_get_str(qdict, "bitmap");
+    BlockDirtyRegionList *region;
+
+    BlockDirtyBitmapInfo *info =
+        qmp_query_block_dirty_bitmap(device, name, NULL);
+
+    if (!info) {
+        monitor_printf(mon, "bitmap '%s' for device '%s' doesn't exis\n",
+                       name, device);
+        return;
+    }
+
+    monitor_printf(mon, "bitmap '%s'\n",
+                   info->name ? info->name : "no name");
+    monitor_printf(mon, "disabled: %s\n", info->disabled ? "true" : "false");
+    monitor_printf(mon, "size: %" PRId64 "\n", info->size);
+    monitor_printf(mon, "granularity: %" PRIu64 "\n", info->granularity);
+    monitor_printf(mon, "dirty regions begin\n");
+
+    for (region = info->dirty_regions; region; region = region->next) {
+        monitor_printf(mon, "%" PRIu64 ": %" PRIu64 "\n",
+                       region->value->start, region->value->count);
+    }
+
+    monitor_printf(mon, "dirty regions end\n");
+
+    qapi_free_BlockDirtyBitmapInfo(info);
+}
+
 void hmp_quit(Monitor *mon, const QDict *qdict)
 {
     monitor_suspend(mon);
diff --git a/hmp.h b/hmp.h
index 4bb5dca..fdfec15 100644
--- a/hmp.h
+++ b/hmp.h
@@ -19,6 +19,7 @@
 #include "qapi-types.h"
 #include "qapi/qmp/qdict.h"
 
+void hmp_info_block_dirty_bitmap(Monitor *mon, const QDict *qdict);
 void hmp_info_name(Monitor *mon, const QDict *qdict);
 void hmp_info_version(Monitor *mon, const QDict *qdict);
 void hmp_info_kvm(Monitor *mon, const QDict *qdict);
diff --git a/include/block/block.h b/include/block/block.h
index f6a50ae..c9d96a6 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -460,6 +460,8 @@ void bdrv_enable_dirty_bitmap(BdrvDirtyBitmap *bitmap);
 BlockDirtyInfoList *bdrv_query_dirty_bitmaps(BlockDriverState *bs);
 uint32_t bdrv_get_default_bitmap_granularity(BlockDriverState *bs);
 uint32_t bdrv_dirty_bitmap_granularity(BdrvDirtyBitmap *bitmap);
+BlockDirtyBitmapInfo *bdrv_query_dirty_bitmap(BlockDriverState *bs,
+                                              BdrvDirtyBitmap *bitmap);
 bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap);
 bool bdrv_dirty_bitmap_frozen(BdrvDirtyBitmap *bitmap);
 int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap, int64_t sector);
diff --git a/monitor.c b/monitor.c
index c3cc060..2bcb390 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2936,6 +2936,13 @@ static mon_cmd_t info_cmds[] = {
         .mhandler.cmd = hmp_info_memory_devices,
     },
     {
+        .name       = "block-dirty-bitmap",
+        .args_type  = "device:B,bitmap:s",
+        .params     = "device bitmap",
+        .help       = "show dirty bitmap details and content",
+        .mhandler.cmd = hmp_info_block_dirty_bitmap,
+    },
+    {
         .name       = NULL,
     },
 };
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 02bf2b6..25dea80 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -996,6 +996,64 @@
   'data': { 'node': 'str', 'name': 'str', '*granularity': 'uint32' } }
 
 ##
+# @BlockDirtyRegion:
+#
+# Region in bytes.
+#
+# @start: first byte
+#
+# @count: number of bytes in the region
+#
+# Since: 2.3
+##
+{ 'type': 'BlockDirtyRegion',
+  'data': { 'start': 'int', 'count': 'int' } }
+
+##
+# @BlockDirtyBitmapInfo
+#
+# @name: the name of the dirty bitmap
+#
+# @size: size of the dirty bitmap in sectors
+#
+# @granularity: granularity of the dirty bitmap in bytes
+#
+# @disabled: whether the dirty bitmap is disabled
+#
+# @dirty-count: number of dirty bytes according to the dirty bitmap
+#
+# @dirty-regions: dirty regions of the bitmap
+#
+# Since 2.3
+##
+{ 'type': 'BlockDirtyBitmapInfo',
+  'data': { 'name': 'str',
+            'size': 'int',
+            'granularity': 'int',
+            'disabled': 'bool',
+            'dirty-count': 'int',
+            'dirty-regions': ['BlockDirtyRegion'] } }
+
+##
+# @DirtyBitmapInfo:
+#
+# Dirty bitmap information.
+#
+# @name: #optional the name of the dirty bitmap (Since 2.3)
+#
+# @count: number of dirty bytes according to the dirty bitmap
+#
+# @granularity: granularity of the dirty bitmap in bytes (since 1.4)
+#
+# @enabled: whether the dirty bitmap is enabled (Since 2.3)
+#
+# Since: 2.3
+##
+#{ 'type': 'DirtyBitmapInfo',
+#  'data': {'*name': 'str', 'count': 'int', 'granularity': 'int',
+#           'enabled': 'bool', 'dirty-regions': ['DirtyRegion']} }
+
+##
 # @block-dirty-bitmap-add
 #
 # Create a dirty bitmap with a name on the node
@@ -1069,6 +1127,23 @@
   'data': 'BlockDirtyBitmap' }
 
 ##
+# @query-block-dirty-bitmap
+#
+# Get a description for specified dirty bitmap including it's dirty regions.
+# This command is in general for testing purposes.
+#
+# Returns: @DirtyBitmapInfo
+#
+# Since: 2.3
+##
+{ 'type': 'Mega',
+'data': { '*name': 'str' }}
+
+{ 'command': 'query-block-dirty-bitmap',
+  'data': 'BlockDirtyBitmap',
+  'returns': 'BlockDirtyBitmapInfo' }
+
+##
 # @block_set_io_throttle:
 #
 # Change I/O throttle limits for a block drive.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 21416fa..b8812a8 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1269,6 +1269,11 @@ EQMP
         .args_type  = "node:B,name:s",
         .mhandler.cmd_new = qmp_marshal_input_block_dirty_bitmap_clear,
     },
+    {
+        .name       = "query-block-dirty-bitmap",
+        .args_type  = "node:B,name:s",
+        .mhandler.cmd_new = qmp_marshal_input_query_block_dirty_bitmap,
+    },
 
 SQMP
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RFC v3 02/14] hbitmap: serialization
  2015-02-18 14:00 [Qemu-devel] [PATCH RFC v3 00/14] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 01/14] qmp: add query-block-dirty-bitmap Vladimir Sementsov-Ogievskiy
@ 2015-02-18 14:00 ` Vladimir Sementsov-Ogievskiy
  2015-02-18 23:42   ` John Snow
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 03/14] block: BdrvDirtyBitmap serialization interface Vladimir Sementsov-Ogievskiy
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-02-18 14:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, vsementsov, stefanha,
	den, amit.shah, pbonzini

Functions to serialize / deserialize(restore) HBitmap. HBitmap should be
saved to linear sequence of bits independently of endianness and bitmap
array element (unsigned long) size. Therefore Little Endian is chosen.

These functions are appropriate for dirty bitmap migration, restoring
the bitmap in several steps is available. To save performance, every
step writes only the last level of the bitmap. All other levels are
restored by hbitmap_deserialize_finish() as a last step of restoring.
So, HBitmap is inconsistent while restoring.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 include/qemu/hbitmap.h | 59 +++++++++++++++++++++++++++++++
 util/hbitmap.c         | 96 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 155 insertions(+)

diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index c19c1cb..10ce05b 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -129,6 +129,65 @@ void hbitmap_reset(HBitmap *hb, uint64_t start, uint64_t count);
 bool hbitmap_get(const HBitmap *hb, uint64_t item);
 
 /**
+ * hbitmap_data_size:
+ * @hb: HBitmap to operate on.
+ * @count: Number of bits
+ *
+ * Return amount of bytes hbitmap_serialize_part needs
+ */
+uint64_t hbitmap_data_size(const HBitmap *hb, uint64_t count);
+
+/**
+ * hbitmap_serialize_part
+ * @hb: HBitmap to oprate on.
+ * @buf: Buffer to store serialized bitmap.
+ * @start: First bit to store.
+ * @count: Number of bits to store.
+ *
+ * Stores HBitmap data corresponding to given region. The format of saved data
+ * is linear sequence of bits, so it can be used by hbitmap_deserialize_part
+ * independently of endianness and size of HBitmap level array elements
+ */
+void hbitmap_serialize_part(const HBitmap *hb, uint8_t *buf,
+                            uint64_t start, uint64_t count);
+
+/**
+ * hbitmap_deserialize_part
+ * @hb: HBitmap to operate on.
+ * @buf: Buffer to restore bitmap data from.
+ * @start: First bit to restore.
+ * @count: Number of bits to restore.
+ *
+ * Retores HBitmap data corresponding to given region. The format is the same
+ * as for hbitmap_serialize_part.
+ *
+ * ! The bitmap becomes inconsistent after this operation.
+ * hbitmap_serialize_finish should be called before using the bitmap after
+ * data restoring.
+ */
+void hbitmap_deserialize_part(HBitmap *hb, uint8_t *buf,
+                              uint64_t start, uint64_t count);
+
+/**
+ * hbitmap_deserialize_zeroes
+ * @hb: HBitmap to operate on.
+ * @start: First bit to restore.
+ * @count: Number of bits to restore.
+ *
+ * Same as hbitmap_serialize_part, but fills the bitmap with zeroes.
+ */
+void hbitmap_deserialize_zeroes(HBitmap *hb, uint64_t start, uint64_t count);
+
+/**
+ * hbitmap_deserialize_finish
+ * @hb: HBitmap to operate on.
+ *
+ * Repair HBitmap after calling hbitmap_deserialize_data. Actually, all HBitmap
+ * layers are restored here.
+ */
+void hbitmap_deserialize_finish(HBitmap *hb);
+
+/**
  * hbitmap_free:
  * @hb: HBitmap to operate on.
  *
diff --git a/util/hbitmap.c b/util/hbitmap.c
index 962ff29..1a736e7 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -362,6 +362,102 @@ bool hbitmap_get(const HBitmap *hb, uint64_t item)
     return (hb->levels[HBITMAP_LEVELS - 1][pos >> BITS_PER_LEVEL] & bit) != 0;
 }
 
+uint64_t hbitmap_data_size(const HBitmap *hb, uint64_t count)
+{
+    uint64_t size, gran;
+
+    if (count == 0) {
+        return 0;
+    }
+
+    gran = 1ll << hb->granularity;
+    size = (((gran + count - 2) >> hb->granularity) >> BITS_PER_LEVEL) + 1;
+
+    return size * sizeof(unsigned long);
+}
+
+void hbitmap_serialize_part(const HBitmap *hb, uint8_t *buf,
+                            uint64_t start, uint64_t count)
+{
+    uint64_t i;
+    uint64_t last = start + count - 1;
+    unsigned long *out = (unsigned long *)buf;
+
+    if (count == 0) {
+        return;
+    }
+
+    start = (start >> hb->granularity) >> BITS_PER_LEVEL;
+    last = (last >> hb->granularity) >> BITS_PER_LEVEL;
+    count = last - start + 1;
+
+    for (i = start; i <= last; ++i) {
+        unsigned long el = hb->levels[HBITMAP_LEVELS - 1][i];
+        out[i] = (BITS_PER_LONG == 32 ? cpu_to_le32(el) : cpu_to_le64(el));
+    }
+}
+
+void hbitmap_deserialize_part(HBitmap *hb, uint8_t *buf,
+                              uint64_t start, uint64_t count)
+{
+    uint64_t i;
+    uint64_t last = start + count - 1;
+    unsigned long *in = (unsigned long *)buf;
+
+    if (count == 0) {
+        return;
+    }
+
+    start = (start >> hb->granularity) >> BITS_PER_LEVEL;
+    last = (last >> hb->granularity) >> BITS_PER_LEVEL;
+    count = last - start + 1;
+
+    for (i = start; i <= last; ++i) {
+        hb->levels[HBITMAP_LEVELS - 1][i] =
+            (BITS_PER_LONG == 32 ? le32_to_cpu(in[i]) : le64_to_cpu(in[i]));
+    }
+}
+
+void hbitmap_deserialize_zeroes(HBitmap *hb, uint64_t start, uint64_t count)
+{
+    uint64_t last = start + count - 1;
+
+    if (count == 0) {
+        return;
+    }
+
+    start = (start >> hb->granularity) >> BITS_PER_LEVEL;
+    last = (last >> hb->granularity) >> BITS_PER_LEVEL;
+    count = last - start + 1;
+
+    memset(hb->levels[HBITMAP_LEVELS - 1] + start, 0,
+           count * sizeof(unsigned long));
+}
+
+void hbitmap_deserialize_finish(HBitmap *bitmap)
+{
+    int64_t i, size, prev_size;
+    int lev;
+
+    /* restore levels starting from penultimate to zero level, assuming
+     * that the last level is ok */
+    size = MAX((bitmap->size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 1);
+    for (lev = HBITMAP_LEVELS - 1; lev-- > 0; ) {
+        prev_size = size;
+        size = MAX((size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 1);
+        memset(bitmap->levels[lev], 0, size * sizeof(unsigned long));
+
+        for (i = 0; i < prev_size; ++i) {
+            if (bitmap->levels[lev + 1][i]) {
+                bitmap->levels[lev][i >> BITS_PER_LEVEL] |=
+                    1 << (i & (BITS_PER_LONG - 1));
+            }
+        }
+    }
+
+    bitmap->levels[0][0] |= 1UL << (BITS_PER_LONG - 1);
+}
+
 void hbitmap_free(HBitmap *hb)
 {
     unsigned i;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RFC v3 03/14] block: BdrvDirtyBitmap serialization interface
  2015-02-18 14:00 [Qemu-devel] [PATCH RFC v3 00/14] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 01/14] qmp: add query-block-dirty-bitmap Vladimir Sementsov-Ogievskiy
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 02/14] hbitmap: serialization Vladimir Sementsov-Ogievskiy
@ 2015-02-18 14:00 ` Vladimir Sementsov-Ogievskiy
  2015-02-18 23:43   ` John Snow
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 04/14] block: tiny refactoring: minimize hbitmap_(set/reset) usage Vladimir Sementsov-Ogievskiy
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-02-18 14:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, vsementsov, stefanha,
	den, amit.shah, pbonzini

Several functions to provide necessary access to BdrvDirtyBitmap for
block-migration.c

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 block.c               | 36 ++++++++++++++++++++++++++++++++++++
 include/block/block.h | 13 +++++++++++++
 2 files changed, 49 insertions(+)

diff --git a/block.c b/block.c
index e4547d7..e95a5ae 100644
--- a/block.c
+++ b/block.c
@@ -5674,6 +5674,42 @@ void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap)
     hbitmap_reset(bitmap->bitmap, 0, bitmap->size);
 }
 
+const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap)
+{
+    return bitmap->name;
+}
+
+uint64_t bdrv_dirty_bitmap_data_size(const BdrvDirtyBitmap *bitmap,
+                                     uint64_t count)
+{
+    return hbitmap_data_size(bitmap->bitmap, count);
+}
+
+void bdrv_dirty_bitmap_serialize_part(const BdrvDirtyBitmap *bitmap,
+                                      uint8_t *buf, uint64_t start,
+                                      uint64_t count)
+{
+    hbitmap_serialize_part(bitmap->bitmap, buf, start, count);
+}
+
+void bdrv_dirty_bitmap_deserialize_part(BdrvDirtyBitmap *bitmap,
+                                        uint8_t *buf, uint64_t start,
+                                        uint64_t count)
+{
+    hbitmap_deserialize_part(bitmap->bitmap, buf, start, count);
+}
+
+void bdrv_dirty_bitmap_deserialize_zeroes(BdrvDirtyBitmap *bitmap,
+                                          uint64_t start, uint64_t count)
+{
+    hbitmap_deserialize_zeroes(bitmap->bitmap, start, count);
+}
+
+void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap)
+{
+    hbitmap_deserialize_finish(bitmap->bitmap);
+}
+
 static void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
                            int nr_sectors)
 {
diff --git a/include/block/block.h b/include/block/block.h
index c9d96a6..c6a928d 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -474,6 +474,19 @@ void bdrv_dirty_iter_init(BdrvDirtyBitmap *bitmap, struct HBitmapIter *hbi);
 void bdrv_set_dirty_iter(struct HBitmapIter *hbi, int64_t offset);
 int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap);
 
+const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap);
+uint64_t bdrv_dirty_bitmap_data_size(const BdrvDirtyBitmap *bitmap,
+                                     uint64_t count);
+void bdrv_dirty_bitmap_serialize_part(const BdrvDirtyBitmap *bitmap,
+                                      uint8_t *buf, uint64_t start,
+                                      uint64_t count);
+void bdrv_dirty_bitmap_deserialize_part(BdrvDirtyBitmap *bitmap,
+                                        uint8_t *buf, uint64_t start,
+                                        uint64_t count);
+void bdrv_dirty_bitmap_deserialize_zeroes(BdrvDirtyBitmap *bitmap,
+                                          uint64_t start, uint64_t count);
+void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap);
+
 void bdrv_enable_copy_on_read(BlockDriverState *bs);
 void bdrv_disable_copy_on_read(BlockDriverState *bs);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RFC v3 04/14] block: tiny refactoring: minimize hbitmap_(set/reset) usage
  2015-02-18 14:00 [Qemu-devel] [PATCH RFC v3 00/14] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 03/14] block: BdrvDirtyBitmap serialization interface Vladimir Sementsov-Ogievskiy
@ 2015-02-18 14:00 ` Vladimir Sementsov-Ogievskiy
  2015-02-18 23:44   ` John Snow
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 05/14] block: add meta bitmaps Vladimir Sementsov-Ogievskiy
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-02-18 14:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, vsementsov, stefanha,
	den, amit.shah, pbonzini

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 block.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index e95a5ae..a127fd2 100644
--- a/block.c
+++ b/block.c
@@ -5670,8 +5670,7 @@ void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
 
 void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap)
 {
-    assert(bdrv_dirty_bitmap_enabled(bitmap));
-    hbitmap_reset(bitmap->bitmap, 0, bitmap->size);
+    bdrv_reset_dirty_bitmap(bitmap, 0, bitmap->size);
 }
 
 const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap)
@@ -5718,7 +5717,7 @@ static void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
         if (!bdrv_dirty_bitmap_enabled(bitmap)) {
             continue;
         }
-        hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
+        bdrv_set_dirty_bitmap(bitmap, cur_sector, nr_sectors);
     }
 }
 
@@ -5730,7 +5729,7 @@ static void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector,
         if (!bdrv_dirty_bitmap_enabled(bitmap)) {
             continue;
         }
-        hbitmap_reset(bitmap->bitmap, cur_sector, nr_sectors);
+        bdrv_reset_dirty_bitmap(bitmap, cur_sector, nr_sectors);
     }
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RFC v3 05/14] block: add meta bitmaps
  2015-02-18 14:00 [Qemu-devel] [PATCH RFC v3 00/14] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
                   ` (3 preceding siblings ...)
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 04/14] block: tiny refactoring: minimize hbitmap_(set/reset) usage Vladimir Sementsov-Ogievskiy
@ 2015-02-18 14:00 ` Vladimir Sementsov-Ogievskiy
  2015-02-18 23:45   ` John Snow
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 06/14] block: add bdrv_next_dirty_bitmap() Vladimir Sementsov-Ogievskiy
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-02-18 14:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, vsementsov, stefanha,
	den, amit.shah, pbonzini

Meta bitmap is a 'dirty bitmap' for the BdrvDirtyBitmap. It tracks
changes (set/unset) of this BdrvDirtyBitmap. It is needed for live
migration of block dirty bitmaps.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 block.c               | 40 ++++++++++++++++++++++++++++++++++++++++
 include/block/block.h |  5 +++++
 2 files changed, 45 insertions(+)

diff --git a/block.c b/block.c
index a127fd2..aaa08b8 100644
--- a/block.c
+++ b/block.c
@@ -58,9 +58,15 @@
  * (3) successor is set: frozen mode.
  *     A frozen bitmap cannot be renamed, deleted, anonymized, cleared, set,
  *     or enabled. A frozen bitmap can only abdicate() or reclaim().
+ *
+ * Meta bitmap:
+ * Meta bitmap is a 'dirty bitmap' for the BdrvDirtyBitmap. It tracks changes
+ * (set/unset) of this BdrvDirtyBitmap. It is needed for live migration of
+ * block dirty bitmaps.
  */
 struct BdrvDirtyBitmap {
     HBitmap *bitmap;            /* Dirty sector bitmap implementation */
+    HBitmap *meta_bitmap;       /* Meta bitmap */
     BdrvDirtyBitmap *successor; /* Anonymous child; implies frozen status */
     char *name;                 /* Optional non-empty unique ID */
     int64_t size;               /* Size of the bitmap (Number of sectors) */
@@ -5398,6 +5404,31 @@ void bdrv_dirty_bitmap_make_anon(BdrvDirtyBitmap *bitmap)
     bitmap->name = NULL;
 }
 
+HBitmap *bdrv_create_meta_bitmap(BdrvDirtyBitmap *bitmap,
+                                        uint64_t granularity)
+{
+    uint64_t sector_granularity;
+
+    assert((granularity & (granularity - 1)) == 0);
+
+    granularity *= 8 * bdrv_dirty_bitmap_granularity(bitmap);
+    sector_granularity = granularity >> BDRV_SECTOR_BITS;
+    assert(sector_granularity);
+
+    bitmap->meta_bitmap =
+        hbitmap_alloc(bitmap->size, ffsll(sector_granularity) - 1);
+
+    return bitmap->meta_bitmap;
+}
+
+void bdrv_release_meta_bitmap(BdrvDirtyBitmap *bitmap)
+{
+    if (bitmap->meta_bitmap) {
+        hbitmap_free(bitmap->meta_bitmap);
+        bitmap->meta_bitmap = NULL;
+    }
+}
+
 BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,
                                           uint32_t granularity,
                                           const char *name,
@@ -5532,6 +5563,9 @@ void bdrv_release_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap)
             assert(!bdrv_dirty_bitmap_frozen(bm));
             QLIST_REMOVE(bitmap, list);
             hbitmap_free(bitmap->bitmap);
+            if (bitmap->meta_bitmap) {
+                hbitmap_free(bitmap->meta_bitmap);
+            }
             g_free(bitmap->name);
             g_free(bitmap);
             return;
@@ -5659,6 +5693,9 @@ void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
 {
     assert(bdrv_dirty_bitmap_enabled(bitmap));
     hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
+    if (bitmap->meta_bitmap) {
+        hbitmap_set(bitmap->meta_bitmap, cur_sector, nr_sectors);
+    }
 }
 
 void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
@@ -5666,6 +5703,9 @@ void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
 {
     assert(bdrv_dirty_bitmap_enabled(bitmap));
     hbitmap_reset(bitmap->bitmap, cur_sector, nr_sectors);
+    if (bitmap->meta_bitmap) {
+        hbitmap_set(bitmap->meta_bitmap, cur_sector, nr_sectors);
+    }
 }
 
 void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap)
diff --git a/include/block/block.h b/include/block/block.h
index c6a928d..f2c62f6 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -4,6 +4,7 @@
 #include "block/aio.h"
 #include "qemu-common.h"
 #include "qemu/option.h"
+#include "qemu/hbitmap.h"
 #include "block/coroutine.h"
 #include "block/accounting.h"
 #include "qapi/qmp/qobject.h"
@@ -487,6 +488,10 @@ void bdrv_dirty_bitmap_deserialize_zeroes(BdrvDirtyBitmap *bitmap,
                                           uint64_t start, uint64_t count);
 void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap);
 
+HBitmap *bdrv_create_meta_bitmap(BdrvDirtyBitmap *bitmap,
+                                        uint64_t granularity);
+void bdrv_release_meta_bitmap(BdrvDirtyBitmap *bitmap);
+
 void bdrv_enable_copy_on_read(BlockDriverState *bs);
 void bdrv_disable_copy_on_read(BlockDriverState *bs);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RFC v3 06/14] block: add bdrv_next_dirty_bitmap()
  2015-02-18 14:00 [Qemu-devel] [PATCH RFC v3 00/14] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
                   ` (4 preceding siblings ...)
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 05/14] block: add meta bitmaps Vladimir Sementsov-Ogievskiy
@ 2015-02-18 14:00 ` Vladimir Sementsov-Ogievskiy
  2015-02-18 23:45   ` John Snow
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 07/14] qapi: add dirty-bitmaps migration capability Vladimir Sementsov-Ogievskiy
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-02-18 14:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, vsementsov, stefanha,
	den, amit.shah, pbonzini

Like bdrv_next()  - bdrv_next_dirty_bitmap() is a function to provide
access to private dirty bitmaps list.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 block.c               | 10 ++++++++++
 include/block/block.h |  2 ++
 2 files changed, 12 insertions(+)

diff --git a/block.c b/block.c
index aaa08b8..4cca55d 100644
--- a/block.c
+++ b/block.c
@@ -5649,6 +5649,16 @@ BlockDirtyBitmapInfo *bdrv_query_dirty_bitmap(BlockDriverState *bs,
     return info;
 }
 
+BdrvDirtyBitmap *bdrv_next_dirty_bitmap(BlockDriverState *bs,
+                                        BdrvDirtyBitmap *bitmap)
+{
+    if (bitmap == NULL) {
+        return QLIST_FIRST(&bs->dirty_bitmaps);
+    }
+
+    return QLIST_NEXT(bitmap, list);
+}
+
 int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap, int64_t sector)
 {
     if (bitmap) {
diff --git a/include/block/block.h b/include/block/block.h
index f2c62f6..d11d956 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -487,6 +487,8 @@ void bdrv_dirty_bitmap_deserialize_part(BdrvDirtyBitmap *bitmap,
 void bdrv_dirty_bitmap_deserialize_zeroes(BdrvDirtyBitmap *bitmap,
                                           uint64_t start, uint64_t count);
 void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap);
+BdrvDirtyBitmap *bdrv_next_dirty_bitmap(BlockDriverState *bs,
+                                        BdrvDirtyBitmap *bitmap);
 
 HBitmap *bdrv_create_meta_bitmap(BdrvDirtyBitmap *bitmap,
                                         uint64_t granularity);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RFC v3 07/14] qapi: add dirty-bitmaps migration capability
  2015-02-18 14:00 [Qemu-devel] [PATCH RFC v3 00/14] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
                   ` (5 preceding siblings ...)
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 06/14] block: add bdrv_next_dirty_bitmap() Vladimir Sementsov-Ogievskiy
@ 2015-02-18 14:00 ` Vladimir Sementsov-Ogievskiy
  2015-02-18 23:45   ` John Snow
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 08/14] migration: add migration/block-dirty-bitmap.c Vladimir Sementsov-Ogievskiy
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-02-18 14:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, vsementsov, stefanha,
	den, amit.shah, pbonzini

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 include/migration/migration.h | 1 +
 migration/migration.c         | 9 +++++++++
 qapi-schema.json              | 5 ++++-
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index f37348b..719875d 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -145,6 +145,7 @@ void migrate_del_blocker(Error *reason);
 
 bool migrate_rdma_pin_all(void);
 bool migrate_zero_blocks(void);
+bool migrate_dirty_bitmaps(void);
 
 bool migrate_auto_converge(void);
 
diff --git a/migration/migration.c b/migration/migration.c
index b3adbc6..68e7478 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -567,6 +567,15 @@ bool migrate_zero_blocks(void)
     return s->enabled_capabilities[MIGRATION_CAPABILITY_ZERO_BLOCKS];
 }
 
+bool migrate_dirty_bitmaps(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->enabled_capabilities[MIGRATION_CAPABILITY_DIRTY_BITMAPS];
+}
+
 int migrate_use_xbzrle(void)
 {
     MigrationState *s;
diff --git a/qapi-schema.json b/qapi-schema.json
index 0c31203..70b54ab 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -494,10 +494,13 @@
 # @auto-converge: If enabled, QEMU will automatically throttle down the guest
 #          to speed up convergence of RAM migration. (since 1.6)
 #
+# @dirty-bitmaps: If enabled, QEMU will migrate named dirty bitmaps. (since 2.3)
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
-  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks'] }
+  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
+           'dirty-bitmaps'] }
 
 ##
 # @MigrationCapabilityStatus
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RFC v3 08/14] migration: add migration/block-dirty-bitmap.c
  2015-02-18 14:00 [Qemu-devel] [PATCH RFC v3 00/14] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
                   ` (6 preceding siblings ...)
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 07/14] qapi: add dirty-bitmaps migration capability Vladimir Sementsov-Ogievskiy
@ 2015-02-18 14:00 ` Vladimir Sementsov-Ogievskiy
  2015-02-18 23:47   ` John Snow
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 09/14] iotests: maintain several vms in test Vladimir Sementsov-Ogievskiy
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-02-18 14:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, vsementsov, stefanha,
	den, amit.shah, pbonzini

Live migration of dirty bitmaps. Only named dirty bitmaps, associated with
root nodes and non-root named nodes are migrated.

If destination qemu is already containing a dirty bitmap with the same name
as a migrated bitmap (for the same node), than, if their granularities are
the same the migration will be done, otherwise the error will be generated.

If destination qemu doesn't contain such bitmap it will be created.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 include/migration/block.h      |   1 +
 migration/Makefile.objs        |   2 +-
 migration/block-dirty-bitmap.c | 708 +++++++++++++++++++++++++++++++++++++++++
 vl.c                           |   1 +
 4 files changed, 711 insertions(+), 1 deletion(-)
 create mode 100644 migration/block-dirty-bitmap.c

diff --git a/include/migration/block.h b/include/migration/block.h
index ffa8ac0..566bb9f 100644
--- a/include/migration/block.h
+++ b/include/migration/block.h
@@ -14,6 +14,7 @@
 #ifndef BLOCK_MIGRATION_H
 #define BLOCK_MIGRATION_H
 
+void dirty_bitmap_mig_init(void);
 void blk_mig_init(void);
 int blk_mig_active(void);
 uint64_t blk_mig_bytes_transferred(void);
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index d929e96..128612d 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -6,5 +6,5 @@ common-obj-y += xbzrle.o
 common-obj-$(CONFIG_RDMA) += rdma.o
 common-obj-$(CONFIG_POSIX) += exec.o unix.o fd.o
 
-common-obj-y += block.o
+common-obj-y += block.o block-dirty-bitmap.o
 
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
new file mode 100644
index 0000000..084ba22
--- /dev/null
+++ b/migration/block-dirty-bitmap.c
@@ -0,0 +1,708 @@
+/*
+ * QEMU dirty bitmap migration
+ *
+ * Live migration of dirty bitmaps. Only named dirty bitmaps, associated with
+ * root nodes and non-root named nodes are migrated.
+ *
+ * If destination qemu is already containing a dirty bitmap with the same name
+ * as a migrated bitmap (for the same node), than, if their granularities are
+ * the same the migration will be done, otherwise the error will be generated.
+ *
+ * If destination qemu doesn't contain such bitmap it will be created.
+ *
+ * format of migration:
+ *
+ * # Header (shared for different chunk types)
+ * 1 byte: flags
+ * [ 1 byte: node name size ] \  flags & DEVICE_NAME
+ * [ n bytes: node name     ] /
+ * [ 1 byte: bitmap name size ] \  flags & BITMAP_NAME
+ * [ n bytes: bitmap name     ] /
+ *
+ * # Start of bitmap migration (flags & START)
+ * header
+ * be64: granularity
+ *
+ * # Complete of bitmap migration (flags & COMPLETE)
+ * header
+ * 1 byte: bitmap enabled flag
+ *
+ * # Data chunk of bitmap migration
+ * header
+ * be64: start sector
+ * be32: number of sectors
+ * [ be64: buffer size  ] \ ! (flags & ZEROES)
+ * [ n bytes: buffer    ] /
+ *
+ * The last chunk in stream should contain flags & EOS. The chunk may skip
+ * device and/or bitmap names, assuming them to be the same with the previous
+ * chunk.
+ *
+ *
+ * This file is derived from migration/block.c
+ *
+ * Author:
+ * Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
+ *
+ * original copyright message:
+ * =====================================================================
+ * Copyright IBM, Corp. 2009
+ *
+ * Authors:
+ *  Liran Schour   <lirans@il.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Contributions after 2012-01-13 are licensed under the terms of the
+ * GNU GPL, version 2 or (at your option) any later version.
+ * =====================================================================
+ */
+
+#include "block/block.h"
+#include "block/block_int.h"
+#include "sysemu/block-backend.h"
+#include "qemu/main-loop.h"
+#include "qemu/error-report.h"
+#include "migration/block.h"
+#include "migration/migration.h"
+#include "qemu/hbitmap.h"
+#include <assert.h>
+
+#define CHUNK_SIZE                       (1 << 20)
+
+/* Flags occupy from one to four bytes. In all but one the 7-th (EXTRA_FLAGS)
+ * bit should be set. */
+#define DIRTY_BITMAP_MIG_FLAG_EOS           0x01
+#define DIRTY_BITMAP_MIG_FLAG_ZEROES        0x02
+#define DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME   0x04
+#define DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME   0x08
+#define DIRTY_BITMAP_MIG_FLAG_START         0x10
+#define DIRTY_BITMAP_MIG_FLAG_COMPLETE      0x20
+#define DIRTY_BITMAP_MIG_FLAG_BITS          0x40
+
+#define DIRTY_BITMAP_MIG_EXTRA_FLAGS        0x80
+#define DIRTY_BITMAP_MIG_FLAGS_SIZE_16      0x8000
+#define DIRTY_BITMAP_MIG_FLAGS_SIZE_32      0x8080
+
+#define DEBUG_DIRTY_BITMAP_MIGRATION
+
+#ifdef DEBUG_DIRTY_BITMAP_MIGRATION
+#define DPRINTF(fmt, ...) \
+    do { printf("dirty_migration: " fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+    do { } while (0)
+#endif
+
+typedef struct DirtyBitmapMigBitmapState {
+    /* Written during setup phase. */
+    BlockDriverState *bs;
+    const char *node_name;
+    BdrvDirtyBitmap *bitmap;
+    HBitmap *meta_bitmap;
+    uint64_t total_sectors;
+    uint64_t sectors_per_chunk;
+    QSIMPLEQ_ENTRY(DirtyBitmapMigBitmapState) entry;
+
+    /* For bulk phase. */
+    bool bulk_completed;
+    uint64_t cur_sector;
+
+    /* For dirty phase. */
+    HBitmapIter iter_dirty;
+} DirtyBitmapMigBitmapState;
+
+typedef struct DirtyBitmapMigState {
+    QSIMPLEQ_HEAD(dbms_list, DirtyBitmapMigBitmapState) dbms_list;
+
+    bool bulk_completed;
+
+    /* for send_bitmap() */
+    BlockDriverState *prev_bs;
+    BdrvDirtyBitmap *prev_bitmap;
+} DirtyBitmapMigState;
+
+typedef struct DirtyBitmapLoadState {
+    uint32_t flags;
+    char node_name[256];
+    char bitmap_name[256];
+    BlockDriverState *bs;
+    BdrvDirtyBitmap *bitmap;
+} DirtyBitmapLoadState;
+
+static DirtyBitmapMigState dirty_bitmap_mig_state;
+
+static uint32_t qemu_get_flags(QEMUFile *f)
+{
+    uint8_t flags = qemu_get_byte(f);
+    if (flags & DIRTY_BITMAP_MIG_EXTRA_FLAGS) {
+        flags = flags << 8 | qemu_get_byte(f);
+        if (flags & DIRTY_BITMAP_MIG_EXTRA_FLAGS) {
+            flags = flags << 16 | qemu_get_be16(f);
+        }
+    }
+
+    return flags;
+}
+
+static void qemu_put_flags(QEMUFile *f, uint32_t flags)
+{
+    if (!(flags & 0xffffff00)) {
+        qemu_put_byte(f, flags);
+        return;
+    }
+
+    if (!(flags & 0xffff0000)) {
+        qemu_put_be16(f, flags | DIRTY_BITMAP_MIG_FLAGS_SIZE_16);
+        return;
+    }
+
+    qemu_put_be32(f, flags | DIRTY_BITMAP_MIG_FLAGS_SIZE_32);
+}
+
+/* read name from qemu file:
+ * format:
+ * 1 byte : len = name length (<256)
+ * len bytes : name without last zero byte
+ *
+ * name should point to the buffer >= 256 bytes length
+ */
+static char *qemu_get_string(QEMUFile *f, char *name)
+{
+    int len = qemu_get_byte(f);
+    qemu_get_buffer(f, (uint8_t *)name, len);
+    name[len] = '\0';
+
+    DPRINTF("get name: %d %s\n", len, name);
+
+    return name;
+}
+
+/* write name to qemu file:
+ * format:
+ * same as for qemu_get_string
+ *
+ * maximum name length is 255
+ */
+static void qemu_put_string(QEMUFile *f, const char *name)
+{
+    int len = strlen(name);
+
+    DPRINTF("put name: %d %s\n", len, name);
+
+    assert(len < 256);
+    qemu_put_byte(f, len);
+    qemu_put_buffer(f, (const uint8_t *)name, len);
+}
+
+static void send_bitmap_header(QEMUFile *f, DirtyBitmapMigBitmapState *dbms,
+                               uint32_t additional_flags)
+{
+    BlockDriverState *bs = dbms->bs;
+    BdrvDirtyBitmap *bitmap = dbms->bitmap;
+    uint32_t flags = additional_flags;
+
+    if (bs != dirty_bitmap_mig_state.prev_bs) {
+        dirty_bitmap_mig_state.prev_bs = bs;
+        flags |= DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME;
+    }
+
+    if (bitmap != dirty_bitmap_mig_state.prev_bitmap) {
+        dirty_bitmap_mig_state.prev_bitmap = bitmap;
+        flags |= DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME;
+    }
+
+    qemu_put_flags(f, flags);
+
+    if (flags & DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME) {
+        qemu_put_string(f, dbms->node_name);
+    }
+
+    if (flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
+        qemu_put_string(f, bdrv_dirty_bitmap_name(bitmap));
+    }
+}
+
+static void send_bitmap_start(QEMUFile *f, DirtyBitmapMigBitmapState *dbms)
+{
+    send_bitmap_header(f, dbms, DIRTY_BITMAP_MIG_FLAG_START);
+    qemu_put_be32(f, bdrv_dirty_bitmap_granularity(dbms->bitmap));
+}
+
+static void send_bitmap_complete(QEMUFile *f, DirtyBitmapMigBitmapState *dbms)
+{
+    send_bitmap_header(f, dbms, DIRTY_BITMAP_MIG_FLAG_COMPLETE);
+    qemu_put_byte(f, bdrv_dirty_bitmap_enabled(dbms->bitmap));
+}
+
+static void send_bitmap_bits(QEMUFile *f, DirtyBitmapMigBitmapState *dbms,
+                             uint64_t start_sector, uint32_t nr_sectors)
+{
+    /* align for buffer_is_zero() */
+    uint64_t align = 4 * sizeof(long);
+    uint64_t buf_size =
+        (bdrv_dirty_bitmap_data_size(dbms->bitmap, nr_sectors) + align - 1) &
+        ~(align - 1);
+    uint8_t *buf = g_malloc0(buf_size);
+    uint32_t flags = DIRTY_BITMAP_MIG_FLAG_BITS;
+
+    bdrv_dirty_bitmap_serialize_part(dbms->bitmap, buf,
+                                     start_sector, nr_sectors);
+
+    if (buffer_is_zero(buf, buf_size)) {
+        g_free(buf);
+        buf = NULL;
+        flags |= DIRTY_BITMAP_MIG_FLAG_ZEROES;
+    }
+
+    DPRINTF("Enter send_bitmap"
+            "\n   flags:        %x"
+            "\n   start_sector: %" PRIu64
+            "\n   nr_sectors:   %" PRIu32
+            "\n   data_size:    %" PRIu64 "\n",
+            flags, start_sector, nr_sectors, buf_size);
+
+    send_bitmap_header(f, dbms, flags);
+
+    qemu_put_be64(f, start_sector);
+    qemu_put_be32(f, nr_sectors);
+
+    /* if a block is zero we need to flush here since the network
+     * bandwidth is now a lot higher than the storage device bandwidth.
+     * thus if we queue zero blocks we slow down the migration.
+     * also, skip writing block when migrate only dirty bitmaps. */
+    if (flags & DIRTY_BITMAP_MIG_FLAG_ZEROES) {
+        qemu_fflush(f);
+        return;
+    }
+
+    qemu_put_be64(f, buf_size);
+    qemu_put_buffer(f, buf, buf_size);
+    g_free(buf);
+}
+
+
+/* Called with iothread lock taken.  */
+
+static void set_dirty_tracking(void)
+{
+    DirtyBitmapMigBitmapState *dbms;
+
+    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
+        dbms->meta_bitmap =
+            bdrv_create_meta_bitmap(dbms->bitmap, CHUNK_SIZE);
+    }
+}
+
+static void unset_dirty_tracking(void)
+{
+    DirtyBitmapMigBitmapState *dbms;
+
+    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
+        bdrv_release_meta_bitmap(dbms->bitmap);
+    }
+}
+
+static void init_dirty_bitmap_migration(QEMUFile *f)
+{
+    BlockDriverState *bs;
+    BdrvDirtyBitmap *bitmap;
+    DirtyBitmapMigBitmapState *dbms;
+
+    dirty_bitmap_mig_state.bulk_completed = false;
+    dirty_bitmap_mig_state.prev_bs = NULL;
+    dirty_bitmap_mig_state.prev_bitmap = NULL;
+
+    for (bs = bdrv_next(NULL); bs; bs = bdrv_next(bs)) {
+        for (bitmap = bdrv_next_dirty_bitmap(bs, NULL); bitmap;
+             bitmap = bdrv_next_dirty_bitmap(bs, bitmap)) {
+            if (!bdrv_dirty_bitmap_name(bitmap)) {
+                continue;
+            }
+
+            if (!bdrv_get_node_name(bs) && blk_bs(bs->blk) != bs) {
+                /* not named non-root node */
+                continue;
+            }
+
+            dbms = g_new0(DirtyBitmapMigBitmapState, 1);
+            dbms->bs = bs;
+            dbms->node_name = bdrv_get_node_name(bs);
+            if (!dbms->node_name || dbms->node_name[0] == '\0') {
+                dbms->node_name = bdrv_get_device_name(bs);
+            }
+            dbms->bitmap = bitmap;
+            dbms->total_sectors = bdrv_nb_sectors(bs);
+            dbms->sectors_per_chunk = (uint64_t)CHUNK_SIZE * 8 *
+                bdrv_dirty_bitmap_granularity(dbms->bitmap) >> BDRV_SECTOR_BITS;
+
+            QSIMPLEQ_INSERT_TAIL(&dirty_bitmap_mig_state.dbms_list,
+                                 dbms, entry);
+        }
+    }
+}
+
+/* Called with no lock taken.  */
+static void bulk_phase_send_chunk(QEMUFile *f, DirtyBitmapMigBitmapState *dbms)
+{
+    uint32_t nr_sectors = MIN(dbms->total_sectors - dbms->cur_sector,
+                             dbms->sectors_per_chunk);
+
+    send_bitmap_bits(f, dbms, dbms->cur_sector, nr_sectors);
+
+    dbms->cur_sector += nr_sectors;
+    if (dbms->cur_sector >= dbms->total_sectors) {
+        dbms->bulk_completed = true;
+    }
+}
+
+/* Called with no lock taken.  */
+static void bulk_phase(QEMUFile *f, bool limit)
+{
+    DirtyBitmapMigBitmapState *dbms;
+
+    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
+        while (!dbms->bulk_completed) {
+            bulk_phase_send_chunk(f, dbms);
+            if (limit && qemu_file_rate_limit(f)) {
+                return;
+            }
+        }
+    }
+
+    dirty_bitmap_mig_state.bulk_completed = true;
+}
+
+static void blk_mig_reset_dirty_cursor(void)
+{
+    DirtyBitmapMigBitmapState *dbms;
+
+    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
+        hbitmap_iter_init(&dbms->iter_dirty, dbms->meta_bitmap, 0);
+    }
+}
+
+/* Called with iothread lock taken. */
+static bool dirty_phase_send_chunk(QEMUFile *f, DirtyBitmapMigBitmapState *dbms)
+{
+    uint32_t nr_sectors;
+    size_t old_pos = dbms->iter_dirty.pos;
+    int64_t cur = hbitmap_iter_next(&dbms->iter_dirty);
+
+    /* restart search from the beginning */
+    if (old_pos && cur == -1) {
+        hbitmap_iter_init(&dbms->iter_dirty, dbms->meta_bitmap, 0);
+        cur = hbitmap_iter_next(&dbms->iter_dirty);
+    }
+
+    if (cur == -1) {
+        hbitmap_iter_init(&dbms->iter_dirty, dbms->meta_bitmap, 0);
+        return false;
+    }
+
+    nr_sectors = MIN(dbms->total_sectors - cur, dbms->sectors_per_chunk);
+    send_bitmap_bits(f, dbms, cur, nr_sectors);
+    hbitmap_reset(dbms->meta_bitmap, cur, dbms->sectors_per_chunk);
+    cur += nr_sectors;
+
+    return true;
+}
+
+/* Called with iothread lock taken. */
+static void dirty_phase(QEMUFile *f, bool limit)
+{
+    DirtyBitmapMigBitmapState *dbms;
+
+    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
+        while (dirty_phase_send_chunk(f, dbms)) {
+            if (limit && qemu_file_rate_limit(f)) {
+                return;
+            }
+        }
+    }
+}
+
+
+/* Called with iothread lock taken.  */
+static void dirty_bitmap_mig_cleanup(void)
+{
+    DirtyBitmapMigBitmapState *dbms;
+
+    unset_dirty_tracking();
+
+    while ((dbms = QSIMPLEQ_FIRST(&dirty_bitmap_mig_state.dbms_list)) != NULL) {
+        QSIMPLEQ_REMOVE_HEAD(&dirty_bitmap_mig_state.dbms_list, entry);
+        g_free(dbms);
+    }
+}
+
+static void dirty_bitmap_migration_cancel(void *opaque)
+{
+    dirty_bitmap_mig_cleanup();
+}
+
+static int dirty_bitmap_save_iterate(QEMUFile *f, void *opaque)
+{
+    DPRINTF("Enter save live iterate\n");
+
+    if (dirty_bitmap_mig_state.bulk_completed) {
+        qemu_mutex_lock_iothread();
+        dirty_phase(f, true);
+        qemu_mutex_unlock_iothread();
+    } else {
+        bulk_phase(f, true);
+    }
+
+    qemu_put_flags(f, DIRTY_BITMAP_MIG_FLAG_EOS);
+
+    return dirty_bitmap_mig_state.bulk_completed;
+}
+
+/* Called with iothread lock taken.  */
+
+static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
+{
+    DirtyBitmapMigBitmapState *dbms;
+    DPRINTF("Enter save live complete\n");
+
+    if (!dirty_bitmap_mig_state.bulk_completed) {
+        bulk_phase(f, false);
+    }
+
+    blk_mig_reset_dirty_cursor();
+    dirty_phase(f, false);
+
+    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
+        send_bitmap_complete(f, dbms);
+    }
+
+    qemu_put_flags(f, DIRTY_BITMAP_MIG_FLAG_EOS);
+
+    DPRINTF("Dirty bitmaps migration completed\n");
+
+    dirty_bitmap_mig_cleanup();
+    return 0;
+}
+
+static uint64_t dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
+                                          uint64_t max_size)
+{
+    DirtyBitmapMigBitmapState *dbms;
+    uint64_t pending = 0;
+
+    qemu_mutex_lock_iothread();
+
+    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
+        uint64_t sectors = hbitmap_count(dbms->meta_bitmap);
+        if (!dbms->bulk_completed) {
+            sectors += dbms->total_sectors - dbms->cur_sector;
+        }
+        pending += bdrv_dirty_bitmap_data_size(dbms->bitmap, sectors);
+    }
+
+    qemu_mutex_unlock_iothread();
+
+    DPRINTF("Enter save live pending %" PRIu64 ", max: %" PRIu64 "\n",
+            pending, max_size);
+    return pending;
+}
+
+/* First occurrence of this bitmap. It should be created if doesn't exist */
+static int dirty_bitmap_load_start(QEMUFile *f, DirtyBitmapLoadState *s)
+{
+    uint32_t granularity = qemu_get_be32(f);
+    if (!s->bitmap) {
+        Error *local_err = NULL;
+        s->bitmap = bdrv_create_dirty_bitmap(s->bs, granularity,
+                                             s->bitmap_name, &local_err);
+        if (!s->bitmap) {
+            error_report("%s", error_get_pretty(local_err));
+            error_free(local_err);
+            return -EINVAL;
+        }
+    } else {
+        uint32_t dest_granularity =
+            bdrv_dirty_bitmap_granularity(s->bitmap);
+        if (dest_granularity != granularity) {
+            fprintf(stderr,
+                    "Error: "
+                    "Migrated bitmap granularity (%" PRIu32 ") "
+                    "doesn't match the destination bitmap '%s' "
+                    "granularity (%" PRIu32 ")\n",
+                    granularity,
+                    bdrv_dirty_bitmap_name(s->bitmap),
+                    dest_granularity);
+            return -EINVAL;
+        }
+    }
+
+    bdrv_disable_dirty_bitmap(s->bitmap);
+
+    return 0;
+}
+
+static void dirty_bitmap_load_complete(QEMUFile *f, DirtyBitmapLoadState *s)
+{
+    bool enabled;
+
+    bdrv_dirty_bitmap_deserialize_finish(s->bitmap);
+    DPRINTF("enab\n");
+
+    enabled = qemu_get_byte(f);
+    if (enabled) {
+        bdrv_enable_dirty_bitmap(s->bitmap);
+    }
+}
+
+static int dirty_bitmap_load_bits(QEMUFile *f, DirtyBitmapLoadState *s)
+{
+    uint64_t first_sector = qemu_get_be64(f);
+    uint32_t nr_sectors = qemu_get_be32(f);
+    DPRINTF("chunk: %lu %u\n", first_sector, nr_sectors);
+
+
+    if (s->flags & DIRTY_BITMAP_MIG_FLAG_ZEROES) {
+        DPRINTF("   - zeroes\n");
+        bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_sector,
+                                             nr_sectors);
+    } else {
+        uint8_t *buf;
+        uint64_t buf_size = qemu_get_be64(f);
+        uint64_t needed_size =
+            bdrv_dirty_bitmap_data_size(s->bitmap, nr_sectors);
+
+        if (needed_size > buf_size) {
+            fprintf(stderr,
+                    "Error: Migrated bitmap granularity doesn't "
+                    "match the destination bitmap '%s' granularity\n",
+                    bdrv_dirty_bitmap_name(s->bitmap));
+            return -EINVAL;
+        }
+
+        buf = g_malloc(buf_size);
+        qemu_get_buffer(f, buf, buf_size);
+        bdrv_dirty_bitmap_deserialize_part(s->bitmap, buf,
+                                           first_sector,
+                                           nr_sectors);
+        g_free(buf);
+    }
+
+    return 0;
+}
+
+static int dirty_bitmap_load_header(QEMUFile *f, DirtyBitmapLoadState *s)
+{
+    Error *local_err = NULL;
+    s->flags = qemu_get_flags(f);
+    DPRINTF("flags: %x\n", s->flags);
+
+    if (s->flags & DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME) {
+        qemu_get_string(f, s->node_name);
+        s->bs = bdrv_lookup_bs(s->node_name, s->node_name, &local_err);
+        if (!s->bs) {
+            error_report("%s", error_get_pretty(local_err));
+            error_free(local_err);
+            return -EINVAL;
+        }
+    } else if (!s->bs) {
+        fprintf(stderr, "Error: block device name is not set\n");
+        return -EINVAL;
+    }
+
+    if (s->flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
+        qemu_get_string(f, s->bitmap_name);
+        s->bitmap = bdrv_find_dirty_bitmap(s->bs, s->bitmap_name);
+
+        /* bitmap may be NULL here, it wouldn't be an error if it is the
+         * first occurrence of the bitmap */
+        if (!s->bitmap && !(s->flags & DIRTY_BITMAP_MIG_FLAG_START)) {
+            fprintf(stderr, "Error: unknown dirty bitmap "
+                    "'%s' for block device '%s'\n",
+                    s->bitmap_name, s->node_name);
+            return -EINVAL;
+        }
+    } else if (!s->bitmap) {
+        fprintf(stderr, "Error: block device name is not set\n");
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
+{
+    static DirtyBitmapLoadState s;
+
+    int ret = 0;
+
+    DPRINTF("load start\n");
+
+    do {
+        dirty_bitmap_load_header(f, &s);
+
+        if (s.flags & DIRTY_BITMAP_MIG_FLAG_START) {
+            ret = dirty_bitmap_load_start(f, &s);
+        } else if (s.flags & DIRTY_BITMAP_MIG_FLAG_COMPLETE) {
+            dirty_bitmap_load_complete(f, &s);
+        } else if (s.flags & DIRTY_BITMAP_MIG_FLAG_BITS) {
+            ret = dirty_bitmap_load_bits(f, &s);
+        }
+
+        DPRINTF("ret: %d\n", ret);
+        if (!ret) {
+            ret = qemu_file_get_error(f);
+        }
+
+        DPRINTF("ret: %d\n", ret);
+        if (ret) {
+            return ret;
+        }
+    } while (!(s.flags & DIRTY_BITMAP_MIG_FLAG_EOS));
+
+    DPRINTF("load finish\n");
+    return 0;
+}
+
+static bool dirty_bitmap_is_active(void *opaque)
+{
+    return migrate_dirty_bitmaps();
+}
+
+static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
+{
+    DirtyBitmapMigBitmapState *dbms = NULL;
+    init_dirty_bitmap_migration(f);
+
+    qemu_mutex_lock_iothread();
+    /* start track dirtyness of dirty bitmaps */
+    set_dirty_tracking();
+    qemu_mutex_unlock_iothread();
+
+    blk_mig_reset_dirty_cursor();
+
+    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
+        send_bitmap_start(f, dbms);
+    }
+    qemu_put_flags(f, DIRTY_BITMAP_MIG_FLAG_EOS);
+
+    return 0;
+}
+
+static SaveVMHandlers savevm_block_handlers = {
+    .save_live_setup = dirty_bitmap_save_setup,
+    .save_live_iterate = dirty_bitmap_save_iterate,
+    .save_live_complete = dirty_bitmap_save_complete,
+    .save_live_pending = dirty_bitmap_save_pending,
+    .load_state = dirty_bitmap_load,
+    .cancel = dirty_bitmap_migration_cancel,
+    .is_active = dirty_bitmap_is_active,
+};
+
+void dirty_bitmap_mig_init(void)
+{
+    QSIMPLEQ_INIT(&dirty_bitmap_mig_state.dbms_list);
+
+    register_savevm_live(NULL, "dirty-bitmap", 0, 1, &savevm_block_handlers,
+                         &dirty_bitmap_mig_state);
+}
diff --git a/vl.c b/vl.c
index 8c8f142..723a362 100644
--- a/vl.c
+++ b/vl.c
@@ -4147,6 +4147,7 @@ int main(int argc, char **argv, char **envp)
 
     blk_mig_init();
     ram_mig_init();
+    dirty_bitmap_mig_init();
 
     /* If the currently selected machine wishes to override the units-per-bus
      * property of its default HBA interface type, do so now. */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RFC v3 09/14] iotests: maintain several vms in test
  2015-02-18 14:00 [Qemu-devel] [PATCH RFC v3 00/14] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
                   ` (7 preceding siblings ...)
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 08/14] migration: add migration/block-dirty-bitmap.c Vladimir Sementsov-Ogievskiy
@ 2015-02-18 14:00 ` Vladimir Sementsov-Ogievskiy
  2015-02-18 23:48   ` John Snow
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 10/14] iotests: add add_incoming_migration to VM class Vladimir Sementsov-Ogievskiy
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-02-18 14:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, vsementsov, stefanha,
	den, amit.shah, pbonzini

The only problem with it is the same qmp socket name (which is
vm._monitor_path) for all vms. And because of this second vm couldn't be
lauched (vm.launch() fails because of socket is already in use).
This patch adds a number of vm into vm._monitor_path

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 tests/qemu-iotests/iotests.py | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index cf5faac..fa756b4 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -77,16 +77,18 @@ def create_image(name, size):
 
 class VM(object):
     '''A QEMU VM'''
+    nb_vms = 0
 
     def __init__(self):
-        self._monitor_path = os.path.join(test_dir, 'qemu-mon.%d' % os.getpid())
-        self._qemu_log_path = os.path.join(test_dir, 'qemu-log.%d' % os.getpid())
+        self._monitor_path = os.path.join(test_dir, 'qemu-mon.%d.%d' % (os.getpid(), VM.nb_vms))
+        self._qemu_log_path = os.path.join(test_dir, 'qemu-log.%d.%d' % (os.getpid(), VM.nb_vms))
         self._args = qemu_args + ['-chardev',
                      'socket,id=mon,path=' + self._monitor_path,
                      '-mon', 'chardev=mon,mode=control',
                      '-qtest', 'stdio', '-machine', 'accel=qtest',
                      '-display', 'none', '-vga', 'none']
         self._num_drives = 0
+        VM.nb_vms += 1
 
     # This can be used to add an unused monitor instance.
     def add_monitor_telnet(self, ip, port):
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RFC v3 10/14] iotests: add add_incoming_migration to VM class
  2015-02-18 14:00 [Qemu-devel] [PATCH RFC v3 00/14] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
                   ` (8 preceding siblings ...)
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 09/14] iotests: maintain several vms in test Vladimir Sementsov-Ogievskiy
@ 2015-02-18 14:00 ` Vladimir Sementsov-Ogievskiy
  2015-02-18 23:48   ` John Snow
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 11/14] iotests: add dirty bitmap migration test Vladimir Sementsov-Ogievskiy
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-02-18 14:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, vsementsov, stefanha,
	den, amit.shah, pbonzini

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 tests/qemu-iotests/iotests.py | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index fa756b4..75640b2 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -111,6 +111,12 @@ class VM(object):
         self._num_drives += 1
         return self
 
+    def add_incoming_migration(self, desc):
+        '''Add an incoming migration to the VM'''
+        self._args.append('-incoming')
+        self._args.append(desc)
+        return self
+
     def pause_drive(self, drive, event=None):
         '''Pause drive r/w operations'''
         if not event:
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RFC v3 11/14] iotests: add dirty bitmap migration test
  2015-02-18 14:00 [Qemu-devel] [PATCH RFC v3 00/14] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
                   ` (9 preceding siblings ...)
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 10/14] iotests: add add_incoming_migration to VM class Vladimir Sementsov-Ogievskiy
@ 2015-02-18 14:00 ` Vladimir Sementsov-Ogievskiy
  2015-02-19 18:47   ` John Snow
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 12/14] qapi: add md5 checksum of last dirty bitmap level to query-block Vladimir Sementsov-Ogievskiy
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-02-18 14:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, vsementsov, stefanha,
	den, amit.shah, pbonzini

The test starts two vms (vm_a, vm_b), create dirty bitmap in the first one, do
several writes to corresponding device and then migrate vm_a to vm_b
with dirty bitmaps.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 tests/qemu-iotests/117     | 88 ++++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/117.out |  5 +++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 94 insertions(+)
 create mode 100755 tests/qemu-iotests/117
 create mode 100644 tests/qemu-iotests/117.out

diff --git a/tests/qemu-iotests/117 b/tests/qemu-iotests/117
new file mode 100755
index 0000000..61538cf
--- /dev/null
+++ b/tests/qemu-iotests/117
@@ -0,0 +1,88 @@
+#!/usr/bin/env python
+#
+# Tests for dirty bitmaps migration.
+#
+# (C) Vladimir Sementsov-Ogievskiy 2015
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import os
+import iotests
+import time
+from iotests import qemu_img
+
+disk_a = os.path.join(iotests.test_dir, 'disk_a')
+disk_b = os.path.join(iotests.test_dir, 'disk_b')
+fifo   = os.path.join(iotests.test_dir, 'fifo')
+
+size   = 0x40000000 # 1G
+sector_size = 512
+granularity = 0x10000
+regions = [
+    { 'start': 0,          'count': 0x100000 },
+    { 'start': 0x10000000, 'count': 0x20000  },
+    { 'start': 0x39990000, 'count': 0x10000  }
+    ]
+
+regions_in_sectors = [
+    { key: val / sector_size for (key, val) in el.items() } for el in regions]
+
+class TestDirtyBitmapMigration(iotests.QMPTestCase):
+
+    def setUp(self):
+        os.mkfifo(fifo)
+        qemu_img('create', '-f', iotests.imgfmt, disk_a, str(size))
+        qemu_img('create', '-f', iotests.imgfmt, disk_b, str(size))
+        self.vm_a = iotests.VM().add_drive(disk_a)
+        self.vm_b = iotests.VM().add_drive(disk_b)
+        self.vm_b.add_incoming_migration("exec: cat " + fifo)
+        self.vm_a.launch()
+        self.vm_b.launch()
+
+    def tearDown(self):
+        self.vm_a.shutdown()
+        self.vm_b.shutdown()
+        os.remove(disk_a)
+        os.remove(disk_b)
+        os.remove(fifo)
+
+    def test_migration(self):
+        result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
+                               name='bitmap', granularity=granularity)
+        self.assert_qmp(result, 'return', {});
+
+        for r in regions:
+          self.vm_a.hmp_qemu_io('drive0',
+                                'write %d %d' % (r['start'], r['count']))
+
+        result = self.vm_a.qmp('query-block-dirty-bitmap', node='drive0',
+                               name='bitmap')
+        self.assert_qmp(result, 'return/dirty-regions', regions_in_sectors)
+
+        result = self.vm_a.qmp('migrate-set-capabilities',
+                               capabilities=[{'capability': 'dirty-bitmaps',
+                                              'state': True}])
+        self.assert_qmp(result, 'return', {})
+        result = self.vm_a.qmp('migrate', uri='exec:cat>' + fifo)
+        while self.vm_a.qmp('query-migrate')['return']['status'] != 'completed':
+          time.sleep(1)
+
+        result = self.vm_b.qmp('query-block-dirty-bitmap', node='drive0',
+                               name='bitmap')
+        self.assert_qmp(result, 'return/dirty-regions', regions_in_sectors);
+
+
+if __name__ == '__main__':
+    iotests.main()
diff --git a/tests/qemu-iotests/117.out b/tests/qemu-iotests/117.out
new file mode 100644
index 0000000..ae1213e
--- /dev/null
+++ b/tests/qemu-iotests/117.out
@@ -0,0 +1,5 @@
+.
+----------------------------------------------------------------------
+Ran 1 tests
+
+OK
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index b4ddf1b..6ad5b55 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -118,3 +118,4 @@
 113 rw auto quick
 114 rw auto quick
 116 rw auto quick
+117 rw auto quick
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RFC v3 12/14] qapi: add md5 checksum of last dirty bitmap level to query-block
  2015-02-18 14:00 [Qemu-devel] [PATCH RFC v3 00/14] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
                   ` (10 preceding siblings ...)
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 11/14] iotests: add dirty bitmap migration test Vladimir Sementsov-Ogievskiy
@ 2015-02-18 14:00 ` Vladimir Sementsov-Ogievskiy
  2015-02-19 18:53   ` John Snow
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 13/14] iotests: add dirty bitmap migration test Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-02-18 14:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, vsementsov, stefanha,
	den, amit.shah, pbonzini

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 block.c                | 1 +
 include/qemu/hbitmap.h | 8 ++++++++
 qapi/block-core.json   | 4 +++-
 util/hbitmap.c         | 8 ++++++++
 4 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 4cca55d..9532ccc 100644
--- a/block.c
+++ b/block.c
@@ -5600,6 +5600,7 @@ BlockDirtyInfoList *bdrv_query_dirty_bitmaps(BlockDriverState *bs)
         info->name = g_strdup(bm->name);
         info->disabled = bm->disabled;
         info->frozen = bdrv_dirty_bitmap_frozen(bm);
+        info->md5 = hbitmap_md5(bm->bitmap);
         entry->value = info;
         *plist = entry;
         plist = &entry->next;
diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index 10ce05b..2fb748a 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -188,6 +188,14 @@ void hbitmap_deserialize_zeroes(HBitmap *hb, uint64_t start, uint64_t count);
 void hbitmap_deserialize_finish(HBitmap *hb);
 
 /**
+ * hbitmap_md5:
+ * @bitmap: HBitmap to operate on.
+ *
+ * Returns md5 checksum of the last level.
+ */
+char *hbitmap_md5(const HBitmap *bitmap);
+
+/**
  * hbitmap_free:
  * @hb: HBitmap to operate on.
  *
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 25dea80..2028d37 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -336,11 +336,13 @@
 #
 # @frozen: whether the dirty bitmap is frozen (Since 2.3)
 #
+# @md5: md5 checksum of the last bitmap level (since 2.3)
+#
 # Since: 1.3
 ##
 { 'type': 'BlockDirtyInfo',
   'data': {'*name': 'str', 'count': 'int', 'granularity': 'uint32',
-           'disabled': 'bool', 'frozen': 'bool'} }
+           'disabled': 'bool', 'frozen': 'bool', 'md5': 'str'} }
 
 ##
 # @BlockInfo:
diff --git a/util/hbitmap.c b/util/hbitmap.c
index 1a736e7..8063dce 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -523,3 +523,11 @@ bool hbitmap_merge(HBitmap *a, const HBitmap *b)
 
     return true;
 }
+
+char *hbitmap_md5(const HBitmap *bitmap)
+{
+    uint64_t size =
+        MAX((bitmap->size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 1);
+    const guchar *data = (const guchar *)bitmap->levels[HBITMAP_LEVELS - 1];
+    return g_compute_checksum_for_data(G_CHECKSUM_MD5, data, size);
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RFC v3 13/14] iotests: add dirty bitmap migration test
  2015-02-18 14:00 [Qemu-devel] [PATCH RFC v3 00/14] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
                   ` (11 preceding siblings ...)
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 12/14] qapi: add md5 checksum of last dirty bitmap level to query-block Vladimir Sementsov-Ogievskiy
@ 2015-02-18 14:00 ` Vladimir Sementsov-Ogievskiy
  2015-02-19 19:30   ` John Snow
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 14/14] migration/qemu-file: make functions qemu_(get/put)_string public Vladimir Sementsov-Ogievskiy
  2015-02-19  0:11 ` [Qemu-devel] [PATCH RFC v3 00/14] Dirty bitmaps migration John Snow
  14 siblings, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-02-18 14:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, vsementsov, stefanha,
	den, amit.shah, pbonzini

[the same test as 117, but not using qmp: query-block-dirty-bitmap.
only one test from {117, 118} will be in the next patch set version]

The test starts two vms (vm_a, vm_b), create dirty bitmap in
the first one, do several writes to corresponding device and
then migrate vm_a to vm_b with dirty bitmaps.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 tests/qemu-iotests/118     | 87 ++++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/118.out |  5 +++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 93 insertions(+)
 create mode 100755 tests/qemu-iotests/118
 create mode 100644 tests/qemu-iotests/118.out

diff --git a/tests/qemu-iotests/118 b/tests/qemu-iotests/118
new file mode 100755
index 0000000..a85ea56
--- /dev/null
+++ b/tests/qemu-iotests/118
@@ -0,0 +1,87 @@
+#!/usr/bin/env python
+#
+# Tests for dirty bitmaps migration.
+#
+# (C) Vladimir Sementsov-Ogievskiy 2015
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import os
+import iotests
+import time
+from iotests import qemu_img
+
+disk_a = os.path.join(iotests.test_dir, 'disk_a')
+disk_b = os.path.join(iotests.test_dir, 'disk_b')
+fifo   = os.path.join(iotests.test_dir, 'fifo')
+
+size   = 0x40000000 # 1G
+sector_size = 512
+granularity = 0x10000
+regions = [
+    { 'start': 0,          'count': 0x100000 },
+    { 'start': 0x10000000, 'count': 0x20000  },
+    { 'start': 0x39990000, 'count': 0x10000  }
+    ]
+
+regions_in_sectors = [
+    { key: val / sector_size for (key, val) in el.items() } for el in regions]
+
+class TestDirtyBitmapMigration(iotests.QMPTestCase):
+
+    def setUp(self):
+        os.mkfifo(fifo)
+        qemu_img('create', '-f', iotests.imgfmt, disk_a, str(size))
+        qemu_img('create', '-f', iotests.imgfmt, disk_b, str(size))
+        self.vm_a = iotests.VM().add_drive(disk_a)
+        self.vm_b = iotests.VM().add_drive(disk_b)
+        self.vm_b.add_incoming_migration("exec: cat " + fifo)
+        self.vm_a.launch()
+        self.vm_b.launch()
+
+    def tearDown(self):
+        self.vm_a.shutdown()
+        self.vm_b.shutdown()
+        os.remove(disk_a)
+        os.remove(disk_b)
+        os.remove(fifo)
+
+    def test_migration(self):
+        result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
+                               name='bitmap', granularity=granularity)
+        self.assert_qmp(result, 'return', {});
+
+        for r in regions:
+          self.vm_a.hmp_qemu_io('drive0',
+                                'write %d %d' % (r['start'], r['count']))
+
+        result = self.vm_a.qmp('query-block');
+        md5 = result['return'][0]['dirty-bitmaps'][0]['md5']
+
+        result = self.vm_a.qmp('migrate-set-capabilities',
+                               capabilities=[{'capability': 'dirty-bitmaps',
+                                              'state': True}])
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm_a.qmp('migrate', uri='exec:cat>' + fifo)
+        while self.vm_a.qmp('query-migrate')['return']['status'] != 'completed':
+          time.sleep(1)
+
+        result = self.vm_b.qmp('query-block');
+        self.assert_qmp(result, 'return[0]/dirty-bitmaps[0]/md5', md5);
+
+
+if __name__ == '__main__':
+    iotests.main()
diff --git a/tests/qemu-iotests/118.out b/tests/qemu-iotests/118.out
new file mode 100644
index 0000000..ae1213e
--- /dev/null
+++ b/tests/qemu-iotests/118.out
@@ -0,0 +1,5 @@
+.
+----------------------------------------------------------------------
+Ran 1 tests
+
+OK
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 6ad5b55..1fd1983 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -119,3 +119,4 @@
 114 rw auto quick
 116 rw auto quick
 117 rw auto quick
+118 rw auto quick
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RFC v3 14/14] migration/qemu-file: make functions qemu_(get/put)_string public
  2015-02-18 14:00 [Qemu-devel] [PATCH RFC v3 00/14] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
                   ` (12 preceding siblings ...)
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 13/14] iotests: add dirty bitmap migration test Vladimir Sementsov-Ogievskiy
@ 2015-02-18 14:00 ` Vladimir Sementsov-Ogievskiy
  2015-02-19  0:00   ` John Snow
  2015-02-19  0:11 ` [Qemu-devel] [PATCH RFC v3 00/14] Dirty bitmaps migration John Snow
  14 siblings, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-02-18 14:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, vsementsov, stefanha,
	den, amit.shah, pbonzini

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 include/migration/qemu-file.h  | 17 +++++++++++++++++
 migration/block-dirty-bitmap.c | 35 -----------------------------------
 migration/qemu-file.c          | 18 ++++++++++++++++++
 3 files changed, 35 insertions(+), 35 deletions(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index a923cec..178ae63 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -310,4 +310,21 @@ static inline void qemu_get_sbe64s(QEMUFile *f, int64_t *pv)
 {
     qemu_get_be64s(f, (uint64_t *)pv);
 }
+
+/* read name from qemu file:
+ * format:
+ * 1 byte : len = name length (<256)
+ * len bytes : name without last zero byte
+ *
+ * name should point to the buffer >= 256 bytes length
+ */
+char *qemu_get_string(QEMUFile *f, char *name);
+
+/* write name to qemu file:
+ * format:
+ * same as for qemu_get_string
+ *
+ * maximum name length is 255
+ */
+void qemu_put_string(QEMUFile *f, const char *name);
 #endif
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 084ba22..8ebf7e5 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -161,41 +161,6 @@ static void qemu_put_flags(QEMUFile *f, uint32_t flags)
     qemu_put_be32(f, flags | DIRTY_BITMAP_MIG_FLAGS_SIZE_32);
 }
 
-/* read name from qemu file:
- * format:
- * 1 byte : len = name length (<256)
- * len bytes : name without last zero byte
- *
- * name should point to the buffer >= 256 bytes length
- */
-static char *qemu_get_string(QEMUFile *f, char *name)
-{
-    int len = qemu_get_byte(f);
-    qemu_get_buffer(f, (uint8_t *)name, len);
-    name[len] = '\0';
-
-    DPRINTF("get name: %d %s\n", len, name);
-
-    return name;
-}
-
-/* write name to qemu file:
- * format:
- * same as for qemu_get_string
- *
- * maximum name length is 255
- */
-static void qemu_put_string(QEMUFile *f, const char *name)
-{
-    int len = strlen(name);
-
-    DPRINTF("put name: %d %s\n", len, name);
-
-    assert(len < 256);
-    qemu_put_byte(f, len);
-    qemu_put_buffer(f, (const uint8_t *)name, len);
-}
-
 static void send_bitmap_header(QEMUFile *f, DirtyBitmapMigBitmapState *dbms,
                                uint32_t additional_flags)
 {
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index e66e557..5439f84 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -545,3 +545,21 @@ uint64_t qemu_get_be64(QEMUFile *f)
     v |= qemu_get_be32(f);
     return v;
 }
+
+char *qemu_get_string(QEMUFile *f, char *name)
+{
+    int len = qemu_get_byte(f);
+    qemu_get_buffer(f, (uint8_t *)name, len);
+    name[len] = '\0';
+
+    return name;
+}
+
+void qemu_put_string(QEMUFile *f, const char *name)
+{
+    int len = strlen(name);
+
+    assert(len < 256);
+    qemu_put_byte(f, len);
+    qemu_put_buffer(f, (const uint8_t *)name, len);
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 02/14] hbitmap: serialization
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 02/14] hbitmap: serialization Vladimir Sementsov-Ogievskiy
@ 2015-02-18 23:42   ` John Snow
  0 siblings, 0 replies; 32+ messages in thread
From: John Snow @ 2015-02-18 23:42 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, stefanha, pbonzini,
	amit.shah, den, jsnow



On 02/18/2015 09:00 AM, Vladimir Sementsov-Ogievskiy wrote:
> Functions to serialize / deserialize(restore) HBitmap. HBitmap should be
> saved to linear sequence of bits independently of endianness and bitmap
> array element (unsigned long) size. Therefore Little Endian is chosen.
>
> These functions are appropriate for dirty bitmap migration, restoring
> the bitmap in several steps is available. To save performance, every
> step writes only the last level of the bitmap. All other levels are
> restored by hbitmap_deserialize_finish() as a last step of restoring.
> So, HBitmap is inconsistent while restoring.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>   include/qemu/hbitmap.h | 59 +++++++++++++++++++++++++++++++
>   util/hbitmap.c         | 96 ++++++++++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 155 insertions(+)
>
> diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
> index c19c1cb..10ce05b 100644
> --- a/include/qemu/hbitmap.h
> +++ b/include/qemu/hbitmap.h
> @@ -129,6 +129,65 @@ void hbitmap_reset(HBitmap *hb, uint64_t start, uint64_t count);
>   bool hbitmap_get(const HBitmap *hb, uint64_t item);
>
>   /**
> + * hbitmap_data_size:
> + * @hb: HBitmap to operate on.
> + * @count: Number of bits
> + *
> + * Return amount of bytes hbitmap_serialize_part needs
> + */
> +uint64_t hbitmap_data_size(const HBitmap *hb, uint64_t count);
> +
> +/**
> + * hbitmap_serialize_part
> + * @hb: HBitmap to oprate on.
> + * @buf: Buffer to store serialized bitmap.
> + * @start: First bit to store.
> + * @count: Number of bits to store.
> + *
> + * Stores HBitmap data corresponding to given region. The format of saved data
> + * is linear sequence of bits, so it can be used by hbitmap_deserialize_part
> + * independently of endianness and size of HBitmap level array elements
> + */
> +void hbitmap_serialize_part(const HBitmap *hb, uint8_t *buf,
> +                            uint64_t start, uint64_t count);
> +
> +/**
> + * hbitmap_deserialize_part
> + * @hb: HBitmap to operate on.
> + * @buf: Buffer to restore bitmap data from.
> + * @start: First bit to restore.
> + * @count: Number of bits to restore.
> + *
> + * Retores HBitmap data corresponding to given region. The format is the same
> + * as for hbitmap_serialize_part.
> + *
> + * ! The bitmap becomes inconsistent after this operation.
> + * hbitmap_serialize_finish should be called before using the bitmap after
> + * data restoring.
> + */
> +void hbitmap_deserialize_part(HBitmap *hb, uint8_t *buf,
> +                              uint64_t start, uint64_t count);
> +
> +/**
> + * hbitmap_deserialize_zeroes
> + * @hb: HBitmap to operate on.
> + * @start: First bit to restore.
> + * @count: Number of bits to restore.
> + *
> + * Same as hbitmap_serialize_part, but fills the bitmap with zeroes.
> + */
> +void hbitmap_deserialize_zeroes(HBitmap *hb, uint64_t start, uint64_t count);
> +
> +/**
> + * hbitmap_deserialize_finish
> + * @hb: HBitmap to operate on.
> + *
> + * Repair HBitmap after calling hbitmap_deserialize_data. Actually, all HBitmap
> + * layers are restored here.
> + */
> +void hbitmap_deserialize_finish(HBitmap *hb);
> +
> +/**
>    * hbitmap_free:
>    * @hb: HBitmap to operate on.
>    *
> diff --git a/util/hbitmap.c b/util/hbitmap.c
> index 962ff29..1a736e7 100644
> --- a/util/hbitmap.c
> +++ b/util/hbitmap.c
> @@ -362,6 +362,102 @@ bool hbitmap_get(const HBitmap *hb, uint64_t item)
>       return (hb->levels[HBITMAP_LEVELS - 1][pos >> BITS_PER_LEVEL] & bit) != 0;
>   }
>
> +uint64_t hbitmap_data_size(const HBitmap *hb, uint64_t count)
> +{
> +    uint64_t size, gran;
> +
> +    if (count == 0) {
> +        return 0;
> +    }
> +
> +    gran = 1ll << hb->granularity;
> +    size = (((gran + count - 2) >> hb->granularity) >> BITS_PER_LEVEL) + 1;
> +
> +    return size * sizeof(unsigned long);
> +}
> +
> +void hbitmap_serialize_part(const HBitmap *hb, uint8_t *buf,
> +                            uint64_t start, uint64_t count)
> +{
> +    uint64_t i;
> +    uint64_t last = start + count - 1;
> +    unsigned long *out = (unsigned long *)buf;
> +
> +    if (count == 0) {
> +        return;
> +    }
> +
> +    start = (start >> hb->granularity) >> BITS_PER_LEVEL;
> +    last = (last >> hb->granularity) >> BITS_PER_LEVEL;
> +    count = last - start + 1;
> +
> +    for (i = start; i <= last; ++i) {
> +        unsigned long el = hb->levels[HBITMAP_LEVELS - 1][i];
> +        out[i] = (BITS_PER_LONG == 32 ? cpu_to_le32(el) : cpu_to_le64(el));
> +    }
> +}
> +
> +void hbitmap_deserialize_part(HBitmap *hb, uint8_t *buf,
> +                              uint64_t start, uint64_t count)
> +{
> +    uint64_t i;
> +    uint64_t last = start + count - 1;
> +    unsigned long *in = (unsigned long *)buf;
> +
> +    if (count == 0) {
> +        return;
> +    }
> +
> +    start = (start >> hb->granularity) >> BITS_PER_LEVEL;
> +    last = (last >> hb->granularity) >> BITS_PER_LEVEL;
> +    count = last - start + 1;
> +
> +    for (i = start; i <= last; ++i) {
> +        hb->levels[HBITMAP_LEVELS - 1][i] =
> +            (BITS_PER_LONG == 32 ? le32_to_cpu(in[i]) : le64_to_cpu(in[i]));
> +    }
> +}
> +
> +void hbitmap_deserialize_zeroes(HBitmap *hb, uint64_t start, uint64_t count)
> +{
> +    uint64_t last = start + count - 1;
> +
> +    if (count == 0) {
> +        return;
> +    }
> +
> +    start = (start >> hb->granularity) >> BITS_PER_LEVEL;
> +    last = (last >> hb->granularity) >> BITS_PER_LEVEL;
> +    count = last - start + 1;
> +
> +    memset(hb->levels[HBITMAP_LEVELS - 1] + start, 0,
> +           count * sizeof(unsigned long));
> +}
> +
> +void hbitmap_deserialize_finish(HBitmap *bitmap)
> +{
> +    int64_t i, size, prev_size;
> +    int lev;
> +
> +    /* restore levels starting from penultimate to zero level, assuming
> +     * that the last level is ok */
> +    size = MAX((bitmap->size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 1);
> +    for (lev = HBITMAP_LEVELS - 1; lev-- > 0; ) {
> +        prev_size = size;
> +        size = MAX((size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 1);
> +        memset(bitmap->levels[lev], 0, size * sizeof(unsigned long));
> +
> +        for (i = 0; i < prev_size; ++i) {
> +            if (bitmap->levels[lev + 1][i]) {
> +                bitmap->levels[lev][i >> BITS_PER_LEVEL] |=
> +                    1 << (i & (BITS_PER_LONG - 1));
> +            }
> +        }
> +    }
> +
> +    bitmap->levels[0][0] |= 1UL << (BITS_PER_LONG - 1);
> +}
> +
>   void hbitmap_free(HBitmap *hb)
>   {
>       unsigned i;
>

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 03/14] block: BdrvDirtyBitmap serialization interface
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 03/14] block: BdrvDirtyBitmap serialization interface Vladimir Sementsov-Ogievskiy
@ 2015-02-18 23:43   ` John Snow
  0 siblings, 0 replies; 32+ messages in thread
From: John Snow @ 2015-02-18 23:43 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, stefanha, pbonzini,
	amit.shah, den



On 02/18/2015 09:00 AM, Vladimir Sementsov-Ogievskiy wrote:
> Several functions to provide necessary access to BdrvDirtyBitmap for
> block-migration.c
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>   block.c               | 36 ++++++++++++++++++++++++++++++++++++
>   include/block/block.h | 13 +++++++++++++
>   2 files changed, 49 insertions(+)
>
> diff --git a/block.c b/block.c
> index e4547d7..e95a5ae 100644
> --- a/block.c
> +++ b/block.c
> @@ -5674,6 +5674,42 @@ void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap)
>       hbitmap_reset(bitmap->bitmap, 0, bitmap->size);
>   }
>
> +const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap)
> +{
> +    return bitmap->name;
> +}
> +
> +uint64_t bdrv_dirty_bitmap_data_size(const BdrvDirtyBitmap *bitmap,
> +                                     uint64_t count)
> +{
> +    return hbitmap_data_size(bitmap->bitmap, count);
> +}
> +
> +void bdrv_dirty_bitmap_serialize_part(const BdrvDirtyBitmap *bitmap,
> +                                      uint8_t *buf, uint64_t start,
> +                                      uint64_t count)
> +{
> +    hbitmap_serialize_part(bitmap->bitmap, buf, start, count);
> +}
> +
> +void bdrv_dirty_bitmap_deserialize_part(BdrvDirtyBitmap *bitmap,
> +                                        uint8_t *buf, uint64_t start,
> +                                        uint64_t count)
> +{
> +    hbitmap_deserialize_part(bitmap->bitmap, buf, start, count);
> +}
> +
> +void bdrv_dirty_bitmap_deserialize_zeroes(BdrvDirtyBitmap *bitmap,
> +                                          uint64_t start, uint64_t count)
> +{
> +    hbitmap_deserialize_zeroes(bitmap->bitmap, start, count);
> +}
> +
> +void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap)
> +{
> +    hbitmap_deserialize_finish(bitmap->bitmap);
> +}
> +
>   static void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
>                              int nr_sectors)
>   {
> diff --git a/include/block/block.h b/include/block/block.h
> index c9d96a6..c6a928d 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -474,6 +474,19 @@ void bdrv_dirty_iter_init(BdrvDirtyBitmap *bitmap, struct HBitmapIter *hbi);
>   void bdrv_set_dirty_iter(struct HBitmapIter *hbi, int64_t offset);
>   int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap);
>
> +const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap);
> +uint64_t bdrv_dirty_bitmap_data_size(const BdrvDirtyBitmap *bitmap,
> +                                     uint64_t count);
> +void bdrv_dirty_bitmap_serialize_part(const BdrvDirtyBitmap *bitmap,
> +                                      uint8_t *buf, uint64_t start,
> +                                      uint64_t count);
> +void bdrv_dirty_bitmap_deserialize_part(BdrvDirtyBitmap *bitmap,
> +                                        uint8_t *buf, uint64_t start,
> +                                        uint64_t count);
> +void bdrv_dirty_bitmap_deserialize_zeroes(BdrvDirtyBitmap *bitmap,
> +                                          uint64_t start, uint64_t count);
> +void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap);
> +
>   void bdrv_enable_copy_on_read(BlockDriverState *bs);
>   void bdrv_disable_copy_on_read(BlockDriverState *bs);
>
>

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 04/14] block: tiny refactoring: minimize hbitmap_(set/reset) usage
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 04/14] block: tiny refactoring: minimize hbitmap_(set/reset) usage Vladimir Sementsov-Ogievskiy
@ 2015-02-18 23:44   ` John Snow
  0 siblings, 0 replies; 32+ messages in thread
From: John Snow @ 2015-02-18 23:44 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, stefanha, pbonzini,
	amit.shah, den



On 02/18/2015 09:00 AM, Vladimir Sementsov-Ogievskiy wrote:
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>   block.c | 7 +++----
>   1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/block.c b/block.c
> index e95a5ae..a127fd2 100644
> --- a/block.c
> +++ b/block.c
> @@ -5670,8 +5670,7 @@ void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
>
>   void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap)
>   {
> -    assert(bdrv_dirty_bitmap_enabled(bitmap));
> -    hbitmap_reset(bitmap->bitmap, 0, bitmap->size);
> +    bdrv_reset_dirty_bitmap(bitmap, 0, bitmap->size);
>   }
>
>   const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap)
> @@ -5718,7 +5717,7 @@ static void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
>           if (!bdrv_dirty_bitmap_enabled(bitmap)) {
>               continue;
>           }
> -        hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
> +        bdrv_set_dirty_bitmap(bitmap, cur_sector, nr_sectors);
>       }
>   }
>
> @@ -5730,7 +5729,7 @@ static void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector,
>           if (!bdrv_dirty_bitmap_enabled(bitmap)) {
>               continue;
>           }
> -        hbitmap_reset(bitmap->bitmap, cur_sector, nr_sectors);
> +        bdrv_reset_dirty_bitmap(bitmap, cur_sector, nr_sectors);
>       }
>   }
>
>

Thanks for this.

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 05/14] block: add meta bitmaps
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 05/14] block: add meta bitmaps Vladimir Sementsov-Ogievskiy
@ 2015-02-18 23:45   ` John Snow
  2015-02-19 11:43     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 32+ messages in thread
From: John Snow @ 2015-02-18 23:45 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, stefanha, pbonzini,
	amit.shah, den



On 02/18/2015 09:00 AM, Vladimir Sementsov-Ogievskiy wrote:
> Meta bitmap is a 'dirty bitmap' for the BdrvDirtyBitmap. It tracks
> changes (set/unset) of this BdrvDirtyBitmap. It is needed for live
> migration of block dirty bitmaps.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>   block.c               | 40 ++++++++++++++++++++++++++++++++++++++++
>   include/block/block.h |  5 +++++
>   2 files changed, 45 insertions(+)
>
> diff --git a/block.c b/block.c
> index a127fd2..aaa08b8 100644
> --- a/block.c
> +++ b/block.c
> @@ -58,9 +58,15 @@
>    * (3) successor is set: frozen mode.
>    *     A frozen bitmap cannot be renamed, deleted, anonymized, cleared, set,
>    *     or enabled. A frozen bitmap can only abdicate() or reclaim().
> + *
> + * Meta bitmap:
> + * Meta bitmap is a 'dirty bitmap' for the BdrvDirtyBitmap. It tracks changes
> + * (set/unset) of this BdrvDirtyBitmap. It is needed for live migration of
> + * block dirty bitmaps.
>    */
>   struct BdrvDirtyBitmap {
>       HBitmap *bitmap;            /* Dirty sector bitmap implementation */
> +    HBitmap *meta_bitmap;       /* Meta bitmap */
>       BdrvDirtyBitmap *successor; /* Anonymous child; implies frozen status */
>       char *name;                 /* Optional non-empty unique ID */
>       int64_t size;               /* Size of the bitmap (Number of sectors) */
> @@ -5398,6 +5404,31 @@ void bdrv_dirty_bitmap_make_anon(BdrvDirtyBitmap *bitmap)
>       bitmap->name = NULL;
>   }
>
> +HBitmap *bdrv_create_meta_bitmap(BdrvDirtyBitmap *bitmap,
> +                                        uint64_t granularity)
> +{
> +    uint64_t sector_granularity;
> +
> +    assert((granularity & (granularity - 1)) == 0);
> +
> +    granularity *= 8 * bdrv_dirty_bitmap_granularity(bitmap);
> +    sector_granularity = granularity >> BDRV_SECTOR_BITS;
> +    assert(sector_granularity);
> +

The maths here could use a comment, I think.

the "granularity" field here actually describes the desired 
serialization buffer size; e.g. CHUNK_SIZE (1 << 20) or 1 MiB. This 
parameter should be renamed to explain what it's actually for. Something 
like "chunk_size" and a comment explaining that it is in bytes.

...

That said, let's talk about the default chunk size you're using in 
correlation with this function.

a CHUNK_SIZE of 1MiB here is going to lead us to, if we have a bitmap 
with the default granularity of 128 sectors\64KiB bytes, a granularity 
for the meta_bitmap of one billion sectors (1 << 30) or 512GiB.

That's going to be bigger than most drives entirely, which will 
generally lead us to only using a single "chunk" per drive. Which means 
we won't really get a lot of mileage out of the bulk/dirty phases most 
of the time.

It's wild to think about that the first 1,000,000,000 sectors or 
512,000,000,000 bytes will all be represented by the first single bit in 
this bitmap. If a single hair on the drive changes, we resend the 
_entire_ bitmap, possibly over and over again. Will we ever make 
progress? Should we investigate a smaller chunk size?

Here's some quick mappings of chunk size (bytes) to effective 
meta_bitmap byte granularities, assuming the meta_bitmap is tracking a 
bitmap with the default granularity of 64KiB:

(1 << 20) 1MiB   -- 512GiB  // This is too high of a granularity
(1 << 17) 128KiB --  64GiB
(1 << 15) 32KiB  --  16GiB
(1 << 11) 2KiB   --   1GiB
(1 << 10) 1KiB   -- 512MiB
(1 << 9)  512B   -- 256MiB
(1 << 8)  256B   -- 128MiB
(1 << 5)  32 B   --  16MiB  // This is too small of a chunk size.
(1 << 1)   1 B   --   1MiB

We want to make the chunk sends efficient, but we also want to make sure 
that the dirty phase doesn't resend more data than it needs to, so we 
need to strike a balance here, no?

I think arguments could be made for most granularities between 128MiB 
through 1GiB. Anything outside of that is too lopsided, IMO.

What are your thoughts on this?

> +    bitmap->meta_bitmap =
> +        hbitmap_alloc(bitmap->size, ffsll(sector_granularity) - 1);
> +
> +    return bitmap->meta_bitmap;
> +}
> +
> +void bdrv_release_meta_bitmap(BdrvDirtyBitmap *bitmap)
> +{
> +    if (bitmap->meta_bitmap) {
> +        hbitmap_free(bitmap->meta_bitmap);
> +        bitmap->meta_bitmap = NULL;
> +    }
> +}
> +
>   BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,
>                                             uint32_t granularity,
>                                             const char *name,
> @@ -5532,6 +5563,9 @@ void bdrv_release_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap)
>               assert(!bdrv_dirty_bitmap_frozen(bm));
>               QLIST_REMOVE(bitmap, list);
>               hbitmap_free(bitmap->bitmap);
> +            if (bitmap->meta_bitmap) {
> +                hbitmap_free(bitmap->meta_bitmap);
> +            }
>               g_free(bitmap->name);
>               g_free(bitmap);
>               return;
> @@ -5659,6 +5693,9 @@ void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
>   {
>       assert(bdrv_dirty_bitmap_enabled(bitmap));
>       hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
> +    if (bitmap->meta_bitmap) {
> +        hbitmap_set(bitmap->meta_bitmap, cur_sector, nr_sectors);
> +    }
>   }
>
>   void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
> @@ -5666,6 +5703,9 @@ void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
>   {
>       assert(bdrv_dirty_bitmap_enabled(bitmap));
>       hbitmap_reset(bitmap->bitmap, cur_sector, nr_sectors);
> +    if (bitmap->meta_bitmap) {
> +        hbitmap_set(bitmap->meta_bitmap, cur_sector, nr_sectors);
> +    }
>   }
>
>   void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap)
> diff --git a/include/block/block.h b/include/block/block.h
> index c6a928d..f2c62f6 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -4,6 +4,7 @@
>   #include "block/aio.h"
>   #include "qemu-common.h"
>   #include "qemu/option.h"
> +#include "qemu/hbitmap.h"
>   #include "block/coroutine.h"
>   #include "block/accounting.h"
>   #include "qapi/qmp/qobject.h"
> @@ -487,6 +488,10 @@ void bdrv_dirty_bitmap_deserialize_zeroes(BdrvDirtyBitmap *bitmap,
>                                             uint64_t start, uint64_t count);
>   void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap);
>
> +HBitmap *bdrv_create_meta_bitmap(BdrvDirtyBitmap *bitmap,
> +                                        uint64_t granularity);
> +void bdrv_release_meta_bitmap(BdrvDirtyBitmap *bitmap);
> +
>   void bdrv_enable_copy_on_read(BlockDriverState *bs);
>   void bdrv_disable_copy_on_read(BlockDriverState *bs);
>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 06/14] block: add bdrv_next_dirty_bitmap()
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 06/14] block: add bdrv_next_dirty_bitmap() Vladimir Sementsov-Ogievskiy
@ 2015-02-18 23:45   ` John Snow
  0 siblings, 0 replies; 32+ messages in thread
From: John Snow @ 2015-02-18 23:45 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, stefanha, pbonzini,
	amit.shah, den



On 02/18/2015 09:00 AM, Vladimir Sementsov-Ogievskiy wrote:
> Like bdrv_next()  - bdrv_next_dirty_bitmap() is a function to provide
> access to private dirty bitmaps list.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>   block.c               | 10 ++++++++++
>   include/block/block.h |  2 ++
>   2 files changed, 12 insertions(+)
>
> diff --git a/block.c b/block.c
> index aaa08b8..4cca55d 100644
> --- a/block.c
> +++ b/block.c
> @@ -5649,6 +5649,16 @@ BlockDirtyBitmapInfo *bdrv_query_dirty_bitmap(BlockDriverState *bs,
>       return info;
>   }
>
> +BdrvDirtyBitmap *bdrv_next_dirty_bitmap(BlockDriverState *bs,
> +                                        BdrvDirtyBitmap *bitmap)
> +{
> +    if (bitmap == NULL) {
> +        return QLIST_FIRST(&bs->dirty_bitmaps);
> +    }
> +
> +    return QLIST_NEXT(bitmap, list);
> +}
> +
>   int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap, int64_t sector)
>   {
>       if (bitmap) {
> diff --git a/include/block/block.h b/include/block/block.h
> index f2c62f6..d11d956 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -487,6 +487,8 @@ void bdrv_dirty_bitmap_deserialize_part(BdrvDirtyBitmap *bitmap,
>   void bdrv_dirty_bitmap_deserialize_zeroes(BdrvDirtyBitmap *bitmap,
>                                             uint64_t start, uint64_t count);
>   void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap);
> +BdrvDirtyBitmap *bdrv_next_dirty_bitmap(BlockDriverState *bs,
> +                                        BdrvDirtyBitmap *bitmap);
>
>   HBitmap *bdrv_create_meta_bitmap(BdrvDirtyBitmap *bitmap,
>                                           uint64_t granularity);
>

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 07/14] qapi: add dirty-bitmaps migration capability
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 07/14] qapi: add dirty-bitmaps migration capability Vladimir Sementsov-Ogievskiy
@ 2015-02-18 23:45   ` John Snow
  0 siblings, 0 replies; 32+ messages in thread
From: John Snow @ 2015-02-18 23:45 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, stefanha, pbonzini,
	amit.shah, den



On 02/18/2015 09:00 AM, Vladimir Sementsov-Ogievskiy wrote:
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>   include/migration/migration.h | 1 +
>   migration/migration.c         | 9 +++++++++
>   qapi-schema.json              | 5 ++++-
>   3 files changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index f37348b..719875d 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -145,6 +145,7 @@ void migrate_del_blocker(Error *reason);
>
>   bool migrate_rdma_pin_all(void);
>   bool migrate_zero_blocks(void);
> +bool migrate_dirty_bitmaps(void);
>
>   bool migrate_auto_converge(void);
>
> diff --git a/migration/migration.c b/migration/migration.c
> index b3adbc6..68e7478 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -567,6 +567,15 @@ bool migrate_zero_blocks(void)
>       return s->enabled_capabilities[MIGRATION_CAPABILITY_ZERO_BLOCKS];
>   }
>
> +bool migrate_dirty_bitmaps(void)
> +{
> +    MigrationState *s;
> +
> +    s = migrate_get_current();
> +
> +    return s->enabled_capabilities[MIGRATION_CAPABILITY_DIRTY_BITMAPS];
> +}
> +
>   int migrate_use_xbzrle(void)
>   {
>       MigrationState *s;
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 0c31203..70b54ab 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -494,10 +494,13 @@
>   # @auto-converge: If enabled, QEMU will automatically throttle down the guest
>   #          to speed up convergence of RAM migration. (since 1.6)
>   #
> +# @dirty-bitmaps: If enabled, QEMU will migrate named dirty bitmaps. (since 2.3)
> +#
>   # Since: 1.2
>   ##
>   { 'enum': 'MigrationCapability',
> -  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks'] }
> +  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
> +           'dirty-bitmaps'] }
>
>   ##
>   # @MigrationCapabilityStatus
>

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 08/14] migration: add migration/block-dirty-bitmap.c
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 08/14] migration: add migration/block-dirty-bitmap.c Vladimir Sementsov-Ogievskiy
@ 2015-02-18 23:47   ` John Snow
  2015-02-19 13:48     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 32+ messages in thread
From: John Snow @ 2015-02-18 23:47 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, stefanha, pbonzini,
	amit.shah, den



On 02/18/2015 09:00 AM, Vladimir Sementsov-Ogievskiy wrote:
> Live migration of dirty bitmaps. Only named dirty bitmaps, associated with
> root nodes and non-root named nodes are migrated.
>
> If destination qemu is already containing a dirty bitmap with the same name
> as a migrated bitmap (for the same node), than, if their granularities are
> the same the migration will be done, otherwise the error will be generated.
>
> If destination qemu doesn't contain such bitmap it will be created.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>   include/migration/block.h      |   1 +
>   migration/Makefile.objs        |   2 +-
>   migration/block-dirty-bitmap.c | 708 +++++++++++++++++++++++++++++++++++++++++
>   vl.c                           |   1 +
>   4 files changed, 711 insertions(+), 1 deletion(-)
>   create mode 100644 migration/block-dirty-bitmap.c
>
> diff --git a/include/migration/block.h b/include/migration/block.h
> index ffa8ac0..566bb9f 100644
> --- a/include/migration/block.h
> +++ b/include/migration/block.h
> @@ -14,6 +14,7 @@
>   #ifndef BLOCK_MIGRATION_H
>   #define BLOCK_MIGRATION_H
>
> +void dirty_bitmap_mig_init(void);
>   void blk_mig_init(void);
>   int blk_mig_active(void);
>   uint64_t blk_mig_bytes_transferred(void);
> diff --git a/migration/Makefile.objs b/migration/Makefile.objs
> index d929e96..128612d 100644
> --- a/migration/Makefile.objs
> +++ b/migration/Makefile.objs
> @@ -6,5 +6,5 @@ common-obj-y += xbzrle.o
>   common-obj-$(CONFIG_RDMA) += rdma.o
>   common-obj-$(CONFIG_POSIX) += exec.o unix.o fd.o
>
> -common-obj-y += block.o
> +common-obj-y += block.o block-dirty-bitmap.o
>
> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
> new file mode 100644
> index 0000000..084ba22
> --- /dev/null
> +++ b/migration/block-dirty-bitmap.c
> @@ -0,0 +1,708 @@
> +/*
> + * QEMU dirty bitmap migration
> + *
> + * Live migration of dirty bitmaps. Only named dirty bitmaps, associated with
> + * root nodes and non-root named nodes are migrated.
> + *
> + * If destination qemu is already containing a dirty bitmap with the same name
> + * as a migrated bitmap (for the same node), than, if their granularities are
> + * the same the migration will be done, otherwise the error will be generated.
> + *
> + * If destination qemu doesn't contain such bitmap it will be created.
> + *
> + * format of migration:
> + *
> + * # Header (shared for different chunk types)
> + * 1 byte: flags

1, 2, or 4 bytes.

> + * [ 1 byte: node name size ] \  flags & DEVICE_NAME
> + * [ n bytes: node name     ] /
> + * [ 1 byte: bitmap name size ] \  flags & BITMAP_NAME
> + * [ n bytes: bitmap name     ] /
> + *
> + * # Start of bitmap migration (flags & START)
> + * header
> + * be64: granularity
> + *
> + * # Complete of bitmap migration (flags & COMPLETE)
> + * header
> + * 1 byte: bitmap enabled flag
> + *
> + * # Data chunk of bitmap migration
> + * header
> + * be64: start sector
> + * be32: number of sectors
> + * [ be64: buffer size  ] \ ! (flags & ZEROES)
> + * [ n bytes: buffer    ] /
> + *
> + * The last chunk in stream should contain flags & EOS. The chunk may skip
> + * device and/or bitmap names, assuming them to be the same with the previous
> + * chunk.
> + *
> + *
> + * This file is derived from migration/block.c
> + *
> + * Author:
> + * Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> + *
> + * original copyright message:
> + * =====================================================================
> + * Copyright IBM, Corp. 2009
> + *
> + * Authors:
> + *  Liran Schour   <lirans@il.ibm.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + * Contributions after 2012-01-13 are licensed under the terms of the
> + * GNU GPL, version 2 or (at your option) any later version.
> + * =====================================================================
> + */
> +
> +#include "block/block.h"
> +#include "block/block_int.h"
> +#include "sysemu/block-backend.h"
> +#include "qemu/main-loop.h"
> +#include "qemu/error-report.h"
> +#include "migration/block.h"
> +#include "migration/migration.h"
> +#include "qemu/hbitmap.h"
> +#include <assert.h>
> +
> +#define CHUNK_SIZE                       (1 << 20)
> +
> +/* Flags occupy from one to four bytes. In all but one the 7-th (EXTRA_FLAGS)
> + * bit should be set. */
> +#define DIRTY_BITMAP_MIG_FLAG_EOS           0x01
> +#define DIRTY_BITMAP_MIG_FLAG_ZEROES        0x02
> +#define DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME   0x04
> +#define DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME   0x08
> +#define DIRTY_BITMAP_MIG_FLAG_START         0x10
> +#define DIRTY_BITMAP_MIG_FLAG_COMPLETE      0x20
> +#define DIRTY_BITMAP_MIG_FLAG_BITS          0x40
> +
> +#define DIRTY_BITMAP_MIG_EXTRA_FLAGS        0x80
> +#define DIRTY_BITMAP_MIG_FLAGS_SIZE_16      0x8000
> +#define DIRTY_BITMAP_MIG_FLAGS_SIZE_32      0x8080
> +
> +#define DEBUG_DIRTY_BITMAP_MIGRATION
> +
> +#ifdef DEBUG_DIRTY_BITMAP_MIGRATION
> +#define DPRINTF(fmt, ...) \
> +    do { printf("dirty_migration: " fmt, ## __VA_ARGS__); } while (0)
> +#else
> +#define DPRINTF(fmt, ...) \
> +    do { } while (0)
> +#endif
> +

Take a look at hw/ide/ahci.c, which has a DPRINTF macro defined so that 
the debugging can be turned off, but the print statements etc. will 
still be compiled and typechecked.

The blurb looks like this:

#define DEBUG_AHCI 0

#define DPRINTF(port, fmt, ...) \
do { \
     if (DEBUG_AHCI) { \
         fprintf(stderr, "ahci: %s: [%d] ", __func__, port); \
         fprintf(stderr, fmt, ## __VA_ARGS__); \
     } \
} while (0)

When it comes time to submit this as non-RFC, leave 
DEBUG_DIRTY_BITMAP_MIGRATION defined to 0.

> +typedef struct DirtyBitmapMigBitmapState {
> +    /* Written during setup phase. */
> +    BlockDriverState *bs;
> +    const char *node_name;
> +    BdrvDirtyBitmap *bitmap;
> +    HBitmap *meta_bitmap;
> +    uint64_t total_sectors;
> +    uint64_t sectors_per_chunk;
> +    QSIMPLEQ_ENTRY(DirtyBitmapMigBitmapState) entry;
> +
> +    /* For bulk phase. */
> +    bool bulk_completed;
> +    uint64_t cur_sector;
> +
> +    /* For dirty phase. */
> +    HBitmapIter iter_dirty;
> +} DirtyBitmapMigBitmapState;
> +
> +typedef struct DirtyBitmapMigState {
> +    QSIMPLEQ_HEAD(dbms_list, DirtyBitmapMigBitmapState) dbms_list;
> +
> +    bool bulk_completed;
> +
> +    /* for send_bitmap() */

send_bitmap_bits, now.

> +    BlockDriverState *prev_bs;
> +    BdrvDirtyBitmap *prev_bitmap;
> +} DirtyBitmapMigState;
> +
> +typedef struct DirtyBitmapLoadState {
> +    uint32_t flags;
> +    char node_name[256];
> +    char bitmap_name[256];
> +    BlockDriverState *bs;
> +    BdrvDirtyBitmap *bitmap;
> +} DirtyBitmapLoadState;
> +
> +static DirtyBitmapMigState dirty_bitmap_mig_state;
> +
> +static uint32_t qemu_get_flags(QEMUFile *f)
> +{
> +    uint8_t flags = qemu_get_byte(f);
> +    if (flags & DIRTY_BITMAP_MIG_EXTRA_FLAGS) {
> +        flags = flags << 8 | qemu_get_byte(f);
> +        if (flags & DIRTY_BITMAP_MIG_EXTRA_FLAGS) {
> +            flags = flags << 16 | qemu_get_be16(f);
> +        }
> +    }
> +
> +    return flags;
> +}
> +
> +static void qemu_put_flags(QEMUFile *f, uint32_t flags)
> +{
> +    if (!(flags & 0xffffff00)) {
> +        qemu_put_byte(f, flags);
> +        return;
> +    }
> +
> +    if (!(flags & 0xffff0000)) {
> +        qemu_put_be16(f, flags | DIRTY_BITMAP_MIG_FLAGS_SIZE_16);
> +        return;
> +    }
> +
> +    qemu_put_be32(f, flags | DIRTY_BITMAP_MIG_FLAGS_SIZE_32);
> +}
> +

This will give us breathing room for sure :)

> +/* read name from qemu file:
> + * format:
> + * 1 byte : len = name length (<256)
> + * len bytes : name without last zero byte
> + *
> + * name should point to the buffer >= 256 bytes length
> + */
> +static char *qemu_get_string(QEMUFile *f, char *name)
> +{
> +    int len = qemu_get_byte(f);
> +    qemu_get_buffer(f, (uint8_t *)name, len);
> +    name[len] = '\0';
> +
> +    DPRINTF("get name: %d %s\n", len, name);
> +
> +    return name;
> +}
> +
> +/* write name to qemu file:
> + * format:
> + * same as for qemu_get_string
> + *
> + * maximum name length is 255
> + */
> +static void qemu_put_string(QEMUFile *f, const char *name)
> +{
> +    int len = strlen(name);
> +
> +    DPRINTF("put name: %d %s\n", len, name);
> +
> +    assert(len < 256);
> +    qemu_put_byte(f, len);
> +    qemu_put_buffer(f, (const uint8_t *)name, len);
> +}
> +

Thanks, sorry for the nitpicking.

> +static void send_bitmap_header(QEMUFile *f, DirtyBitmapMigBitmapState *dbms,
> +                               uint32_t additional_flags)
> +{
> +    BlockDriverState *bs = dbms->bs;
> +    BdrvDirtyBitmap *bitmap = dbms->bitmap;
> +    uint32_t flags = additional_flags;
> +
> +    if (bs != dirty_bitmap_mig_state.prev_bs) {
> +        dirty_bitmap_mig_state.prev_bs = bs;
> +        flags |= DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME;
> +    }
> +
> +    if (bitmap != dirty_bitmap_mig_state.prev_bitmap) {
> +        dirty_bitmap_mig_state.prev_bitmap = bitmap;
> +        flags |= DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME;
> +    }
> +
> +    qemu_put_flags(f, flags);
> +
> +    if (flags & DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME) {
> +        qemu_put_string(f, dbms->node_name);
> +    }
> +
> +    if (flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
> +        qemu_put_string(f, bdrv_dirty_bitmap_name(bitmap));
> +    }
> +}
> +
> +static void send_bitmap_start(QEMUFile *f, DirtyBitmapMigBitmapState *dbms)
> +{
> +    send_bitmap_header(f, dbms, DIRTY_BITMAP_MIG_FLAG_START);
> +    qemu_put_be32(f, bdrv_dirty_bitmap_granularity(dbms->bitmap));
> +}
> +
> +static void send_bitmap_complete(QEMUFile *f, DirtyBitmapMigBitmapState *dbms)
> +{
> +    send_bitmap_header(f, dbms, DIRTY_BITMAP_MIG_FLAG_COMPLETE);
> +    qemu_put_byte(f, bdrv_dirty_bitmap_enabled(dbms->bitmap));
> +}
> +
> +static void send_bitmap_bits(QEMUFile *f, DirtyBitmapMigBitmapState *dbms,
> +                             uint64_t start_sector, uint32_t nr_sectors)
> +{
> +    /* align for buffer_is_zero() */
> +    uint64_t align = 4 * sizeof(long);
> +    uint64_t buf_size =
> +        (bdrv_dirty_bitmap_data_size(dbms->bitmap, nr_sectors) + align - 1) &
> +        ~(align - 1);
> +    uint8_t *buf = g_malloc0(buf_size);
> +    uint32_t flags = DIRTY_BITMAP_MIG_FLAG_BITS;
> +
> +    bdrv_dirty_bitmap_serialize_part(dbms->bitmap, buf,
> +                                     start_sector, nr_sectors);
> +
> +    if (buffer_is_zero(buf, buf_size)) {
> +        g_free(buf);
> +        buf = NULL;
> +        flags |= DIRTY_BITMAP_MIG_FLAG_ZEROES;
> +    }
> +
> +    DPRINTF("Enter send_bitmap"
> +            "\n   flags:        %x"
> +            "\n   start_sector: %" PRIu64
> +            "\n   nr_sectors:   %" PRIu32
> +            "\n   data_size:    %" PRIu64 "\n",
> +            flags, start_sector, nr_sectors, buf_size);
> +
> +    send_bitmap_header(f, dbms, flags);
> +
> +    qemu_put_be64(f, start_sector);
> +    qemu_put_be32(f, nr_sectors);
> +
> +    /* if a block is zero we need to flush here since the network
> +     * bandwidth is now a lot higher than the storage device bandwidth.
> +     * thus if we queue zero blocks we slow down the migration.
> +     * also, skip writing block when migrate only dirty bitmaps. */
> +    if (flags & DIRTY_BITMAP_MIG_FLAG_ZEROES) {
> +        qemu_fflush(f);
> +        return;
> +    }
> +
> +    qemu_put_be64(f, buf_size);
> +    qemu_put_buffer(f, buf, buf_size);
> +    g_free(buf);
> +}
> +
> +
> +/* Called with iothread lock taken.  */
> +
> +static void set_dirty_tracking(void)
> +{
> +    DirtyBitmapMigBitmapState *dbms;
> +
> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
> +        dbms->meta_bitmap =
> +            bdrv_create_meta_bitmap(dbms->bitmap, CHUNK_SIZE);
> +    }
> +}
> +
> +static void unset_dirty_tracking(void)
> +{
> +    DirtyBitmapMigBitmapState *dbms;
> +
> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
> +        bdrv_release_meta_bitmap(dbms->bitmap);
> +    }
> +}
> +
> +static void init_dirty_bitmap_migration(QEMUFile *f)
> +{
> +    BlockDriverState *bs;
> +    BdrvDirtyBitmap *bitmap;
> +    DirtyBitmapMigBitmapState *dbms;
> +
> +    dirty_bitmap_mig_state.bulk_completed = false;
> +    dirty_bitmap_mig_state.prev_bs = NULL;
> +    dirty_bitmap_mig_state.prev_bitmap = NULL;
> +
> +    for (bs = bdrv_next(NULL); bs; bs = bdrv_next(bs)) {
> +        for (bitmap = bdrv_next_dirty_bitmap(bs, NULL); bitmap;
> +             bitmap = bdrv_next_dirty_bitmap(bs, bitmap)) {
> +            if (!bdrv_dirty_bitmap_name(bitmap)) {
> +                continue;
> +            }
> +
> +            if (!bdrv_get_node_name(bs) && blk_bs(bs->blk) != bs) {
> +                /* not named non-root node */
> +                continue;
> +            }
> +
> +            dbms = g_new0(DirtyBitmapMigBitmapState, 1);
> +            dbms->bs = bs;
> +            dbms->node_name = bdrv_get_node_name(bs);
> +            if (!dbms->node_name || dbms->node_name[0] == '\0') {
> +                dbms->node_name = bdrv_get_device_name(bs);
> +            }
> +            dbms->bitmap = bitmap;
> +            dbms->total_sectors = bdrv_nb_sectors(bs);
> +            dbms->sectors_per_chunk = (uint64_t)CHUNK_SIZE * 8 *
> +                bdrv_dirty_bitmap_granularity(dbms->bitmap) >> BDRV_SECTOR_BITS;

This calculation is re-used from the meta_bitmap granularity 
calculation. Might be worth factoring out so that if we tweak the values 
for performance reasons, we don't have to remember all the places we 
need to change it.

If your answer to my earlier question about CHUNK_SIZE is "We don't need 
to change it," then disregard.

> +
> +            QSIMPLEQ_INSERT_TAIL(&dirty_bitmap_mig_state.dbms_list,
> +                                 dbms, entry);
> +        }
> +    }
> +}
> +
> +/* Called with no lock taken.  */
> +static void bulk_phase_send_chunk(QEMUFile *f, DirtyBitmapMigBitmapState *dbms)
> +{
> +    uint32_t nr_sectors = MIN(dbms->total_sectors - dbms->cur_sector,
> +                             dbms->sectors_per_chunk);
> +
> +    send_bitmap_bits(f, dbms, dbms->cur_sector, nr_sectors);
> +
> +    dbms->cur_sector += nr_sectors;
> +    if (dbms->cur_sector >= dbms->total_sectors) {
> +        dbms->bulk_completed = true;
> +    }
> +}
> +
> +/* Called with no lock taken.  */
> +static void bulk_phase(QEMUFile *f, bool limit)
> +{
> +    DirtyBitmapMigBitmapState *dbms;
> +
> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
> +        while (!dbms->bulk_completed) {
> +            bulk_phase_send_chunk(f, dbms);
> +            if (limit && qemu_file_rate_limit(f)) {
> +                return;
> +            }
> +        }
> +    }
> +
> +    dirty_bitmap_mig_state.bulk_completed = true;
> +}
> +
> +static void blk_mig_reset_dirty_cursor(void)
> +{
> +    DirtyBitmapMigBitmapState *dbms;
> +
> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
> +        hbitmap_iter_init(&dbms->iter_dirty, dbms->meta_bitmap, 0);
> +    }
> +}
> +
> +/* Called with iothread lock taken. */
> +static bool dirty_phase_send_chunk(QEMUFile *f, DirtyBitmapMigBitmapState *dbms)
> +{
> +    uint32_t nr_sectors;
> +    size_t old_pos = dbms->iter_dirty.pos;
> +    int64_t cur = hbitmap_iter_next(&dbms->iter_dirty);
> +
> +    /* restart search from the beginning */
> +    if (old_pos && cur == -1) {
> +        hbitmap_iter_init(&dbms->iter_dirty, dbms->meta_bitmap, 0);
> +        cur = hbitmap_iter_next(&dbms->iter_dirty);
> +    }
> +
> +    if (cur == -1) {
> +        hbitmap_iter_init(&dbms->iter_dirty, dbms->meta_bitmap, 0);
> +        return false;
> +    }
> +
> +    nr_sectors = MIN(dbms->total_sectors - cur, dbms->sectors_per_chunk);
> +    send_bitmap_bits(f, dbms, cur, nr_sectors);
> +    hbitmap_reset(dbms->meta_bitmap, cur, dbms->sectors_per_chunk);
> +    cur += nr_sectors;

Dead assignment to cur, function is fine otherwise.

> +
> +    return true;
> +}
> +
> +/* Called with iothread lock taken. */
> +static void dirty_phase(QEMUFile *f, bool limit)
> +{
> +    DirtyBitmapMigBitmapState *dbms;
> +
> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
> +        while (dirty_phase_send_chunk(f, dbms)) {
> +            if (limit && qemu_file_rate_limit(f)) {
> +                return;
> +            }
> +        }
> +    }
> +}
> +
> +
> +/* Called with iothread lock taken.  */
> +static void dirty_bitmap_mig_cleanup(void)
> +{
> +    DirtyBitmapMigBitmapState *dbms;
> +
> +    unset_dirty_tracking();
> +
> +    while ((dbms = QSIMPLEQ_FIRST(&dirty_bitmap_mig_state.dbms_list)) != NULL) {
> +        QSIMPLEQ_REMOVE_HEAD(&dirty_bitmap_mig_state.dbms_list, entry);
> +        g_free(dbms);
> +    }
> +}
> +
> +static void dirty_bitmap_migration_cancel(void *opaque)
> +{
> +    dirty_bitmap_mig_cleanup();
> +}
> +
> +static int dirty_bitmap_save_iterate(QEMUFile *f, void *opaque)
> +{
> +    DPRINTF("Enter save live iterate\n");
> +
> +    if (dirty_bitmap_mig_state.bulk_completed) {
> +        qemu_mutex_lock_iothread();
> +        dirty_phase(f, true);
> +        qemu_mutex_unlock_iothread();
> +    } else {
> +        bulk_phase(f, true);
> +    }
> +
> +    qemu_put_flags(f, DIRTY_BITMAP_MIG_FLAG_EOS);
> +
> +    return dirty_bitmap_mig_state.bulk_completed;
> +}
> +
> +/* Called with iothread lock taken.  */
> +
> +static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
> +{
> +    DirtyBitmapMigBitmapState *dbms;
> +    DPRINTF("Enter save live complete\n");
> +
> +    if (!dirty_bitmap_mig_state.bulk_completed) {
> +        bulk_phase(f, false);
> +    }
> +
> +    blk_mig_reset_dirty_cursor();
> +    dirty_phase(f, false);
> +
> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
> +        send_bitmap_complete(f, dbms);
> +    }
> +
> +    qemu_put_flags(f, DIRTY_BITMAP_MIG_FLAG_EOS);
> +
> +    DPRINTF("Dirty bitmaps migration completed\n");
> +
> +    dirty_bitmap_mig_cleanup();
> +    return 0;
> +}
> +
> +static uint64_t dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
> +                                          uint64_t max_size)
> +{
> +    DirtyBitmapMigBitmapState *dbms;
> +    uint64_t pending = 0;
> +
> +    qemu_mutex_lock_iothread();
> +
> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
> +        uint64_t sectors = hbitmap_count(dbms->meta_bitmap);
> +        if (!dbms->bulk_completed) {
> +            sectors += dbms->total_sectors - dbms->cur_sector;
> +        }
> +        pending += bdrv_dirty_bitmap_data_size(dbms->bitmap, sectors);
> +    }
> +
> +    qemu_mutex_unlock_iothread();
> +
> +    DPRINTF("Enter save live pending %" PRIu64 ", max: %" PRIu64 "\n",
> +            pending, max_size);
> +    return pending;
> +}
> +
> +/* First occurrence of this bitmap. It should be created if doesn't exist */
> +static int dirty_bitmap_load_start(QEMUFile *f, DirtyBitmapLoadState *s)
> +{
> +    uint32_t granularity = qemu_get_be32(f);
> +    if (!s->bitmap) {
> +        Error *local_err = NULL;
> +        s->bitmap = bdrv_create_dirty_bitmap(s->bs, granularity,
> +                                             s->bitmap_name, &local_err);
> +        if (!s->bitmap) {
> +            error_report("%s", error_get_pretty(local_err));
> +            error_free(local_err);
> +            return -EINVAL;
> +        }
> +    } else {
> +        uint32_t dest_granularity =
> +            bdrv_dirty_bitmap_granularity(s->bitmap);
> +        if (dest_granularity != granularity) {
> +            fprintf(stderr,
> +                    "Error: "
> +                    "Migrated bitmap granularity (%" PRIu32 ") "
> +                    "doesn't match the destination bitmap '%s' "
> +                    "granularity (%" PRIu32 ")\n",
> +                    granularity,
> +                    bdrv_dirty_bitmap_name(s->bitmap),
> +                    dest_granularity);
> +            return -EINVAL;
> +        }
> +    }
> +
> +    bdrv_disable_dirty_bitmap(s->bitmap);

This will definitely keep people from using it while it's in an 
inconsistent state.

> +
> +    return 0;
> +}
> +
> +static void dirty_bitmap_load_complete(QEMUFile *f, DirtyBitmapLoadState *s)
> +{
> +    bool enabled;
> +
> +    bdrv_dirty_bitmap_deserialize_finish(s->bitmap);
> +    DPRINTF("enab\n");
> +
> +    enabled = qemu_get_byte(f);
> +    if (enabled) {
> +        bdrv_enable_dirty_bitmap(s->bitmap);
> +    }
> +}
> +
> +static int dirty_bitmap_load_bits(QEMUFile *f, DirtyBitmapLoadState *s)
> +{
> +    uint64_t first_sector = qemu_get_be64(f);
> +    uint32_t nr_sectors = qemu_get_be32(f);
> +    DPRINTF("chunk: %lu %u\n", first_sector, nr_sectors);
> +
> +
> +    if (s->flags & DIRTY_BITMAP_MIG_FLAG_ZEROES) {
> +        DPRINTF("   - zeroes\n");
> +        bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_sector,
> +                                             nr_sectors);
> +    } else {
> +        uint8_t *buf;
> +        uint64_t buf_size = qemu_get_be64(f);
> +        uint64_t needed_size =
> +            bdrv_dirty_bitmap_data_size(s->bitmap, nr_sectors);
> +
> +        if (needed_size > buf_size) {
> +            fprintf(stderr,
> +                    "Error: Migrated bitmap granularity doesn't "
> +                    "match the destination bitmap '%s' granularity\n",
> +                    bdrv_dirty_bitmap_name(s->bitmap));
> +            return -EINVAL;
> +        }
> +
> +        buf = g_malloc(buf_size);
> +        qemu_get_buffer(f, buf, buf_size);
> +        bdrv_dirty_bitmap_deserialize_part(s->bitmap, buf,
> +                                           first_sector,
> +                                           nr_sectors);
> +        g_free(buf);
> +    }
> +
> +    return 0;
> +}
> +
> +static int dirty_bitmap_load_header(QEMUFile *f, DirtyBitmapLoadState *s)
> +{
> +    Error *local_err = NULL;
> +    s->flags = qemu_get_flags(f);
> +    DPRINTF("flags: %x\n", s->flags);
> +
> +    if (s->flags & DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME) {
> +        qemu_get_string(f, s->node_name);
> +        s->bs = bdrv_lookup_bs(s->node_name, s->node_name, &local_err);
> +        if (!s->bs) {
> +            error_report("%s", error_get_pretty(local_err));
> +            error_free(local_err);
> +            return -EINVAL;
> +        }
> +    } else if (!s->bs) {
> +        fprintf(stderr, "Error: block device name is not set\n");
> +        return -EINVAL;
> +    }
> +
> +    if (s->flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
> +        qemu_get_string(f, s->bitmap_name);
> +        s->bitmap = bdrv_find_dirty_bitmap(s->bs, s->bitmap_name);
> +
> +        /* bitmap may be NULL here, it wouldn't be an error if it is the
> +         * first occurrence of the bitmap */
> +        if (!s->bitmap && !(s->flags & DIRTY_BITMAP_MIG_FLAG_START)) {
> +            fprintf(stderr, "Error: unknown dirty bitmap "
> +                    "'%s' for block device '%s'\n",
> +                    s->bitmap_name, s->node_name);
> +            return -EINVAL;
> +        }
> +    } else if (!s->bitmap) {
> +        fprintf(stderr, "Error: block device name is not set\n");
> +        return -EINVAL;
> +    }
> +
> +    return 0;
> +}
> +
> +static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
> +{
> +    static DirtyBitmapLoadState s;
> +
> +    int ret = 0;
> +
> +    DPRINTF("load start\n");
> +
> +    do {
> +        dirty_bitmap_load_header(f, &s);
> +
> +        if (s.flags & DIRTY_BITMAP_MIG_FLAG_START) {
> +            ret = dirty_bitmap_load_start(f, &s);
> +        } else if (s.flags & DIRTY_BITMAP_MIG_FLAG_COMPLETE) {
> +            dirty_bitmap_load_complete(f, &s);
> +        } else if (s.flags & DIRTY_BITMAP_MIG_FLAG_BITS) {
> +            ret = dirty_bitmap_load_bits(f, &s);
> +        }
> +
> +        DPRINTF("ret: %d\n", ret);
> +        if (!ret) {
> +            ret = qemu_file_get_error(f);
> +        }
> +
> +        DPRINTF("ret: %d\n", ret);
> +        if (ret) {
> +            return ret;
> +        }
> +    } while (!(s.flags & DIRTY_BITMAP_MIG_FLAG_EOS));
> +
> +    DPRINTF("load finish\n");
> +    return 0;
> +}
> +

Great, thanks. These functions read way nicer now.

> +static bool dirty_bitmap_is_active(void *opaque)
> +{
> +    return migrate_dirty_bitmaps();
> +}
> +
> +static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
> +{
> +    DirtyBitmapMigBitmapState *dbms = NULL;
> +    init_dirty_bitmap_migration(f);
> +
> +    qemu_mutex_lock_iothread();
> +    /* start track dirtyness of dirty bitmaps */

'dirtiness'

> +    set_dirty_tracking();
> +    qemu_mutex_unlock_iothread();
> +
> +    blk_mig_reset_dirty_cursor();
> +
> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
> +        send_bitmap_start(f, dbms);
> +    }
> +    qemu_put_flags(f, DIRTY_BITMAP_MIG_FLAG_EOS);
> +
> +    return 0;
> +}
> +
> +static SaveVMHandlers savevm_block_handlers = {
> +    .save_live_setup = dirty_bitmap_save_setup,
> +    .save_live_iterate = dirty_bitmap_save_iterate,
> +    .save_live_complete = dirty_bitmap_save_complete,
> +    .save_live_pending = dirty_bitmap_save_pending,
> +    .load_state = dirty_bitmap_load,
> +    .cancel = dirty_bitmap_migration_cancel,
> +    .is_active = dirty_bitmap_is_active,
> +};
> +
> +void dirty_bitmap_mig_init(void)
> +{
> +    QSIMPLEQ_INIT(&dirty_bitmap_mig_state.dbms_list);
> +
> +    register_savevm_live(NULL, "dirty-bitmap", 0, 1, &savevm_block_handlers,
> +                         &dirty_bitmap_mig_state);
> +}
> diff --git a/vl.c b/vl.c
> index 8c8f142..723a362 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -4147,6 +4147,7 @@ int main(int argc, char **argv, char **envp)
>
>       blk_mig_init();
>       ram_mig_init();
> +    dirty_bitmap_mig_init();
>
>       /* If the currently selected machine wishes to override the units-per-bus
>        * property of its default HBA interface type, do so now. */
>

Looks good. Holding the R-B pending discussion of the implications of 
chunk sizes.

--js

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 09/14] iotests: maintain several vms in test
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 09/14] iotests: maintain several vms in test Vladimir Sementsov-Ogievskiy
@ 2015-02-18 23:48   ` John Snow
  0 siblings, 0 replies; 32+ messages in thread
From: John Snow @ 2015-02-18 23:48 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, stefanha, pbonzini,
	amit.shah, den



On 02/18/2015 09:00 AM, Vladimir Sementsov-Ogievskiy wrote:
> The only problem with it is the same qmp socket name (which is
> vm._monitor_path) for all vms. And because of this second vm couldn't be
> lauched (vm.launch() fails because of socket is already in use).
> This patch adds a number of vm into vm._monitor_path
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>   tests/qemu-iotests/iotests.py | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> index cf5faac..fa756b4 100644
> --- a/tests/qemu-iotests/iotests.py
> +++ b/tests/qemu-iotests/iotests.py
> @@ -77,16 +77,18 @@ def create_image(name, size):
>
>   class VM(object):
>       '''A QEMU VM'''
> +    nb_vms = 0
>
>       def __init__(self):
> -        self._monitor_path = os.path.join(test_dir, 'qemu-mon.%d' % os.getpid())
> -        self._qemu_log_path = os.path.join(test_dir, 'qemu-log.%d' % os.getpid())
> +        self._monitor_path = os.path.join(test_dir, 'qemu-mon.%d.%d' % (os.getpid(), VM.nb_vms))
> +        self._qemu_log_path = os.path.join(test_dir, 'qemu-log.%d.%d' % (os.getpid(), VM.nb_vms))
>           self._args = qemu_args + ['-chardev',
>                        'socket,id=mon,path=' + self._monitor_path,
>                        '-mon', 'chardev=mon,mode=control',
>                        '-qtest', 'stdio', '-machine', 'accel=qtest',
>                        '-display', 'none', '-vga', 'none']
>           self._num_drives = 0
> +        VM.nb_vms += 1
>
>       # This can be used to add an unused monitor instance.
>       def add_monitor_telnet(self, ip, port):
>

Simple enough, thanks!

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 10/14] iotests: add add_incoming_migration to VM class
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 10/14] iotests: add add_incoming_migration to VM class Vladimir Sementsov-Ogievskiy
@ 2015-02-18 23:48   ` John Snow
  0 siblings, 0 replies; 32+ messages in thread
From: John Snow @ 2015-02-18 23:48 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, stefanha, pbonzini,
	amit.shah, den



On 02/18/2015 09:00 AM, Vladimir Sementsov-Ogievskiy wrote:
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>   tests/qemu-iotests/iotests.py | 6 ++++++
>   1 file changed, 6 insertions(+)
>
> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> index fa756b4..75640b2 100644
> --- a/tests/qemu-iotests/iotests.py
> +++ b/tests/qemu-iotests/iotests.py
> @@ -111,6 +111,12 @@ class VM(object):
>           self._num_drives += 1
>           return self
>
> +    def add_incoming_migration(self, desc):
> +        '''Add an incoming migration to the VM'''
> +        self._args.append('-incoming')
> +        self._args.append(desc)
> +        return self
> +
>       def pause_drive(self, drive, event=None):
>           '''Pause drive r/w operations'''
>           if not event:
>

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 14/14] migration/qemu-file: make functions qemu_(get/put)_string public
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 14/14] migration/qemu-file: make functions qemu_(get/put)_string public Vladimir Sementsov-Ogievskiy
@ 2015-02-19  0:00   ` John Snow
  0 siblings, 0 replies; 32+ messages in thread
From: John Snow @ 2015-02-19  0:00 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, stefanha, pbonzini,
	amit.shah, den



On 02/18/2015 09:00 AM, Vladimir Sementsov-Ogievskiy wrote:
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>   include/migration/qemu-file.h  | 17 +++++++++++++++++
>   migration/block-dirty-bitmap.c | 35 -----------------------------------
>   migration/qemu-file.c          | 18 ++++++++++++++++++
>   3 files changed, 35 insertions(+), 35 deletions(-)
>
> diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
> index a923cec..178ae63 100644
> --- a/include/migration/qemu-file.h
> +++ b/include/migration/qemu-file.h
> @@ -310,4 +310,21 @@ static inline void qemu_get_sbe64s(QEMUFile *f, int64_t *pv)
>   {
>       qemu_get_be64s(f, (uint64_t *)pv);
>   }
> +
> +/* read name from qemu file:
> + * format:
> + * 1 byte : len = name length (<256)
> + * len bytes : name without last zero byte
> + *
> + * name should point to the buffer >= 256 bytes length
> + */
> +char *qemu_get_string(QEMUFile *f, char *name);
> +
> +/* write name to qemu file:
> + * format:
> + * same as for qemu_get_string
> + *
> + * maximum name length is 255
> + */
> +void qemu_put_string(QEMUFile *f, const char *name);
>   #endif
> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
> index 084ba22..8ebf7e5 100644
> --- a/migration/block-dirty-bitmap.c
> +++ b/migration/block-dirty-bitmap.c
> @@ -161,41 +161,6 @@ static void qemu_put_flags(QEMUFile *f, uint32_t flags)
>       qemu_put_be32(f, flags | DIRTY_BITMAP_MIG_FLAGS_SIZE_32);
>   }
>
> -/* read name from qemu file:
> - * format:
> - * 1 byte : len = name length (<256)
> - * len bytes : name without last zero byte
> - *
> - * name should point to the buffer >= 256 bytes length
> - */
> -static char *qemu_get_string(QEMUFile *f, char *name)
> -{
> -    int len = qemu_get_byte(f);
> -    qemu_get_buffer(f, (uint8_t *)name, len);
> -    name[len] = '\0';
> -
> -    DPRINTF("get name: %d %s\n", len, name);
> -
> -    return name;
> -}
> -
> -/* write name to qemu file:
> - * format:
> - * same as for qemu_get_string
> - *
> - * maximum name length is 255
> - */
> -static void qemu_put_string(QEMUFile *f, const char *name)
> -{
> -    int len = strlen(name);
> -
> -    DPRINTF("put name: %d %s\n", len, name);
> -
> -    assert(len < 256);
> -    qemu_put_byte(f, len);
> -    qemu_put_buffer(f, (const uint8_t *)name, len);
> -}
> -
>   static void send_bitmap_header(QEMUFile *f, DirtyBitmapMigBitmapState *dbms,
>                                  uint32_t additional_flags)
>   {
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index e66e557..5439f84 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -545,3 +545,21 @@ uint64_t qemu_get_be64(QEMUFile *f)
>       v |= qemu_get_be32(f);
>       return v;
>   }
> +
> +char *qemu_get_string(QEMUFile *f, char *name)
> +{
> +    int len = qemu_get_byte(f);
> +    qemu_get_buffer(f, (uint8_t *)name, len);
> +    name[len] = '\0';
> +
> +    return name;
> +}
> +
> +void qemu_put_string(QEMUFile *f, const char *name)
> +{
> +    int len = strlen(name);
> +
> +    assert(len < 256);
> +    qemu_put_byte(f, len);
> +    qemu_put_buffer(f, (const uint8_t *)name, len);
> +}
>

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 00/14] Dirty bitmaps migration
  2015-02-18 14:00 [Qemu-devel] [PATCH RFC v3 00/14] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
                   ` (13 preceding siblings ...)
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 14/14] migration/qemu-file: make functions qemu_(get/put)_string public Vladimir Sementsov-Ogievskiy
@ 2015-02-19  0:11 ` John Snow
  14 siblings, 0 replies; 32+ messages in thread
From: John Snow @ 2015-02-19  0:11 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, quintela, dgilbert, stefanha, pbonzini, amit.shah, den



On 02/18/2015 09:00 AM, Vladimir Sementsov-Ogievskiy wrote:
>
>   rfc: there are two tests. They are the same but using different
>   interfaces: md5 checksum of the bitmap last layer in query-block or
>   separate query-block-dirty-bitmap with dirty bitmap regions.
>   The second form is more appropriate for debugging, the first is more
>   appropriate for simple regression control. Which should go to
>   upstream?

Long term: The checksum for regression control. I think the region 
output by query-block-dirty-bitmap may be somewhat excessive during 
production use (a large bitmap may have many, many dirty bits) so its 
usage may be limited there.

It's definitely very useful as a debug command; but the checksum is 
likely more useful for the regression tests as you say.

We'll likely need to debug the bitmaps and the migration thereof for 
quite a bit of time in the near future, though, so it'd be a shame to 
discard the nice debugging interface. I wonder if we can conditionally 
compile in QMP commands exposed only in debug builds?

I don't think we have any like that yet, but I could see a use for them.

Thanks,
--js

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 05/14] block: add meta bitmaps
  2015-02-18 23:45   ` John Snow
@ 2015-02-19 11:43     ` Vladimir Sementsov-Ogievskiy
  2015-02-21  0:53       ` John Snow
  0 siblings, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-02-19 11:43 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, stefanha, pbonzini,
	amit.shah, den

On 19.02.2015 02:45, John Snow wrote:
>
>
> On 02/18/2015 09:00 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Meta bitmap is a 'dirty bitmap' for the BdrvDirtyBitmap. It tracks
>> changes (set/unset) of this BdrvDirtyBitmap. It is needed for live
>> migration of block dirty bitmaps.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>> ---
>>   block.c               | 40 ++++++++++++++++++++++++++++++++++++++++
>>   include/block/block.h |  5 +++++
>>   2 files changed, 45 insertions(+)
>>
>> diff --git a/block.c b/block.c
>> index a127fd2..aaa08b8 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -58,9 +58,15 @@
>>    * (3) successor is set: frozen mode.
>>    *     A frozen bitmap cannot be renamed, deleted, anonymized, 
>> cleared, set,
>>    *     or enabled. A frozen bitmap can only abdicate() or reclaim().
>> + *
>> + * Meta bitmap:
>> + * Meta bitmap is a 'dirty bitmap' for the BdrvDirtyBitmap. It 
>> tracks changes
>> + * (set/unset) of this BdrvDirtyBitmap. It is needed for live 
>> migration of
>> + * block dirty bitmaps.
>>    */
>>   struct BdrvDirtyBitmap {
>>       HBitmap *bitmap;            /* Dirty sector bitmap 
>> implementation */
>> +    HBitmap *meta_bitmap;       /* Meta bitmap */
>>       BdrvDirtyBitmap *successor; /* Anonymous child; implies frozen 
>> status */
>>       char *name;                 /* Optional non-empty unique ID */
>>       int64_t size;               /* Size of the bitmap (Number of 
>> sectors) */
>> @@ -5398,6 +5404,31 @@ void 
>> bdrv_dirty_bitmap_make_anon(BdrvDirtyBitmap *bitmap)
>>       bitmap->name = NULL;
>>   }
>>
>> +HBitmap *bdrv_create_meta_bitmap(BdrvDirtyBitmap *bitmap,
>> +                                        uint64_t granularity)
>> +{
>> +    uint64_t sector_granularity;
>> +
>> +    assert((granularity & (granularity - 1)) == 0);
>> +
>> +    granularity *= 8 * bdrv_dirty_bitmap_granularity(bitmap);
>> +    sector_granularity = granularity >> BDRV_SECTOR_BITS;
>> +    assert(sector_granularity);
>> +
>
> The maths here could use a comment, I think.
>
> the "granularity" field here actually describes the desired 
> serialization buffer size; e.g. CHUNK_SIZE (1 << 20) or 1 MiB. This 
> parameter should be renamed to explain what it's actually for. 
> Something like "chunk_size" and a comment explaining that it is in bytes.
>
> ...
>
> That said, let's talk about the default chunk size you're using in 
> correlation with this function.
>
> a CHUNK_SIZE of 1MiB here is going to lead us to, if we have a bitmap 
> with the default granularity of 128 sectors\64KiB bytes, a granularity 
> for the meta_bitmap of one billion sectors (1 << 30) or 512GiB.
>
> That's going to be bigger than most drives entirely, which will 
> generally lead us to only using a single "chunk" per drive. Which 
> means we won't really get a lot of mileage out of the bulk/dirty 
> phases most of the time.
>
> It's wild to think about that the first 1,000,000,000 sectors or 
> 512,000,000,000 bytes will all be represented by the first single bit 
> in this bitmap. If a single hair on the drive changes, we resend the 
> _entire_ bitmap, possibly over and over again. Will we ever make 
> progress? Should we investigate a smaller chunk size?
>
> Here's some quick mappings of chunk size (bytes) to effective 
> meta_bitmap byte granularities, assuming the meta_bitmap is tracking a 
> bitmap with the default granularity of 64KiB:
>
> (1 << 20) 1MiB   -- 512GiB  // This is too high of a granularity
> (1 << 17) 128KiB --  64GiB
> (1 << 15) 32KiB  --  16GiB
> (1 << 11) 2KiB   --   1GiB
> (1 << 10) 1KiB   -- 512MiB
> (1 << 9)  512B   -- 256MiB
> (1 << 8)  256B   -- 128MiB
> (1 << 5)  32 B   --  16MiB  // This is too small of a chunk size.
> (1 << 1)   1 B   --   1MiB
>
> We want to make the chunk sends efficient, but we also want to make 
> sure that the dirty phase doesn't resend more data than it needs to, 
> so we need to strike a balance here, no?
>
> I think arguments could be made for most granularities between 128MiB 
> through 1GiB. Anything outside of that is too lopsided, IMO.
>
> What are your thoughts on this?
Ok, interesting thing to discuss.

My thoughts:
* the chunk size for block-migration is 1mb, than the bitmap (64kb 
granularity) for this chunk is 16bit=2bytes long. It's an intuitive 
reason for choosing the chunk size about 2 bytes. But in this case the 
data/metadata ratio is very bad (about 20bytes for the header of the 
chunk). So, taking the nearest value with adequate ratio gives (IMHO) 
'1kb -- 512mb': 20b/1k ~ 2%. Or 512b => 4%.

* for ndb+mirror migration scheme the default chunk is 64kb instead of 
1mb. So the bitmap is more smaller. But the same reason of data/metadata 
ratio leads to 1kb chunk for dirty bitmap migration.

So, what about default to 1kb and additional parameter for migration 
(migration capabilities) to give the user a possibility of chose?

* Yes, in most of user cases the bitmap (64kb granularity) will be small 
(< 1mb). In these cases, I think, it would be better to send the data 
only in complete step, only once. (for exmaple, if pending <= 1mb, 
dosn't send anything in incremental phase).
Live migration is actually needed only for migration of bitmaps for 
disks of several TBs size.

>
>> +    bitmap->meta_bitmap =
>> +        hbitmap_alloc(bitmap->size, ffsll(sector_granularity) - 1);
>> +
>> +    return bitmap->meta_bitmap;
>> +}
>> +
>> +void bdrv_release_meta_bitmap(BdrvDirtyBitmap *bitmap)
>> +{
>> +    if (bitmap->meta_bitmap) {
>> +        hbitmap_free(bitmap->meta_bitmap);
>> +        bitmap->meta_bitmap = NULL;
>> +    }
>> +}
>> +
>>   BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,
>>                                             uint32_t granularity,
>>                                             const char *name,
>> @@ -5532,6 +5563,9 @@ void bdrv_release_dirty_bitmap(BlockDriverState 
>> *bs, BdrvDirtyBitmap *bitmap)
>>               assert(!bdrv_dirty_bitmap_frozen(bm));
>>               QLIST_REMOVE(bitmap, list);
>>               hbitmap_free(bitmap->bitmap);
>> +            if (bitmap->meta_bitmap) {
>> +                hbitmap_free(bitmap->meta_bitmap);
>> +            }
>>               g_free(bitmap->name);
>>               g_free(bitmap);
>>               return;
>> @@ -5659,6 +5693,9 @@ void bdrv_set_dirty_bitmap(BdrvDirtyBitmap 
>> *bitmap,
>>   {
>>       assert(bdrv_dirty_bitmap_enabled(bitmap));
>>       hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
>> +    if (bitmap->meta_bitmap) {
>> +        hbitmap_set(bitmap->meta_bitmap, cur_sector, nr_sectors);
>> +    }
>>   }
>>
>>   void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
>> @@ -5666,6 +5703,9 @@ void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap 
>> *bitmap,
>>   {
>>       assert(bdrv_dirty_bitmap_enabled(bitmap));
>>       hbitmap_reset(bitmap->bitmap, cur_sector, nr_sectors);
>> +    if (bitmap->meta_bitmap) {
>> +        hbitmap_set(bitmap->meta_bitmap, cur_sector, nr_sectors);
>> +    }
>>   }
>>
>>   void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap)
>> diff --git a/include/block/block.h b/include/block/block.h
>> index c6a928d..f2c62f6 100644
>> --- a/include/block/block.h
>> +++ b/include/block/block.h
>> @@ -4,6 +4,7 @@
>>   #include "block/aio.h"
>>   #include "qemu-common.h"
>>   #include "qemu/option.h"
>> +#include "qemu/hbitmap.h"
>>   #include "block/coroutine.h"
>>   #include "block/accounting.h"
>>   #include "qapi/qmp/qobject.h"
>> @@ -487,6 +488,10 @@ void 
>> bdrv_dirty_bitmap_deserialize_zeroes(BdrvDirtyBitmap *bitmap,
>>                                             uint64_t start, uint64_t 
>> count);
>>   void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap);
>>
>> +HBitmap *bdrv_create_meta_bitmap(BdrvDirtyBitmap *bitmap,
>> +                                        uint64_t granularity);
>> +void bdrv_release_meta_bitmap(BdrvDirtyBitmap *bitmap);
>> +
>>   void bdrv_enable_copy_on_read(BlockDriverState *bs);
>>   void bdrv_disable_copy_on_read(BlockDriverState *bs);
>>
>>


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 08/14] migration: add migration/block-dirty-bitmap.c
  2015-02-18 23:47   ` John Snow
@ 2015-02-19 13:48     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-02-19 13:48 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, stefanha, pbonzini,
	amit.shah, den

On 19.02.2015 02:47, John Snow wrote:
>
>
> On 02/18/2015 09:00 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Live migration of dirty bitmaps. Only named dirty bitmaps, associated 
>> with
>> root nodes and non-root named nodes are migrated.
>>
>> If destination qemu is already containing a dirty bitmap with the 
>> same name
>> as a migrated bitmap (for the same node), than, if their 
>> granularities are
>> the same the migration will be done, otherwise the error will be 
>> generated.
>>
>> If destination qemu doesn't contain such bitmap it will be created.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>> ---
>>   include/migration/block.h      |   1 +
>>   migration/Makefile.objs        |   2 +-
>>   migration/block-dirty-bitmap.c | 708 
>> +++++++++++++++++++++++++++++++++++++++++
>>   vl.c                           |   1 +
>>   4 files changed, 711 insertions(+), 1 deletion(-)
>>   create mode 100644 migration/block-dirty-bitmap.c
>>
>> diff --git a/include/migration/block.h b/include/migration/block.h
>> index ffa8ac0..566bb9f 100644
>> --- a/include/migration/block.h
>> +++ b/include/migration/block.h
>> @@ -14,6 +14,7 @@
>>   #ifndef BLOCK_MIGRATION_H
>>   #define BLOCK_MIGRATION_H
>>
>> +void dirty_bitmap_mig_init(void);
>>   void blk_mig_init(void);
>>   int blk_mig_active(void);
>>   uint64_t blk_mig_bytes_transferred(void);
>> diff --git a/migration/Makefile.objs b/migration/Makefile.objs
>> index d929e96..128612d 100644
>> --- a/migration/Makefile.objs
>> +++ b/migration/Makefile.objs
>> @@ -6,5 +6,5 @@ common-obj-y += xbzrle.o
>>   common-obj-$(CONFIG_RDMA) += rdma.o
>>   common-obj-$(CONFIG_POSIX) += exec.o unix.o fd.o
>>
>> -common-obj-y += block.o
>> +common-obj-y += block.o block-dirty-bitmap.o
>>
>> diff --git a/migration/block-dirty-bitmap.c 
>> b/migration/block-dirty-bitmap.c
>> new file mode 100644
>> index 0000000..084ba22
>> --- /dev/null
>> +++ b/migration/block-dirty-bitmap.c
>> @@ -0,0 +1,708 @@
>> +/*
>> + * QEMU dirty bitmap migration
>> + *
>> + * Live migration of dirty bitmaps. Only named dirty bitmaps, 
>> associated with
>> + * root nodes and non-root named nodes are migrated.
>> + *
>> + * If destination qemu is already containing a dirty bitmap with the 
>> same name
>> + * as a migrated bitmap (for the same node), than, if their 
>> granularities are
>> + * the same the migration will be done, otherwise the error will be 
>> generated.
>> + *
>> + * If destination qemu doesn't contain such bitmap it will be created.
>> + *
>> + * format of migration:
>> + *
>> + * # Header (shared for different chunk types)
>> + * 1 byte: flags
>
> 1, 2, or 4 bytes.
>
>> + * [ 1 byte: node name size ] \  flags & DEVICE_NAME
>> + * [ n bytes: node name     ] /
>> + * [ 1 byte: bitmap name size ] \  flags & BITMAP_NAME
>> + * [ n bytes: bitmap name     ] /
>> + *
>> + * # Start of bitmap migration (flags & START)
>> + * header
>> + * be64: granularity
>> + *
>> + * # Complete of bitmap migration (flags & COMPLETE)
>> + * header
>> + * 1 byte: bitmap enabled flag
>> + *
>> + * # Data chunk of bitmap migration
>> + * header
>> + * be64: start sector
>> + * be32: number of sectors
>> + * [ be64: buffer size  ] \ ! (flags & ZEROES)
>> + * [ n bytes: buffer    ] /
>> + *
>> + * The last chunk in stream should contain flags & EOS. The chunk 
>> may skip
>> + * device and/or bitmap names, assuming them to be the same with the 
>> previous
>> + * chunk.
>> + *
>> + *
>> + * This file is derived from migration/block.c
>> + *
>> + * Author:
>> + * Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>> + *
>> + * original copyright message:
>> + * 
>> =====================================================================
>> + * Copyright IBM, Corp. 2009
>> + *
>> + * Authors:
>> + *  Liran Schour   <lirans@il.ibm.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2.  
>> See
>> + * the COPYING file in the top-level directory.
>> + *
>> + * Contributions after 2012-01-13 are licensed under the terms of the
>> + * GNU GPL, version 2 or (at your option) any later version.
>> + * 
>> =====================================================================
>> + */
>> +
>> +#include "block/block.h"
>> +#include "block/block_int.h"
>> +#include "sysemu/block-backend.h"
>> +#include "qemu/main-loop.h"
>> +#include "qemu/error-report.h"
>> +#include "migration/block.h"
>> +#include "migration/migration.h"
>> +#include "qemu/hbitmap.h"
>> +#include <assert.h>
>> +
>> +#define CHUNK_SIZE                       (1 << 20)
>> +
>> +/* Flags occupy from one to four bytes. In all but one the 7-th 
>> (EXTRA_FLAGS)
>> + * bit should be set. */
>> +#define DIRTY_BITMAP_MIG_FLAG_EOS           0x01
>> +#define DIRTY_BITMAP_MIG_FLAG_ZEROES        0x02
>> +#define DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME   0x04
>> +#define DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME   0x08
>> +#define DIRTY_BITMAP_MIG_FLAG_START         0x10
>> +#define DIRTY_BITMAP_MIG_FLAG_COMPLETE      0x20
>> +#define DIRTY_BITMAP_MIG_FLAG_BITS          0x40
>> +
>> +#define DIRTY_BITMAP_MIG_EXTRA_FLAGS        0x80
>> +#define DIRTY_BITMAP_MIG_FLAGS_SIZE_16      0x8000
>> +#define DIRTY_BITMAP_MIG_FLAGS_SIZE_32      0x8080
>> +
>> +#define DEBUG_DIRTY_BITMAP_MIGRATION
>> +
>> +#ifdef DEBUG_DIRTY_BITMAP_MIGRATION
>> +#define DPRINTF(fmt, ...) \
>> +    do { printf("dirty_migration: " fmt, ## __VA_ARGS__); } while (0)
>> +#else
>> +#define DPRINTF(fmt, ...) \
>> +    do { } while (0)
>> +#endif
>> +
>
> Take a look at hw/ide/ahci.c, which has a DPRINTF macro defined so 
> that the debugging can be turned off, but the print statements etc. 
> will still be compiled and typechecked.
>
> The blurb looks like this:
>
> #define DEBUG_AHCI 0
>
> #define DPRINTF(port, fmt, ...) \
> do { \
>     if (DEBUG_AHCI) { \
>         fprintf(stderr, "ahci: %s: [%d] ", __func__, port); \
>         fprintf(stderr, fmt, ## __VA_ARGS__); \
>     } \
> } while (0)
>
> When it comes time to submit this as non-RFC, leave 
> DEBUG_DIRTY_BITMAP_MIGRATION defined to 0.
>
Ok.
>> +typedef struct DirtyBitmapMigBitmapState {
>> +    /* Written during setup phase. */
>> +    BlockDriverState *bs;
>> +    const char *node_name;
>> +    BdrvDirtyBitmap *bitmap;
>> +    HBitmap *meta_bitmap;
>> +    uint64_t total_sectors;
>> +    uint64_t sectors_per_chunk;
>> +    QSIMPLEQ_ENTRY(DirtyBitmapMigBitmapState) entry;
>> +
>> +    /* For bulk phase. */
>> +    bool bulk_completed;
>> +    uint64_t cur_sector;
>> +
>> +    /* For dirty phase. */
>> +    HBitmapIter iter_dirty;
>> +} DirtyBitmapMigBitmapState;
>> +
>> +typedef struct DirtyBitmapMigState {
>> +    QSIMPLEQ_HEAD(dbms_list, DirtyBitmapMigBitmapState) dbms_list;
>> +
>> +    bool bulk_completed;
>> +
>> +    /* for send_bitmap() */
>
> send_bitmap_bits, now.
>
>> +    BlockDriverState *prev_bs;
>> +    BdrvDirtyBitmap *prev_bitmap;
>> +} DirtyBitmapMigState;
>> +
>> +typedef struct DirtyBitmapLoadState {
>> +    uint32_t flags;
>> +    char node_name[256];
>> +    char bitmap_name[256];
>> +    BlockDriverState *bs;
>> +    BdrvDirtyBitmap *bitmap;
>> +} DirtyBitmapLoadState;
>> +
>> +static DirtyBitmapMigState dirty_bitmap_mig_state;
>> +
>> +static uint32_t qemu_get_flags(QEMUFile *f)
>> +{
>> +    uint8_t flags = qemu_get_byte(f);
>> +    if (flags & DIRTY_BITMAP_MIG_EXTRA_FLAGS) {
>> +        flags = flags << 8 | qemu_get_byte(f);
>> +        if (flags & DIRTY_BITMAP_MIG_EXTRA_FLAGS) {
>> +            flags = flags << 16 | qemu_get_be16(f);
>> +        }
>> +    }
>> +
>> +    return flags;
>> +}
>> +
>> +static void qemu_put_flags(QEMUFile *f, uint32_t flags)
>> +{
>> +    if (!(flags & 0xffffff00)) {
>> +        qemu_put_byte(f, flags);
>> +        return;
>> +    }
>> +
>> +    if (!(flags & 0xffff0000)) {
>> +        qemu_put_be16(f, flags | DIRTY_BITMAP_MIG_FLAGS_SIZE_16);
>> +        return;
>> +    }
>> +
>> +    qemu_put_be32(f, flags | DIRTY_BITMAP_MIG_FLAGS_SIZE_32);
>> +}
>> +
>
> This will give us breathing room for sure :)
>
>> +/* read name from qemu file:
>> + * format:
>> + * 1 byte : len = name length (<256)
>> + * len bytes : name without last zero byte
>> + *
>> + * name should point to the buffer >= 256 bytes length
>> + */
>> +static char *qemu_get_string(QEMUFile *f, char *name)
>> +{
>> +    int len = qemu_get_byte(f);
>> +    qemu_get_buffer(f, (uint8_t *)name, len);
>> +    name[len] = '\0';
>> +
>> +    DPRINTF("get name: %d %s\n", len, name);
>> +
>> +    return name;
>> +}
>> +
>> +/* write name to qemu file:
>> + * format:
>> + * same as for qemu_get_string
>> + *
>> + * maximum name length is 255
>> + */
>> +static void qemu_put_string(QEMUFile *f, const char *name)
>> +{
>> +    int len = strlen(name);
>> +
>> +    DPRINTF("put name: %d %s\n", len, name);
>> +
>> +    assert(len < 256);
>> +    qemu_put_byte(f, len);
>> +    qemu_put_buffer(f, (const uint8_t *)name, len);
>> +}
>> +
>
> Thanks, sorry for the nitpicking.
No problem, I like good naming too, but true idea doesn't come every time.
>
>> +static void send_bitmap_header(QEMUFile *f, 
>> DirtyBitmapMigBitmapState *dbms,
>> +                               uint32_t additional_flags)
>> +{
>> +    BlockDriverState *bs = dbms->bs;
>> +    BdrvDirtyBitmap *bitmap = dbms->bitmap;
>> +    uint32_t flags = additional_flags;
>> +
>> +    if (bs != dirty_bitmap_mig_state.prev_bs) {
>> +        dirty_bitmap_mig_state.prev_bs = bs;
>> +        flags |= DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME;
>> +    }
>> +
>> +    if (bitmap != dirty_bitmap_mig_state.prev_bitmap) {
>> +        dirty_bitmap_mig_state.prev_bitmap = bitmap;
>> +        flags |= DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME;
>> +    }
>> +
>> +    qemu_put_flags(f, flags);
>> +
>> +    if (flags & DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME) {
>> +        qemu_put_string(f, dbms->node_name);
>> +    }
>> +
>> +    if (flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
>> +        qemu_put_string(f, bdrv_dirty_bitmap_name(bitmap));
>> +    }
>> +}
>> +
>> +static void send_bitmap_start(QEMUFile *f, DirtyBitmapMigBitmapState 
>> *dbms)
>> +{
>> +    send_bitmap_header(f, dbms, DIRTY_BITMAP_MIG_FLAG_START);
>> +    qemu_put_be32(f, bdrv_dirty_bitmap_granularity(dbms->bitmap));
>> +}
>> +
>> +static void send_bitmap_complete(QEMUFile *f, 
>> DirtyBitmapMigBitmapState *dbms)
>> +{
>> +    send_bitmap_header(f, dbms, DIRTY_BITMAP_MIG_FLAG_COMPLETE);
>> +    qemu_put_byte(f, bdrv_dirty_bitmap_enabled(dbms->bitmap));
>> +}
>> +
>> +static void send_bitmap_bits(QEMUFile *f, DirtyBitmapMigBitmapState 
>> *dbms,
>> +                             uint64_t start_sector, uint32_t 
>> nr_sectors)
>> +{
>> +    /* align for buffer_is_zero() */
>> +    uint64_t align = 4 * sizeof(long);
>> +    uint64_t buf_size =
>> +        (bdrv_dirty_bitmap_data_size(dbms->bitmap, nr_sectors) + 
>> align - 1) &
>> +        ~(align - 1);
>> +    uint8_t *buf = g_malloc0(buf_size);
>> +    uint32_t flags = DIRTY_BITMAP_MIG_FLAG_BITS;
>> +
>> +    bdrv_dirty_bitmap_serialize_part(dbms->bitmap, buf,
>> +                                     start_sector, nr_sectors);
>> +
>> +    if (buffer_is_zero(buf, buf_size)) {
>> +        g_free(buf);
>> +        buf = NULL;
>> +        flags |= DIRTY_BITMAP_MIG_FLAG_ZEROES;
>> +    }
>> +
>> +    DPRINTF("Enter send_bitmap"
>> +            "\n   flags:        %x"
>> +            "\n   start_sector: %" PRIu64
>> +            "\n   nr_sectors:   %" PRIu32
>> +            "\n   data_size:    %" PRIu64 "\n",
>> +            flags, start_sector, nr_sectors, buf_size);
>> +
>> +    send_bitmap_header(f, dbms, flags);
>> +
>> +    qemu_put_be64(f, start_sector);
>> +    qemu_put_be32(f, nr_sectors);
>> +
>> +    /* if a block is zero we need to flush here since the network
>> +     * bandwidth is now a lot higher than the storage device bandwidth.
>> +     * thus if we queue zero blocks we slow down the migration.
>> +     * also, skip writing block when migrate only dirty bitmaps. */
>> +    if (flags & DIRTY_BITMAP_MIG_FLAG_ZEROES) {
>> +        qemu_fflush(f);
>> +        return;
>> +    }
>> +
>> +    qemu_put_be64(f, buf_size);
>> +    qemu_put_buffer(f, buf, buf_size);
>> +    g_free(buf);
>> +}
>> +
>> +
>> +/* Called with iothread lock taken.  */
>> +
>> +static void set_dirty_tracking(void)
>> +{
>> +    DirtyBitmapMigBitmapState *dbms;
>> +
>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> +        dbms->meta_bitmap =
>> +            bdrv_create_meta_bitmap(dbms->bitmap, CHUNK_SIZE);
>> +    }
>> +}
>> +
>> +static void unset_dirty_tracking(void)
>> +{
>> +    DirtyBitmapMigBitmapState *dbms;
>> +
>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> +        bdrv_release_meta_bitmap(dbms->bitmap);
>> +    }
>> +}
>> +
>> +static void init_dirty_bitmap_migration(QEMUFile *f)
>> +{
>> +    BlockDriverState *bs;
>> +    BdrvDirtyBitmap *bitmap;
>> +    DirtyBitmapMigBitmapState *dbms;
>> +
>> +    dirty_bitmap_mig_state.bulk_completed = false;
>> +    dirty_bitmap_mig_state.prev_bs = NULL;
>> +    dirty_bitmap_mig_state.prev_bitmap = NULL;
>> +
>> +    for (bs = bdrv_next(NULL); bs; bs = bdrv_next(bs)) {
>> +        for (bitmap = bdrv_next_dirty_bitmap(bs, NULL); bitmap;
>> +             bitmap = bdrv_next_dirty_bitmap(bs, bitmap)) {
>> +            if (!bdrv_dirty_bitmap_name(bitmap)) {
>> +                continue;
>> +            }
>> +
>> +            if (!bdrv_get_node_name(bs) && blk_bs(bs->blk) != bs) {
>> +                /* not named non-root node */
>> +                continue;
>> +            }
>> +
>> +            dbms = g_new0(DirtyBitmapMigBitmapState, 1);
>> +            dbms->bs = bs;
>> +            dbms->node_name = bdrv_get_node_name(bs);
>> +            if (!dbms->node_name || dbms->node_name[0] == '\0') {
>> +                dbms->node_name = bdrv_get_device_name(bs);
>> +            }
>> +            dbms->bitmap = bitmap;
>> +            dbms->total_sectors = bdrv_nb_sectors(bs);
>> +            dbms->sectors_per_chunk = (uint64_t)CHUNK_SIZE * 8 *
>> +                bdrv_dirty_bitmap_granularity(dbms->bitmap) >> 
>> BDRV_SECTOR_BITS;
>
> This calculation is re-used from the meta_bitmap granularity 
> calculation. Might be worth factoring out so that if we tweak the 
> values for performance reasons, we don't have to remember all the 
> places we need to change it.
>
> If your answer to my earlier question about CHUNK_SIZE is "We don't 
> need to change it," then disregard.
ok. I'll move 'dbms->sectors_per_chunk =' to set_dirty_tracking and use 
hbitmap_granularity
>
>> +
>> + QSIMPLEQ_INSERT_TAIL(&dirty_bitmap_mig_state.dbms_list,
>> +                                 dbms, entry);
>> +        }
>> +    }
>> +}
>> +
>> +/* Called with no lock taken.  */
>> +static void bulk_phase_send_chunk(QEMUFile *f, 
>> DirtyBitmapMigBitmapState *dbms)
>> +{
>> +    uint32_t nr_sectors = MIN(dbms->total_sectors - dbms->cur_sector,
>> +                             dbms->sectors_per_chunk);
>> +
>> +    send_bitmap_bits(f, dbms, dbms->cur_sector, nr_sectors);
>> +
>> +    dbms->cur_sector += nr_sectors;
>> +    if (dbms->cur_sector >= dbms->total_sectors) {
>> +        dbms->bulk_completed = true;
>> +    }
>> +}
>> +
>> +/* Called with no lock taken.  */
>> +static void bulk_phase(QEMUFile *f, bool limit)
>> +{
>> +    DirtyBitmapMigBitmapState *dbms;
>> +
>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> +        while (!dbms->bulk_completed) {
>> +            bulk_phase_send_chunk(f, dbms);
>> +            if (limit && qemu_file_rate_limit(f)) {
>> +                return;
>> +            }
>> +        }
>> +    }
>> +
>> +    dirty_bitmap_mig_state.bulk_completed = true;
>> +}
>> +
>> +static void blk_mig_reset_dirty_cursor(void)
>> +{
>> +    DirtyBitmapMigBitmapState *dbms;
>> +
>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> +        hbitmap_iter_init(&dbms->iter_dirty, dbms->meta_bitmap, 0);
>> +    }
>> +}
>> +
>> +/* Called with iothread lock taken. */
>> +static bool dirty_phase_send_chunk(QEMUFile *f, 
>> DirtyBitmapMigBitmapState *dbms)
>> +{
>> +    uint32_t nr_sectors;
>> +    size_t old_pos = dbms->iter_dirty.pos;
>> +    int64_t cur = hbitmap_iter_next(&dbms->iter_dirty);
>> +
>> +    /* restart search from the beginning */
>> +    if (old_pos && cur == -1) {
>> +        hbitmap_iter_init(&dbms->iter_dirty, dbms->meta_bitmap, 0);
>> +        cur = hbitmap_iter_next(&dbms->iter_dirty);
>> +    }
>> +
>> +    if (cur == -1) {
>> +        hbitmap_iter_init(&dbms->iter_dirty, dbms->meta_bitmap, 0);
>> +        return false;
>> +    }
>> +
>> +    nr_sectors = MIN(dbms->total_sectors - cur, 
>> dbms->sectors_per_chunk);
>> +    send_bitmap_bits(f, dbms, cur, nr_sectors);
>> +    hbitmap_reset(dbms->meta_bitmap, cur, dbms->sectors_per_chunk);
>> +    cur += nr_sectors;
>
> Dead assignment to cur, function is fine otherwise.
>
>> +
>> +    return true;
>> +}
>> +
>> +/* Called with iothread lock taken. */
>> +static void dirty_phase(QEMUFile *f, bool limit)
>> +{
>> +    DirtyBitmapMigBitmapState *dbms;
>> +
>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> +        while (dirty_phase_send_chunk(f, dbms)) {
>> +            if (limit && qemu_file_rate_limit(f)) {
>> +                return;
>> +            }
>> +        }
>> +    }
>> +}
>> +
>> +
>> +/* Called with iothread lock taken.  */
>> +static void dirty_bitmap_mig_cleanup(void)
>> +{
>> +    DirtyBitmapMigBitmapState *dbms;
>> +
>> +    unset_dirty_tracking();
>> +
>> +    while ((dbms = 
>> QSIMPLEQ_FIRST(&dirty_bitmap_mig_state.dbms_list)) != NULL) {
>> + QSIMPLEQ_REMOVE_HEAD(&dirty_bitmap_mig_state.dbms_list, entry);
>> +        g_free(dbms);
>> +    }
>> +}
>> +
>> +static void dirty_bitmap_migration_cancel(void *opaque)
>> +{
>> +    dirty_bitmap_mig_cleanup();
>> +}
>> +
>> +static int dirty_bitmap_save_iterate(QEMUFile *f, void *opaque)
>> +{
>> +    DPRINTF("Enter save live iterate\n");
>> +
>> +    if (dirty_bitmap_mig_state.bulk_completed) {
>> +        qemu_mutex_lock_iothread();
>> +        dirty_phase(f, true);
>> +        qemu_mutex_unlock_iothread();
>> +    } else {
>> +        bulk_phase(f, true);
>> +    }
>> +
>> +    qemu_put_flags(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>> +
>> +    return dirty_bitmap_mig_state.bulk_completed;
>> +}
>> +
>> +/* Called with iothread lock taken.  */
>> +
>> +static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
>> +{
>> +    DirtyBitmapMigBitmapState *dbms;
>> +    DPRINTF("Enter save live complete\n");
>> +
>> +    if (!dirty_bitmap_mig_state.bulk_completed) {
>> +        bulk_phase(f, false);
>> +    }
>> +
>> +    blk_mig_reset_dirty_cursor();
>> +    dirty_phase(f, false);
>> +
>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> +        send_bitmap_complete(f, dbms);
>> +    }
>> +
>> +    qemu_put_flags(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>> +
>> +    DPRINTF("Dirty bitmaps migration completed\n");
>> +
>> +    dirty_bitmap_mig_cleanup();
>> +    return 0;
>> +}
>> +
>> +static uint64_t dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
>> +                                          uint64_t max_size)
>> +{
>> +    DirtyBitmapMigBitmapState *dbms;
>> +    uint64_t pending = 0;
>> +
>> +    qemu_mutex_lock_iothread();
>> +
>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> +        uint64_t sectors = hbitmap_count(dbms->meta_bitmap);
>> +        if (!dbms->bulk_completed) {
>> +            sectors += dbms->total_sectors - dbms->cur_sector;
>> +        }
>> +        pending += bdrv_dirty_bitmap_data_size(dbms->bitmap, sectors);
>> +    }
>> +
>> +    qemu_mutex_unlock_iothread();
>> +
>> +    DPRINTF("Enter save live pending %" PRIu64 ", max: %" PRIu64 "\n",
>> +            pending, max_size);
>> +    return pending;
>> +}
>> +
>> +/* First occurrence of this bitmap. It should be created if doesn't 
>> exist */
>> +static int dirty_bitmap_load_start(QEMUFile *f, DirtyBitmapLoadState 
>> *s)
>> +{
>> +    uint32_t granularity = qemu_get_be32(f);
>> +    if (!s->bitmap) {
>> +        Error *local_err = NULL;
>> +        s->bitmap = bdrv_create_dirty_bitmap(s->bs, granularity,
>> +                                             s->bitmap_name, 
>> &local_err);
>> +        if (!s->bitmap) {
>> +            error_report("%s", error_get_pretty(local_err));
>> +            error_free(local_err);
>> +            return -EINVAL;
>> +        }
>> +    } else {
>> +        uint32_t dest_granularity =
>> +            bdrv_dirty_bitmap_granularity(s->bitmap);
>> +        if (dest_granularity != granularity) {
>> +            fprintf(stderr,
>> +                    "Error: "
>> +                    "Migrated bitmap granularity (%" PRIu32 ") "
>> +                    "doesn't match the destination bitmap '%s' "
>> +                    "granularity (%" PRIu32 ")\n",
>> +                    granularity,
>> +                    bdrv_dirty_bitmap_name(s->bitmap),
>> +                    dest_granularity);
>> +            return -EINVAL;
>> +        }
>> +    }
>> +
>> +    bdrv_disable_dirty_bitmap(s->bitmap);
>
> This will definitely keep people from using it while it's in an 
> inconsistent state.
Ho, it's more interesting. Not people. Qemu! If we are migrating disk in 
parallel with dirty bitmap, every write of migrated disk block will 
spoil our dirty bitmap (if not disabled).
>
>> +
>> +    return 0;
>> +}
>> +
>> +static void dirty_bitmap_load_complete(QEMUFile *f, 
>> DirtyBitmapLoadState *s)
>> +{
>> +    bool enabled;
>> +
>> +    bdrv_dirty_bitmap_deserialize_finish(s->bitmap);
>> +    DPRINTF("enab\n");
>> +
>> +    enabled = qemu_get_byte(f);
>> +    if (enabled) {
>> +        bdrv_enable_dirty_bitmap(s->bitmap);
>> +    }
>> +}
>> +
>> +static int dirty_bitmap_load_bits(QEMUFile *f, DirtyBitmapLoadState *s)
>> +{
>> +    uint64_t first_sector = qemu_get_be64(f);
>> +    uint32_t nr_sectors = qemu_get_be32(f);
>> +    DPRINTF("chunk: %lu %u\n", first_sector, nr_sectors);
>> +
>> +
>> +    if (s->flags & DIRTY_BITMAP_MIG_FLAG_ZEROES) {
>> +        DPRINTF("   - zeroes\n");
>> +        bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_sector,
>> +                                             nr_sectors);
>> +    } else {
>> +        uint8_t *buf;
>> +        uint64_t buf_size = qemu_get_be64(f);
>> +        uint64_t needed_size =
>> +            bdrv_dirty_bitmap_data_size(s->bitmap, nr_sectors);
>> +
>> +        if (needed_size > buf_size) {
>> +            fprintf(stderr,
>> +                    "Error: Migrated bitmap granularity doesn't "
>> +                    "match the destination bitmap '%s' granularity\n",
>> +                    bdrv_dirty_bitmap_name(s->bitmap));
>> +            return -EINVAL;
>> +        }
>> +
>> +        buf = g_malloc(buf_size);
>> +        qemu_get_buffer(f, buf, buf_size);
>> +        bdrv_dirty_bitmap_deserialize_part(s->bitmap, buf,
>> +                                           first_sector,
>> +                                           nr_sectors);
>> +        g_free(buf);
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int dirty_bitmap_load_header(QEMUFile *f, 
>> DirtyBitmapLoadState *s)
>> +{
>> +    Error *local_err = NULL;
>> +    s->flags = qemu_get_flags(f);
>> +    DPRINTF("flags: %x\n", s->flags);
>> +
>> +    if (s->flags & DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME) {
>> +        qemu_get_string(f, s->node_name);
>> +        s->bs = bdrv_lookup_bs(s->node_name, s->node_name, &local_err);
>> +        if (!s->bs) {
>> +            error_report("%s", error_get_pretty(local_err));
>> +            error_free(local_err);
>> +            return -EINVAL;
>> +        }
>> +    } else if (!s->bs) {
>> +        fprintf(stderr, "Error: block device name is not set\n");
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (s->flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
>> +        qemu_get_string(f, s->bitmap_name);
>> +        s->bitmap = bdrv_find_dirty_bitmap(s->bs, s->bitmap_name);
>> +
>> +        /* bitmap may be NULL here, it wouldn't be an error if it is 
>> the
>> +         * first occurrence of the bitmap */
>> +        if (!s->bitmap && !(s->flags & DIRTY_BITMAP_MIG_FLAG_START)) {
>> +            fprintf(stderr, "Error: unknown dirty bitmap "
>> +                    "'%s' for block device '%s'\n",
>> +                    s->bitmap_name, s->node_name);
>> +            return -EINVAL;
>> +        }
>> +    } else if (!s->bitmap) {
>> +        fprintf(stderr, "Error: block device name is not set\n");
>> +        return -EINVAL;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
>> +{
>> +    static DirtyBitmapLoadState s;
>> +
>> +    int ret = 0;
>> +
>> +    DPRINTF("load start\n");
>> +
>> +    do {
>> +        dirty_bitmap_load_header(f, &s);
>> +
>> +        if (s.flags & DIRTY_BITMAP_MIG_FLAG_START) {
>> +            ret = dirty_bitmap_load_start(f, &s);
>> +        } else if (s.flags & DIRTY_BITMAP_MIG_FLAG_COMPLETE) {
>> +            dirty_bitmap_load_complete(f, &s);
>> +        } else if (s.flags & DIRTY_BITMAP_MIG_FLAG_BITS) {
>> +            ret = dirty_bitmap_load_bits(f, &s);
>> +        }
>> +
>> +        DPRINTF("ret: %d\n", ret);
>> +        if (!ret) {
>> +            ret = qemu_file_get_error(f);
>> +        }
>> +
>> +        DPRINTF("ret: %d\n", ret);
>> +        if (ret) {
>> +            return ret;
>> +        }
>> +    } while (!(s.flags & DIRTY_BITMAP_MIG_FLAG_EOS));
>> +
>> +    DPRINTF("load finish\n");
>> +    return 0;
>> +}
>> +
>
> Great, thanks. These functions read way nicer now.
>
>> +static bool dirty_bitmap_is_active(void *opaque)
>> +{
>> +    return migrate_dirty_bitmaps();
>> +}
>> +
>> +static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
>> +{
>> +    DirtyBitmapMigBitmapState *dbms = NULL;
>> +    init_dirty_bitmap_migration(f);
>> +
>> +    qemu_mutex_lock_iothread();
>> +    /* start track dirtyness of dirty bitmaps */
>
> 'dirtiness'
codespell missed it =(
>
>> +    set_dirty_tracking();
>> +    qemu_mutex_unlock_iothread();
>> +
>> +    blk_mig_reset_dirty_cursor();
>> +
>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> +        send_bitmap_start(f, dbms);
>> +    }
>> +    qemu_put_flags(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>> +
>> +    return 0;
>> +}
>> +
>> +static SaveVMHandlers savevm_block_handlers = {
>> +    .save_live_setup = dirty_bitmap_save_setup,
>> +    .save_live_iterate = dirty_bitmap_save_iterate,
>> +    .save_live_complete = dirty_bitmap_save_complete,
>> +    .save_live_pending = dirty_bitmap_save_pending,
>> +    .load_state = dirty_bitmap_load,
>> +    .cancel = dirty_bitmap_migration_cancel,
>> +    .is_active = dirty_bitmap_is_active,
>> +};
>> +
>> +void dirty_bitmap_mig_init(void)
>> +{
>> +    QSIMPLEQ_INIT(&dirty_bitmap_mig_state.dbms_list);
>> +
>> +    register_savevm_live(NULL, "dirty-bitmap", 0, 1, 
>> &savevm_block_handlers,
>> +                         &dirty_bitmap_mig_state);
>> +}
>> diff --git a/vl.c b/vl.c
>> index 8c8f142..723a362 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -4147,6 +4147,7 @@ int main(int argc, char **argv, char **envp)
>>
>>       blk_mig_init();
>>       ram_mig_init();
>> +    dirty_bitmap_mig_init();
>>
>>       /* If the currently selected machine wishes to override the 
>> units-per-bus
>>        * property of its default HBA interface type, do so now. */
>>
>
> Looks good. Holding the R-B pending discussion of the implications of 
> chunk sizes.
>
> --js

Thanks,

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 11/14] iotests: add dirty bitmap migration test
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 11/14] iotests: add dirty bitmap migration test Vladimir Sementsov-Ogievskiy
@ 2015-02-19 18:47   ` John Snow
  0 siblings, 0 replies; 32+ messages in thread
From: John Snow @ 2015-02-19 18:47 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, stefanha, pbonzini,
	amit.shah, den



On 02/18/2015 09:00 AM, Vladimir Sementsov-Ogievskiy wrote:
> The test starts two vms (vm_a, vm_b), create dirty bitmap in the first one, do
> several writes to corresponding device and then migrate vm_a to vm_b
> with dirty bitmaps.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>   tests/qemu-iotests/117     | 88 ++++++++++++++++++++++++++++++++++++++++++++++
>   tests/qemu-iotests/117.out |  5 +++
>   tests/qemu-iotests/group   |  1 +
>   3 files changed, 94 insertions(+)
>   create mode 100755 tests/qemu-iotests/117
>   create mode 100644 tests/qemu-iotests/117.out
>
> diff --git a/tests/qemu-iotests/117 b/tests/qemu-iotests/117
> new file mode 100755
> index 0000000..61538cf
> --- /dev/null
> +++ b/tests/qemu-iotests/117
> @@ -0,0 +1,88 @@
> +#!/usr/bin/env python
> +#
> +# Tests for dirty bitmaps migration.
> +#
> +# (C) Vladimir Sementsov-Ogievskiy 2015
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +#
> +
> +import os
> +import iotests
> +import time
> +from iotests import qemu_img
> +
> +disk_a = os.path.join(iotests.test_dir, 'disk_a')
> +disk_b = os.path.join(iotests.test_dir, 'disk_b')
> +fifo   = os.path.join(iotests.test_dir, 'fifo')
> +
> +size   = 0x40000000 # 1G
> +sector_size = 512
> +granularity = 0x10000
> +regions = [
> +    { 'start': 0,          'count': 0x100000 },
> +    { 'start': 0x10000000, 'count': 0x20000  },
> +    { 'start': 0x39990000, 'count': 0x10000  }
> +    ]
> +
> +regions_in_sectors = [
> +    { key: val / sector_size for (key, val) in el.items() } for el in regions]
> +
> +class TestDirtyBitmapMigration(iotests.QMPTestCase):
> +
> +    def setUp(self):
> +        os.mkfifo(fifo)
> +        qemu_img('create', '-f', iotests.imgfmt, disk_a, str(size))
> +        qemu_img('create', '-f', iotests.imgfmt, disk_b, str(size))
> +        self.vm_a = iotests.VM().add_drive(disk_a)
> +        self.vm_b = iotests.VM().add_drive(disk_b)
> +        self.vm_b.add_incoming_migration("exec: cat " + fifo)
> +        self.vm_a.launch()
> +        self.vm_b.launch()
> +
> +    def tearDown(self):
> +        self.vm_a.shutdown()
> +        self.vm_b.shutdown()
> +        os.remove(disk_a)
> +        os.remove(disk_b)
> +        os.remove(fifo)
> +
> +    def test_migration(self):
> +        result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
> +                               name='bitmap', granularity=granularity)
> +        self.assert_qmp(result, 'return', {});
> +
> +        for r in regions:
> +          self.vm_a.hmp_qemu_io('drive0',
> +                                'write %d %d' % (r['start'], r['count']))
> +
> +        result = self.vm_a.qmp('query-block-dirty-bitmap', node='drive0',
> +                               name='bitmap')
> +        self.assert_qmp(result, 'return/dirty-regions', regions_in_sectors)
> +
> +        result = self.vm_a.qmp('migrate-set-capabilities',
> +                               capabilities=[{'capability': 'dirty-bitmaps',
> +                                              'state': True}])
> +        self.assert_qmp(result, 'return', {})
> +        result = self.vm_a.qmp('migrate', uri='exec:cat>' + fifo)
> +        while self.vm_a.qmp('query-migrate')['return']['status'] != 'completed':
> +          time.sleep(1)
> +
> +        result = self.vm_b.qmp('query-block-dirty-bitmap', node='drive0',
> +                               name='bitmap')
> +        self.assert_qmp(result, 'return/dirty-regions', regions_in_sectors);
> +
> +
> +if __name__ == '__main__':
> +    iotests.main()
> diff --git a/tests/qemu-iotests/117.out b/tests/qemu-iotests/117.out
> new file mode 100644
> index 0000000..ae1213e
> --- /dev/null
> +++ b/tests/qemu-iotests/117.out
> @@ -0,0 +1,5 @@
> +.
> +----------------------------------------------------------------------
> +Ran 1 tests
> +
> +OK
> diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
> index b4ddf1b..6ad5b55 100644
> --- a/tests/qemu-iotests/group
> +++ b/tests/qemu-iotests/group
> @@ -118,3 +118,4 @@
>   113 rw auto quick
>   114 rw auto quick
>   116 rw auto quick
> +117 rw auto quick
>

Spoke with Stefan earlier today and I think my initial hunch of keeping 
the exhaustive QMP debug commands out of the upstream repo makes more 
sense right now.

Our view was that even if we check in code "to be used for debug only," 
there's no telling how people or other programs might come to rely on 
the data, so it's best not to introduce the interface to begin with.

I may pull these patches into one of my github repositories, though, 
since we'll probably need these patches for debugging in the near future.

So consider this my NACK for 01 and 11.

--js

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 12/14] qapi: add md5 checksum of last dirty bitmap level to query-block
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 12/14] qapi: add md5 checksum of last dirty bitmap level to query-block Vladimir Sementsov-Ogievskiy
@ 2015-02-19 18:53   ` John Snow
  0 siblings, 0 replies; 32+ messages in thread
From: John Snow @ 2015-02-19 18:53 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, stefanha, pbonzini,
	amit.shah, den



On 02/18/2015 09:00 AM, Vladimir Sementsov-Ogievskiy wrote:
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>   block.c                | 1 +
>   include/qemu/hbitmap.h | 8 ++++++++
>   qapi/block-core.json   | 4 +++-
>   util/hbitmap.c         | 8 ++++++++
>   4 files changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/block.c b/block.c
> index 4cca55d..9532ccc 100644
> --- a/block.c
> +++ b/block.c
> @@ -5600,6 +5600,7 @@ BlockDirtyInfoList *bdrv_query_dirty_bitmaps(BlockDriverState *bs)
>           info->name = g_strdup(bm->name);
>           info->disabled = bm->disabled;
>           info->frozen = bdrv_dirty_bitmap_frozen(bm);
> +        info->md5 = hbitmap_md5(bm->bitmap);
>           entry->value = info;
>           *plist = entry;
>           plist = &entry->next;
> diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
> index 10ce05b..2fb748a 100644
> --- a/include/qemu/hbitmap.h
> +++ b/include/qemu/hbitmap.h
> @@ -188,6 +188,14 @@ void hbitmap_deserialize_zeroes(HBitmap *hb, uint64_t start, uint64_t count);
>   void hbitmap_deserialize_finish(HBitmap *hb);
>
>   /**
> + * hbitmap_md5:
> + * @bitmap: HBitmap to operate on.
> + *
> + * Returns md5 checksum of the last level.
> + */
> +char *hbitmap_md5(const HBitmap *bitmap);
> +
> +/**
>    * hbitmap_free:
>    * @hb: HBitmap to operate on.
>    *
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 25dea80..2028d37 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -336,11 +336,13 @@
>   #
>   # @frozen: whether the dirty bitmap is frozen (Since 2.3)
>   #
> +# @md5: md5 checksum of the last bitmap level (since 2.3)
> +#
>   # Since: 1.3
>   ##
>   { 'type': 'BlockDirtyInfo',
>     'data': {'*name': 'str', 'count': 'int', 'granularity': 'uint32',
> -           'disabled': 'bool', 'frozen': 'bool'} }
> +           'disabled': 'bool', 'frozen': 'bool', 'md5': 'str'} }
>
>   ##
>   # @BlockInfo:
> diff --git a/util/hbitmap.c b/util/hbitmap.c
> index 1a736e7..8063dce 100644
> --- a/util/hbitmap.c
> +++ b/util/hbitmap.c
> @@ -523,3 +523,11 @@ bool hbitmap_merge(HBitmap *a, const HBitmap *b)
>
>       return true;
>   }
> +
> +char *hbitmap_md5(const HBitmap *bitmap)
> +{
> +    uint64_t size =
> +        MAX((bitmap->size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 1);
> +    const guchar *data = (const guchar *)bitmap->levels[HBITMAP_LEVELS - 1];
> +    return g_compute_checksum_for_data(G_CHECKSUM_MD5, data, size);
> +}
>

It strikes me as somewhat odd to introduce a feature for the explicit 
purpose of regression testing, but I can't think of how else we'd do it 
simply, so this makes the most sense to me right now.

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 13/14] iotests: add dirty bitmap migration test
  2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 13/14] iotests: add dirty bitmap migration test Vladimir Sementsov-Ogievskiy
@ 2015-02-19 19:30   ` John Snow
  0 siblings, 0 replies; 32+ messages in thread
From: John Snow @ 2015-02-19 19:30 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, stefanha, pbonzini,
	amit.shah, den



On 02/18/2015 09:00 AM, Vladimir Sementsov-Ogievskiy wrote:
> [the same test as 117, but not using qmp: query-block-dirty-bitmap.
> only one test from {117, 118} will be in the next patch set version]
>
> The test starts two vms (vm_a, vm_b), create dirty bitmap in
> the first one, do several writes to corresponding device and
> then migrate vm_a to vm_b with dirty bitmaps.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>   tests/qemu-iotests/118     | 87 ++++++++++++++++++++++++++++++++++++++++++++++
>   tests/qemu-iotests/118.out |  5 +++
>   tests/qemu-iotests/group   |  1 +
>   3 files changed, 93 insertions(+)
>   create mode 100755 tests/qemu-iotests/118
>   create mode 100644 tests/qemu-iotests/118.out
>
> diff --git a/tests/qemu-iotests/118 b/tests/qemu-iotests/118
> new file mode 100755
> index 0000000..a85ea56
> --- /dev/null
> +++ b/tests/qemu-iotests/118
> @@ -0,0 +1,87 @@
> +#!/usr/bin/env python
> +#
> +# Tests for dirty bitmaps migration.
> +#
> +# (C) Vladimir Sementsov-Ogievskiy 2015
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +#
> +
> +import os
> +import iotests
> +import time
> +from iotests import qemu_img
> +
> +disk_a = os.path.join(iotests.test_dir, 'disk_a')
> +disk_b = os.path.join(iotests.test_dir, 'disk_b')
> +fifo   = os.path.join(iotests.test_dir, 'fifo')
> +
> +size   = 0x40000000 # 1G
> +sector_size = 512
> +granularity = 0x10000
> +regions = [
> +    { 'start': 0,          'count': 0x100000 },
> +    { 'start': 0x10000000, 'count': 0x20000  },
> +    { 'start': 0x39990000, 'count': 0x10000  }
> +    ]
> +
> +regions_in_sectors = [
> +    { key: val / sector_size for (key, val) in el.items() } for el in regions]
> +

Not used in this test.

> +class TestDirtyBitmapMigration(iotests.QMPTestCase):
> +
> +    def setUp(self):
> +        os.mkfifo(fifo)
> +        qemu_img('create', '-f', iotests.imgfmt, disk_a, str(size))
> +        qemu_img('create', '-f', iotests.imgfmt, disk_b, str(size))
> +        self.vm_a = iotests.VM().add_drive(disk_a)
> +        self.vm_b = iotests.VM().add_drive(disk_b)
> +        self.vm_b.add_incoming_migration("exec: cat " + fifo)
> +        self.vm_a.launch()
> +        self.vm_b.launch()
> +
> +    def tearDown(self):
> +        self.vm_a.shutdown()
> +        self.vm_b.shutdown()
> +        os.remove(disk_a)
> +        os.remove(disk_b)
> +        os.remove(fifo)
> +
> +    def test_migration(self):
> +        result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
> +                               name='bitmap', granularity=granularity)
> +        self.assert_qmp(result, 'return', {});
> +
> +        for r in regions:
> +          self.vm_a.hmp_qemu_io('drive0',
> +                                'write %d %d' % (r['start'], r['count']))
> +
> +        result = self.vm_a.qmp('query-block');
> +        md5 = result['return'][0]['dirty-bitmaps'][0]['md5']
> +
> +        result = self.vm_a.qmp('migrate-set-capabilities',
> +                               capabilities=[{'capability': 'dirty-bitmaps',
> +                                              'state': True}])
> +        self.assert_qmp(result, 'return', {})
> +
> +        result = self.vm_a.qmp('migrate', uri='exec:cat>' + fifo)
> +        while self.vm_a.qmp('query-migrate')['return']['status'] != 'completed':
> +          time.sleep(1)
> +

Might be possible to wait on a status completion here. Adding a method 
to VM:

def event_wait(self, name='BLOCK_JOB_COMPLETED', maxwait=10):
     for _ in range(maxwait):
         for event in self.get_qmp_events(wait=True):
             if event['event'] == name:
                 return event
     return None


And replacing the sleep and poll loop with:

result = self.vm_a.qmp('migrate', uri='exec:cat>' + fifo)
self.assertIsNotNone(self.vm_a.event_wait("STOP"))
self.assertIsNotNone(self.vm_b.event_wait("RESUME"))

works just as well without the poll/sleep.

> +        result = self.vm_b.qmp('query-block');
> +        self.assert_qmp(result, 'return[0]/dirty-bitmaps[0]/md5', md5);
> +
> +
> +if __name__ == '__main__':
> +    iotests.main()
> diff --git a/tests/qemu-iotests/118.out b/tests/qemu-iotests/118.out
> new file mode 100644
> index 0000000..ae1213e
> --- /dev/null
> +++ b/tests/qemu-iotests/118.out
> @@ -0,0 +1,5 @@
> +.
> +----------------------------------------------------------------------
> +Ran 1 tests
> +
> +OK
> diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
> index 6ad5b55..1fd1983 100644
> --- a/tests/qemu-iotests/group
> +++ b/tests/qemu-iotests/group
> @@ -119,3 +119,4 @@
>   114 rw auto quick
>   116 rw auto quick
>   117 rw auto quick
> +118 rw auto quick
>

Thanks,
--js

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 05/14] block: add meta bitmaps
  2015-02-19 11:43     ` Vladimir Sementsov-Ogievskiy
@ 2015-02-21  0:53       ` John Snow
  0 siblings, 0 replies; 32+ messages in thread
From: John Snow @ 2015-02-21  0:53 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, peter.maydell, quintela, dgilbert, stefanha, pbonzini,
	amit.shah, den



On 02/19/2015 06:43 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 19.02.2015 02:45, John Snow wrote:
>>
>>
>> On 02/18/2015 09:00 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Meta bitmap is a 'dirty bitmap' for the BdrvDirtyBitmap. It tracks
>>> changes (set/unset) of this BdrvDirtyBitmap. It is needed for live
>>> migration of block dirty bitmaps.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>>> ---
>>>   block.c               | 40 ++++++++++++++++++++++++++++++++++++++++
>>>   include/block/block.h |  5 +++++
>>>   2 files changed, 45 insertions(+)
>>>
>>> diff --git a/block.c b/block.c
>>> index a127fd2..aaa08b8 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -58,9 +58,15 @@
>>>    * (3) successor is set: frozen mode.
>>>    *     A frozen bitmap cannot be renamed, deleted, anonymized,
>>> cleared, set,
>>>    *     or enabled. A frozen bitmap can only abdicate() or reclaim().
>>> + *
>>> + * Meta bitmap:
>>> + * Meta bitmap is a 'dirty bitmap' for the BdrvDirtyBitmap. It
>>> tracks changes
>>> + * (set/unset) of this BdrvDirtyBitmap. It is needed for live
>>> migration of
>>> + * block dirty bitmaps.
>>>    */
>>>   struct BdrvDirtyBitmap {
>>>       HBitmap *bitmap;            /* Dirty sector bitmap
>>> implementation */
>>> +    HBitmap *meta_bitmap;       /* Meta bitmap */
>>>       BdrvDirtyBitmap *successor; /* Anonymous child; implies frozen
>>> status */
>>>       char *name;                 /* Optional non-empty unique ID */
>>>       int64_t size;               /* Size of the bitmap (Number of
>>> sectors) */
>>> @@ -5398,6 +5404,31 @@ void
>>> bdrv_dirty_bitmap_make_anon(BdrvDirtyBitmap *bitmap)
>>>       bitmap->name = NULL;
>>>   }
>>>
>>> +HBitmap *bdrv_create_meta_bitmap(BdrvDirtyBitmap *bitmap,
>>> +                                        uint64_t granularity)
>>> +{
>>> +    uint64_t sector_granularity;
>>> +
>>> +    assert((granularity & (granularity - 1)) == 0);
>>> +
>>> +    granularity *= 8 * bdrv_dirty_bitmap_granularity(bitmap);
>>> +    sector_granularity = granularity >> BDRV_SECTOR_BITS;
>>> +    assert(sector_granularity);
>>> +
>>
>> The maths here could use a comment, I think.
>>
>> the "granularity" field here actually describes the desired
>> serialization buffer size; e.g. CHUNK_SIZE (1 << 20) or 1 MiB. This
>> parameter should be renamed to explain what it's actually for.
>> Something like "chunk_size" and a comment explaining that it is in bytes.
>>
>> ...
>>
>> That said, let's talk about the default chunk size you're using in
>> correlation with this function.
>>
>> a CHUNK_SIZE of 1MiB here is going to lead us to, if we have a bitmap
>> with the default granularity of 128 sectors\64KiB bytes, a granularity
>> for the meta_bitmap of one billion sectors (1 << 30) or 512GiB.
>>
>> That's going to be bigger than most drives entirely, which will
>> generally lead us to only using a single "chunk" per drive. Which
>> means we won't really get a lot of mileage out of the bulk/dirty
>> phases most of the time.
>>
>> It's wild to think about that the first 1,000,000,000 sectors or
>> 512,000,000,000 bytes will all be represented by the first single bit
>> in this bitmap. If a single hair on the drive changes, we resend the
>> _entire_ bitmap, possibly over and over again. Will we ever make
>> progress? Should we investigate a smaller chunk size?
>>
>> Here's some quick mappings of chunk size (bytes) to effective
>> meta_bitmap byte granularities, assuming the meta_bitmap is tracking a
>> bitmap with the default granularity of 64KiB:
>>
>> (1 << 20) 1MiB   -- 512GiB  // This is too high of a granularity
>> (1 << 17) 128KiB --  64GiB
>> (1 << 15) 32KiB  --  16GiB
>> (1 << 11) 2KiB   --   1GiB
>> (1 << 10) 1KiB   -- 512MiB
>> (1 << 9)  512B   -- 256MiB
>> (1 << 8)  256B   -- 128MiB
>> (1 << 5)  32 B   --  16MiB  // This is too small of a chunk size.
>> (1 << 1)   1 B   --   1MiB
>>
>> We want to make the chunk sends efficient, but we also want to make
>> sure that the dirty phase doesn't resend more data than it needs to,
>> so we need to strike a balance here, no?
>>
>> I think arguments could be made for most granularities between 128MiB
>> through 1GiB. Anything outside of that is too lopsided, IMO.
>>
>> What are your thoughts on this?
> Ok, interesting thing to discuss.
>
> My thoughts:
> * the chunk size for block-migration is 1mb, than the bitmap (64kb
> granularity) for this chunk is 16bit=2bytes long. It's an intuitive
> reason for choosing the chunk size about 2 bytes. But in this case the
> data/metadata ratio is very bad (about 20bytes for the header of the
> chunk). So, taking the nearest value with adequate ratio gives (IMHO)
> '1kb -- 512mb': 20b/1k ~ 2%. Or 512b => 4%.
>
> * for ndb+mirror migration scheme the default chunk is 64kb instead of
> 1mb. So the bitmap is more smaller. But the same reason of data/metadata
> ratio leads to 1kb chunk for dirty bitmap migration.
>
> So, what about default to 1kb and additional parameter for migration
> (migration capabilities) to give the user a possibility of chose?
>

That sounds good to me. I'm less sure about the parameter, though. In 
practice I doubt most people will be attempting a live migration of data 
sets big enough to need to alter the default. For cases with such large 
data sets, it almost certainly makes more sense to perform the migration 
via some other method.

> * Yes, in most of user cases the bitmap (64kb granularity) will be small
> (< 1mb). In these cases, I think, it would be better to send the data
> only in complete step, only once. (for exmaple, if pending <= 1mb,
> dosn't send anything in incremental phase).
> Live migration is actually needed only for migration of bitmaps for
> disks of several TBs size.
>

You are right; sending only the data upon completion after live 
migration makes a lot of sense. If we acknowledge that we only really 
need dual-phase live migration of dirty bitmaps for extremely huge 
drives, then a chunk size of 1MiB isn't a problem anymore.

Would this be hard to do? We could just skip the live migration phase 
for "small" drives (less than however much we describe with one chunk -- 
for 1MiB, that's a whopping 512GiB) and just wait for the completion 
phase to do a bulk-send of the entire drive.

For large drives, we can do the live migration as "normal."

Is this sane?

>>
>>> +    bitmap->meta_bitmap =
>>> +        hbitmap_alloc(bitmap->size, ffsll(sector_granularity) - 1);
>>> +
>>> +    return bitmap->meta_bitmap;
>>> +}
>>> +
>>> +void bdrv_release_meta_bitmap(BdrvDirtyBitmap *bitmap)
>>> +{
>>> +    if (bitmap->meta_bitmap) {
>>> +        hbitmap_free(bitmap->meta_bitmap);
>>> +        bitmap->meta_bitmap = NULL;
>>> +    }
>>> +}
>>> +
>>>   BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,
>>>                                             uint32_t granularity,
>>>                                             const char *name,
>>> @@ -5532,6 +5563,9 @@ void bdrv_release_dirty_bitmap(BlockDriverState
>>> *bs, BdrvDirtyBitmap *bitmap)
>>>               assert(!bdrv_dirty_bitmap_frozen(bm));
>>>               QLIST_REMOVE(bitmap, list);
>>>               hbitmap_free(bitmap->bitmap);
>>> +            if (bitmap->meta_bitmap) {
>>> +                hbitmap_free(bitmap->meta_bitmap);
>>> +            }
>>>               g_free(bitmap->name);
>>>               g_free(bitmap);
>>>               return;
>>> @@ -5659,6 +5693,9 @@ void bdrv_set_dirty_bitmap(BdrvDirtyBitmap
>>> *bitmap,
>>>   {
>>>       assert(bdrv_dirty_bitmap_enabled(bitmap));
>>>       hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
>>> +    if (bitmap->meta_bitmap) {
>>> +        hbitmap_set(bitmap->meta_bitmap, cur_sector, nr_sectors);
>>> +    }
>>>   }
>>>
>>>   void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
>>> @@ -5666,6 +5703,9 @@ void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap
>>> *bitmap,
>>>   {
>>>       assert(bdrv_dirty_bitmap_enabled(bitmap));
>>>       hbitmap_reset(bitmap->bitmap, cur_sector, nr_sectors);
>>> +    if (bitmap->meta_bitmap) {
>>> +        hbitmap_set(bitmap->meta_bitmap, cur_sector, nr_sectors);
>>> +    }
>>>   }
>>>
>>>   void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap)
>>> diff --git a/include/block/block.h b/include/block/block.h
>>> index c6a928d..f2c62f6 100644
>>> --- a/include/block/block.h
>>> +++ b/include/block/block.h
>>> @@ -4,6 +4,7 @@
>>>   #include "block/aio.h"
>>>   #include "qemu-common.h"
>>>   #include "qemu/option.h"
>>> +#include "qemu/hbitmap.h"
>>>   #include "block/coroutine.h"
>>>   #include "block/accounting.h"
>>>   #include "qapi/qmp/qobject.h"
>>> @@ -487,6 +488,10 @@ void
>>> bdrv_dirty_bitmap_deserialize_zeroes(BdrvDirtyBitmap *bitmap,
>>>                                             uint64_t start, uint64_t
>>> count);
>>>   void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap);
>>>
>>> +HBitmap *bdrv_create_meta_bitmap(BdrvDirtyBitmap *bitmap,
>>> +                                        uint64_t granularity);
>>> +void bdrv_release_meta_bitmap(BdrvDirtyBitmap *bitmap);
>>> +
>>>   void bdrv_enable_copy_on_read(BlockDriverState *bs);
>>>   void bdrv_disable_copy_on_read(BlockDriverState *bs);
>>>
>>>
>
>

-- 
—js

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2015-02-21  0:53 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-18 14:00 [Qemu-devel] [PATCH RFC v3 00/14] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 01/14] qmp: add query-block-dirty-bitmap Vladimir Sementsov-Ogievskiy
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 02/14] hbitmap: serialization Vladimir Sementsov-Ogievskiy
2015-02-18 23:42   ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 03/14] block: BdrvDirtyBitmap serialization interface Vladimir Sementsov-Ogievskiy
2015-02-18 23:43   ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 04/14] block: tiny refactoring: minimize hbitmap_(set/reset) usage Vladimir Sementsov-Ogievskiy
2015-02-18 23:44   ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 05/14] block: add meta bitmaps Vladimir Sementsov-Ogievskiy
2015-02-18 23:45   ` John Snow
2015-02-19 11:43     ` Vladimir Sementsov-Ogievskiy
2015-02-21  0:53       ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 06/14] block: add bdrv_next_dirty_bitmap() Vladimir Sementsov-Ogievskiy
2015-02-18 23:45   ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 07/14] qapi: add dirty-bitmaps migration capability Vladimir Sementsov-Ogievskiy
2015-02-18 23:45   ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 08/14] migration: add migration/block-dirty-bitmap.c Vladimir Sementsov-Ogievskiy
2015-02-18 23:47   ` John Snow
2015-02-19 13:48     ` Vladimir Sementsov-Ogievskiy
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 09/14] iotests: maintain several vms in test Vladimir Sementsov-Ogievskiy
2015-02-18 23:48   ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 10/14] iotests: add add_incoming_migration to VM class Vladimir Sementsov-Ogievskiy
2015-02-18 23:48   ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 11/14] iotests: add dirty bitmap migration test Vladimir Sementsov-Ogievskiy
2015-02-19 18:47   ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 12/14] qapi: add md5 checksum of last dirty bitmap level to query-block Vladimir Sementsov-Ogievskiy
2015-02-19 18:53   ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 13/14] iotests: add dirty bitmap migration test Vladimir Sementsov-Ogievskiy
2015-02-19 19:30   ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 14/14] migration/qemu-file: make functions qemu_(get/put)_string public Vladimir Sementsov-Ogievskiy
2015-02-19  0:00   ` John Snow
2015-02-19  0:11 ` [Qemu-devel] [PATCH RFC v3 00/14] Dirty bitmaps migration John Snow

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.