All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH RFC v2 0/8] Dirty bitmaps migration
@ 2015-01-27 10:56 Vladimir Sementsov-Ogievskiy
  2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 1/8] qmp: print dirty bitmap Vladimir Sementsov-Ogievskiy
                   ` (7 more replies)
  0 siblings, 8 replies; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-01-27 10:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, vsementsov, stefanha, pbonzini, den, jsnow

v2:
 1. bug-fixes, that are already in upstream, and renaming of function
 bdrv_reset_dirty_bitmap (which is already in Snow's series) are
 dropped
 2. bitmap store/restore: the concept renamed to serialization, added
 function hbitmap_deserialize_part0, to not transfer zero blocks
 3. migration dirty parameter: added description comment
 4. Other patches are new.

v2.rfc:
Actually, in this version of the series I'm trying not use
migration/block.c at all. Instead a separate migration unit is added
in the new file migration/dirty-bitmap.c. Now bitmaps are migrated
like blocks in block migration, they have their "dirty-dirty" bitmaps,
for tracking set/unset changes during migration.

The advantages are:
  - no complications of migration/block.c
  - separate dirty-dirty bitmaps provide handling of "unset's"
  - more effective meta-data/data ratio - no tiny bitmap-blocks.



v1:
These patches provide dirty bitmap migration feature. Only named dirty
bitmaps are to be migrated. Migration is made as a part of block
migration in block-migration.c.

Dirty bitmap migration may be enabled by "dirty" parameter for qmp migrate
command. If "blk" and "inc" parameters are false when "dirty" is true
block migration is actually skipped: no allocatoions, no bdrv_read's,
no bdrv_write's, only bitmaps are migrated.

The patch set includes two my previous bug fixes, which are necessary
for it. The patch set is based on Incremental backup series by John
Snow.

Vladimir Sementsov-Ogievskiy (8):
  qmp: print dirty bitmap
  hbitmap: serialization
  block: BdrvDirtyBitmap serialization interface
  block: add dirty-dirty bitmaps
  block: add bdrv_dirty_bitmap_enabled()
  block: add bdrv_next_dirty_bitmap()
  migration: add dirty parameter
  migration: add migration/dirty-bitmap.c

 block.c                       | 128 +++++++++
 blockdev.c                    |  13 +
 hmp-commands.hx               |  27 +-
 hmp.c                         |  12 +-
 hmp.h                         |   1 +
 include/block/block.h         |  22 ++
 include/migration/block.h     |   1 +
 include/migration/migration.h |   1 +
 include/qemu/hbitmap.h        |  59 ++++
 migration/Makefile.objs       |   2 +-
 migration/dirty-bitmap.c      | 606 ++++++++++++++++++++++++++++++++++++++++++
 migration/migration.c         |   4 +-
 qapi-schema.json              |  10 +-
 qapi/block-core.json          |   3 +
 qmp-commands.hx               |  10 +-
 savevm.c                      |   3 +-
 util/hbitmap.c                |  96 +++++++
 vl.c                          |   1 +
 18 files changed, 987 insertions(+), 12 deletions(-)
 create mode 100644 migration/dirty-bitmap.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH RFC v2 1/8] qmp: print dirty bitmap
  2015-01-27 10:56 [Qemu-devel] [PATCH RFC v2 0/8] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
@ 2015-01-27 10:56 ` Vladimir Sementsov-Ogievskiy
  2015-01-27 16:17   ` Eric Blake
  2015-02-10 21:28   ` John Snow
  2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 2/8] hbitmap: serialization Vladimir Sementsov-Ogievskiy
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-01-27 10:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, vsementsov, stefanha, pbonzini, den, jsnow

Adds qmp and hmp commands to print dirty bitmap. This is needed only for
testing.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 block.c               | 33 +++++++++++++++++++++++++++++++++
 blockdev.c            | 13 +++++++++++++
 hmp-commands.hx       | 15 +++++++++++++++
 hmp.c                 |  8 ++++++++
 hmp.h                 |  1 +
 include/block/block.h |  2 ++
 qapi-schema.json      |  3 ++-
 qapi/block-core.json  |  3 +++
 qmp-commands.hx       |  5 +++++
 9 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 2466ba8..6d3f0b2 100644
--- a/block.c
+++ b/block.c
@@ -5374,6 +5374,39 @@ BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap(BlockDriverState *bs,
 }
 
 
+void bdrv_print_dirty_bitmap(BdrvDirtyBitmap *bitmap)
+{
+    unsigned long a = 0, b = 0;
+
+    printf("bitmap '%s'\n", bitmap->name ? bitmap->name : "no name");
+    printf("enabled: %s\n", bitmap->enabled ? "true" : "false");
+    printf("size: %" PRId64 "\n", bitmap->size);
+    printf("granularity: %" PRId64 "\n", bitmap->granularity);
+    printf("dirty regions begin:\n");
+
+    while (true) {
+        for (a = b; a < bitmap->size && !hbitmap_get(bitmap->bitmap, a); ++a) {
+            ;
+        }
+        if (a >= bitmap->size) {
+            break;
+        }
+
+        for (b = a + 1;
+             b < bitmap->size && hbitmap_get(bitmap->bitmap, b);
+             ++b) {
+            ;
+        }
+
+        printf("%ld -> %ld\n", a, b - 1);
+        if (b >= bitmap->size) {
+            break;
+        }
+    }
+
+    printf("dirty regions end\n");
+}
+
 BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,
                                           int granularity,
                                           const char *name,
diff --git a/blockdev.c b/blockdev.c
index 209fedd..66f0437 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2074,6 +2074,19 @@ void qmp_block_dirty_bitmap_add(const char *node_ref, const char *name,
     aio_context_release(aio_context);
 }
 
+void qmp_block_dirty_bitmap_print(const char *node_ref, const char *name,
+                                  Error **errp)
+{
+    BdrvDirtyBitmap *bitmap;
+
+    bitmap = block_dirty_bitmap_lookup(node_ref, name, NULL, errp);
+    if (!bitmap) {
+        return;
+    }
+
+    bdrv_print_dirty_bitmap(bitmap);
+}
+
 void qmp_block_dirty_bitmap_remove(const char *node_ref, const char *name,
                                    Error **errp)
 {
diff --git a/hmp-commands.hx b/hmp-commands.hx
index e37bc8b..a9be506 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -58,6 +58,21 @@ Quit the emulator.
 ETEXI
 
     {
+        .name       = "print_dirty_bitmap",
+        .args_type  = "device:B,bitmap:s",
+        .params     = "device bitmap",
+        .help       = "print dirty bitmap",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd = hmp_print_dirty_bitmap,
+    },
+
+STEXI
+@item print_dirty_bitmap device_id bitmap_name
+@findex print_dirty_bitmap
+Print dirty bitmap meta information and dirty regions.
+ETEXI
+
+    {
         .name       = "block_resize",
         .args_type  = "device:B,size:o",
         .params     = "device size",
diff --git a/hmp.c b/hmp.c
index 63b19c7..a269145 100644
--- a/hmp.c
+++ b/hmp.c
@@ -782,6 +782,14 @@ void hmp_info_tpm(Monitor *mon, const QDict *qdict)
     qapi_free_TPMInfoList(info_list);
 }
 
+void hmp_print_dirty_bitmap(Monitor *mon, const QDict *qdict)
+{
+    const char *device = qdict_get_str(qdict, "device");
+    const char *name = qdict_get_str(qdict, "bitmap");
+
+    qmp_block_dirty_bitmap_print(device, name, NULL);
+}
+
 void hmp_quit(Monitor *mon, const QDict *qdict)
 {
     monitor_suspend(mon);
diff --git a/hmp.h b/hmp.h
index 4bb5dca..6bbbc33 100644
--- a/hmp.h
+++ b/hmp.h
@@ -19,6 +19,7 @@
 #include "qapi-types.h"
 #include "qapi/qmp/qdict.h"
 
+void hmp_print_dirty_bitmap(Monitor *mon, const QDict *qdict);
 void hmp_info_name(Monitor *mon, const QDict *qdict);
 void hmp_info_version(Monitor *mon, const QDict *qdict);
 void hmp_info_kvm(Monitor *mon, const QDict *qdict);
diff --git a/include/block/block.h b/include/block/block.h
index cb1f28d..6cf067f 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -459,6 +459,8 @@ void bdrv_dirty_iter_init(BlockDriverState *bs,
 void bdrv_set_dirty_iter(struct HBitmapIter *hbi, int64_t offset);
 int64_t bdrv_get_dirty_count(BlockDriverState *bs, BdrvDirtyBitmap *bitmap);
 
+void bdrv_print_dirty_bitmap(BdrvDirtyBitmap *bitmap);
+
 void bdrv_enable_copy_on_read(BlockDriverState *bs);
 void bdrv_disable_copy_on_read(BlockDriverState *bs);
 
diff --git a/qapi-schema.json b/qapi-schema.json
index 85f55d9..1475f69 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1263,7 +1263,8 @@
        'blockdev-snapshot-internal-sync': 'BlockdevSnapshotInternal',
        'block-dirty-bitmap-add': 'BlockDirtyBitmapAdd',
        'block-dirty-bitmap-enable': 'BlockDirtyBitmap',
-       'block-dirty-bitmap-disable': 'BlockDirtyBitmap'
+       'block-dirty-bitmap-disable': 'BlockDirtyBitmap',
+       'block-dirty-bitmap-print': 'BlockDirtyBitmap'
    } }
 
 ##
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 1892b50..3e1edb1 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -982,6 +982,9 @@
 {'command': 'block-dirty-bitmap-disable',
   'data': 'BlockDirtyBitmap' }
 
+{'command': 'block-dirty-bitmap-print',
+  'data': 'BlockDirtyBitmap' }
+
 ##
 # @block_set_io_throttle:
 #
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 479d4f5..8065715 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1222,6 +1222,11 @@ EQMP
         .args_type  = "node-ref:B,name:s",
         .mhandler.cmd_new = qmp_marshal_input_block_dirty_bitmap_disable,
     },
+    {
+        .name       = "block-dirty-bitmap-print",
+        .args_type  = "node-ref:B,name:s",
+        .mhandler.cmd_new = qmp_marshal_input_block_dirty_bitmap_print,
+    },
 
 SQMP
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH RFC v2 2/8] hbitmap: serialization
  2015-01-27 10:56 [Qemu-devel] [PATCH RFC v2 0/8] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
  2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 1/8] qmp: print dirty bitmap Vladimir Sementsov-Ogievskiy
@ 2015-01-27 10:56 ` Vladimir Sementsov-Ogievskiy
  2015-02-10 21:29   ` John Snow
  2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 3/8] block: BdrvDirtyBitmap serialization interface Vladimir Sementsov-Ogievskiy
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-01-27 10:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, vsementsov, stefanha, pbonzini, den, jsnow

Functions to serialize / deserialize(restore) HBitmap. HBitmap should be
saved to linear sequence of bits independently of endianness and bitmap
array element (unsigned long) size. Therefore Little Endian is chosen.

These functions are appropriate for dirty bitmap migration, restoring
the bitmap in several steps is available. To save performance, every
step writes only the last level of the bitmap. All other levels are
restored by hbitmap_deserialize_finish() as a last step of restoring.
So, HBitmap is inconsistent while restoring.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 include/qemu/hbitmap.h | 59 +++++++++++++++++++++++++++++++
 util/hbitmap.c         | 96 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 155 insertions(+)

diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index c48c50a..1d37582 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -137,6 +137,65 @@ void hbitmap_reset(HBitmap *hb, uint64_t start, uint64_t count);
 bool hbitmap_get(const HBitmap *hb, uint64_t item);
 
 /**
+ * hbitmap_data_size:
+ * @hb: HBitmap to operate on.
+ * @count: Number of bits
+ *
+ * Return amount of bytes hbitmap_serialize_part needs
+ */
+uint64_t hbitmap_data_size(const HBitmap *hb, uint64_t count);
+
+/**
+ * hbitmap_serialize_part
+ * @hb: HBitmap to oprate on.
+ * @buf: Buffer to store serialized bitmap.
+ * @start: First bit to store.
+ * @count: Number of bits to store.
+ *
+ * Stores HBitmap data corresponding to given region. The format of saved data
+ * is linear sequence of bits, so it can be used by hbitmap_deserialize_part
+ * independently of endianess and size of HBitmap level array elements
+ */
+void hbitmap_serialize_part(const HBitmap *hb, uint8_t *buf,
+                            uint64_t start, uint64_t count);
+
+/**
+ * hbitmap_deserialize_part
+ * @hb: HBitmap to operate on.
+ * @buf: Buffer to restore bitmap data from.
+ * @start: First bit to restore.
+ * @count: Number of bits to restore.
+ *
+ * Retores HBitmap data corresponding to given region. The format is the same
+ * as for hbitmap_serialize_part.
+ *
+ * ! The bitmap becomes inconsistent after this operation.
+ * hbitmap_serialize_finish should be called before using the bitmap after
+ * data restoring.
+ */
+void hbitmap_deserialize_part(HBitmap *hb, uint8_t *buf,
+                              uint64_t start, uint64_t count);
+
+/**
+ * hbitmap_deserialize_part0
+ * @hb: HBitmap to operate on.
+ * @start: First bit to restore.
+ * @count: Number of bits to restore.
+ *
+ * Same as hbitmap_serialize_part, but fills the bitmap with zeroes.
+ */
+void hbitmap_deserialize_part0(HBitmap *hb, uint64_t start, uint64_t count);
+
+/**
+ * hbitmap_deserialize_finish
+ * @hb: HBitmap to operate on.
+ *
+ * Repair HBitmap after calling hbitmap_deserialize_data. Actually, all HBitmap
+ * layers are restored here.
+ */
+void hbitmap_deserialize_finish(HBitmap *hb);
+
+/**
  * hbitmap_free:
  * @hb: HBitmap to operate on.
  *
diff --git a/util/hbitmap.c b/util/hbitmap.c
index f400dcb..55226a0 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -366,6 +366,102 @@ bool hbitmap_get(const HBitmap *hb, uint64_t item)
     return (hb->levels[HBITMAP_LEVELS - 1][pos >> BITS_PER_LEVEL] & bit) != 0;
 }
 
+uint64_t hbitmap_data_size(const HBitmap *hb, uint64_t count)
+{
+    uint64_t size, gran;
+
+    if (count == 0) {
+        return 0;
+    }
+
+    gran = 1ll << hb->granularity;
+    size = (((gran + count - 2) >> hb->granularity) >> BITS_PER_LEVEL) + 1;
+
+    return size * sizeof(unsigned long);
+}
+
+void hbitmap_serialize_part(const HBitmap *hb, uint8_t *buf,
+                            uint64_t start, uint64_t count)
+{
+    uint64_t i;
+    uint64_t last = start + count - 1;
+    unsigned long *out = (unsigned long *)buf;
+
+    if (count == 0) {
+        return;
+    }
+
+    start = (start >> hb->granularity) >> BITS_PER_LEVEL;
+    last = (last >> hb->granularity) >> BITS_PER_LEVEL;
+    count = last - start + 1;
+
+    for (i = start; i <= last; ++i) {
+        unsigned long el = hb->levels[HBITMAP_LEVELS - 1][i];
+        out[i] = (BITS_PER_LONG == 32 ? cpu_to_le32(el) : cpu_to_le64(el));
+    }
+}
+
+void hbitmap_deserialize_part(HBitmap *hb, uint8_t *buf,
+                              uint64_t start, uint64_t count)
+{
+    uint64_t i;
+    uint64_t last = start + count - 1;
+    unsigned long *in = (unsigned long *)buf;
+
+    if (count == 0) {
+        return;
+    }
+
+    start = (start >> hb->granularity) >> BITS_PER_LEVEL;
+    last = (last >> hb->granularity) >> BITS_PER_LEVEL;
+    count = last - start + 1;
+
+    for (i = start; i <= last; ++i) {
+        hb->levels[HBITMAP_LEVELS - 1][i] =
+            (BITS_PER_LONG == 32 ? le32_to_cpu(in[i]) : le64_to_cpu(in[i]));
+    }
+}
+
+void hbitmap_deserialize_part0(HBitmap *hb, uint64_t start, uint64_t count)
+{
+    uint64_t last = start + count - 1;
+
+    if (count == 0) {
+        return;
+    }
+
+    start = (start >> hb->granularity) >> BITS_PER_LEVEL;
+    last = (last >> hb->granularity) >> BITS_PER_LEVEL;
+    count = last - start + 1;
+
+    memset(hb->levels[HBITMAP_LEVELS - 1] + start, 0,
+           count * sizeof(unsigned long));
+}
+
+void hbitmap_deserialize_finish(HBitmap *bitmap)
+{
+    int64_t i, size, prev_size;
+    int lev;
+
+    /* restore levels starting from penultimate to zero level, assuming
+     * that the last level is ok */
+    size = MAX((bitmap->size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 1);
+    for (lev = HBITMAP_LEVELS - 1; lev-- > 0; ) {
+        prev_size = size;
+        size = MAX((size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 1);
+        memset(bitmap->levels[lev], 0, size * sizeof(unsigned long));
+
+        for (i = 0; i < prev_size; ++i) {
+            if (bitmap->levels[lev + 1][i]) {
+                bitmap->levels[lev][i >> BITS_PER_LEVEL] |=
+                    1 << (i & (BITS_PER_LONG - 1));
+            }
+        }
+    }
+
+    bitmap->levels[0][0] |= 1UL << (BITS_PER_LONG - 1);
+}
+
 void hbitmap_free(HBitmap *hb)
 {
     unsigned i;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH RFC v2 3/8] block: BdrvDirtyBitmap serialization interface
  2015-01-27 10:56 [Qemu-devel] [PATCH RFC v2 0/8] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
  2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 1/8] qmp: print dirty bitmap Vladimir Sementsov-Ogievskiy
  2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 2/8] hbitmap: serialization Vladimir Sementsov-Ogievskiy
@ 2015-01-27 10:56 ` Vladimir Sementsov-Ogievskiy
  2015-02-10 21:29   ` John Snow
  2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 4/8] block: add dirty-dirty bitmaps Vladimir Sementsov-Ogievskiy
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-01-27 10:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, vsementsov, stefanha, pbonzini, den, jsnow

Several functions to provide necessary access to BdrvDirtyBitmap for
block-migration.c

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 block.c               | 36 ++++++++++++++++++++++++++++++++++++
 include/block/block.h | 12 ++++++++++++
 2 files changed, 48 insertions(+)

diff --git a/block.c b/block.c
index 6d3f0b2..9860fc1 100644
--- a/block.c
+++ b/block.c
@@ -5552,6 +5552,42 @@ void bdrv_clear_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap)
     hbitmap_reset(bitmap->bitmap, 0, bitmap->size);
 }
 
+const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap)
+{
+    return bitmap->name;
+}
+
+uint64_t bdrv_dirty_bitmap_data_size(const BdrvDirtyBitmap *bitmap,
+                                     uint64_t count)
+{
+    return hbitmap_data_size(bitmap->bitmap, count);
+}
+
+void bdrv_dirty_bitmap_serialize_part(const BdrvDirtyBitmap *bitmap,
+                                      uint8_t *buf, uint64_t start,
+                                      uint64_t count)
+{
+    hbitmap_serialize_part(bitmap->bitmap, buf, start, count);
+}
+
+void bdrv_dirty_bitmap_deserialize_part(BdrvDirtyBitmap *bitmap,
+                                        uint8_t *buf, uint64_t start,
+                                        uint64_t count)
+{
+    hbitmap_deserialize_part(bitmap->bitmap, buf, start, count);
+}
+
+void bdrv_dirty_bitmap_deserialize_part0(BdrvDirtyBitmap *bitmap,
+                                         uint64_t start, uint64_t count)
+{
+    hbitmap_deserialize_part0(bitmap->bitmap, start, count);
+}
+
+void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap)
+{
+    hbitmap_deserialize_finish(bitmap->bitmap);
+}
+
 static void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
                            int nr_sectors)
 {
diff --git a/include/block/block.h b/include/block/block.h
index 6cf067f..0890cd2 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -460,6 +460,18 @@ void bdrv_set_dirty_iter(struct HBitmapIter *hbi, int64_t offset);
 int64_t bdrv_get_dirty_count(BlockDriverState *bs, BdrvDirtyBitmap *bitmap);
 
 void bdrv_print_dirty_bitmap(BdrvDirtyBitmap *bitmap);
+const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap);
+uint64_t bdrv_dirty_bitmap_data_size(const BdrvDirtyBitmap *bitmap,
+                                     uint64_t count);
+void bdrv_dirty_bitmap_serialize_part(const BdrvDirtyBitmap *bitmap,
+                                      uint8_t *buf, uint64_t start,
+                                      uint64_t count);
+void bdrv_dirty_bitmap_deserialize_part(BdrvDirtyBitmap *bitmap,
+                                        uint8_t *buf, uint64_t start,
+                                        uint64_t count);
+void bdrv_dirty_bitmap_deserialize_part0(BdrvDirtyBitmap *bitmap,
+                                         uint64_t start, uint64_t count);
+void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap);
 
 void bdrv_enable_copy_on_read(BlockDriverState *bs);
 void bdrv_disable_copy_on_read(BlockDriverState *bs);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH RFC v2 4/8] block: add dirty-dirty bitmaps
  2015-01-27 10:56 [Qemu-devel] [PATCH RFC v2 0/8] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 3/8] block: BdrvDirtyBitmap serialization interface Vladimir Sementsov-Ogievskiy
@ 2015-01-27 10:56 ` Vladimir Sementsov-Ogievskiy
  2015-02-10 21:30   ` John Snow
  2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 5/8] block: add bdrv_dirty_bitmap_enabled() Vladimir Sementsov-Ogievskiy
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-01-27 10:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, vsementsov, stefanha, pbonzini, den, jsnow

A dirty-dirty bitmap is a dirty bitmap for BdrvDirtyBitmap. It tracks
set/unset changes of BdrvDirtyBitmap.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 block.c               | 44 ++++++++++++++++++++++++++++++++++++++++++++
 include/block/block.h |  5 +++++
 2 files changed, 49 insertions(+)

diff --git a/block.c b/block.c
index 9860fc1..8ab724b 100644
--- a/block.c
+++ b/block.c
@@ -53,6 +53,7 @@
 
 struct BdrvDirtyBitmap {
     HBitmap *bitmap;
+    HBitmap *dirty_dirty_bitmap;
     BdrvDirtyBitmap *originator;
     int64_t size;
     int64_t granularity;
@@ -5373,6 +5374,30 @@ BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap(BlockDriverState *bs,
     return originator;
 }
 
+HBitmap *bdrv_create_dirty_dirty_bitmap(BdrvDirtyBitmap *bitmap,
+                                        uint64_t granularity)
+{
+    uint64_t sector_granularity;
+
+    assert((granularity & (granularity - 1)) == 0);
+
+    granularity *= 8 * bitmap->granularity;
+    sector_granularity = granularity >> BDRV_SECTOR_BITS;
+    assert(sector_granularity);
+
+    bitmap->dirty_dirty_bitmap =
+        hbitmap_alloc(bitmap->size, ffsll(sector_granularity) - 1);
+
+    return bitmap->dirty_dirty_bitmap;
+}
+
+void bdrv_release_dirty_dirty_bitmap(BdrvDirtyBitmap *bitmap)
+{
+    if (bitmap->dirty_dirty_bitmap) {
+        hbitmap_free(bitmap->dirty_dirty_bitmap);
+        bitmap->dirty_dirty_bitmap = NULL;
+    }
+}
 
 void bdrv_print_dirty_bitmap(BdrvDirtyBitmap *bitmap)
 {
@@ -5447,6 +5472,9 @@ void bdrv_release_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap)
         if (bm == bitmap) {
             QLIST_REMOVE(bitmap, list);
             hbitmap_free(bitmap->bitmap);
+            if (bitmap->dirty_dirty_bitmap) {
+                hbitmap_free(bitmap->dirty_dirty_bitmap);
+            }
             g_free(bitmap->name);
             g_free(bitmap);
             return;
@@ -5534,6 +5562,10 @@ void bdrv_set_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
 {
     if (bitmap->enabled) {
         hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
+
+        if (bitmap->dirty_dirty_bitmap) {
+            hbitmap_set(bitmap->dirty_dirty_bitmap, cur_sector, nr_sectors);
+        }
     }
 }
 
@@ -5541,6 +5573,9 @@ void bdrv_reset_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
                              int64_t cur_sector, int nr_sectors)
 {
     hbitmap_reset(bitmap->bitmap, cur_sector, nr_sectors);
+    if (bitmap->dirty_dirty_bitmap) {
+        hbitmap_set(bitmap->dirty_dirty_bitmap, cur_sector, nr_sectors);
+    }
 }
 
 /**
@@ -5550,6 +5585,9 @@ void bdrv_reset_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
 void bdrv_clear_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap)
 {
     hbitmap_reset(bitmap->bitmap, 0, bitmap->size);
+    if (bitmap->dirty_dirty_bitmap) {
+        hbitmap_set(bitmap->dirty_dirty_bitmap, 0, bitmap->size);
+    }
 }
 
 const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap)
@@ -5597,6 +5635,9 @@ static void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
             continue;
         }
         hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
+        if (bitmap->dirty_dirty_bitmap) {
+            hbitmap_set(bitmap->dirty_dirty_bitmap, cur_sector, nr_sectors);
+        }
     }
 }
 
@@ -5606,6 +5647,9 @@ static void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector,
     BdrvDirtyBitmap *bitmap;
     QLIST_FOREACH(bitmap, &bs->dirty_bitmaps, list) {
         hbitmap_reset(bitmap->bitmap, cur_sector, nr_sectors);
+        if (bitmap->dirty_dirty_bitmap) {
+            hbitmap_set(bitmap->dirty_dirty_bitmap, cur_sector, nr_sectors);
+        }
     }
 }
 
diff --git a/include/block/block.h b/include/block/block.h
index 0890cd2..648b0a9 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -4,6 +4,7 @@
 #include "block/aio.h"
 #include "qemu-common.h"
 #include "qemu/option.h"
+#include "qemu/hbitmap.h"
 #include "block/coroutine.h"
 #include "block/accounting.h"
 #include "qapi/qmp/qobject.h"
@@ -473,6 +474,10 @@ void bdrv_dirty_bitmap_deserialize_part0(BdrvDirtyBitmap *bitmap,
                                          uint64_t start, uint64_t count);
 void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap);
 
+HBitmap *bdrv_create_dirty_dirty_bitmap(BdrvDirtyBitmap *bitmap,
+                                        uint64_t granularity);
+void bdrv_release_dirty_dirty_bitmap(BdrvDirtyBitmap *bitmap);
+
 void bdrv_enable_copy_on_read(BlockDriverState *bs);
 void bdrv_disable_copy_on_read(BlockDriverState *bs);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH RFC v2 5/8] block: add bdrv_dirty_bitmap_enabled()
  2015-01-27 10:56 [Qemu-devel] [PATCH RFC v2 0/8] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
                   ` (3 preceding siblings ...)
  2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 4/8] block: add dirty-dirty bitmaps Vladimir Sementsov-Ogievskiy
@ 2015-01-27 10:56 ` Vladimir Sementsov-Ogievskiy
  2015-02-10 21:30   ` John Snow
  2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 6/8] block: add bdrv_next_dirty_bitmap() Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-01-27 10:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, vsementsov, stefanha, pbonzini, den, jsnow

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 block.c               | 5 +++++
 include/block/block.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/block.c b/block.c
index 8ab724b..15fc621 100644
--- a/block.c
+++ b/block.c
@@ -5551,6 +5551,11 @@ uint64_t bdrv_dirty_bitmap_granularity(BlockDriverState *bs,
     return bitmap->granularity;
 }
 
+bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap)
+{
+    return bitmap->enabled;
+}
+
 void bdrv_dirty_iter_init(BlockDriverState *bs,
                           BdrvDirtyBitmap *bitmap, HBitmapIter *hbi)
 {
diff --git a/include/block/block.h b/include/block/block.h
index 648b0a9..7b49d98 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -449,6 +449,7 @@ BlockDirtyInfoList *bdrv_query_dirty_bitmaps(BlockDriverState *bs);
 uint64_t bdrv_get_default_bitmap_granularity(BlockDriverState *bs);
 uint64_t bdrv_dirty_bitmap_granularity(BlockDriverState *bs,
                                       BdrvDirtyBitmap *bitmap);
+bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap);
 int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap, int64_t sector);
 void bdrv_set_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
                            int64_t cur_sector, int nr_sectors);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH RFC v2 6/8] block: add bdrv_next_dirty_bitmap()
  2015-01-27 10:56 [Qemu-devel] [PATCH RFC v2 0/8] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
                   ` (4 preceding siblings ...)
  2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 5/8] block: add bdrv_dirty_bitmap_enabled() Vladimir Sementsov-Ogievskiy
@ 2015-01-27 10:56 ` Vladimir Sementsov-Ogievskiy
  2015-02-10 21:31   ` John Snow
  2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 7/8] migration: add dirty parameter Vladimir Sementsov-Ogievskiy
  2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 8/8] migration: add migration/dirty-bitmap.c Vladimir Sementsov-Ogievskiy
  7 siblings, 1 reply; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-01-27 10:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, vsementsov, stefanha, pbonzini, den, jsnow

Like bdrv_next()  - bdrv_next_dirty_bitmap() is a function to provide
access to private dirty bitmaps list.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 block.c               | 10 ++++++++++
 include/block/block.h |  2 ++
 2 files changed, 12 insertions(+)

diff --git a/block.c b/block.c
index 15fc621..9e59c2e 100644
--- a/block.c
+++ b/block.c
@@ -5514,6 +5514,16 @@ BlockDirtyInfoList *bdrv_query_dirty_bitmaps(BlockDriverState *bs)
     return list;
 }
 
+BdrvDirtyBitmap *bdrv_next_dirty_bitmap(BlockDriverState *bs,
+                                        BdrvDirtyBitmap *bitmap)
+{
+    if (bitmap == NULL) {
+        return QLIST_FIRST(&bs->dirty_bitmaps);
+    }
+
+    return QLIST_NEXT(bitmap, list);
+}
+
 int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap, int64_t sector)
 {
     if (bitmap) {
diff --git a/include/block/block.h b/include/block/block.h
index 7b49d98..34d0259 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -474,6 +474,8 @@ void bdrv_dirty_bitmap_deserialize_part(BdrvDirtyBitmap *bitmap,
 void bdrv_dirty_bitmap_deserialize_part0(BdrvDirtyBitmap *bitmap,
                                          uint64_t start, uint64_t count);
 void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap);
+BdrvDirtyBitmap *bdrv_next_dirty_bitmap(BlockDriverState *bs,
+                                        BdrvDirtyBitmap *bitmap);
 
 HBitmap *bdrv_create_dirty_dirty_bitmap(BdrvDirtyBitmap *bitmap,
                                         uint64_t granularity);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH RFC v2 7/8] migration: add dirty parameter
  2015-01-27 10:56 [Qemu-devel] [PATCH RFC v2 0/8] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
                   ` (5 preceding siblings ...)
  2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 6/8] block: add bdrv_next_dirty_bitmap() Vladimir Sementsov-Ogievskiy
@ 2015-01-27 10:56 ` Vladimir Sementsov-Ogievskiy
  2015-01-27 16:20   ` Eric Blake
  2015-02-10 21:32   ` John Snow
  2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 8/8] migration: add migration/dirty-bitmap.c Vladimir Sementsov-Ogievskiy
  7 siblings, 2 replies; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-01-27 10:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, vsementsov, stefanha, pbonzini, den, jsnow

Add dirty parameter to qmp-migrate command. If this parameter is true,
migration/block.c will migrate dirty bitmaps. This parameter can be used
without "blk" parameter to migrate only dirty bitmaps, skipping block
migration.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 hmp-commands.hx               | 12 +++++++-----
 hmp.c                         |  4 +++-
 include/migration/migration.h |  1 +
 migration/migration.c         |  4 +++-
 qapi-schema.json              |  7 ++++++-
 qmp-commands.hx               |  5 ++++-
 savevm.c                      |  3 ++-
 7 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index a9be506..c16f93c 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -902,23 +902,25 @@ ETEXI
 
     {
         .name       = "migrate",
-        .args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
-        .params     = "[-d] [-b] [-i] uri",
+        .args_type  = "detach:-d,blk:-b,inc:-i,dirty:-D,uri:s",
+        .params     = "[-d] [-b] [-i] [-D] uri",
         .help       = "migrate to URI (using -d to not wait for completion)"
 		      "\n\t\t\t -b for migration without shared storage with"
 		      " full copy of disk\n\t\t\t -i for migration without "
-		      "shared storage with incremental copy of disk "
-		      "(base image shared between src and destination)",
+		      "shared storage with incremental copy of disk\n\t\t\t"
+		      " -D for migration of named dirty bitmaps as well\n\t\t\t"
+		      " (base image shared between src and destination)",
         .mhandler.cmd = hmp_migrate,
     },
 
 
 STEXI
-@item migrate [-d] [-b] [-i] @var{uri}
+@item migrate [-d] [-b] [-i] [-D] @var{uri}
 @findex migrate
 Migrate to @var{uri} (using -d to not wait for completion).
 	-b for migration with full copy of disk
 	-i for migration with incremental copy of disk (base image is shared)
+	-D for migration of named dirty bitmaps
 ETEXI
 
     {
diff --git a/hmp.c b/hmp.c
index a269145..0b89ee8 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1347,10 +1347,12 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
     int detach = qdict_get_try_bool(qdict, "detach", 0);
     int blk = qdict_get_try_bool(qdict, "blk", 0);
     int inc = qdict_get_try_bool(qdict, "inc", 0);
+    int dirty = qdict_get_try_bool(qdict, "dirty", 0);
     const char *uri = qdict_get_str(qdict, "uri");
     Error *err = NULL;
 
-    qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, &err);
+    qmp_migrate(uri, !!blk, blk, !!inc, inc, !!dirty, dirty,
+                false, false, &err);
     if (err) {
         monitor_printf(mon, "migrate: %s\n", error_get_pretty(err));
         error_free(err);
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 3cb5ba8..48d71d3 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -37,6 +37,7 @@
 struct MigrationParams {
     bool blk;
     bool shared;
+    bool dirty;
 };
 
 typedef struct MigrationState MigrationState;
diff --git a/migration/migration.c b/migration/migration.c
index c49a05a..e7bb7f3 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -404,7 +404,8 @@ void migrate_del_blocker(Error *reason)
 }
 
 void qmp_migrate(const char *uri, bool has_blk, bool blk,
-                 bool has_inc, bool inc, bool has_detach, bool detach,
+                 bool has_inc, bool inc, bool has_dirty, bool dirty,
+                 bool has_detach, bool detach,
                  Error **errp)
 {
     Error *local_err = NULL;
@@ -414,6 +415,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 
     params.blk = has_blk && blk;
     params.shared = has_inc && inc;
+    params.dirty = has_dirty && dirty;
 
     if (s->state == MIG_STATE_ACTIVE || s->state == MIG_STATE_SETUP ||
         s->state == MIG_STATE_CANCELLING) {
diff --git a/qapi-schema.json b/qapi-schema.json
index 1475f69..1d10d6b 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1656,12 +1656,17 @@
 # @detach: this argument exists only for compatibility reasons and
 #          is ignored by QEMU
 #
+# @dirty: #optional do dirty-bitmaps migration (can be used with or without
+#         @blk parameter)
+#         (since 2.3)
+#
 # Returns: nothing on success
 #
 # Since: 0.14.0
 ##
 { 'command': 'migrate',
-  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' } }
+  'data': { 'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*dirty': 'bool',
+            '*detach': 'bool' } }
 
 # @xen-save-devices-state:
 #
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 8065715..f8e50ac 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -610,7 +610,7 @@ EQMP
 
     {
         .name       = "migrate",
-        .args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
+        .args_type  = "detach:-d,blk:-b,inc:-i,dirty:-D,uri:s",
         .mhandler.cmd_new = qmp_marshal_input_migrate,
     },
 
@@ -624,6 +624,7 @@ Arguments:
 
 - "blk": block migration, full disk copy (json-bool, optional)
 - "inc": incremental disk copy (json-bool, optional)
+- "dirty": migrate named dirty bitmaps (json-bool, optional)
 - "uri": Destination URI (json-string)
 
 Example:
@@ -638,6 +639,8 @@ Notes:
 (2) All boolean arguments default to false
 (3) The user Monitor's "detach" argument is invalid in QMP and should not
     be used
+(4) The "dirty" argument may be used without "blk", to migrate only dirty
+    bitmaps
 
 EQMP
 
diff --git a/savevm.c b/savevm.c
index 08ec678..a598d1d 100644
--- a/savevm.c
+++ b/savevm.c
@@ -784,7 +784,8 @@ static int qemu_savevm_state(QEMUFile *f)
     int ret;
     MigrationParams params = {
         .blk = 0,
-        .shared = 0
+        .shared = 0,
+        .dirty = 0
     };
 
     if (qemu_savevm_state_blocked(NULL)) {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH RFC v2 8/8] migration: add migration/dirty-bitmap.c
  2015-01-27 10:56 [Qemu-devel] [PATCH RFC v2 0/8] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
                   ` (6 preceding siblings ...)
  2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 7/8] migration: add dirty parameter Vladimir Sementsov-Ogievskiy
@ 2015-01-27 10:56 ` Vladimir Sementsov-Ogievskiy
  2015-02-10 21:33   ` John Snow
  7 siblings, 1 reply; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-01-27 10:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, vsementsov, stefanha, pbonzini, den, jsnow

Live migration of dirty bitmaps. Only named dirty bitmaps are migrated.
If destination qemu is already containing a dirty bitmap with the same
name as a migrated bitmap, then their granularities should be the same,
otherwise the error will be generated. If destination qemu doesn't
contain such bitmap it will be created.

format:

1 byte: flags

[ 1 byte: node name size ] \  flags & DEVICE_NAME
[ n bytes: node name     ] /

[ 1 byte: bitmap name size ]       \
[ n bytes: bitmap name     ]       | flags & BITMAP_NAME
[ [ be64: granularity    ] ]  flags & GRANULARITY

[ 1 byte: bitmap enabled bit ] flags & ENABLED

[ be64: start sector      ] \ flags & (NORMAL_CHUNK | ZERO_CHUNK)
[ be32: number of sectors ] /

[ be64: buffer size ] \ flags & NORMAL_CHUNK
[ n bytes: buffer   ] /

The last chunk should contain flags & EOS. The chunk may skip device
and/or bitmap names, assuming them to be the same with the previous
chunk. GRANULARITY is sent with the first chunk for the bitmap. ENABLED
bit is sent in the end of "complete" stage of migration. So when
destination gets ENABLED flag it should deserialize_finish the bitmap
and set its enabled bit to corresponding value.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 include/migration/block.h |   1 +
 migration/Makefile.objs   |   2 +-
 migration/dirty-bitmap.c  | 606 ++++++++++++++++++++++++++++++++++++++++++++++
 vl.c                      |   1 +
 4 files changed, 609 insertions(+), 1 deletion(-)
 create mode 100644 migration/dirty-bitmap.c

diff --git a/include/migration/block.h b/include/migration/block.h
index ffa8ac0..566bb9f 100644
--- a/include/migration/block.h
+++ b/include/migration/block.h
@@ -14,6 +14,7 @@
 #ifndef BLOCK_MIGRATION_H
 #define BLOCK_MIGRATION_H
 
+void dirty_bitmap_mig_init(void);
 void blk_mig_init(void);
 int blk_mig_active(void);
 uint64_t blk_mig_bytes_transferred(void);
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index d929e96..9adfda9 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -6,5 +6,5 @@ common-obj-y += xbzrle.o
 common-obj-$(CONFIG_RDMA) += rdma.o
 common-obj-$(CONFIG_POSIX) += exec.o unix.o fd.o
 
-common-obj-y += block.o
+common-obj-y += block.o dirty-bitmap.o
 
diff --git a/migration/dirty-bitmap.c b/migration/dirty-bitmap.c
new file mode 100644
index 0000000..8621218
--- /dev/null
+++ b/migration/dirty-bitmap.c
@@ -0,0 +1,606 @@
+/*
+ * QEMU dirty bitmap migration
+ *
+ * derived from migration/block.c
+ *
+ * Author:
+ * Sementsov-Ogievskiy Vladimir <vsementsov@parallels.com>
+ *
+ * original copyright message:
+ * =====================================================================
+ * Copyright IBM, Corp. 2009
+ *
+ * Authors:
+ *  Liran Schour   <lirans@il.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Contributions after 2012-01-13 are licensed under the terms of the
+ * GNU GPL, version 2 or (at your option) any later version.
+ * =====================================================================
+ */
+
+#include "block/block.h"
+#include "qemu/main-loop.h"
+#include "qemu/error-report.h"
+#include "migration/block.h"
+#include "migration/migration.h"
+#include "qemu/hbitmap.h"
+#include <assert.h>
+
+#define CHUNK_SIZE                       (1 << 20)
+
+#define DIRTY_BITMAP_MIG_FLAG_EOS           0x01
+#define DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK  0x02
+#define DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK    0x04
+#define DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME   0x08
+#define DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME   0x10
+#define DIRTY_BITMAP_MIG_FLAG_GRANULARITY   0x20
+#define DIRTY_BITMAP_MIG_FLAG_ENABLED       0x40
+/* flags should be <= 0xff */
+
+/* #define DEBUG_DIRTY_BITMAP_MIGRATION */
+
+#ifdef DEBUG_DIRTY_BITMAP_MIGRATION
+#define DPRINTF(fmt, ...) \
+    do { printf("dirty_migration: " fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+    do { } while (0)
+#endif
+
+typedef struct DirtyBitmapMigBitmapState {
+    /* Written during setup phase. */
+    BlockDriverState *bs;
+    BdrvDirtyBitmap *bitmap;
+    HBitmap *dirty_bitmap;
+    int64_t total_sectors;
+    uint64_t sectors_per_chunk;
+    QSIMPLEQ_ENTRY(DirtyBitmapMigBitmapState) entry;
+
+    /* For bulk phase. */
+    bool bulk_completed;
+    int64_t cur_sector;
+    bool granularity_sent;
+
+    /* For dirty phase. */
+    int64_t cur_dirty;
+} DirtyBitmapMigBitmapState;
+
+typedef struct DirtyBitmapMigState {
+    int migration_enable;
+    QSIMPLEQ_HEAD(dbms_list, DirtyBitmapMigBitmapState) dbms_list;
+
+    bool bulk_completed;
+
+    /* for send_bitmap() */
+    BlockDriverState *prev_bs;
+    BdrvDirtyBitmap *prev_bitmap;
+} DirtyBitmapMigState;
+
+static DirtyBitmapMigState dirty_bitmap_mig_state;
+
+/* read name from qemu file:
+ * format:
+ * 1 byte : len = name length (<256)
+ * len bytes : name without last zero byte
+ *
+ * name should point to the buffer >= 256 bytes length
+ */
+static char *qemu_get_name(QEMUFile *f, char *name)
+{
+    int len = qemu_get_byte(f);
+    qemu_get_buffer(f, (uint8_t *)name, len);
+    name[len] = '\0';
+
+    DPRINTF("get name: %d %s\n", len, name);
+
+    return name;
+}
+
+/* write name to qemu file:
+ * format:
+ * same as for qemu_get_name
+ *
+ * maximum name length is 255
+ */
+static void qemu_put_name(QEMUFile *f, const char *name)
+{
+    int len = strlen(name);
+
+    DPRINTF("put name: %d %s\n", len, name);
+
+    assert(len < 256);
+    qemu_put_byte(f, len);
+    qemu_put_buffer(f, (const uint8_t *)name, len);
+}
+
+static void send_bitmap(QEMUFile *f, DirtyBitmapMigBitmapState *dbms,
+                        uint64_t start_sector, uint32_t nr_sectors)
+{
+    BlockDriverState *bs = dbms->bs;
+    BdrvDirtyBitmap *bitmap = dbms->bitmap;
+    uint8_t flags = 0;
+    /* align for buffer_is_zero() */
+    uint64_t align = 4 * sizeof(long);
+    uint64_t buf_size =
+        (bdrv_dirty_bitmap_data_size(bitmap, nr_sectors) + align - 1) &
+        ~(align - 1);
+    uint8_t *buf = g_malloc0(buf_size);
+
+    bdrv_dirty_bitmap_serialize_part(bitmap, buf, start_sector, nr_sectors);
+
+    if (buffer_is_zero(buf, buf_size)) {
+        g_free(buf);
+        buf = NULL;
+        flags |= DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK;
+    } else {
+        flags |= DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK;
+    }
+
+    if (bs != dirty_bitmap_mig_state.prev_bs) {
+        dirty_bitmap_mig_state.prev_bs = bs;
+        flags |= DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME;
+    }
+
+    if (bitmap != dirty_bitmap_mig_state.prev_bitmap) {
+        dirty_bitmap_mig_state.prev_bitmap = bitmap;
+        flags |= DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME;
+    }
+
+    if (dbms->granularity_sent == 0) {
+        dbms->granularity_sent = 1;
+        flags |= DIRTY_BITMAP_MIG_FLAG_GRANULARITY;
+    }
+
+    DPRINTF("Enter send_bitmap"
+            "\n   flags:        %x"
+            "\n   start_sector: %" PRIu64
+            "\n   nr_sectors:   %" PRIu32
+            "\n   data_size:    %" PRIu64 "\n",
+            flags, start_sector, nr_sectors, buf_size);
+
+    qemu_put_byte(f, flags);
+
+    if (flags & DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME) {
+        qemu_put_name(f, bdrv_get_device_name(bs));
+    }
+
+    if (flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
+        qemu_put_name(f, bdrv_dirty_bitmap_name(bitmap));
+
+        if (flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY) {
+            qemu_put_be64(f, bdrv_dirty_bitmap_granularity(bs, bitmap));
+        }
+    } else {
+        assert(!(flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY));
+    }
+
+    qemu_put_be64(f, start_sector);
+    qemu_put_be32(f, nr_sectors);
+
+    /* if a block is zero we need to flush here since the network
+     * bandwidth is now a lot higher than the storage device bandwidth.
+     * thus if we queue zero blocks we slow down the migration.
+     * also, skip writing block when migrate only dirty bitmaps. */
+    if (flags & DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK) {
+        qemu_fflush(f);
+        return;
+    }
+
+    qemu_put_be64(f, buf_size);
+    qemu_put_buffer(f, buf, buf_size);
+    g_free(buf);
+}
+
+
+/* Called with iothread lock taken.  */
+
+static void set_dirty_tracking(void)
+{
+    DirtyBitmapMigBitmapState *dbms;
+
+    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
+        dbms->dirty_bitmap =
+            bdrv_create_dirty_dirty_bitmap(dbms->bitmap, CHUNK_SIZE);
+    }
+}
+
+static void unset_dirty_tracking(void)
+{
+    DirtyBitmapMigBitmapState *dbms;
+
+    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
+        bdrv_release_dirty_dirty_bitmap(dbms->bitmap);
+    }
+}
+
+static void init_dirty_bitmap_migration(QEMUFile *f)
+{
+    BlockDriverState *bs;
+    BdrvDirtyBitmap *bitmap;
+    DirtyBitmapMigBitmapState *dbms;
+
+    dirty_bitmap_mig_state.bulk_completed = false;
+    dirty_bitmap_mig_state.prev_bs = NULL;
+    dirty_bitmap_mig_state.prev_bitmap = NULL;
+
+    for (bs = bdrv_next(NULL); bs; bs = bdrv_next(bs)) {
+        for (bitmap = bdrv_next_dirty_bitmap(bs, NULL); bitmap;
+             bitmap = bdrv_next_dirty_bitmap(bs, bitmap)) {
+            if (!bdrv_dirty_bitmap_name(bitmap)) {
+                continue;
+            }
+
+            dbms = g_new0(DirtyBitmapMigBitmapState, 1);
+            dbms->bs = bs;
+            dbms->bitmap = bitmap;
+            dbms->total_sectors = bdrv_nb_sectors(bs);
+            dbms->sectors_per_chunk = CHUNK_SIZE * 8 *
+                bdrv_dirty_bitmap_granularity(dbms->bs, dbms->bitmap)
+                >> BDRV_SECTOR_BITS;
+
+            QSIMPLEQ_INSERT_TAIL(&dirty_bitmap_mig_state.dbms_list,
+                                 dbms, entry);
+        }
+    }
+}
+
+/* Called with no lock taken.  */
+static void bulk_phase_send_chunk(QEMUFile *f, DirtyBitmapMigBitmapState *dbms)
+{
+    uint32_t nr_sectors = MIN(dbms->total_sectors - dbms->cur_sector,
+                             dbms->sectors_per_chunk);
+
+    send_bitmap(f, dbms, dbms->cur_sector, nr_sectors);
+
+    dbms->cur_sector += nr_sectors;
+    if (dbms->cur_sector >= dbms->total_sectors) {
+        dbms->bulk_completed = true;
+    }
+}
+
+/* Called with no lock taken.  */
+static void bulk_phase(QEMUFile *f, bool limit)
+{
+    DirtyBitmapMigBitmapState *dbms;
+
+    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
+        while (!dbms->bulk_completed) {
+            bulk_phase_send_chunk(f, dbms);
+            if (limit && qemu_file_rate_limit(f)) {
+                return;
+            }
+        }
+    }
+
+    dirty_bitmap_mig_state.bulk_completed = true;
+}
+
+static void blk_mig_reset_dirty_cursor(void)
+{
+    DirtyBitmapMigBitmapState *dbms;
+
+    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
+        dbms->cur_dirty = 0;
+    }
+}
+
+/* Called with iothread lock taken.  */
+static void dirty_phase_send_chunk(QEMUFile *f, DirtyBitmapMigBitmapState *dbms)
+{
+    uint32_t nr_sectors;
+
+    while (dbms->cur_dirty < dbms->total_sectors &&
+           !hbitmap_get(dbms->dirty_bitmap, dbms->cur_dirty)) {
+        dbms->cur_dirty += dbms->sectors_per_chunk;
+    }
+
+    if (dbms->cur_dirty >= dbms->total_sectors) {
+        return;
+    }
+
+    nr_sectors = MIN(dbms->total_sectors - dbms->cur_dirty,
+                     dbms->sectors_per_chunk);
+    send_bitmap(f, dbms, dbms->cur_dirty, nr_sectors);
+    hbitmap_reset(dbms->dirty_bitmap, dbms->cur_dirty, dbms->sectors_per_chunk);
+    dbms->cur_dirty += nr_sectors;
+}
+
+/* Called with iothread lock taken.
+ *
+ * return value:
+ * 0: too much data for max_downtime
+ * 1: few enough data for max_downtime
+*/
+static void dirty_phase(QEMUFile *f, bool limit)
+{
+    DirtyBitmapMigBitmapState *dbms;
+
+    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
+        while (dbms->cur_dirty < dbms->total_sectors) {
+            dirty_phase_send_chunk(f, dbms);
+            if (limit && qemu_file_rate_limit(f)) {
+                return;
+            }
+        }
+    }
+}
+
+
+/* Called with iothread lock taken.  */
+static void dirty_bitmap_mig_cleanup(void)
+{
+    DirtyBitmapMigBitmapState *dbms;
+
+    unset_dirty_tracking();
+
+    while ((dbms = QSIMPLEQ_FIRST(&dirty_bitmap_mig_state.dbms_list)) != NULL) {
+        QSIMPLEQ_REMOVE_HEAD(&dirty_bitmap_mig_state.dbms_list, entry);
+        g_free(dbms);
+    }
+}
+
+static void dirty_bitmap_migration_cancel(void *opaque)
+{
+    dirty_bitmap_mig_cleanup();
+}
+
+static int dirty_bitmap_save_iterate(QEMUFile *f, void *opaque)
+{
+    DPRINTF("Enter save live iterate\n");
+
+    blk_mig_reset_dirty_cursor();
+
+    if (dirty_bitmap_mig_state.bulk_completed) {
+        qemu_mutex_lock_iothread();
+        dirty_phase(f, true);
+        qemu_mutex_unlock_iothread();
+    } else {
+        bulk_phase(f, true);
+    }
+
+    qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
+
+    return dirty_bitmap_mig_state.bulk_completed;
+}
+
+/* Called with iothread lock taken.  */
+
+static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
+{
+    DirtyBitmapMigBitmapState *dbms;
+    DPRINTF("Enter save live complete\n");
+
+    if (!dirty_bitmap_mig_state.bulk_completed) {
+        bulk_phase(f, false);
+    }
+
+    blk_mig_reset_dirty_cursor();
+    dirty_phase(f, false);
+
+    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
+        uint8_t flags = DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME |
+                        DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME |
+                        DIRTY_BITMAP_MIG_FLAG_ENABLED;
+
+        qemu_put_byte(f, flags);
+        qemu_put_name(f, bdrv_get_device_name(dbms->bs));
+        qemu_put_name(f, bdrv_dirty_bitmap_name(dbms->bitmap));
+        qemu_put_byte(f, bdrv_dirty_bitmap_enabled(dbms->bitmap));
+    }
+
+    qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
+
+    DPRINTF("Dirty bitmaps migration completed\n");
+
+    dirty_bitmap_mig_cleanup();
+    return 0;
+}
+
+static uint64_t dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
+                                          uint64_t max_size)
+{
+    DirtyBitmapMigBitmapState *dbms;
+    uint64_t pending = 0;
+
+    qemu_mutex_lock_iothread();
+
+    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
+        uint64_t sectors = hbitmap_count(dbms->dirty_bitmap);
+        if (!dbms->bulk_completed) {
+            sectors += dbms->total_sectors - dbms->cur_sector;
+        }
+        pending += bdrv_dirty_bitmap_data_size(dbms->bitmap, sectors);
+    }
+
+    qemu_mutex_unlock_iothread();
+
+    DPRINTF("Enter save live pending %" PRIu64 ", max: %" PRIu64 "\n",
+            pending, max_size);
+    return pending;
+}
+
+static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
+{
+    int flags;
+
+    static char device_name[256], bitmap_name[256];
+    static BlockDriverState *bs;
+    static BdrvDirtyBitmap *bitmap;
+
+    uint8_t *buf;
+    uint64_t first_sector;
+    uint32_t  nr_sectors;
+    int ret;
+
+    DPRINTF("load start\n");
+
+    do {
+        flags = qemu_get_byte(f);
+        DPRINTF("flags: %x\n", flags);
+
+        if (flags & DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME) {
+            qemu_get_name(f, device_name);
+            bs = bdrv_find(device_name);
+            if (!bs) {
+                fprintf(stderr, "Error: unknown block device '%s'\n",
+                        device_name);
+                return -EINVAL;
+            }
+        }
+
+        if (flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
+            if (!bs) {
+                fprintf(stderr, "Error: block device name is not set\n");
+                return -EINVAL;
+            }
+
+            qemu_get_name(f, bitmap_name);
+            bitmap = bdrv_find_dirty_bitmap(bs, bitmap_name);
+            if (flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY) {
+                /* First chunk from this bitmap */
+                uint64_t granularity = qemu_get_be64(f);
+                if (!bitmap) {
+                    Error *local_err = NULL;
+                    bitmap = bdrv_create_dirty_bitmap(bs, granularity,
+                                                      bitmap_name,
+                                                      &local_err);
+                    if (!bitmap) {
+                        error_report("%s", error_get_pretty(local_err));
+                        error_free(local_err);
+                        return -EINVAL;
+                    }
+                } else {
+                    uint64_t dest_granularity =
+                        bdrv_dirty_bitmap_granularity(bs, bitmap);
+                    if (dest_granularity != granularity) {
+                        fprintf(stderr,
+                                "Error: "
+                                "Migrated bitmap granularity (%" PRIu64 ") "
+                                "is not match with destination bitmap '%s' "
+                                "granularity (%" PRIu64 ")\n",
+                                granularity,
+                                bitmap_name,
+                                dest_granularity);
+                        return -EINVAL;
+                    }
+                }
+                bdrv_disable_dirty_bitmap(bitmap);
+            }
+            if (!bitmap) {
+                fprintf(stderr, "Error: unknown dirty bitmap "
+                        "'%s' for block device '%s'\n",
+                        bitmap_name, device_name);
+                return -EINVAL;
+            }
+        }
+
+        if (flags & DIRTY_BITMAP_MIG_FLAG_ENABLED) {
+            bool enabled;
+            if (!bitmap) {
+                fprintf(stderr, "Error: dirty bitmap name is not set\n");
+                return -EINVAL;
+            }
+            bdrv_dirty_bitmap_deserialize_finish(bitmap);
+            /* complete migration */
+            enabled = qemu_get_byte(f);
+            if (enabled) {
+                bdrv_enable_dirty_bitmap(bitmap);
+            }
+        }
+
+        if (flags & (DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK |
+                     DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK)) {
+            if (!bs) {
+                fprintf(stderr, "Error: block device name is not set\n");
+                return -EINVAL;
+            }
+            if (!bitmap) {
+                fprintf(stderr, "Error: dirty bitmap name is not set\n");
+                return -EINVAL;
+            }
+
+            first_sector = qemu_get_be64(f);
+            nr_sectors = qemu_get_be32(f);
+            DPRINTF("chunk: %lu %u\n", first_sector, nr_sectors);
+
+
+            if (flags & DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK) {
+                bdrv_dirty_bitmap_deserialize_part0(bitmap, first_sector,
+                                                    nr_sectors);
+            } else {
+                uint64_t buf_size = qemu_get_be64(f);
+                uint64_t needed_size =
+                    bdrv_dirty_bitmap_data_size(bitmap, nr_sectors);
+
+                if (needed_size > buf_size) {
+                    fprintf(stderr,
+                            "Error: Migrated bitmap granularity is not "
+                            "match with destination bitmap granularity\n");
+                    return -EINVAL;
+                }
+
+                buf = g_malloc(buf_size);
+                qemu_get_buffer(f, buf, buf_size);
+                bdrv_dirty_bitmap_deserialize_part(bitmap, buf,
+                                                   first_sector,
+                                                   nr_sectors);
+                g_free(buf);
+            }
+        }
+
+        ret = qemu_file_get_error(f);
+        if (ret != 0) {
+            return ret;
+        }
+    } while (!(flags & DIRTY_BITMAP_MIG_FLAG_EOS));
+
+    DPRINTF("load finish\n");
+    return 0;
+}
+
+static void dirty_bitmap_set_params(const MigrationParams *params, void *opaque)
+{
+    dirty_bitmap_mig_state.migration_enable = params->dirty;
+}
+
+static bool dirty_bitmap_is_active(void *opaque)
+{
+    return dirty_bitmap_mig_state.migration_enable == 1;
+}
+
+static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
+{
+    init_dirty_bitmap_migration(f);
+
+    qemu_mutex_lock_iothread();
+    /* start track dirtyness of dirty bitmaps */
+    set_dirty_tracking();
+    qemu_mutex_unlock_iothread();
+
+    blk_mig_reset_dirty_cursor();
+    qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
+
+    return 0;
+}
+
+static SaveVMHandlers savevm_block_handlers = {
+    .set_params = dirty_bitmap_set_params,
+    .save_live_setup = dirty_bitmap_save_setup,
+    .save_live_iterate = dirty_bitmap_save_iterate,
+    .save_live_complete = dirty_bitmap_save_complete,
+    .save_live_pending = dirty_bitmap_save_pending,
+    .load_state = dirty_bitmap_load,
+    .cancel = dirty_bitmap_migration_cancel,
+    .is_active = dirty_bitmap_is_active,
+};
+
+void dirty_bitmap_mig_init(void)
+{
+    QSIMPLEQ_INIT(&dirty_bitmap_mig_state.dbms_list);
+
+    register_savevm_live(NULL, "dirty-bitmap", 0, 1, &savevm_block_handlers,
+                         &dirty_bitmap_mig_state);
+}
diff --git a/vl.c b/vl.c
index a824a7d..dee7220 100644
--- a/vl.c
+++ b/vl.c
@@ -4184,6 +4184,7 @@ int main(int argc, char **argv, char **envp)
 
     blk_mig_init();
     ram_mig_init();
+    dirty_bitmap_mig_init();
 
     /* If the currently selected machine wishes to override the units-per-bus
      * property of its default HBA interface type, do so now. */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 1/8] qmp: print dirty bitmap
  2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 1/8] qmp: print dirty bitmap Vladimir Sementsov-Ogievskiy
@ 2015-01-27 16:17   ` Eric Blake
  2015-01-27 16:23     ` Vladimir Sementsov-Ogievskiy
  2015-02-10 21:28   ` John Snow
  1 sibling, 1 reply; 35+ messages in thread
From: Eric Blake @ 2015-01-27 16:17 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, den, jsnow, stefanha, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1821 bytes --]

On 01/27/2015 03:56 AM, Vladimir Sementsov-Ogievskiy wrote:
> Adds qmp and hmp commands to print dirty bitmap. This is needed only for
> testing.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>  block.c               | 33 +++++++++++++++++++++++++++++++++
>  blockdev.c            | 13 +++++++++++++
>  hmp-commands.hx       | 15 +++++++++++++++
>  hmp.c                 |  8 ++++++++
>  hmp.h                 |  1 +
>  include/block/block.h |  2 ++
>  qapi-schema.json      |  3 ++-
>  qapi/block-core.json  |  3 +++
>  qmp-commands.hx       |  5 +++++
>  9 files changed, 82 insertions(+), 1 deletion(-)
> 

> +void bdrv_print_dirty_bitmap(BdrvDirtyBitmap *bitmap)
> +{
> +    unsigned long a = 0, b = 0;
> +
> +    printf("bitmap '%s'\n", bitmap->name ? bitmap->name : "no name");
> +    printf("enabled: %s\n", bitmap->enabled ? "true" : "false");
> +    printf("size: %" PRId64 "\n", bitmap->size);
> +    printf("granularity: %" PRId64 "\n", bitmap->granularity);
> +    printf("dirty regions begin:\n");
> +

> +void qmp_block_dirty_bitmap_print(const char *node_ref, const char *name,
> +                                  Error **errp)
> +{
> +    BdrvDirtyBitmap *bitmap;
> +
> +    bitmap = block_dirty_bitmap_lookup(node_ref, name, NULL, errp);
> +    if (!bitmap) {
> +        return;
> +    }
> +
> +    bdrv_print_dirty_bitmap(bitmap);

Won't work.  You cannot assume that stdout is usable when invoked from
QMP.  The only sane thing to do is to bundle up the structured data into
JSON, pass that back over the QMP connection, and let the client decide
how to print it.

I'm opposed to adding this command as-is.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 7/8] migration: add dirty parameter
  2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 7/8] migration: add dirty parameter Vladimir Sementsov-Ogievskiy
@ 2015-01-27 16:20   ` Eric Blake
  2015-02-04 14:42     ` Vladimir Sementsov-Ogievskiy
  2015-02-10 21:32   ` John Snow
  1 sibling, 1 reply; 35+ messages in thread
From: Eric Blake @ 2015-01-27 16:20 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, den, jsnow, stefanha, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1011 bytes --]

On 01/27/2015 03:56 AM, Vladimir Sementsov-Ogievskiy wrote:
> Add dirty parameter to qmp-migrate command. If this parameter is true,
> migration/block.c will migrate dirty bitmaps. This parameter can be used
> without "blk" parameter to migrate only dirty bitmaps, skipping block
> migration.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---

> +++ b/qapi-schema.json
> @@ -1656,12 +1656,17 @@
>  # @detach: this argument exists only for compatibility reasons and
>  #          is ignored by QEMU
>  #
> +# @dirty: #optional do dirty-bitmaps migration (can be used with or without
> +#         @blk parameter)
> +#         (since 2.3)

Rather than adding it to 'migrate', where the command is not
introspectible, I'd rather you add it to 'migrate-set-capabilities',
where I can then use 'query-migrate-capabilities' to learn if it is
supported.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 1/8] qmp: print dirty bitmap
  2015-01-27 16:17   ` Eric Blake
@ 2015-01-27 16:23     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-01-27 16:23 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, den, jsnow, stefanha, pbonzini

Ok, I agree. It was just a simple way to test the other staff. I'll 
rewrite it in the following versions of my two series.

Best regards,
Vladimir

On 27.01.2015 19:17, Eric Blake wrote:
> On 01/27/2015 03:56 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Adds qmp and hmp commands to print dirty bitmap. This is needed only for
>> testing.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>> ---
>>   block.c               | 33 +++++++++++++++++++++++++++++++++
>>   blockdev.c            | 13 +++++++++++++
>>   hmp-commands.hx       | 15 +++++++++++++++
>>   hmp.c                 |  8 ++++++++
>>   hmp.h                 |  1 +
>>   include/block/block.h |  2 ++
>>   qapi-schema.json      |  3 ++-
>>   qapi/block-core.json  |  3 +++
>>   qmp-commands.hx       |  5 +++++
>>   9 files changed, 82 insertions(+), 1 deletion(-)
>>
>> +void bdrv_print_dirty_bitmap(BdrvDirtyBitmap *bitmap)
>> +{
>> +    unsigned long a = 0, b = 0;
>> +
>> +    printf("bitmap '%s'\n", bitmap->name ? bitmap->name : "no name");
>> +    printf("enabled: %s\n", bitmap->enabled ? "true" : "false");
>> +    printf("size: %" PRId64 "\n", bitmap->size);
>> +    printf("granularity: %" PRId64 "\n", bitmap->granularity);
>> +    printf("dirty regions begin:\n");
>> +
>> +void qmp_block_dirty_bitmap_print(const char *node_ref, const char *name,
>> +                                  Error **errp)
>> +{
>> +    BdrvDirtyBitmap *bitmap;
>> +
>> +    bitmap = block_dirty_bitmap_lookup(node_ref, name, NULL, errp);
>> +    if (!bitmap) {
>> +        return;
>> +    }
>> +
>> +    bdrv_print_dirty_bitmap(bitmap);
> Won't work.  You cannot assume that stdout is usable when invoked from
> QMP.  The only sane thing to do is to bundle up the structured data into
> JSON, pass that back over the QMP connection, and let the client decide
> how to print it.
>
> I'm opposed to adding this command as-is.
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 7/8] migration: add dirty parameter
  2015-01-27 16:20   ` Eric Blake
@ 2015-02-04 14:42     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-02-04 14:42 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, den, jsnow, stefanha, pbonzini

On 27.01.2015 19:20, Eric Blake wrote:
> On 01/27/2015 03:56 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Add dirty parameter to qmp-migrate command. If this parameter is true,
>> migration/block.c will migrate dirty bitmaps. This parameter can be used
>> without "blk" parameter to migrate only dirty bitmaps, skipping block
>> migration.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>> ---
>> +++ b/qapi-schema.json
>> @@ -1656,12 +1656,17 @@
>>   # @detach: this argument exists only for compatibility reasons and
>>   #          is ignored by QEMU
>>   #
>> +# @dirty: #optional do dirty-bitmaps migration (can be used with or without
>> +#         @blk parameter)
>> +#         (since 2.3)
> Rather than adding it to 'migrate', where the command is not
> introspectible, I'd rather you add it to 'migrate-set-capabilities',
> where I can then use 'query-migrate-capabilities' to learn if it is
> supported.
>
Thank you. Moreover, it is turned out to be a simpler way than with 
migration parameter. I've done the change, it will be in v3. I'm just 
waiting for review of the core part of the series - migration/dirty-bitmap.c

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 1/8] qmp: print dirty bitmap
  2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 1/8] qmp: print dirty bitmap Vladimir Sementsov-Ogievskiy
  2015-01-27 16:17   ` Eric Blake
@ 2015-02-10 21:28   ` John Snow
  1 sibling, 0 replies; 35+ messages in thread
From: John Snow @ 2015-02-10 21:28 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel; +Cc: kwolf, pbonzini, stefanha, den



On 01/27/2015 05:56 AM, Vladimir Sementsov-Ogievskiy wrote:
> Adds qmp and hmp commands to print dirty bitmap. This is needed only for
> testing.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>   block.c               | 33 +++++++++++++++++++++++++++++++++
>   blockdev.c            | 13 +++++++++++++
>   hmp-commands.hx       | 15 +++++++++++++++
>   hmp.c                 |  8 ++++++++
>   hmp.h                 |  1 +
>   include/block/block.h |  2 ++
>   qapi-schema.json      |  3 ++-
>   qapi/block-core.json  |  3 +++
>   qmp-commands.hx       |  5 +++++
>   9 files changed, 82 insertions(+), 1 deletion(-)
>
> diff --git a/block.c b/block.c
> index 2466ba8..6d3f0b2 100644
> --- a/block.c
> +++ b/block.c
> @@ -5374,6 +5374,39 @@ BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap(BlockDriverState *bs,
>   }
>
>
> +void bdrv_print_dirty_bitmap(BdrvDirtyBitmap *bitmap)
> +{
> +    unsigned long a = 0, b = 0;
> +
> +    printf("bitmap '%s'\n", bitmap->name ? bitmap->name : "no name");
> +    printf("enabled: %s\n", bitmap->enabled ? "true" : "false");
> +    printf("size: %" PRId64 "\n", bitmap->size);
> +    printf("granularity: %" PRId64 "\n", bitmap->granularity);
> +    printf("dirty regions begin:\n");
> +
> +    while (true) {
> +        for (a = b; a < bitmap->size && !hbitmap_get(bitmap->bitmap, a); ++a) {
> +            ;
> +        }
> +        if (a >= bitmap->size) {
> +            break;
> +        }
> +
> +        for (b = a + 1;
> +             b < bitmap->size && hbitmap_get(bitmap->bitmap, b);
> +             ++b) {
> +            ;
> +        }
> +
> +        printf("%ld -> %ld\n", a, b - 1);
> +        if (b >= bitmap->size) {
> +            break;
> +        }
> +    }
> +
> +    printf("dirty regions end\n");
> +}
> +
>   BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,
>                                             int granularity,
>                                             const char *name,
> diff --git a/blockdev.c b/blockdev.c
> index 209fedd..66f0437 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -2074,6 +2074,19 @@ void qmp_block_dirty_bitmap_add(const char *node_ref, const char *name,
>       aio_context_release(aio_context);
>   }
>
> +void qmp_block_dirty_bitmap_print(const char *node_ref, const char *name,
> +                                  Error **errp)
> +{
> +    BdrvDirtyBitmap *bitmap;
> +
> +    bitmap = block_dirty_bitmap_lookup(node_ref, name, NULL, errp);
> +    if (!bitmap) {
> +        return;
> +    }
> +
> +    bdrv_print_dirty_bitmap(bitmap);
> +}
> +
>   void qmp_block_dirty_bitmap_remove(const char *node_ref, const char *name,
>                                      Error **errp)
>   {
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index e37bc8b..a9be506 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -58,6 +58,21 @@ Quit the emulator.
>   ETEXI
>
>       {
> +        .name       = "print_dirty_bitmap",
> +        .args_type  = "device:B,bitmap:s",
> +        .params     = "device bitmap",
> +        .help       = "print dirty bitmap",
> +        .user_print = monitor_user_noop,
> +        .mhandler.cmd = hmp_print_dirty_bitmap,
> +    },
> +
> +STEXI
> +@item print_dirty_bitmap device_id bitmap_name
> +@findex print_dirty_bitmap
> +Print dirty bitmap meta information and dirty regions.
> +ETEXI
> +
> +    {
>           .name       = "block_resize",
>           .args_type  = "device:B,size:o",
>           .params     = "device size",
> diff --git a/hmp.c b/hmp.c
> index 63b19c7..a269145 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -782,6 +782,14 @@ void hmp_info_tpm(Monitor *mon, const QDict *qdict)
>       qapi_free_TPMInfoList(info_list);
>   }
>
> +void hmp_print_dirty_bitmap(Monitor *mon, const QDict *qdict)
> +{
> +    const char *device = qdict_get_str(qdict, "device");
> +    const char *name = qdict_get_str(qdict, "bitmap");
> +
> +    qmp_block_dirty_bitmap_print(device, name, NULL);
> +}
> +
>   void hmp_quit(Monitor *mon, const QDict *qdict)
>   {
>       monitor_suspend(mon);
> diff --git a/hmp.h b/hmp.h
> index 4bb5dca..6bbbc33 100644
> --- a/hmp.h
> +++ b/hmp.h
> @@ -19,6 +19,7 @@
>   #include "qapi-types.h"
>   #include "qapi/qmp/qdict.h"
>
> +void hmp_print_dirty_bitmap(Monitor *mon, const QDict *qdict);
>   void hmp_info_name(Monitor *mon, const QDict *qdict);
>   void hmp_info_version(Monitor *mon, const QDict *qdict);
>   void hmp_info_kvm(Monitor *mon, const QDict *qdict);
> diff --git a/include/block/block.h b/include/block/block.h
> index cb1f28d..6cf067f 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -459,6 +459,8 @@ void bdrv_dirty_iter_init(BlockDriverState *bs,
>   void bdrv_set_dirty_iter(struct HBitmapIter *hbi, int64_t offset);
>   int64_t bdrv_get_dirty_count(BlockDriverState *bs, BdrvDirtyBitmap *bitmap);
>
> +void bdrv_print_dirty_bitmap(BdrvDirtyBitmap *bitmap);
> +
>   void bdrv_enable_copy_on_read(BlockDriverState *bs);
>   void bdrv_disable_copy_on_read(BlockDriverState *bs);
>
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 85f55d9..1475f69 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -1263,7 +1263,8 @@
>          'blockdev-snapshot-internal-sync': 'BlockdevSnapshotInternal',
>          'block-dirty-bitmap-add': 'BlockDirtyBitmapAdd',
>          'block-dirty-bitmap-enable': 'BlockDirtyBitmap',
> -       'block-dirty-bitmap-disable': 'BlockDirtyBitmap'
> +       'block-dirty-bitmap-disable': 'BlockDirtyBitmap',
> +       'block-dirty-bitmap-print': 'BlockDirtyBitmap'
>      } }
>
>   ##
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 1892b50..3e1edb1 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -982,6 +982,9 @@
>   {'command': 'block-dirty-bitmap-disable',
>     'data': 'BlockDirtyBitmap' }
>
> +{'command': 'block-dirty-bitmap-print',
> +  'data': 'BlockDirtyBitmap' }
> +
>   ##
>   # @block_set_io_throttle:
>   #
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index 479d4f5..8065715 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -1222,6 +1222,11 @@ EQMP
>           .args_type  = "node-ref:B,name:s",
>           .mhandler.cmd_new = qmp_marshal_input_block_dirty_bitmap_disable,
>       },
> +    {
> +        .name       = "block-dirty-bitmap-print",
> +        .args_type  = "node-ref:B,name:s",
> +        .mhandler.cmd_new = qmp_marshal_input_block_dirty_bitmap_print,
> +    },
>
>   SQMP
>
>

If we need it only for testing, perhaps just returning JSON data like 
checksums or similar enough to verify the migration went well should be 
sufficient.

Or maybe if you really want to see every bit, just return the bitmap as 
an array of hexstrings or something reasonably compact.

In terms of iotests, I'd be a fan of just a checksum of the lower layer 
to verify accurate transmission.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 2/8] hbitmap: serialization
  2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 2/8] hbitmap: serialization Vladimir Sementsov-Ogievskiy
@ 2015-02-10 21:29   ` John Snow
  0 siblings, 0 replies; 35+ messages in thread
From: John Snow @ 2015-02-10 21:29 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel; +Cc: kwolf, pbonzini, stefanha, den



On 01/27/2015 05:56 AM, Vladimir Sementsov-Ogievskiy wrote:
> Functions to serialize / deserialize(restore) HBitmap. HBitmap should be
> saved to linear sequence of bits independently of endianness and bitmap
> array element (unsigned long) size. Therefore Little Endian is chosen.
>
> These functions are appropriate for dirty bitmap migration, restoring
> the bitmap in several steps is available. To save performance, every
> step writes only the last level of the bitmap. All other levels are
> restored by hbitmap_deserialize_finish() as a last step of restoring.
> So, HBitmap is inconsistent while restoring.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>   include/qemu/hbitmap.h | 59 +++++++++++++++++++++++++++++++
>   util/hbitmap.c         | 96 ++++++++++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 155 insertions(+)
>
> diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
> index c48c50a..1d37582 100644
> --- a/include/qemu/hbitmap.h
> +++ b/include/qemu/hbitmap.h
> @@ -137,6 +137,65 @@ void hbitmap_reset(HBitmap *hb, uint64_t start, uint64_t count);
>   bool hbitmap_get(const HBitmap *hb, uint64_t item);
>
>   /**
> + * hbitmap_data_size:
> + * @hb: HBitmap to operate on.
> + * @count: Number of bits
> + *
> + * Return amount of bytes hbitmap_serialize_part needs
> + */
> +uint64_t hbitmap_data_size(const HBitmap *hb, uint64_t count);
> +
> +/**
> + * hbitmap_serialize_part
> + * @hb: HBitmap to oprate on.
> + * @buf: Buffer to store serialized bitmap.
> + * @start: First bit to store.
> + * @count: Number of bits to store.
> + *
> + * Stores HBitmap data corresponding to given region. The format of saved data
> + * is linear sequence of bits, so it can be used by hbitmap_deserialize_part
> + * independently of endianess and size of HBitmap level array elements
> + */
> +void hbitmap_serialize_part(const HBitmap *hb, uint8_t *buf,
> +                            uint64_t start, uint64_t count);
> +
> +/**
> + * hbitmap_deserialize_part
> + * @hb: HBitmap to operate on.
> + * @buf: Buffer to restore bitmap data from.
> + * @start: First bit to restore.
> + * @count: Number of bits to restore.
> + *
> + * Retores HBitmap data corresponding to given region. The format is the same
> + * as for hbitmap_serialize_part.
> + *
> + * ! The bitmap becomes inconsistent after this operation.
> + * hbitmap_serialize_finish should be called before using the bitmap after
> + * data restoring.
> + */
> +void hbitmap_deserialize_part(HBitmap *hb, uint8_t *buf,
> +                              uint64_t start, uint64_t count);
> +
> +/**
> + * hbitmap_deserialize_part0
> + * @hb: HBitmap to operate on.
> + * @start: First bit to restore.
> + * @count: Number of bits to restore.
> + *
> + * Same as hbitmap_serialize_part, but fills the bitmap with zeroes.
> + */
> +void hbitmap_deserialize_part0(HBitmap *hb, uint64_t start, uint64_t count);
> +

Not important enough to warrant a respin on its own:
maybe "hbitmap_deserialize_zeroes"? "part0" is a little odd.

> +/**
> + * hbitmap_deserialize_finish
> + * @hb: HBitmap to operate on.
> + *
> + * Repair HBitmap after calling hbitmap_deserialize_data. Actually, all HBitmap
> + * layers are restored here.
> + */
> +void hbitmap_deserialize_finish(HBitmap *hb);
> +
> +/**
>    * hbitmap_free:
>    * @hb: HBitmap to operate on.
>    *
> diff --git a/util/hbitmap.c b/util/hbitmap.c
> index f400dcb..55226a0 100644
> --- a/util/hbitmap.c
> +++ b/util/hbitmap.c
> @@ -366,6 +366,102 @@ bool hbitmap_get(const HBitmap *hb, uint64_t item)
>       return (hb->levels[HBITMAP_LEVELS - 1][pos >> BITS_PER_LEVEL] & bit) != 0;
>   }
>
> +uint64_t hbitmap_data_size(const HBitmap *hb, uint64_t count)
> +{
> +    uint64_t size, gran;
> +
> +    if (count == 0) {
> +        return 0;
> +    }
> +
> +    gran = 1ll << hb->granularity;
> +    size = (((gran + count - 2) >> hb->granularity) >> BITS_PER_LEVEL) + 1;
> +
> +    return size * sizeof(unsigned long);
> +}
> +
> +void hbitmap_serialize_part(const HBitmap *hb, uint8_t *buf,
> +                            uint64_t start, uint64_t count)
> +{
> +    uint64_t i;
> +    uint64_t last = start + count - 1;
> +    unsigned long *out = (unsigned long *)buf;
> +
> +    if (count == 0) {
> +        return;
> +    }
> +
> +    start = (start >> hb->granularity) >> BITS_PER_LEVEL;
> +    last = (last >> hb->granularity) >> BITS_PER_LEVEL;
> +    count = last - start + 1;
> +
> +    for (i = start; i <= last; ++i) {
> +        unsigned long el = hb->levels[HBITMAP_LEVELS - 1][i];
> +        out[i] = (BITS_PER_LONG == 32 ? cpu_to_le32(el) : cpu_to_le64(el));
> +    }
> +}
> +
> +void hbitmap_deserialize_part(HBitmap *hb, uint8_t *buf,
> +                              uint64_t start, uint64_t count)
> +{
> +    uint64_t i;
> +    uint64_t last = start + count - 1;
> +    unsigned long *in = (unsigned long *)buf;
> +
> +    if (count == 0) {
> +        return;
> +    }
> +
> +    start = (start >> hb->granularity) >> BITS_PER_LEVEL;
> +    last = (last >> hb->granularity) >> BITS_PER_LEVEL;
> +    count = last - start + 1;
> +
> +    for (i = start; i <= last; ++i) {
> +        hb->levels[HBITMAP_LEVELS - 1][i] =
> +            (BITS_PER_LONG == 32 ? le32_to_cpu(in[i]) : le64_to_cpu(in[i]));
> +    }
> +}
> +
> +void hbitmap_deserialize_part0(HBitmap *hb, uint64_t start, uint64_t count)
> +{
> +    uint64_t last = start + count - 1;
> +
> +    if (count == 0) {
> +        return;
> +    }
> +
> +    start = (start >> hb->granularity) >> BITS_PER_LEVEL;
> +    last = (last >> hb->granularity) >> BITS_PER_LEVEL;
> +    count = last - start + 1;
> +
> +    memset(hb->levels[HBITMAP_LEVELS - 1] + start, 0,
> +           count * sizeof(unsigned long));
> +}
> +
> +void hbitmap_deserialize_finish(HBitmap *bitmap)
> +{
> +    int64_t i, size, prev_size;
> +    int lev;
> +
> +    /* restore levels starting from penultimate to zero level, assuming
> +     * that the last level is ok */
> +    size = MAX((bitmap->size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 1);
> +    for (lev = HBITMAP_LEVELS - 1; lev-- > 0; ) {
> +        prev_size = size;
> +        size = MAX((size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 1);
> +        memset(bitmap->levels[lev], 0, size * sizeof(unsigned long));
> +
> +        for (i = 0; i < prev_size; ++i) {
> +            if (bitmap->levels[lev + 1][i]) {
> +                bitmap->levels[lev][i >> BITS_PER_LEVEL] |=
> +                    1 << (i & (BITS_PER_LONG - 1));
> +            }
> +        }
> +    }
> +
> +    bitmap->levels[0][0] |= 1UL << (BITS_PER_LONG - 1);
> +}
> +
>   void hbitmap_free(HBitmap *hb)
>   {
>       unsigned i;
>

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 3/8] block: BdrvDirtyBitmap serialization interface
  2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 3/8] block: BdrvDirtyBitmap serialization interface Vladimir Sementsov-Ogievskiy
@ 2015-02-10 21:29   ` John Snow
  0 siblings, 0 replies; 35+ messages in thread
From: John Snow @ 2015-02-10 21:29 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel; +Cc: kwolf, pbonzini, stefanha, den



On 01/27/2015 05:56 AM, Vladimir Sementsov-Ogievskiy wrote:
> Several functions to provide necessary access to BdrvDirtyBitmap for
> block-migration.c
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>   block.c               | 36 ++++++++++++++++++++++++++++++++++++
>   include/block/block.h | 12 ++++++++++++
>   2 files changed, 48 insertions(+)
>
> diff --git a/block.c b/block.c
> index 6d3f0b2..9860fc1 100644
> --- a/block.c
> +++ b/block.c
> @@ -5552,6 +5552,42 @@ void bdrv_clear_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap)
>       hbitmap_reset(bitmap->bitmap, 0, bitmap->size);
>   }
>
> +const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap)
> +{
> +    return bitmap->name;
> +}
> +
> +uint64_t bdrv_dirty_bitmap_data_size(const BdrvDirtyBitmap *bitmap,
> +                                     uint64_t count)
> +{
> +    return hbitmap_data_size(bitmap->bitmap, count);
> +}
> +
> +void bdrv_dirty_bitmap_serialize_part(const BdrvDirtyBitmap *bitmap,
> +                                      uint8_t *buf, uint64_t start,
> +                                      uint64_t count)
> +{
> +    hbitmap_serialize_part(bitmap->bitmap, buf, start, count);
> +}
> +
> +void bdrv_dirty_bitmap_deserialize_part(BdrvDirtyBitmap *bitmap,
> +                                        uint8_t *buf, uint64_t start,
> +                                        uint64_t count)
> +{
> +    hbitmap_deserialize_part(bitmap->bitmap, buf, start, count);
> +}
> +
> +void bdrv_dirty_bitmap_deserialize_part0(BdrvDirtyBitmap *bitmap,
> +                                         uint64_t start, uint64_t count)
> +{
> +    hbitmap_deserialize_part0(bitmap->bitmap, start, count);
> +}
> +
> +void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap)
> +{
> +    hbitmap_deserialize_finish(bitmap->bitmap);
> +}
> +
>   static void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
>                              int nr_sectors)
>   {
> diff --git a/include/block/block.h b/include/block/block.h
> index 6cf067f..0890cd2 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -460,6 +460,18 @@ void bdrv_set_dirty_iter(struct HBitmapIter *hbi, int64_t offset);
>   int64_t bdrv_get_dirty_count(BlockDriverState *bs, BdrvDirtyBitmap *bitmap);
>
>   void bdrv_print_dirty_bitmap(BdrvDirtyBitmap *bitmap);
> +const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap);
> +uint64_t bdrv_dirty_bitmap_data_size(const BdrvDirtyBitmap *bitmap,
> +                                     uint64_t count);
> +void bdrv_dirty_bitmap_serialize_part(const BdrvDirtyBitmap *bitmap,
> +                                      uint8_t *buf, uint64_t start,
> +                                      uint64_t count);
> +void bdrv_dirty_bitmap_deserialize_part(BdrvDirtyBitmap *bitmap,
> +                                        uint8_t *buf, uint64_t start,
> +                                        uint64_t count);
> +void bdrv_dirty_bitmap_deserialize_part0(BdrvDirtyBitmap *bitmap,
> +                                         uint64_t start, uint64_t count);
> +void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap);
>
>   void bdrv_enable_copy_on_read(BlockDriverState *bs);
>   void bdrv_disable_copy_on_read(BlockDriverState *bs);
>

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 4/8] block: add dirty-dirty bitmaps
  2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 4/8] block: add dirty-dirty bitmaps Vladimir Sementsov-Ogievskiy
@ 2015-02-10 21:30   ` John Snow
  2015-02-12 10:51     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 35+ messages in thread
From: John Snow @ 2015-02-10 21:30 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel; +Cc: kwolf, pbonzini, stefanha, den

I had hoped it wouldn't come to this :)

On 01/27/2015 05:56 AM, Vladimir Sementsov-Ogievskiy wrote:
> A dirty-dirty bitmap is a dirty bitmap for BdrvDirtyBitmap. It tracks
> set/unset changes of BdrvDirtyBitmap.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>   block.c               | 44 ++++++++++++++++++++++++++++++++++++++++++++
>   include/block/block.h |  5 +++++
>   2 files changed, 49 insertions(+)
>
> diff --git a/block.c b/block.c
> index 9860fc1..8ab724b 100644
> --- a/block.c
> +++ b/block.c
> @@ -53,6 +53,7 @@
>
>   struct BdrvDirtyBitmap {
>       HBitmap *bitmap;
> +    HBitmap *dirty_dirty_bitmap;

Just opinions: Maybe we can call it the "meta_bitmap" to help keep the 
name less confusing, and accompany it with a good structure comment here 
to explain what the heck it's for.

If you feel that's a good idea; s/dirty_dirty_/meta_/g below.

Regardless, let's make sure this patch adds documentation for it.

>       BdrvDirtyBitmap *originator;
>       int64_t size;
>       int64_t granularity;
> @@ -5373,6 +5374,30 @@ BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap(BlockDriverState *bs,
>       return originator;
>   }
>
> +HBitmap *bdrv_create_dirty_dirty_bitmap(BdrvDirtyBitmap *bitmap,
> +                                        uint64_t granularity)
> +{
> +    uint64_t sector_granularity;
> +
> +    assert((granularity & (granularity - 1)) == 0);
> +
> +    granularity *= 8 * bitmap->granularity;
> +    sector_granularity = granularity >> BDRV_SECTOR_BITS;
> +    assert(sector_granularity);
> +
> +    bitmap->dirty_dirty_bitmap =
> +        hbitmap_alloc(bitmap->size, ffsll(sector_granularity) - 1);
> +
> +    return bitmap->dirty_dirty_bitmap;
> +}
> +
> +void bdrv_release_dirty_dirty_bitmap(BdrvDirtyBitmap *bitmap)
> +{
> +    if (bitmap->dirty_dirty_bitmap) {
> +        hbitmap_free(bitmap->dirty_dirty_bitmap);
> +        bitmap->dirty_dirty_bitmap = NULL;
> +    }
> +}
>
>   void bdrv_print_dirty_bitmap(BdrvDirtyBitmap *bitmap)
>   {
> @@ -5447,6 +5472,9 @@ void bdrv_release_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap)
>           if (bm == bitmap) {
>               QLIST_REMOVE(bitmap, list);
>               hbitmap_free(bitmap->bitmap);
> +            if (bitmap->dirty_dirty_bitmap) {
> +                hbitmap_free(bitmap->dirty_dirty_bitmap);
> +            }
>               g_free(bitmap->name);
>               g_free(bitmap);
>               return;
> @@ -5534,6 +5562,10 @@ void bdrv_set_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
>   {
>       if (bitmap->enabled) {
>           hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
> +
> +        if (bitmap->dirty_dirty_bitmap) {
> +            hbitmap_set(bitmap->dirty_dirty_bitmap, cur_sector, nr_sectors);
> +        }
>       }
>   }
>
> @@ -5541,6 +5573,9 @@ void bdrv_reset_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
>                                int64_t cur_sector, int nr_sectors)
>   {
>       hbitmap_reset(bitmap->bitmap, cur_sector, nr_sectors);
> +    if (bitmap->dirty_dirty_bitmap) {
> +        hbitmap_set(bitmap->dirty_dirty_bitmap, cur_sector, nr_sectors);
> +    }
>   }
>
>   /**
> @@ -5550,6 +5585,9 @@ void bdrv_reset_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
>   void bdrv_clear_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap)
>   {
>       hbitmap_reset(bitmap->bitmap, 0, bitmap->size);
> +    if (bitmap->dirty_dirty_bitmap) {
> +        hbitmap_set(bitmap->dirty_dirty_bitmap, 0, bitmap->size);
> +    }
>   }
>
>   const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap)
> @@ -5597,6 +5635,9 @@ static void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
>               continue;
>           }
>           hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
> +        if (bitmap->dirty_dirty_bitmap) {
> +            hbitmap_set(bitmap->dirty_dirty_bitmap, cur_sector, nr_sectors);
> +        }
>       }
>   }
>
> @@ -5606,6 +5647,9 @@ static void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector,
>       BdrvDirtyBitmap *bitmap;
>       QLIST_FOREACH(bitmap, &bs->dirty_bitmaps, list) {
>           hbitmap_reset(bitmap->bitmap, cur_sector, nr_sectors);
> +        if (bitmap->dirty_dirty_bitmap) {
> +            hbitmap_set(bitmap->dirty_dirty_bitmap, cur_sector, nr_sectors);
> +        }
>       }
>   }
>
> diff --git a/include/block/block.h b/include/block/block.h
> index 0890cd2..648b0a9 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -4,6 +4,7 @@
>   #include "block/aio.h"
>   #include "qemu-common.h"
>   #include "qemu/option.h"
> +#include "qemu/hbitmap.h"
>   #include "block/coroutine.h"
>   #include "block/accounting.h"
>   #include "qapi/qmp/qobject.h"
> @@ -473,6 +474,10 @@ void bdrv_dirty_bitmap_deserialize_part0(BdrvDirtyBitmap *bitmap,
>                                            uint64_t start, uint64_t count);
>   void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap);
>
> +HBitmap *bdrv_create_dirty_dirty_bitmap(BdrvDirtyBitmap *bitmap,
> +                                        uint64_t granularity);
> +void bdrv_release_dirty_dirty_bitmap(BdrvDirtyBitmap *bitmap);
> +
>   void bdrv_enable_copy_on_read(BlockDriverState *bs);
>   void bdrv_disable_copy_on_read(BlockDriverState *bs);
>
>

Looks correct, it just needs documentation in my opinion to help explain 
what this extra bitmap is for.

Musings / opinions:

I also think that at this point, we may want to make bdrv_reset_dirty 
and bdrv_set_dirty call the bdrv_reset_dirty_bitmap and 
bdrv_set_dirty_bitmap functions instead of re-implementing the behavior.

I used to think it was fine as-is, but the more conditions we add to how 
these bitmaps should be accessed, the more I think limiting the 
interface to as few functions as possible is a good idea.

Maybe I'll even do that myself. It might even be nice to split off all 
the bitmap functions off into something like block/dirty_bitmaps.c as 
the complexity creeps up and we're trying to improve the maintainability 
of block.c.

Anyway, we can worry about that in a later series.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 5/8] block: add bdrv_dirty_bitmap_enabled()
  2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 5/8] block: add bdrv_dirty_bitmap_enabled() Vladimir Sementsov-Ogievskiy
@ 2015-02-10 21:30   ` John Snow
  2015-02-12 10:54     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 35+ messages in thread
From: John Snow @ 2015-02-10 21:30 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel; +Cc: kwolf, pbonzini, stefanha, den



On 01/27/2015 05:56 AM, Vladimir Sementsov-Ogievskiy wrote:
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>   block.c               | 5 +++++
>   include/block/block.h | 1 +
>   2 files changed, 6 insertions(+)
>
> diff --git a/block.c b/block.c
> index 8ab724b..15fc621 100644
> --- a/block.c
> +++ b/block.c
> @@ -5551,6 +5551,11 @@ uint64_t bdrv_dirty_bitmap_granularity(BlockDriverState *bs,
>       return bitmap->granularity;
>   }
>
> +bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap)
> +{
> +    return bitmap->enabled;
> +}
> +
>   void bdrv_dirty_iter_init(BlockDriverState *bs,
>                             BdrvDirtyBitmap *bitmap, HBitmapIter *hbi)
>   {
> diff --git a/include/block/block.h b/include/block/block.h
> index 648b0a9..7b49d98 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -449,6 +449,7 @@ BlockDirtyInfoList *bdrv_query_dirty_bitmaps(BlockDriverState *bs);
>   uint64_t bdrv_get_default_bitmap_granularity(BlockDriverState *bs);
>   uint64_t bdrv_dirty_bitmap_granularity(BlockDriverState *bs,
>                                         BdrvDirtyBitmap *bitmap);
> +bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap);
>   int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap, int64_t sector);
>   void bdrv_set_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
>                              int64_t cur_sector, int nr_sectors);
>

I wrote something similar in my incremental backup series. I'm 
submitting a new incremental backup series (v12!) soon which might have 
/slightly/ different semantics for enabled/disabled bitmaps.

I don't think you'll need this patch, but in case you do need it:

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 6/8] block: add bdrv_next_dirty_bitmap()
  2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 6/8] block: add bdrv_next_dirty_bitmap() Vladimir Sementsov-Ogievskiy
@ 2015-02-10 21:31   ` John Snow
  0 siblings, 0 replies; 35+ messages in thread
From: John Snow @ 2015-02-10 21:31 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel; +Cc: kwolf, pbonzini, stefanha, den



On 01/27/2015 05:56 AM, Vladimir Sementsov-Ogievskiy wrote:
> Like bdrv_next()  - bdrv_next_dirty_bitmap() is a function to provide
> access to private dirty bitmaps list.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>   block.c               | 10 ++++++++++
>   include/block/block.h |  2 ++
>   2 files changed, 12 insertions(+)
>
> diff --git a/block.c b/block.c
> index 15fc621..9e59c2e 100644
> --- a/block.c
> +++ b/block.c
> @@ -5514,6 +5514,16 @@ BlockDirtyInfoList *bdrv_query_dirty_bitmaps(BlockDriverState *bs)
>       return list;
>   }
>
> +BdrvDirtyBitmap *bdrv_next_dirty_bitmap(BlockDriverState *bs,
> +                                        BdrvDirtyBitmap *bitmap)
> +{
> +    if (bitmap == NULL) {
> +        return QLIST_FIRST(&bs->dirty_bitmaps);
> +    }
> +
> +    return QLIST_NEXT(bitmap, list);
> +}
> +
>   int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap, int64_t sector)
>   {
>       if (bitmap) {
> diff --git a/include/block/block.h b/include/block/block.h
> index 7b49d98..34d0259 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -474,6 +474,8 @@ void bdrv_dirty_bitmap_deserialize_part(BdrvDirtyBitmap *bitmap,
>   void bdrv_dirty_bitmap_deserialize_part0(BdrvDirtyBitmap *bitmap,
>                                            uint64_t start, uint64_t count);
>   void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap);
> +BdrvDirtyBitmap *bdrv_next_dirty_bitmap(BlockDriverState *bs,
> +                                        BdrvDirtyBitmap *bitmap);
>
>   HBitmap *bdrv_create_dirty_dirty_bitmap(BdrvDirtyBitmap *bitmap,
>                                           uint64_t granularity);
>

Makes sense to me.

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 7/8] migration: add dirty parameter
  2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 7/8] migration: add dirty parameter Vladimir Sementsov-Ogievskiy
  2015-01-27 16:20   ` Eric Blake
@ 2015-02-10 21:32   ` John Snow
  1 sibling, 0 replies; 35+ messages in thread
From: John Snow @ 2015-02-10 21:32 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel; +Cc: kwolf, pbonzini, stefanha, den



On 01/27/2015 05:56 AM, Vladimir Sementsov-Ogievskiy wrote:
> Add dirty parameter to qmp-migrate command. If this parameter is true,
> migration/block.c will migrate dirty bitmaps. This parameter can be used
> without "blk" parameter to migrate only dirty bitmaps, skipping block
> migration.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>   hmp-commands.hx               | 12 +++++++-----
>   hmp.c                         |  4 +++-
>   include/migration/migration.h |  1 +
>   migration/migration.c         |  4 +++-
>   qapi-schema.json              |  7 ++++++-
>   qmp-commands.hx               |  5 ++++-
>   savevm.c                      |  3 ++-
>   7 files changed, 26 insertions(+), 10 deletions(-)
>
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index a9be506..c16f93c 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -902,23 +902,25 @@ ETEXI
>
>       {
>           .name       = "migrate",
> -        .args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
> -        .params     = "[-d] [-b] [-i] uri",
> +        .args_type  = "detach:-d,blk:-b,inc:-i,dirty:-D,uri:s",
> +        .params     = "[-d] [-b] [-i] [-D] uri",
>           .help       = "migrate to URI (using -d to not wait for completion)"
>   		      "\n\t\t\t -b for migration without shared storage with"
>   		      " full copy of disk\n\t\t\t -i for migration without "
> -		      "shared storage with incremental copy of disk "
> -		      "(base image shared between src and destination)",
> +		      "shared storage with incremental copy of disk\n\t\t\t"
> +		      " -D for migration of named dirty bitmaps as well\n\t\t\t"
> +		      " (base image shared between src and destination)",
>           .mhandler.cmd = hmp_migrate,
>       },
>
>
>   STEXI
> -@item migrate [-d] [-b] [-i] @var{uri}
> +@item migrate [-d] [-b] [-i] [-D] @var{uri}
>   @findex migrate
>   Migrate to @var{uri} (using -d to not wait for completion).
>   	-b for migration with full copy of disk
>   	-i for migration with incremental copy of disk (base image is shared)
> +	-D for migration of named dirty bitmaps
>   ETEXI
>
>       {
> diff --git a/hmp.c b/hmp.c
> index a269145..0b89ee8 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -1347,10 +1347,12 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
>       int detach = qdict_get_try_bool(qdict, "detach", 0);
>       int blk = qdict_get_try_bool(qdict, "blk", 0);
>       int inc = qdict_get_try_bool(qdict, "inc", 0);
> +    int dirty = qdict_get_try_bool(qdict, "dirty", 0);
>       const char *uri = qdict_get_str(qdict, "uri");
>       Error *err = NULL;
>
> -    qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, &err);
> +    qmp_migrate(uri, !!blk, blk, !!inc, inc, !!dirty, dirty,
> +                false, false, &err);
>       if (err) {
>           monitor_printf(mon, "migrate: %s\n", error_get_pretty(err));
>           error_free(err);
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 3cb5ba8..48d71d3 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -37,6 +37,7 @@
>   struct MigrationParams {
>       bool blk;
>       bool shared;
> +    bool dirty;
>   };
>
>   typedef struct MigrationState MigrationState;
> diff --git a/migration/migration.c b/migration/migration.c
> index c49a05a..e7bb7f3 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -404,7 +404,8 @@ void migrate_del_blocker(Error *reason)
>   }
>
>   void qmp_migrate(const char *uri, bool has_blk, bool blk,
> -                 bool has_inc, bool inc, bool has_detach, bool detach,
> +                 bool has_inc, bool inc, bool has_dirty, bool dirty,
> +                 bool has_detach, bool detach,
>                    Error **errp)
>   {
>       Error *local_err = NULL;
> @@ -414,6 +415,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
>
>       params.blk = has_blk && blk;
>       params.shared = has_inc && inc;
> +    params.dirty = has_dirty && dirty;
>
>       if (s->state == MIG_STATE_ACTIVE || s->state == MIG_STATE_SETUP ||
>           s->state == MIG_STATE_CANCELLING) {
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 1475f69..1d10d6b 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -1656,12 +1656,17 @@
>   # @detach: this argument exists only for compatibility reasons and
>   #          is ignored by QEMU
>   #
> +# @dirty: #optional do dirty-bitmaps migration (can be used with or without
> +#         @blk parameter)
> +#         (since 2.3)
> +#
>   # Returns: nothing on success
>   #
>   # Since: 0.14.0
>   ##
>   { 'command': 'migrate',
> -  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' } }
> +  'data': { 'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*dirty': 'bool',
> +            '*detach': 'bool' } }
>
>   # @xen-save-devices-state:
>   #
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index 8065715..f8e50ac 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -610,7 +610,7 @@ EQMP
>
>       {
>           .name       = "migrate",
> -        .args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
> +        .args_type  = "detach:-d,blk:-b,inc:-i,dirty:-D,uri:s",
>           .mhandler.cmd_new = qmp_marshal_input_migrate,
>       },
>
> @@ -624,6 +624,7 @@ Arguments:
>
>   - "blk": block migration, full disk copy (json-bool, optional)
>   - "inc": incremental disk copy (json-bool, optional)
> +- "dirty": migrate named dirty bitmaps (json-bool, optional)
>   - "uri": Destination URI (json-string)
>
>   Example:
> @@ -638,6 +639,8 @@ Notes:
>   (2) All boolean arguments default to false
>   (3) The user Monitor's "detach" argument is invalid in QMP and should not
>       be used
> +(4) The "dirty" argument may be used without "blk", to migrate only dirty
> +    bitmaps
>
>   EQMP
>
> diff --git a/savevm.c b/savevm.c
> index 08ec678..a598d1d 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -784,7 +784,8 @@ static int qemu_savevm_state(QEMUFile *f)
>       int ret;
>       MigrationParams params = {
>           .blk = 0,
> -        .shared = 0
> +        .shared = 0,
> +        .dirty = 0
>       };
>
>       if (qemu_savevm_state_blocked(NULL)) {
>

I'll defer to Eric's judgment here; It looks like you have already 
re-spun but were waiting on my lethargy to send out to the ML again.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 8/8] migration: add migration/dirty-bitmap.c
  2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 8/8] migration: add migration/dirty-bitmap.c Vladimir Sementsov-Ogievskiy
@ 2015-02-10 21:33   ` John Snow
  2015-02-13  8:19     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 35+ messages in thread
From: John Snow @ 2015-02-10 21:33 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, Peter Maydell,
	Juan quin >> Juan Jose Quintela Carreira,
	Dr. David Alan Gilbert, stefanha, den, amit Shah, pbonzini

Peter Maydell: What's the right way to license a file as copied from a 
previous version? See below, please;

Max, Markus: ctrl+f "bdrv_get_device_name" and let me know what you 
think, if you would.

Juan, Amit, David: Copying migration maintainers.

On 01/27/2015 05:56 AM, Vladimir Sementsov-Ogievskiy wrote:
> Live migration of dirty bitmaps. Only named dirty bitmaps are migrated.
> If destination qemu is already containing a dirty bitmap with the same
> name as a migrated bitmap, then their granularities should be the same,
> otherwise the error will be generated. If destination qemu doesn't
> contain such bitmap it will be created.
>
> format:
>
> 1 byte: flags
>
> [ 1 byte: node name size ] \  flags & DEVICE_NAME
> [ n bytes: node name     ] /
>
> [ 1 byte: bitmap name size ]       \
> [ n bytes: bitmap name     ]       | flags & BITMAP_NAME
> [ [ be64: granularity    ] ]  flags & GRANULARITY
>
> [ 1 byte: bitmap enabled bit ] flags & ENABLED
>
> [ be64: start sector      ] \ flags & (NORMAL_CHUNK | ZERO_CHUNK)
> [ be32: number of sectors ] /
>
> [ be64: buffer size ] \ flags & NORMAL_CHUNK
> [ n bytes: buffer   ] /
>
> The last chunk should contain flags & EOS. The chunk may skip device
> and/or bitmap names, assuming them to be the same with the previous
> chunk. GRANULARITY is sent with the first chunk for the bitmap. ENABLED
> bit is sent in the end of "complete" stage of migration. So when
> destination gets ENABLED flag it should deserialize_finish the bitmap
> and set its enabled bit to corresponding value.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>   include/migration/block.h |   1 +
>   migration/Makefile.objs   |   2 +-
>   migration/dirty-bitmap.c  | 606 ++++++++++++++++++++++++++++++++++++++++++++++
>   vl.c                      |   1 +
>   4 files changed, 609 insertions(+), 1 deletion(-)
>   create mode 100644 migration/dirty-bitmap.c
>
> diff --git a/include/migration/block.h b/include/migration/block.h
> index ffa8ac0..566bb9f 100644
> --- a/include/migration/block.h
> +++ b/include/migration/block.h
> @@ -14,6 +14,7 @@
>   #ifndef BLOCK_MIGRATION_H
>   #define BLOCK_MIGRATION_H
>
> +void dirty_bitmap_mig_init(void);
>   void blk_mig_init(void);
>   int blk_mig_active(void);
>   uint64_t blk_mig_bytes_transferred(void);

OK.

> diff --git a/migration/Makefile.objs b/migration/Makefile.objs
> index d929e96..9adfda9 100644
> --- a/migration/Makefile.objs
> +++ b/migration/Makefile.objs
> @@ -6,5 +6,5 @@ common-obj-y += xbzrle.o
>   common-obj-$(CONFIG_RDMA) += rdma.o
>   common-obj-$(CONFIG_POSIX) += exec.o unix.o fd.o
>
> -common-obj-y += block.o
> +common-obj-y += block.o dirty-bitmap.o
>

OK.

> diff --git a/migration/dirty-bitmap.c b/migration/dirty-bitmap.c
> new file mode 100644
> index 0000000..8621218
> --- /dev/null
> +++ b/migration/dirty-bitmap.c
> @@ -0,0 +1,606 @@
> +/*
> + * QEMU dirty bitmap migration
> + *
> + * derived from migration/block.c
> + *
> + * Author:
> + * Sementsov-Ogievskiy Vladimir <vsementsov@parallels.com>
> + *
> + * original copyright message:
> + * =====================================================================
> + * Copyright IBM, Corp. 2009
> + *
> + * Authors:
> + *  Liran Schour   <lirans@il.ibm.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + * Contributions after 2012-01-13 are licensed under the terms of the
> + * GNU GPL, version 2 or (at your option) any later version.
> + * =====================================================================
> + */
> +

Not super familiar with the right way to do licensing here; it's 
possible you may not need to copy the original here, but I'm not sure. 
You will want to make it clear what license applies to /your/ work, I 
think. Maybe Peter Maydell can clue us in.

> +#include "block/block.h"
> +#include "qemu/main-loop.h"
> +#include "qemu/error-report.h"
> +#include "migration/block.h"
> +#include "migration/migration.h"
> +#include "qemu/hbitmap.h"
> +#include <assert.h>
> +
> +#define CHUNK_SIZE                       (1 << 20)
> +
> +#define DIRTY_BITMAP_MIG_FLAG_EOS           0x01
> +#define DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK  0x02
> +#define DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK    0x04
> +#define DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME   0x08
> +#define DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME   0x10
> +#define DIRTY_BITMAP_MIG_FLAG_GRANULARITY   0x20
> +#define DIRTY_BITMAP_MIG_FLAG_ENABLED       0x40
> +/* flags should be <= 0xff */
> +

We should give ourselves a little breathing room with the flags, since 
we've only got room for one more.

> +/* #define DEBUG_DIRTY_BITMAP_MIGRATION */
> +
> +#ifdef DEBUG_DIRTY_BITMAP_MIGRATION
> +#define DPRINTF(fmt, ...) \
> +    do { printf("dirty_migration: " fmt, ## __VA_ARGS__); } while (0)
> +#else
> +#define DPRINTF(fmt, ...) \
> +    do { } while (0)
> +#endif
> +
> +typedef struct DirtyBitmapMigBitmapState {
> +    /* Written during setup phase. */
> +    BlockDriverState *bs;
> +    BdrvDirtyBitmap *bitmap;
> +    HBitmap *dirty_bitmap;

For my own sanity, I'd really prefer "bitmap" and "meta_bitmap" here; 
"dirty_bitmap" is often used as a synonym (outside of this file) to 
refer to the BdrvDirtyBitmap in general, so it's usage here can be 
somewhat confusing.

I'd appreciate "dirty_dirty_bitmap" as in your previous patch for 
consistency, or "meta_bitmap" as I recommend.

> +    int64_t total_sectors;
> +    uint64_t sectors_per_chunk;
> +    QSIMPLEQ_ENTRY(DirtyBitmapMigBitmapState) entry;
> +
> +    /* For bulk phase. */
> +    bool bulk_completed;
> +    int64_t cur_sector;
> +    bool granularity_sent;
> +
> +    /* For dirty phase. */
> +    int64_t cur_dirty;
> +} DirtyBitmapMigBitmapState;
> +
> +typedef struct DirtyBitmapMigState {
> +    int migration_enable;
> +    QSIMPLEQ_HEAD(dbms_list, DirtyBitmapMigBitmapState) dbms_list;
> +
> +    bool bulk_completed;
> +
> +    /* for send_bitmap() */
> +    BlockDriverState *prev_bs;
> +    BdrvDirtyBitmap *prev_bitmap;
> +} DirtyBitmapMigState;
> +
> +static DirtyBitmapMigState dirty_bitmap_mig_state;
> +
> +/* read name from qemu file:
> + * format:
> + * 1 byte : len = name length (<256)
> + * len bytes : name without last zero byte
> + *
> + * name should point to the buffer >= 256 bytes length
> + */
> +static char *qemu_get_name(QEMUFile *f, char *name)
> +{
> +    int len = qemu_get_byte(f);
> +    qemu_get_buffer(f, (uint8_t *)name, len);
> +    name[len] = '\0';
> +
> +    DPRINTF("get name: %d %s\n", len, name);
> +
> +    return name;
> +}
> +

OK. Maybe these could be "qemu_put_string" or "qemu_get_string" and 
added to qemu-file.c so others can use them.

> +/* write name to qemu file:
> + * format:
> + * same as for qemu_get_name
> + *
> + * maximum name length is 255
> + */
> +static void qemu_put_name(QEMUFile *f, const char *name)
> +{
> +    int len = strlen(name);
> +
> +    DPRINTF("put name: %d %s\n", len, name);
> +
> +    assert(len < 256);
> +    qemu_put_byte(f, len);
> +    qemu_put_buffer(f, (const uint8_t *)name, len);
> +}
> +

OK.

> +static void send_bitmap(QEMUFile *f, DirtyBitmapMigBitmapState *dbms,
> +                        uint64_t start_sector, uint32_t nr_sectors)
> +{
> +    BlockDriverState *bs = dbms->bs;
> +    BdrvDirtyBitmap *bitmap = dbms->bitmap;
> +    uint8_t flags = 0;
> +    /* align for buffer_is_zero() */
> +    uint64_t align = 4 * sizeof(long);
> +    uint64_t buf_size =
> +        (bdrv_dirty_bitmap_data_size(bitmap, nr_sectors) + align - 1) &
> +        ~(align - 1);
> +    uint8_t *buf = g_malloc0(buf_size);
> +
> +    bdrv_dirty_bitmap_serialize_part(bitmap, buf, start_sector, nr_sectors);
> +
> +    if (buffer_is_zero(buf, buf_size)) {
> +        g_free(buf);
> +        buf = NULL;
> +        flags |= DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK;
> +    } else {
> +        flags |= DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK;
> +    }
> +
> +    if (bs != dirty_bitmap_mig_state.prev_bs) {
> +        dirty_bitmap_mig_state.prev_bs = bs;
> +        flags |= DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME;
> +    }
> +
> +    if (bitmap != dirty_bitmap_mig_state.prev_bitmap) {
> +        dirty_bitmap_mig_state.prev_bitmap = bitmap;
> +        flags |= DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME;
> +    }

OK, so we use the current bs/bitmap under consideration to detect if we 
have switched context, and put the names in the stream when it happens. OK.

> +
> +    if (dbms->granularity_sent == 0) {
> +        dbms->granularity_sent = 1;
> +        flags |= DIRTY_BITMAP_MIG_FLAG_GRANULARITY;
> +    }
> +
> +    DPRINTF("Enter send_bitmap"
> +            "\n   flags:        %x"
> +            "\n   start_sector: %" PRIu64
> +            "\n   nr_sectors:   %" PRIu32
> +            "\n   data_size:    %" PRIu64 "\n",
> +            flags, start_sector, nr_sectors, buf_size);
> +
> +    qemu_put_byte(f, flags);
> +
> +    if (flags & DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME) {
> +        qemu_put_name(f, bdrv_get_device_name(bs));
> +    }

I am still not fully clear on this myself, but I think we are phasing 
out bdrv_get_device_name. In the context of bitmaps, we are mostly 
likely using them one-per-tree, but they /can/ be attached one-per-node, 
so we shouldn't be trying to get the device name here, but rather, the 
node-name.

I /think/ we may want to be using bdrv_get_node_name, but I think it is 
not currently true that all nodes WILL be named ... I am CC'ing Markus 
Armbruster and Max Reitz for (A) A refresher course and (B) Opinions on 
what function call might make sense here, given that we wish to migrate 
bitmaps attached to arbitrary nodes.

> +
> +    if (flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
> +        qemu_put_name(f, bdrv_dirty_bitmap_name(bitmap));
> +
> +        if (flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY) {
> +            qemu_put_be64(f, bdrv_dirty_bitmap_granularity(bs, bitmap));
> +        }
> +    } else {
> +        assert(!(flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY));
> +    }
> +

I thought we were only migrating bitmaps with names?
I suppose the conditional can't hurt, but I am not clear on when we 
won't have a bitmap name here.

> +    qemu_put_be64(f, start_sector);
> +    qemu_put_be32(f, nr_sectors);
> +
> +    /* if a block is zero we need to flush here since the network
> +     * bandwidth is now a lot higher than the storage device bandwidth.
> +     * thus if we queue zero blocks we slow down the migration.
> +     * also, skip writing block when migrate only dirty bitmaps. */
> +    if (flags & DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK) {
> +        qemu_fflush(f);
> +        return;
> +    }
> +
> +    qemu_put_be64(f, buf_size);
> +    qemu_put_buffer(f, buf, buf_size);
> +    g_free(buf);
> +}
> +
> +
> +/* Called with iothread lock taken.  */
> +
> +static void set_dirty_tracking(void)
> +{
> +    DirtyBitmapMigBitmapState *dbms;
> +
> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
> +        dbms->dirty_bitmap =
> +            bdrv_create_dirty_dirty_bitmap(dbms->bitmap, CHUNK_SIZE);
> +    }
> +}
> +

OK: so we only have these dirty-dirty bitmaps when migration is 
starting, which makes sense.

> +static void unset_dirty_tracking(void)
> +{
> +    DirtyBitmapMigBitmapState *dbms;
> +
> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
> +        bdrv_release_dirty_dirty_bitmap(dbms->bitmap);
> +    }
> +}
> +

OK.

> +static void init_dirty_bitmap_migration(QEMUFile *f)
> +{
> +    BlockDriverState *bs;
> +    BdrvDirtyBitmap *bitmap;
> +    DirtyBitmapMigBitmapState *dbms;
> +
> +    dirty_bitmap_mig_state.bulk_completed = false;
> +    dirty_bitmap_mig_state.prev_bs = NULL;
> +    dirty_bitmap_mig_state.prev_bitmap = NULL;
> +
> +    for (bs = bdrv_next(NULL); bs; bs = bdrv_next(bs)) {
> +        for (bitmap = bdrv_next_dirty_bitmap(bs, NULL); bitmap;
> +             bitmap = bdrv_next_dirty_bitmap(bs, bitmap)) {
> +            if (!bdrv_dirty_bitmap_name(bitmap)) {
> +                continue;
> +            }
> +
> +            dbms = g_new0(DirtyBitmapMigBitmapState, 1);
> +            dbms->bs = bs;
> +            dbms->bitmap = bitmap;
> +            dbms->total_sectors = bdrv_nb_sectors(bs);
> +            dbms->sectors_per_chunk = CHUNK_SIZE * 8 *
> +                bdrv_dirty_bitmap_granularity(dbms->bs, dbms->bitmap)
> +                >> BDRV_SECTOR_BITS;
> +
> +            QSIMPLEQ_INSERT_TAIL(&dirty_bitmap_mig_state.dbms_list,
> +                                 dbms, entry);
> +        }
> +    }
> +}
> +

OK, but see the note below for dirty_bitmap_mig_init.

> +/* Called with no lock taken.  */
> +static void bulk_phase_send_chunk(QEMUFile *f, DirtyBitmapMigBitmapState *dbms)
> +{
> +    uint32_t nr_sectors = MIN(dbms->total_sectors - dbms->cur_sector,
> +                             dbms->sectors_per_chunk);

What about cases where nr_sectors will put us past the end of the 
bitmap? The bitmap serialization implementation might need a touchup 
with this in mind.

> +
> +    send_bitmap(f, dbms, dbms->cur_sector, nr_sectors);
> +
> +    dbms->cur_sector += nr_sectors;
> +    if (dbms->cur_sector >= dbms->total_sectors) {
> +        dbms->bulk_completed = true;
> +    }
> +}
> +
> +/* Called with no lock taken.  */
> +static void bulk_phase(QEMUFile *f, bool limit)
> +{
> +    DirtyBitmapMigBitmapState *dbms;
> +
> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
> +        while (!dbms->bulk_completed) {
> +            bulk_phase_send_chunk(f, dbms);
> +            if (limit && qemu_file_rate_limit(f)) {
> +                return;
> +            }
> +        }
> +    }
> +
> +    dirty_bitmap_mig_state.bulk_completed = true;
> +}

OK.

> +
> +static void blk_mig_reset_dirty_cursor(void)
> +{
> +    DirtyBitmapMigBitmapState *dbms;
> +
> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
> +        dbms->cur_dirty = 0;
> +    }
> +}
> +

OK.

> +/* Called with iothread lock taken.  */
> +static void dirty_phase_send_chunk(QEMUFile *f, DirtyBitmapMigBitmapState *dbms)
> +{
> +    uint32_t nr_sectors;
> +
> +    while (dbms->cur_dirty < dbms->total_sectors &&
> +           !hbitmap_get(dbms->dirty_bitmap, dbms->cur_dirty)) {
> +        dbms->cur_dirty += dbms->sectors_per_chunk;
> +    }

OK, so we fast forward the dirty cursor while the meta-bitmap is empty. 
Is it not worth using the HBitmapIterator here? You can reset them 
everywhere you reset the dirty cursor, and then just fast-seek to the 
first dirty sector.

> +
> +    if (dbms->cur_dirty >= dbms->total_sectors) {
> +        return;
> +    }
> +
> +    nr_sectors = MIN(dbms->total_sectors - dbms->cur_dirty,
> +                     dbms->sectors_per_chunk);

What happens if nr_sectors goes past the end?

> +    send_bitmap(f, dbms, dbms->cur_dirty, nr_sectors);
> +    hbitmap_reset(dbms->dirty_bitmap, dbms->cur_dirty, dbms->sectors_per_chunk);
> +    dbms->cur_dirty += nr_sectors;
> +}
> +
> +/* Called with iothread lock taken.
> + *
> + * return value:
> + * 0: too much data for max_downtime
> + * 1: few enough data for max_downtime
> +*/

dirty_phase below doesn't have a return value.

> +static void dirty_phase(QEMUFile *f, bool limit)
> +{
> +    DirtyBitmapMigBitmapState *dbms;
> +
> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
> +        while (dbms->cur_dirty < dbms->total_sectors) {
> +            dirty_phase_send_chunk(f, dbms);
> +            if (limit && qemu_file_rate_limit(f)) {
> +                return;
> +            }
> +        }
> +    }
> +}
> +

OK.

> +
> +/* Called with iothread lock taken.  */
> +static void dirty_bitmap_mig_cleanup(void)
> +{
> +    DirtyBitmapMigBitmapState *dbms;
> +
> +    unset_dirty_tracking();
> +
> +    while ((dbms = QSIMPLEQ_FIRST(&dirty_bitmap_mig_state.dbms_list)) != NULL) {
> +        QSIMPLEQ_REMOVE_HEAD(&dirty_bitmap_mig_state.dbms_list, entry);
> +        g_free(dbms);
> +    }
> +}
> +

OK.

> +static void dirty_bitmap_migration_cancel(void *opaque)
> +{
> +    dirty_bitmap_mig_cleanup();
> +}
> +

OK.

> +static int dirty_bitmap_save_iterate(QEMUFile *f, void *opaque)
> +{
> +    DPRINTF("Enter save live iterate\n");
> +
> +    blk_mig_reset_dirty_cursor();

I suppose this is because it's easier to check if we are finished by 
starting from sector 0 every time.

A harder, but faster method may be: Use HBitmapIterators, but don't 
reset them every iteration: just iterate until the end, and check that 
the bitmap is empty. If the meta bitmap is empty, the dirty phase is 
complete. If the meta bitmap is NOT empty, reset the HBI and continue 
allowing iterations over the dirty phase.

> +
> +    if (dirty_bitmap_mig_state.bulk_completed) {
> +        qemu_mutex_lock_iothread();
> +        dirty_phase(f, true);
> +        qemu_mutex_unlock_iothread();
> +    } else {
> +        bulk_phase(f, true);
> +    }
> +
> +    qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
> +
> +    return dirty_bitmap_mig_state.bulk_completed;
> +}
> +
> +/* Called with iothread lock taken.  */
> +
> +static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
> +{
> +    DirtyBitmapMigBitmapState *dbms;
> +    DPRINTF("Enter save live complete\n");
> +
> +    if (!dirty_bitmap_mig_state.bulk_completed) {
> +        bulk_phase(f, false);
> +    }

[Not expertly familiar with savevm:] Under what conditions can this happen?

> +
> +    blk_mig_reset_dirty_cursor();
> +    dirty_phase(f, false);
> +
> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
> +        uint8_t flags = DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME |
> +                        DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME |
> +                        DIRTY_BITMAP_MIG_FLAG_ENABLED;
> +
> +        qemu_put_byte(f, flags);
> +        qemu_put_name(f, bdrv_get_device_name(dbms->bs));
> +        qemu_put_name(f, bdrv_dirty_bitmap_name(dbms->bitmap));
> +        qemu_put_byte(f, bdrv_dirty_bitmap_enabled(dbms->bitmap));
> +    }
> +
> +    qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
> +
> +    DPRINTF("Dirty bitmaps migration completed\n");
> +
> +    dirty_bitmap_mig_cleanup();
> +    return 0;
> +}
> +

I suppose we don't need a flag that distinctly SAYS this is the end 
section, since we can tell by omission of 
DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK or ZERO_CHUNK.

> +static uint64_t dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
> +                                          uint64_t max_size)
> +{
> +    DirtyBitmapMigBitmapState *dbms;
> +    uint64_t pending = 0;
> +
> +    qemu_mutex_lock_iothread();
> +
> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
> +        uint64_t sectors = hbitmap_count(dbms->dirty_bitmap);
> +        if (!dbms->bulk_completed) {
> +            sectors += dbms->total_sectors - dbms->cur_sector;
> +        }
> +        pending += bdrv_dirty_bitmap_data_size(dbms->bitmap, sectors);
> +    }
> +
> +    qemu_mutex_unlock_iothread();
> +
> +    DPRINTF("Enter save live pending %" PRIu64 ", max: %" PRIu64 "\n",
> +            pending, max_size);
> +    return pending;
> +}
> +

OK.

> +static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
> +{
> +    int flags;
> +
> +    static char device_name[256], bitmap_name[256];
> +    static BlockDriverState *bs;
> +    static BdrvDirtyBitmap *bitmap;
> +
> +    uint8_t *buf;
> +    uint64_t first_sector;
> +    uint32_t  nr_sectors;
> +    int ret;
> +
> +    DPRINTF("load start\n");
> +
> +    do {
> +        flags = qemu_get_byte(f);
> +        DPRINTF("flags: %x\n", flags);
> +
> +        if (flags & DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME) {
> +            qemu_get_name(f, device_name);
> +            bs = bdrv_find(device_name);

Similar to the above confusion, you may want bdrv_lookup_bs or similar, 
since we're going to be looking for BDS nodes instead of "devices."

> +            if (!bs) {
> +                fprintf(stderr, "Error: unknown block device '%s'\n",
> +                        device_name);
> +                return -EINVAL;
> +            }
> +        }
> +
> +        if (flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
> +            if (!bs) {
> +                fprintf(stderr, "Error: block device name is not set\n");
> +                return -EINVAL;
> +            }
> +
> +            qemu_get_name(f, bitmap_name);
> +            bitmap = bdrv_find_dirty_bitmap(bs, bitmap_name);
> +            if (flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY) {
> +                /* First chunk from this bitmap */
> +                uint64_t granularity = qemu_get_be64(f);
> +                if (!bitmap) {
> +                    Error *local_err = NULL;
> +                    bitmap = bdrv_create_dirty_bitmap(bs, granularity,
> +                                                      bitmap_name,
> +                                                      &local_err);
> +                    if (!bitmap) {
> +                        error_report("%s", error_get_pretty(local_err));
> +                        error_free(local_err);
> +                        return -EINVAL;
> +                    }
> +                } else {
> +                    uint64_t dest_granularity =
> +                        bdrv_dirty_bitmap_granularity(bs, bitmap);
> +                    if (dest_granularity != granularity) {
> +                        fprintf(stderr,
> +                                "Error: "
> +                                "Migrated bitmap granularity (%" PRIu64 ") "
> +                                "is not match with destination bitmap '%s' "
> +                                "granularity (%" PRIu64 ")\n",
> +                                granularity,
> +                                bitmap_name,
> +                                dest_granularity);
> +                        return -EINVAL;
> +                    }
> +                }
> +                bdrv_disable_dirty_bitmap(bitmap);
> +            }
> +            if (!bitmap) {
> +                fprintf(stderr, "Error: unknown dirty bitmap "
> +                        "'%s' for block device '%s'\n",
> +                        bitmap_name, device_name);
> +                return -EINVAL;
> +            }
> +        }
> +
> +        if (flags & DIRTY_BITMAP_MIG_FLAG_ENABLED) {
> +            bool enabled;
> +            if (!bitmap) {
> +                fprintf(stderr, "Error: dirty bitmap name is not set\n");
> +                return -EINVAL;
> +            }
> +            bdrv_dirty_bitmap_deserialize_finish(bitmap);
> +            /* complete migration */
> +            enabled = qemu_get_byte(f);
> +            if (enabled) {
> +                bdrv_enable_dirty_bitmap(bitmap);
> +            }
> +        }

Oh, so you use the ENABLED flag to show that migration is over. If we 
are going to commit to a stream format for bitmaps, though, maybe it's 
best to actually create a "COMPLETION BLOCK" flag and then split this 
function into two pieces:

(1) The part that receives regular / zero blocks, and
(2) The part that receives completion data.

That way, if we change the properties that bitmaps have down the line, 
we aren't reliant on literally the "enabled" flag to decide what to do.

Also, it might help make this fairly long function a little smaller and 
more readable.

> +
> +        if (flags & (DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK |
> +                     DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK)) {
> +            if (!bs) {
> +                fprintf(stderr, "Error: block device name is not set\n");
> +                return -EINVAL;
> +            }
> +            if (!bitmap) {
> +                fprintf(stderr, "Error: dirty bitmap name is not set\n");
> +                return -EINVAL;
> +            }
> +
> +            first_sector = qemu_get_be64(f);
> +            nr_sectors = qemu_get_be32(f);
> +            DPRINTF("chunk: %lu %u\n", first_sector, nr_sectors);
> +
> +
> +            if (flags & DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK) {
> +                bdrv_dirty_bitmap_deserialize_part0(bitmap, first_sector,
> +                                                    nr_sectors);
> +            } else {
> +                uint64_t buf_size = qemu_get_be64(f);
> +                uint64_t needed_size =
> +                    bdrv_dirty_bitmap_data_size(bitmap, nr_sectors);
> +
> +                if (needed_size > buf_size) {
> +                    fprintf(stderr,
> +                            "Error: Migrated bitmap granularity is not "
> +                            "match with destination bitmap granularity\n");
> +                    return -EINVAL;
> +                }
> +

"Migrated bitmap granularity doesn't match the destination bitmap 
granularity" perhaps.

> +                buf = g_malloc(buf_size);
> +                qemu_get_buffer(f, buf, buf_size);
> +                bdrv_dirty_bitmap_deserialize_part(bitmap, buf,
> +                                                   first_sector,
> +                                                   nr_sectors);
> +                g_free(buf);
> +            }
> +        }
> +
> +        ret = qemu_file_get_error(f);
> +        if (ret != 0) {
> +            return ret;
> +        }
> +    } while (!(flags & DIRTY_BITMAP_MIG_FLAG_EOS));
> +
> +    DPRINTF("load finish\n");
> +    return 0;
> +}
> +
> +static void dirty_bitmap_set_params(const MigrationParams *params, void *opaque)
> +{
> +    dirty_bitmap_mig_state.migration_enable = params->dirty;
> +}
> +

OK; though I am not immediately aware of what changes need to happen to 
accommodate Eric's suggestions.

> +static bool dirty_bitmap_is_active(void *opaque)
> +{
> +    return dirty_bitmap_mig_state.migration_enable == 1;
> +}
> +

OK.

> +static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
> +{
> +    init_dirty_bitmap_migration(f);
> +
> +    qemu_mutex_lock_iothread();
> +    /* start track dirtyness of dirty bitmaps */
> +    set_dirty_tracking();
> +    qemu_mutex_unlock_iothread();
> +
> +    blk_mig_reset_dirty_cursor();
> +    qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
> +
> +    return 0;
> +}
> +

OK; see dirty_bitmap_mig_init below, though.

> +static SaveVMHandlers savevm_block_handlers = {
> +    .set_params = dirty_bitmap_set_params,
> +    .save_live_setup = dirty_bitmap_save_setup,
> +    .save_live_iterate = dirty_bitmap_save_iterate,
> +    .save_live_complete = dirty_bitmap_save_complete,
> +    .save_live_pending = dirty_bitmap_save_pending,
> +    .load_state = dirty_bitmap_load,
> +    .cancel = dirty_bitmap_migration_cancel,
> +    .is_active = dirty_bitmap_is_active,
> +};
> +
> +void dirty_bitmap_mig_init(void)
> +{
> +    QSIMPLEQ_INIT(&dirty_bitmap_mig_state.dbms_list);

Maybe I haven't looked thoroughly enough yet, but it's weird that part 
of the dirty_bitmap_mig_state is initialized here, and the rest of it in 
init_dirty_bitmap_migration. I'd prefer to keep it all together, if 
possible.

> +
> +    register_savevm_live(NULL, "dirty-bitmap", 0, 1, &savevm_block_handlers,
> +                         &dirty_bitmap_mig_state);
> +}

OK.

> diff --git a/vl.c b/vl.c
> index a824a7d..dee7220 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -4184,6 +4184,7 @@ int main(int argc, char **argv, char **envp)
>
>       blk_mig_init();
>       ram_mig_init();
> +    dirty_bitmap_mig_init();
>
>       /* If the currently selected machine wishes to override the units-per-bus
>        * property of its default HBA interface type, do so now. */
>

Hm, since dirty bitmaps are a sub-component of the block layer, would it 
not make sense to put this hook under blk_mig_init, perhaps?


Overall this looks very clean compared to the intermingled format in V1, 
and the code is organized pretty well. Just a few minor comments, and 
I'd like to get the opinion of the migration maintainers, but I am 
happy. Sorry it took me so long to review, please feel free to let me 
know if you disagree with any of my opinions :)

Thank you,
--John

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 4/8] block: add dirty-dirty bitmaps
  2015-02-10 21:30   ` John Snow
@ 2015-02-12 10:51     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-02-12 10:51 UTC (permalink / raw)
  To: John Snow, qemu-devel; +Cc: kwolf, pbonzini, stefanha, den

On 11.02.2015 00:30, John Snow wrote:
> I had hoped it wouldn't come to this :)
>
> On 01/27/2015 05:56 AM, Vladimir Sementsov-Ogievskiy wrote:
>> A dirty-dirty bitmap is a dirty bitmap for BdrvDirtyBitmap. It tracks
>> set/unset changes of BdrvDirtyBitmap.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>> ---
>>   block.c               | 44 
>> ++++++++++++++++++++++++++++++++++++++++++++
>>   include/block/block.h |  5 +++++
>>   2 files changed, 49 insertions(+)
>>
>> diff --git a/block.c b/block.c
>> index 9860fc1..8ab724b 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -53,6 +53,7 @@
>>
>>   struct BdrvDirtyBitmap {
>>       HBitmap *bitmap;
>> +    HBitmap *dirty_dirty_bitmap;
>
> Just opinions: Maybe we can call it the "meta_bitmap" to help keep the 
> name less confusing, and accompany it with a good structure comment 
> here to explain what the heck it's for.
>
> If you feel that's a good idea; s/dirty_dirty_/meta_/g below.
>
> Regardless, let's make sure this patch adds documentation for it.
No objections. If everyone is happy with meta_bitmaps - I'll  use this 
name. Dirty-dirty bitmaps are just more fanny, I think ;)
>
>>       BdrvDirtyBitmap *originator;
>>       int64_t size;
>>       int64_t granularity;
>> @@ -5373,6 +5374,30 @@ BdrvDirtyBitmap 
>> *bdrv_reclaim_dirty_bitmap(BlockDriverState *bs,
>>       return originator;
>>   }
>>
>> +HBitmap *bdrv_create_dirty_dirty_bitmap(BdrvDirtyBitmap *bitmap,
>> +                                        uint64_t granularity)
>> +{
>> +    uint64_t sector_granularity;
>> +
>> +    assert((granularity & (granularity - 1)) == 0);
>> +
>> +    granularity *= 8 * bitmap->granularity;
>> +    sector_granularity = granularity >> BDRV_SECTOR_BITS;
>> +    assert(sector_granularity);
>> +
>> +    bitmap->dirty_dirty_bitmap =
>> +        hbitmap_alloc(bitmap->size, ffsll(sector_granularity) - 1);
>> +
>> +    return bitmap->dirty_dirty_bitmap;
>> +}
>> +
>> +void bdrv_release_dirty_dirty_bitmap(BdrvDirtyBitmap *bitmap)
>> +{
>> +    if (bitmap->dirty_dirty_bitmap) {
>> +        hbitmap_free(bitmap->dirty_dirty_bitmap);
>> +        bitmap->dirty_dirty_bitmap = NULL;
>> +    }
>> +}
>>
>>   void bdrv_print_dirty_bitmap(BdrvDirtyBitmap *bitmap)
>>   {
>> @@ -5447,6 +5472,9 @@ void bdrv_release_dirty_bitmap(BlockDriverState 
>> *bs, BdrvDirtyBitmap *bitmap)
>>           if (bm == bitmap) {
>>               QLIST_REMOVE(bitmap, list);
>>               hbitmap_free(bitmap->bitmap);
>> +            if (bitmap->dirty_dirty_bitmap) {
>> +                hbitmap_free(bitmap->dirty_dirty_bitmap);
>> +            }
>>               g_free(bitmap->name);
>>               g_free(bitmap);
>>               return;
>> @@ -5534,6 +5562,10 @@ void bdrv_set_dirty_bitmap(BlockDriverState 
>> *bs, BdrvDirtyBitmap *bitmap,
>>   {
>>       if (bitmap->enabled) {
>>           hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
>> +
>> +        if (bitmap->dirty_dirty_bitmap) {
>> +            hbitmap_set(bitmap->dirty_dirty_bitmap, cur_sector, 
>> nr_sectors);
>> +        }
>>       }
>>   }
>>
>> @@ -5541,6 +5573,9 @@ void bdrv_reset_dirty_bitmap(BlockDriverState 
>> *bs, BdrvDirtyBitmap *bitmap,
>>                                int64_t cur_sector, int nr_sectors)
>>   {
>>       hbitmap_reset(bitmap->bitmap, cur_sector, nr_sectors);
>> +    if (bitmap->dirty_dirty_bitmap) {
>> +        hbitmap_set(bitmap->dirty_dirty_bitmap, cur_sector, 
>> nr_sectors);
>> +    }
>>   }
>>
>>   /**
>> @@ -5550,6 +5585,9 @@ void bdrv_reset_dirty_bitmap(BlockDriverState 
>> *bs, BdrvDirtyBitmap *bitmap,
>>   void bdrv_clear_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap 
>> *bitmap)
>>   {
>>       hbitmap_reset(bitmap->bitmap, 0, bitmap->size);
>> +    if (bitmap->dirty_dirty_bitmap) {
>> +        hbitmap_set(bitmap->dirty_dirty_bitmap, 0, bitmap->size);
>> +    }
>>   }
>>
>>   const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap)
>> @@ -5597,6 +5635,9 @@ static void bdrv_set_dirty(BlockDriverState 
>> *bs, int64_t cur_sector,
>>               continue;
>>           }
>>           hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
>> +        if (bitmap->dirty_dirty_bitmap) {
>> +            hbitmap_set(bitmap->dirty_dirty_bitmap, cur_sector, 
>> nr_sectors);
>> +        }
>>       }
>>   }
>>
>> @@ -5606,6 +5647,9 @@ static void bdrv_reset_dirty(BlockDriverState 
>> *bs, int64_t cur_sector,
>>       BdrvDirtyBitmap *bitmap;
>>       QLIST_FOREACH(bitmap, &bs->dirty_bitmaps, list) {
>>           hbitmap_reset(bitmap->bitmap, cur_sector, nr_sectors);
>> +        if (bitmap->dirty_dirty_bitmap) {
>> +            hbitmap_set(bitmap->dirty_dirty_bitmap, cur_sector, 
>> nr_sectors);
>> +        }
>>       }
>>   }
>>
>> diff --git a/include/block/block.h b/include/block/block.h
>> index 0890cd2..648b0a9 100644
>> --- a/include/block/block.h
>> +++ b/include/block/block.h
>> @@ -4,6 +4,7 @@
>>   #include "block/aio.h"
>>   #include "qemu-common.h"
>>   #include "qemu/option.h"
>> +#include "qemu/hbitmap.h"
>>   #include "block/coroutine.h"
>>   #include "block/accounting.h"
>>   #include "qapi/qmp/qobject.h"
>> @@ -473,6 +474,10 @@ void 
>> bdrv_dirty_bitmap_deserialize_part0(BdrvDirtyBitmap *bitmap,
>>                                            uint64_t start, uint64_t 
>> count);
>>   void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap);
>>
>> +HBitmap *bdrv_create_dirty_dirty_bitmap(BdrvDirtyBitmap *bitmap,
>> +                                        uint64_t granularity);
>> +void bdrv_release_dirty_dirty_bitmap(BdrvDirtyBitmap *bitmap);
>> +
>>   void bdrv_enable_copy_on_read(BlockDriverState *bs);
>>   void bdrv_disable_copy_on_read(BlockDriverState *bs);
>>
>>
>
> Looks correct, it just needs documentation in my opinion to help 
> explain what this extra bitmap is for.
>
> Musings / opinions:
>
> I also think that at this point, we may want to make bdrv_reset_dirty 
> and bdrv_set_dirty call the bdrv_reset_dirty_bitmap and 
> bdrv_set_dirty_bitmap functions instead of re-implementing the behavior.
>
> I used to think it was fine as-is, but the more conditions we add to 
> how these bitmaps should be accessed, the more I think limiting the 
> interface to as few functions as possible is a good idea.
>
> Maybe I'll even do that myself. It might even be nice to split off all 
> the bitmap functions off into something like block/dirty_bitmaps.c as 
> the complexity creeps up and we're trying to improve the 
> maintainability of block.c.
>
> Anyway, we can worry about that in a later series.


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 5/8] block: add bdrv_dirty_bitmap_enabled()
  2015-02-10 21:30   ` John Snow
@ 2015-02-12 10:54     ` Vladimir Sementsov-Ogievskiy
  2015-02-12 16:22       ` John Snow
  0 siblings, 1 reply; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-02-12 10:54 UTC (permalink / raw)
  To: John Snow, qemu-devel; +Cc: kwolf, pbonzini, stefanha, den

On 11.02.2015 00:30, John Snow wrote:
>
>
> On 01/27/2015 05:56 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>> ---
>>   block.c               | 5 +++++
>>   include/block/block.h | 1 +
>>   2 files changed, 6 insertions(+)
>>
>> diff --git a/block.c b/block.c
>> index 8ab724b..15fc621 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -5551,6 +5551,11 @@ uint64_t 
>> bdrv_dirty_bitmap_granularity(BlockDriverState *bs,
>>       return bitmap->granularity;
>>   }
>>
>> +bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap)
>> +{
>> +    return bitmap->enabled;
>> +}
>> +
>>   void bdrv_dirty_iter_init(BlockDriverState *bs,
>>                             BdrvDirtyBitmap *bitmap, HBitmapIter *hbi)
>>   {
>> diff --git a/include/block/block.h b/include/block/block.h
>> index 648b0a9..7b49d98 100644
>> --- a/include/block/block.h
>> +++ b/include/block/block.h
>> @@ -449,6 +449,7 @@ BlockDirtyInfoList 
>> *bdrv_query_dirty_bitmaps(BlockDriverState *bs);
>>   uint64_t bdrv_get_default_bitmap_granularity(BlockDriverState *bs);
>>   uint64_t bdrv_dirty_bitmap_granularity(BlockDriverState *bs,
>>                                         BdrvDirtyBitmap *bitmap);
>> +bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap);
>>   int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap, 
>> int64_t sector);
>>   void bdrv_set_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap 
>> *bitmap,
>>                              int64_t cur_sector, int nr_sectors);
>>
>
> I wrote something similar in my incremental backup series. I'm 
> submitting a new incremental backup series (v12!) soon which might 
> have /slightly/ different semantics for enabled/disabled bitmaps.
>
> I don't think you'll need this patch, but in case you do need it:
>
> Reviewed-by: John Snow <jsnow@redhat.com>

Ok, I don't need it after v12.

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 5/8] block: add bdrv_dirty_bitmap_enabled()
  2015-02-12 10:54     ` Vladimir Sementsov-Ogievskiy
@ 2015-02-12 16:22       ` John Snow
  0 siblings, 0 replies; 35+ messages in thread
From: John Snow @ 2015-02-12 16:22 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel; +Cc: kwolf, pbonzini, stefanha, den



On 02/12/2015 05:54 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 11.02.2015 00:30, John Snow wrote:
>>
>>
>> On 01/27/2015 05:56 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>>> ---
>>>   block.c               | 5 +++++
>>>   include/block/block.h | 1 +
>>>   2 files changed, 6 insertions(+)
>>>
>>> diff --git a/block.c b/block.c
>>> index 8ab724b..15fc621 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -5551,6 +5551,11 @@ uint64_t
>>> bdrv_dirty_bitmap_granularity(BlockDriverState *bs,
>>>       return bitmap->granularity;
>>>   }
>>>
>>> +bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap)
>>> +{
>>> +    return bitmap->enabled;
>>> +}
>>> +
>>>   void bdrv_dirty_iter_init(BlockDriverState *bs,
>>>                             BdrvDirtyBitmap *bitmap, HBitmapIter *hbi)
>>>   {
>>> diff --git a/include/block/block.h b/include/block/block.h
>>> index 648b0a9..7b49d98 100644
>>> --- a/include/block/block.h
>>> +++ b/include/block/block.h
>>> @@ -449,6 +449,7 @@ BlockDirtyInfoList
>>> *bdrv_query_dirty_bitmaps(BlockDriverState *bs);
>>>   uint64_t bdrv_get_default_bitmap_granularity(BlockDriverState *bs);
>>>   uint64_t bdrv_dirty_bitmap_granularity(BlockDriverState *bs,
>>>                                         BdrvDirtyBitmap *bitmap);
>>> +bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap);
>>>   int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
>>> int64_t sector);
>>>   void bdrv_set_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap
>>> *bitmap,
>>>                              int64_t cur_sector, int nr_sectors);
>>>
>>
>> I wrote something similar in my incremental backup series. I'm
>> submitting a new incremental backup series (v12!) soon which might
>> have /slightly/ different semantics for enabled/disabled bitmaps.
>>
>> I don't think you'll need this patch, but in case you do need it:
>>
>> Reviewed-by: John Snow <jsnow@redhat.com>
>
> Ok, I don't need it after v12.
>

Yes, my apologies for the churn.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 8/8] migration: add migration/dirty-bitmap.c
  2015-02-10 21:33   ` John Snow
@ 2015-02-13  8:19     ` Vladimir Sementsov-Ogievskiy
  2015-02-13  9:06       ` Vladimir Sementsov-Ogievskiy
  2015-02-13 20:22       ` John Snow
  0 siblings, 2 replies; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-02-13  8:19 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: kwolf, Peter Maydell,
	Juan quin >> Juan Jose Quintela Carreira,
	Dr. David Alan Gilbert, stefanha, den, amit Shah, pbonzini

On 11.02.2015 00:33, John Snow wrote:
> Peter Maydell: What's the right way to license a file as copied from a 
> previous version? See below, please;
>
> Max, Markus: ctrl+f "bdrv_get_device_name" and let me know what you 
> think, if you would.
>
> Juan, Amit, David: Copying migration maintainers.
>
> On 01/27/2015 05:56 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Live migration of dirty bitmaps. Only named dirty bitmaps are migrated.
>> If destination qemu is already containing a dirty bitmap with the same
>> name as a migrated bitmap, then their granularities should be the same,
>> otherwise the error will be generated. If destination qemu doesn't
>> contain such bitmap it will be created.
>>
>> format:
>>
>> 1 byte: flags
>>
>> [ 1 byte: node name size ] \  flags & DEVICE_NAME
>> [ n bytes: node name     ] /
>>
>> [ 1 byte: bitmap name size ]       \
>> [ n bytes: bitmap name     ]       | flags & BITMAP_NAME
>> [ [ be64: granularity    ] ]  flags & GRANULARITY
>>
>> [ 1 byte: bitmap enabled bit ] flags & ENABLED
>>
>> [ be64: start sector      ] \ flags & (NORMAL_CHUNK | ZERO_CHUNK)
>> [ be32: number of sectors ] /
>>
>> [ be64: buffer size ] \ flags & NORMAL_CHUNK
>> [ n bytes: buffer   ] /
>>
>> The last chunk should contain flags & EOS. The chunk may skip device
>> and/or bitmap names, assuming them to be the same with the previous
>> chunk. GRANULARITY is sent with the first chunk for the bitmap. ENABLED
>> bit is sent in the end of "complete" stage of migration. So when
>> destination gets ENABLED flag it should deserialize_finish the bitmap
>> and set its enabled bit to corresponding value.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>> ---
>>   include/migration/block.h |   1 +
>>   migration/Makefile.objs   |   2 +-
>>   migration/dirty-bitmap.c  | 606 
>> ++++++++++++++++++++++++++++++++++++++++++++++
>>   vl.c                      |   1 +
>>   4 files changed, 609 insertions(+), 1 deletion(-)
>>   create mode 100644 migration/dirty-bitmap.c
>>
>> diff --git a/include/migration/block.h b/include/migration/block.h
>> index ffa8ac0..566bb9f 100644
>> --- a/include/migration/block.h
>> +++ b/include/migration/block.h
>> @@ -14,6 +14,7 @@
>>   #ifndef BLOCK_MIGRATION_H
>>   #define BLOCK_MIGRATION_H
>>
>> +void dirty_bitmap_mig_init(void);
>>   void blk_mig_init(void);
>>   int blk_mig_active(void);
>>   uint64_t blk_mig_bytes_transferred(void);
>
> OK.
>
>> diff --git a/migration/Makefile.objs b/migration/Makefile.objs
>> index d929e96..9adfda9 100644
>> --- a/migration/Makefile.objs
>> +++ b/migration/Makefile.objs
>> @@ -6,5 +6,5 @@ common-obj-y += xbzrle.o
>>   common-obj-$(CONFIG_RDMA) += rdma.o
>>   common-obj-$(CONFIG_POSIX) += exec.o unix.o fd.o
>>
>> -common-obj-y += block.o
>> +common-obj-y += block.o dirty-bitmap.o
>>
>
> OK.
>
>> diff --git a/migration/dirty-bitmap.c b/migration/dirty-bitmap.c
>> new file mode 100644
>> index 0000000..8621218
>> --- /dev/null
>> +++ b/migration/dirty-bitmap.c
>> @@ -0,0 +1,606 @@
>> +/*
>> + * QEMU dirty bitmap migration
>> + *
>> + * derived from migration/block.c
>> + *
>> + * Author:
>> + * Sementsov-Ogievskiy Vladimir <vsementsov@parallels.com>
>> + *
>> + * original copyright message:
>> + * 
>> =====================================================================
>> + * Copyright IBM, Corp. 2009
>> + *
>> + * Authors:
>> + *  Liran Schour <lirans@il.ibm.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2.  
>> See
>> + * the COPYING file in the top-level directory.
>> + *
>> + * Contributions after 2012-01-13 are licensed under the terms of the
>> + * GNU GPL, version 2 or (at your option) any later version.
>> + * 
>> =====================================================================
>> + */
>> +
>
> Not super familiar with the right way to do licensing here; it's 
> possible you may not need to copy the original here, but I'm not sure. 
> You will want to make it clear what license applies to /your/ work, I 
> think. Maybe Peter Maydell can clue us in.
>
>> +#include "block/block.h"
>> +#include "qemu/main-loop.h"
>> +#include "qemu/error-report.h"
>> +#include "migration/block.h"
>> +#include "migration/migration.h"
>> +#include "qemu/hbitmap.h"
>> +#include <assert.h>
>> +
>> +#define CHUNK_SIZE                       (1 << 20)
>> +
>> +#define DIRTY_BITMAP_MIG_FLAG_EOS           0x01
>> +#define DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK  0x02
>> +#define DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK    0x04
>> +#define DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME   0x08
>> +#define DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME   0x10
>> +#define DIRTY_BITMAP_MIG_FLAG_GRANULARITY   0x20
>> +#define DIRTY_BITMAP_MIG_FLAG_ENABLED       0x40
>> +/* flags should be <= 0xff */
>> +
>
> We should give ourselves a little breathing room with the flags, since 
> we've only got room for one more.
Ok. Will one more byte be enough?
>
>> +/* #define DEBUG_DIRTY_BITMAP_MIGRATION */
>> +
>> +#ifdef DEBUG_DIRTY_BITMAP_MIGRATION
>> +#define DPRINTF(fmt, ...) \
>> +    do { printf("dirty_migration: " fmt, ## __VA_ARGS__); } while (0)
>> +#else
>> +#define DPRINTF(fmt, ...) \
>> +    do { } while (0)
>> +#endif
>> +
>> +typedef struct DirtyBitmapMigBitmapState {
>> +    /* Written during setup phase. */
>> +    BlockDriverState *bs;
>> +    BdrvDirtyBitmap *bitmap;
>> +    HBitmap *dirty_bitmap;
>
> For my own sanity, I'd really prefer "bitmap" and "meta_bitmap" here; 
> "dirty_bitmap" is often used as a synonym (outside of this file) to 
> refer to the BdrvDirtyBitmap in general, so it's usage here can be 
> somewhat confusing.
>
> I'd appreciate "dirty_dirty_bitmap" as in your previous patch for 
> consistency, or "meta_bitmap" as I recommend.
>
Ok
>> +    int64_t total_sectors;
>> +    uint64_t sectors_per_chunk;
>> +    QSIMPLEQ_ENTRY(DirtyBitmapMigBitmapState) entry;
>> +
>> +    /* For bulk phase. */
>> +    bool bulk_completed;
>> +    int64_t cur_sector;
>> +    bool granularity_sent;
>> +
>> +    /* For dirty phase. */
>> +    int64_t cur_dirty;
>> +} DirtyBitmapMigBitmapState;
>> +
>> +typedef struct DirtyBitmapMigState {
>> +    int migration_enable;
>> +    QSIMPLEQ_HEAD(dbms_list, DirtyBitmapMigBitmapState) dbms_list;
>> +
>> +    bool bulk_completed;
>> +
>> +    /* for send_bitmap() */
>> +    BlockDriverState *prev_bs;
>> +    BdrvDirtyBitmap *prev_bitmap;
>> +} DirtyBitmapMigState;
>> +
>> +static DirtyBitmapMigState dirty_bitmap_mig_state;
>> +
>> +/* read name from qemu file:
>> + * format:
>> + * 1 byte : len = name length (<256)
>> + * len bytes : name without last zero byte
>> + *
>> + * name should point to the buffer >= 256 bytes length
>> + */
>> +static char *qemu_get_name(QEMUFile *f, char *name)
>> +{
>> +    int len = qemu_get_byte(f);
>> +    qemu_get_buffer(f, (uint8_t *)name, len);
>> +    name[len] = '\0';
>> +
>> +    DPRINTF("get name: %d %s\n", len, name);
>> +
>> +    return name;
>> +}
>> +
>
> OK. Maybe these could be "qemu_put_string" or "qemu_get_string" and 
> added to qemu-file.c so others can use them.
If no objections for sharing this format, I'll do it.
>
>> +/* write name to qemu file:
>> + * format:
>> + * same as for qemu_get_name
>> + *
>> + * maximum name length is 255
>> + */
>> +static void qemu_put_name(QEMUFile *f, const char *name)
>> +{
>> +    int len = strlen(name);
>> +
>> +    DPRINTF("put name: %d %s\n", len, name);
>> +
>> +    assert(len < 256);
>> +    qemu_put_byte(f, len);
>> +    qemu_put_buffer(f, (const uint8_t *)name, len);
>> +}
>> +
>
> OK.
>
>> +static void send_bitmap(QEMUFile *f, DirtyBitmapMigBitmapState *dbms,
>> +                        uint64_t start_sector, uint32_t nr_sectors)
>> +{
>> +    BlockDriverState *bs = dbms->bs;
>> +    BdrvDirtyBitmap *bitmap = dbms->bitmap;
>> +    uint8_t flags = 0;
>> +    /* align for buffer_is_zero() */
>> +    uint64_t align = 4 * sizeof(long);
>> +    uint64_t buf_size =
>> +        (bdrv_dirty_bitmap_data_size(bitmap, nr_sectors) + align - 1) &
>> +        ~(align - 1);
>> +    uint8_t *buf = g_malloc0(buf_size);
>> +
>> +    bdrv_dirty_bitmap_serialize_part(bitmap, buf, start_sector, 
>> nr_sectors);
>> +
>> +    if (buffer_is_zero(buf, buf_size)) {
>> +        g_free(buf);
>> +        buf = NULL;
>> +        flags |= DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK;
>> +    } else {
>> +        flags |= DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK;
>> +    }
>> +
>> +    if (bs != dirty_bitmap_mig_state.prev_bs) {
>> +        dirty_bitmap_mig_state.prev_bs = bs;
>> +        flags |= DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME;
>> +    }
>> +
>> +    if (bitmap != dirty_bitmap_mig_state.prev_bitmap) {
>> +        dirty_bitmap_mig_state.prev_bitmap = bitmap;
>> +        flags |= DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME;
>> +    }
>
> OK, so we use the current bs/bitmap under consideration to detect if 
> we have switched context, and put the names in the stream when it 
> happens. OK.
>
>> +
>> +    if (dbms->granularity_sent == 0) {
>> +        dbms->granularity_sent = 1;
>> +        flags |= DIRTY_BITMAP_MIG_FLAG_GRANULARITY;
>> +    }
>> +
>> +    DPRINTF("Enter send_bitmap"
>> +            "\n   flags:        %x"
>> +            "\n   start_sector: %" PRIu64
>> +            "\n   nr_sectors:   %" PRIu32
>> +            "\n   data_size:    %" PRIu64 "\n",
>> +            flags, start_sector, nr_sectors, buf_size);
>> +
>> +    qemu_put_byte(f, flags);
>> +
>> +    if (flags & DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME) {
>> +        qemu_put_name(f, bdrv_get_device_name(bs));
>> +    }
>
> I am still not fully clear on this myself, but I think we are phasing 
> out bdrv_get_device_name. In the context of bitmaps, we are mostly 
> likely using them one-per-tree, but they /can/ be attached 
> one-per-node, so we shouldn't be trying to get the device name here, 
> but rather, the node-name.
>
> I /think/ we may want to be using bdrv_get_node_name, but I think it 
> is not currently true that all nodes WILL be named ... I am CC'ing 
> Markus Armbruster and Max Reitz for (A) A refresher course and (B) 
> Opinions on what function call might make sense here, given that we 
> wish to migrate bitmaps attached to arbitrary nodes.
Hmm.. I'm not familiar with hierarchy of nodes and devices. As I 
understand, both command_line- and qmp- created drives are created 
through blockdev_init, which creates both blk(device) and bs(node) 
through blk_new_with_bs.. Am I right? Also, bdrv_get_device_name is used 
in migration/block.c.
>
>> +
>> +    if (flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
>> +        qemu_put_name(f, bdrv_dirty_bitmap_name(bitmap));
>> +
>> +        if (flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY) {
>> +            qemu_put_be64(f, bdrv_dirty_bitmap_granularity(bs, 
>> bitmap));
>> +        }
>> +    } else {
>> +        assert(!(flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY));
>> +    }
>> +
>
> I thought we were only migrating bitmaps with names?
> I suppose the conditional can't hurt, but I am not clear on when we 
> won't have a bitmap name here.
You are right, 'else' case is not possible.. Hmm. I've added it to be 
sure that format is not corrupted, when I decided to put granularity 
only with name. Wi won't have a bitmap name only when we send the same 
bitmap as on the previous send_bitmap() call. May be it will be better 
to use two separate if's without else and assert.
>
>> +    qemu_put_be64(f, start_sector);
>> +    qemu_put_be32(f, nr_sectors);
>> +
>> +    /* if a block is zero we need to flush here since the network
>> +     * bandwidth is now a lot higher than the storage device bandwidth.
>> +     * thus if we queue zero blocks we slow down the migration.
>> +     * also, skip writing block when migrate only dirty bitmaps. */
>> +    if (flags & DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK) {
>> +        qemu_fflush(f);
>> +        return;
>> +    }
>> +
>> +    qemu_put_be64(f, buf_size);
>> +    qemu_put_buffer(f, buf, buf_size);
>> +    g_free(buf);
>> +}
>> +
>> +
>> +/* Called with iothread lock taken.  */
>> +
>> +static void set_dirty_tracking(void)
>> +{
>> +    DirtyBitmapMigBitmapState *dbms;
>> +
>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> +        dbms->dirty_bitmap =
>> +            bdrv_create_dirty_dirty_bitmap(dbms->bitmap, CHUNK_SIZE);
>> +    }
>> +}
>> +
>
> OK: so we only have these dirty-dirty bitmaps when migration is 
> starting, which makes sense.
>
>> +static void unset_dirty_tracking(void)
>> +{
>> +    DirtyBitmapMigBitmapState *dbms;
>> +
>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> +        bdrv_release_dirty_dirty_bitmap(dbms->bitmap);
>> +    }
>> +}
>> +
>
> OK.
>
>> +static void init_dirty_bitmap_migration(QEMUFile *f)
>> +{
>> +    BlockDriverState *bs;
>> +    BdrvDirtyBitmap *bitmap;
>> +    DirtyBitmapMigBitmapState *dbms;
>> +
>> +    dirty_bitmap_mig_state.bulk_completed = false;
>> +    dirty_bitmap_mig_state.prev_bs = NULL;
>> +    dirty_bitmap_mig_state.prev_bitmap = NULL;
>> +
>> +    for (bs = bdrv_next(NULL); bs; bs = bdrv_next(bs)) {
>> +        for (bitmap = bdrv_next_dirty_bitmap(bs, NULL); bitmap;
>> +             bitmap = bdrv_next_dirty_bitmap(bs, bitmap)) {
>> +            if (!bdrv_dirty_bitmap_name(bitmap)) {
>> +                continue;
>> +            }
>> +
>> +            dbms = g_new0(DirtyBitmapMigBitmapState, 1);
>> +            dbms->bs = bs;
>> +            dbms->bitmap = bitmap;
>> +            dbms->total_sectors = bdrv_nb_sectors(bs);
>> +            dbms->sectors_per_chunk = CHUNK_SIZE * 8 *
>> +                bdrv_dirty_bitmap_granularity(dbms->bs, dbms->bitmap)
>> +                >> BDRV_SECTOR_BITS;
>> +
>> + QSIMPLEQ_INSERT_TAIL(&dirty_bitmap_mig_state.dbms_list,
>> +                                 dbms, entry);
>> +        }
>> +    }
>> +}
>> +
>
> OK, but see the note below for dirty_bitmap_mig_init.
actually it is not 'init' but 'reinit' - called on every migration 
start.. Hmm. dbms_list should be cleared here before fill it again.
>
>> +/* Called with no lock taken.  */
>> +static void bulk_phase_send_chunk(QEMUFile *f, 
>> DirtyBitmapMigBitmapState *dbms)
>> +{
>> +    uint32_t nr_sectors = MIN(dbms->total_sectors - dbms->cur_sector,
>> +                             dbms->sectors_per_chunk);
>
> What about cases where nr_sectors will put us past the end of the 
> bitmap? The bitmap serialization implementation might need a touchup 
> with this in mind.
I don't understand.. nr_sectors <=  dbms->total_sectors - 
dbms->cur_sector and it can't put us past the end...
>
>> +
>> +    send_bitmap(f, dbms, dbms->cur_sector, nr_sectors);
>> +
>> +    dbms->cur_sector += nr_sectors;
>> +    if (dbms->cur_sector >= dbms->total_sectors) {
>> +        dbms->bulk_completed = true;
>> +    }
>> +}
>> +
>> +/* Called with no lock taken.  */
>> +static void bulk_phase(QEMUFile *f, bool limit)
>> +{
>> +    DirtyBitmapMigBitmapState *dbms;
>> +
>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> +        while (!dbms->bulk_completed) {
>> +            bulk_phase_send_chunk(f, dbms);
>> +            if (limit && qemu_file_rate_limit(f)) {
>> +                return;
>> +            }
>> +        }
>> +    }
>> +
>> +    dirty_bitmap_mig_state.bulk_completed = true;
>> +}
>
> OK.
>
>> +
>> +static void blk_mig_reset_dirty_cursor(void)
>> +{
>> +    DirtyBitmapMigBitmapState *dbms;
>> +
>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> +        dbms->cur_dirty = 0;
>> +    }
>> +}
>> +
>
> OK.
>
>> +/* Called with iothread lock taken.  */
>> +static void dirty_phase_send_chunk(QEMUFile *f, 
>> DirtyBitmapMigBitmapState *dbms)
>> +{
>> +    uint32_t nr_sectors;
>> +
>> +    while (dbms->cur_dirty < dbms->total_sectors &&
>> +           !hbitmap_get(dbms->dirty_bitmap, dbms->cur_dirty)) {
>> +        dbms->cur_dirty += dbms->sectors_per_chunk;
>> +    }
>
> OK, so we fast forward the dirty cursor while the meta-bitmap is 
> empty. Is it not worth using the HBitmapIterator here? You can reset 
> them everywhere you reset the dirty cursor, and then just fast-seek to 
> the first dirty sector.
Yes, I've thought about it, just used simpler way (copied from 
migration/block.c) for an early version of the patch set. I will do it.
>
>> +
>> +    if (dbms->cur_dirty >= dbms->total_sectors) {
>> +        return;
>> +    }
>> +
>> +    nr_sectors = MIN(dbms->total_sectors - dbms->cur_dirty,
>> +                     dbms->sectors_per_chunk);
>
> What happens if nr_sectors goes past the end?
>
>> +    send_bitmap(f, dbms, dbms->cur_dirty, nr_sectors);
>> +    hbitmap_reset(dbms->dirty_bitmap, dbms->cur_dirty, 
>> dbms->sectors_per_chunk);
>> +    dbms->cur_dirty += nr_sectors;
>> +}
>> +
>> +/* Called with iothread lock taken.
>> + *
>> + * return value:
>> + * 0: too much data for max_downtime
>> + * 1: few enough data for max_downtime
>> +*/
>
> dirty_phase below doesn't have a return value.
rudimentary comment.. thanks.
>
>> +static void dirty_phase(QEMUFile *f, bool limit)
>> +{
>> +    DirtyBitmapMigBitmapState *dbms;
>> +
>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> +        while (dbms->cur_dirty < dbms->total_sectors) {
>> +            dirty_phase_send_chunk(f, dbms);
>> +            if (limit && qemu_file_rate_limit(f)) {
>> +                return;
>> +            }
>> +        }
>> +    }
>> +}
>> +
>
> OK.
>
>> +
>> +/* Called with iothread lock taken.  */
>> +static void dirty_bitmap_mig_cleanup(void)
>> +{
>> +    DirtyBitmapMigBitmapState *dbms;
>> +
>> +    unset_dirty_tracking();
>> +
>> +    while ((dbms = 
>> QSIMPLEQ_FIRST(&dirty_bitmap_mig_state.dbms_list)) != NULL) {
>> + QSIMPLEQ_REMOVE_HEAD(&dirty_bitmap_mig_state.dbms_list, entry);
>> +        g_free(dbms);
>> +    }
>> +}
>> +
>
> OK.
>
>> +static void dirty_bitmap_migration_cancel(void *opaque)
>> +{
>> +    dirty_bitmap_mig_cleanup();
>> +}
>> +
>
> OK.
>
>> +static int dirty_bitmap_save_iterate(QEMUFile *f, void *opaque)
>> +{
>> +    DPRINTF("Enter save live iterate\n");
>> +
>> +    blk_mig_reset_dirty_cursor();
>
> I suppose this is because it's easier to check if we are finished by 
> starting from sector 0 every time.
>
> A harder, but faster method may be: Use HBitmapIterators, but don't 
> reset them every iteration: just iterate until the end, and check that 
> the bitmap is empty. If the meta bitmap is empty, the dirty phase is 
> complete. If the meta bitmap is NOT empty, reset the HBI and continue 
> allowing iterations over the dirty phase.
Ok, will do.
>
>> +
>> +    if (dirty_bitmap_mig_state.bulk_completed) {
>> +        qemu_mutex_lock_iothread();
>> +        dirty_phase(f, true);
>> +        qemu_mutex_unlock_iothread();
>> +    } else {
>> +        bulk_phase(f, true);
>> +    }
>> +
>> +    qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>> +
>> +    return dirty_bitmap_mig_state.bulk_completed;
>> +}
>> +
>> +/* Called with iothread lock taken.  */
>> +
>> +static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
>> +{
>> +    DirtyBitmapMigBitmapState *dbms;
>> +    DPRINTF("Enter save live complete\n");
>> +
>> +    if (!dirty_bitmap_mig_state.bulk_completed) {
>> +        bulk_phase(f, false);
>> +    }
>
> [Not expertly familiar with savevm:] Under what conditions can this 
> happen?
This can happen. save_complete will happen when savevm decide that 
pending data size to send is small enough. It was the case for my bugfix 
for migration/block.c about pending. To prevent save_complete when bulk 
phase isn't completed, save_pending returns (in my bugfix for 
migration/block.c) big value. Here I decided to make more honest 
save_pending, so I need to complete (if it doesn't) bulk phase in 
save_complete.
>
>> +
>> +    blk_mig_reset_dirty_cursor();
>> +    dirty_phase(f, false);
>> +
>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> +        uint8_t flags = DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME |
>> +                        DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME |
>> +                        DIRTY_BITMAP_MIG_FLAG_ENABLED;
>> +
>> +        qemu_put_byte(f, flags);
>> +        qemu_put_name(f, bdrv_get_device_name(dbms->bs));
>> +        qemu_put_name(f, bdrv_dirty_bitmap_name(dbms->bitmap));
>> +        qemu_put_byte(f, bdrv_dirty_bitmap_enabled(dbms->bitmap));
>> +    }
>> +
>> +    qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>> +
>> +    DPRINTF("Dirty bitmaps migration completed\n");
>> +
>> +    dirty_bitmap_mig_cleanup();
>> +    return 0;
>> +}
>> +
>
> I suppose we don't need a flag that distinctly SAYS this is the end 
> section, since we can tell by omission of 
> DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK or ZERO_CHUNK.
Hmm. I think it simplifies the logic (to use EOS after each section). 
And the same approach is in migration/block.c.. It's a question about 
which format is better:  "Each section for dirty_bitmap_load ends with 
EOS" or "Each section for dirty_bitmap_load ends with EOS except the 
last one. The last one may be recognized by absent NORMAL_CHUNK and 
ZERO_CHUNK"
>
>> +static uint64_t dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
>> +                                          uint64_t max_size)
>> +{
>> +    DirtyBitmapMigBitmapState *dbms;
>> +    uint64_t pending = 0;
>> +
>> +    qemu_mutex_lock_iothread();
>> +
>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> +        uint64_t sectors = hbitmap_count(dbms->dirty_bitmap);
>> +        if (!dbms->bulk_completed) {
>> +            sectors += dbms->total_sectors - dbms->cur_sector;
>> +        }
>> +        pending += bdrv_dirty_bitmap_data_size(dbms->bitmap, sectors);
>> +    }
>> +
>> +    qemu_mutex_unlock_iothread();
>> +
>> +    DPRINTF("Enter save live pending %" PRIu64 ", max: %" PRIu64 "\n",
>> +            pending, max_size);
>> +    return pending;
>> +}
>> +
>
> OK.
>
>> +static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
>> +{
>> +    int flags;
>> +
>> +    static char device_name[256], bitmap_name[256];
>> +    static BlockDriverState *bs;
>> +    static BdrvDirtyBitmap *bitmap;
>> +
>> +    uint8_t *buf;
>> +    uint64_t first_sector;
>> +    uint32_t  nr_sectors;
>> +    int ret;
>> +
>> +    DPRINTF("load start\n");
>> +
>> +    do {
>> +        flags = qemu_get_byte(f);
>> +        DPRINTF("flags: %x\n", flags);
>> +
>> +        if (flags & DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME) {
>> +            qemu_get_name(f, device_name);
>> +            bs = bdrv_find(device_name);
>
> Similar to the above confusion, you may want bdrv_lookup_bs or 
> similar, since we're going to be looking for BDS nodes instead of 
> "devices."
In this case, should it be changed in migration/block.c too?
>
>> +            if (!bs) {
>> +                fprintf(stderr, "Error: unknown block device '%s'\n",
>> +                        device_name);
>> +                return -EINVAL;
>> +            }
>> +        }
>> +
>> +        if (flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
>> +            if (!bs) {
>> +                fprintf(stderr, "Error: block device name is not 
>> set\n");
>> +                return -EINVAL;
>> +            }
>> +
>> +            qemu_get_name(f, bitmap_name);
>> +            bitmap = bdrv_find_dirty_bitmap(bs, bitmap_name);
>> +            if (flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY) {
>> +                /* First chunk from this bitmap */
>> +                uint64_t granularity = qemu_get_be64(f);
>> +                if (!bitmap) {
>> +                    Error *local_err = NULL;
>> +                    bitmap = bdrv_create_dirty_bitmap(bs, granularity,
>> + bitmap_name,
>> + &local_err);
>> +                    if (!bitmap) {
>> +                        error_report("%s", 
>> error_get_pretty(local_err));
>> +                        error_free(local_err);
>> +                        return -EINVAL;
>> +                    }
>> +                } else {
>> +                    uint64_t dest_granularity =
>> +                        bdrv_dirty_bitmap_granularity(bs, bitmap);
>> +                    if (dest_granularity != granularity) {
>> +                        fprintf(stderr,
>> +                                "Error: "
>> +                                "Migrated bitmap granularity (%" 
>> PRIu64 ") "
>> +                                "is not match with destination 
>> bitmap '%s' "
>> +                                "granularity (%" PRIu64 ")\n",
>> +                                granularity,
>> +                                bitmap_name,
>> +                                dest_granularity);
>> +                        return -EINVAL;
>> +                    }
>> +                }
>> +                bdrv_disable_dirty_bitmap(bitmap);
>> +            }
>> +            if (!bitmap) {
>> +                fprintf(stderr, "Error: unknown dirty bitmap "
>> +                        "'%s' for block device '%s'\n",
>> +                        bitmap_name, device_name);
>> +                return -EINVAL;
>> +            }
>> +        }
>> +
>> +        if (flags & DIRTY_BITMAP_MIG_FLAG_ENABLED) {
>> +            bool enabled;
>> +            if (!bitmap) {
>> +                fprintf(stderr, "Error: dirty bitmap name is not 
>> set\n");
>> +                return -EINVAL;
>> +            }
>> +            bdrv_dirty_bitmap_deserialize_finish(bitmap);
>> +            /* complete migration */
>> +            enabled = qemu_get_byte(f);
>> +            if (enabled) {
>> +                bdrv_enable_dirty_bitmap(bitmap);
>> +            }
>> +        }
>
> Oh, so you use the ENABLED flag to show that migration is over.
Yes, it was bad idea..
> If we are going to commit to a stream format for bitmaps, though, 
> maybe it's best to actually create a "COMPLETION BLOCK" flag and then 
> split this function into two pieces:
>
> (1) The part that receives regular / zero blocks, and
> (2) The part that receives completion data.
>
> That way, if we change the properties that bitmaps have down the line, 
> we aren't reliant on literally the "enabled" flag to decide what to do.
>
> Also, it might help make this fairly long function a little smaller 
> and more readable.
Ok.
>
>> +
>> +        if (flags & (DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK |
>> +                     DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK)) {
>> +            if (!bs) {
>> +                fprintf(stderr, "Error: block device name is not 
>> set\n");
>> +                return -EINVAL;
>> +            }
>> +            if (!bitmap) {
>> +                fprintf(stderr, "Error: dirty bitmap name is not 
>> set\n");
>> +                return -EINVAL;
>> +            }
>> +
>> +            first_sector = qemu_get_be64(f);
>> +            nr_sectors = qemu_get_be32(f);
>> +            DPRINTF("chunk: %lu %u\n", first_sector, nr_sectors);
>> +
>> +
>> +            if (flags & DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK) {
>> +                bdrv_dirty_bitmap_deserialize_part0(bitmap, 
>> first_sector,
>> + nr_sectors);
>> +            } else {
>> +                uint64_t buf_size = qemu_get_be64(f);
>> +                uint64_t needed_size =
>> +                    bdrv_dirty_bitmap_data_size(bitmap, nr_sectors);
>> +
>> +                if (needed_size > buf_size) {
>> +                    fprintf(stderr,
>> +                            "Error: Migrated bitmap granularity is 
>> not "
>> +                            "match with destination bitmap 
>> granularity\n");
>> +                    return -EINVAL;
>> +                }
>> +
>
> "Migrated bitmap granularity doesn't match the destination bitmap 
> granularity" perhaps.
>
>> +                buf = g_malloc(buf_size);
>> +                qemu_get_buffer(f, buf, buf_size);
>> +                bdrv_dirty_bitmap_deserialize_part(bitmap, buf,
>> + first_sector,
>> +                                                   nr_sectors);
>> +                g_free(buf);
>> +            }
>> +        }
>> +
>> +        ret = qemu_file_get_error(f);
>> +        if (ret != 0) {
>> +            return ret;
>> +        }
>> +    } while (!(flags & DIRTY_BITMAP_MIG_FLAG_EOS));
>> +
>> +    DPRINTF("load finish\n");
>> +    return 0;
>> +}
>> +
>> +static void dirty_bitmap_set_params(const MigrationParams *params, 
>> void *opaque)
>> +{
>> +    dirty_bitmap_mig_state.migration_enable = params->dirty;
>> +}
>> +
>
> OK; though I am not immediately aware of what changes need to happen 
> to accommodate Eric's suggestions.
This function will be dropped in v3.
>
>> +static bool dirty_bitmap_is_active(void *opaque)
>> +{
>> +    return dirty_bitmap_mig_state.migration_enable == 1;
>> +}
>> +
>
> OK.
>
>> +static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
>> +{
>> +    init_dirty_bitmap_migration(f);
>> +
>> +    qemu_mutex_lock_iothread();
>> +    /* start track dirtyness of dirty bitmaps */
>> +    set_dirty_tracking();
>> +    qemu_mutex_unlock_iothread();
>> +
>> +    blk_mig_reset_dirty_cursor();
>> +    qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>> +
>> +    return 0;
>> +}
>> +
>
> OK; see dirty_bitmap_mig_init below, though.
>
>> +static SaveVMHandlers savevm_block_handlers = {
>> +    .set_params = dirty_bitmap_set_params,
>> +    .save_live_setup = dirty_bitmap_save_setup,
>> +    .save_live_iterate = dirty_bitmap_save_iterate,
>> +    .save_live_complete = dirty_bitmap_save_complete,
>> +    .save_live_pending = dirty_bitmap_save_pending,
>> +    .load_state = dirty_bitmap_load,
>> +    .cancel = dirty_bitmap_migration_cancel,
>> +    .is_active = dirty_bitmap_is_active,
>> +};
>> +
>> +void dirty_bitmap_mig_init(void)
>> +{
>> +    QSIMPLEQ_INIT(&dirty_bitmap_mig_state.dbms_list);
>
> Maybe I haven't looked thoroughly enough yet, but it's weird that part 
> of the dirty_bitmap_mig_state is initialized here, and the rest of it 
> in init_dirty_bitmap_migration. I'd prefer to keep it all together, if 
> possible.
dirty_bitmap_mig_init is called one time when qemu starts. QSIMPLEQ_INIT 
should be called once. dirty_bitmap_save_setup is called on every 
migration start, it's like 'reinitialize'.
>
>> +
>> +    register_savevm_live(NULL, "dirty-bitmap", 0, 1, 
>> &savevm_block_handlers,
>> +                         &dirty_bitmap_mig_state);
>> +}
>
> OK.
>
>> diff --git a/vl.c b/vl.c
>> index a824a7d..dee7220 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -4184,6 +4184,7 @@ int main(int argc, char **argv, char **envp)
>>
>>       blk_mig_init();
>>       ram_mig_init();
>> +    dirty_bitmap_mig_init();
>>
>>       /* If the currently selected machine wishes to override the 
>> units-per-bus
>>        * property of its default HBA interface type, do so now. */
>>
>
> Hm, since dirty bitmaps are a sub-component of the block layer, would 
> it not make sense to put this hook under blk_mig_init, perhaps?
IMHO the reason to put it here is to keep all 
register_savevm_live-entities in one place.
>
>
> Overall this looks very clean compared to the intermingled format in 
> V1, and the code is organized pretty well. Just a few minor comments, 
> and I'd like to get the opinion of the migration maintainers, but I am 
> happy. Sorry it took me so long to review, please feel free to let me 
> know if you disagree with any of my opinions :)
>
> Thank you,
> --John

Thank you for reviewing my series)

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 8/8] migration: add migration/dirty-bitmap.c
  2015-02-13  8:19     ` Vladimir Sementsov-Ogievskiy
@ 2015-02-13  9:06       ` Vladimir Sementsov-Ogievskiy
  2015-02-13 17:32         ` John Snow
  2015-02-13 20:22       ` John Snow
  1 sibling, 1 reply; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-02-13  9:06 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: kwolf, Peter Maydell,
	Juan quin >> Juan Jose Quintela Carreira,
	Dr. David Alan Gilbert, stefanha, pbonzini, amit Shah, den

>>
>>> +
>>> +    blk_mig_reset_dirty_cursor();
>>> +    dirty_phase(f, false);
>>> +
>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>>> +        uint8_t flags = DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME |
>>> +                        DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME |
>>> +                        DIRTY_BITMAP_MIG_FLAG_ENABLED;
>>> +
>>> +        qemu_put_byte(f, flags);
>>> +        qemu_put_name(f, bdrv_get_device_name(dbms->bs));
>>> +        qemu_put_name(f, bdrv_dirty_bitmap_name(dbms->bitmap));
>>> +        qemu_put_byte(f, bdrv_dirty_bitmap_enabled(dbms->bitmap));
>>> +    }
>>> +
>>> +    qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>>> +
>>> +    DPRINTF("Dirty bitmaps migration completed\n");
>>> +
>>> +    dirty_bitmap_mig_cleanup();
>>> +    return 0;
>>> +}
>>> +
>>
>> I suppose we don't need a flag that distinctly SAYS this is the end 
>> section, since we can tell by omission of 
>> DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK or ZERO_CHUNK.
> Hmm. I think it simplifies the logic (to use EOS after each section). 
> And the same approach is in migration/block.c.. It's a question about 
> which format is better:  "Each section for dirty_bitmap_load ends with 
> EOS" or "Each section for dirty_bitmap_load ends with EOS except the 
> last one. The last one may be recognized by absent NORMAL_CHUNK and 
> ZERO_CHUNK"

Oh, sorry, no, it's important EOS. There are several blocks with no 
*_CHUNK! Several bitmaps. And loop in dirty_bitmap_load will read them 
iteratively, and it will finish when find EOS.


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 8/8] migration: add migration/dirty-bitmap.c
  2015-02-13  9:06       ` Vladimir Sementsov-Ogievskiy
@ 2015-02-13 17:32         ` John Snow
  2015-02-13 17:41           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 35+ messages in thread
From: John Snow @ 2015-02-13 17:32 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, Peter Maydell,
	Juan quin >> Juan Jose Quintela Carreira,
	Dr. David Alan Gilbert, stefanha, pbonzini, amit Shah, den



On 02/13/2015 04:06 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>
>>>> +
>>>> +    blk_mig_reset_dirty_cursor();
>>>> +    dirty_phase(f, false);
>>>> +
>>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>>>> +        uint8_t flags = DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME |
>>>> +                        DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME |
>>>> +                        DIRTY_BITMAP_MIG_FLAG_ENABLED;
>>>> +
>>>> +        qemu_put_byte(f, flags);
>>>> +        qemu_put_name(f, bdrv_get_device_name(dbms->bs));
>>>> +        qemu_put_name(f, bdrv_dirty_bitmap_name(dbms->bitmap));
>>>> +        qemu_put_byte(f, bdrv_dirty_bitmap_enabled(dbms->bitmap));
>>>> +    }
>>>> +
>>>> +    qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>>>> +
>>>> +    DPRINTF("Dirty bitmaps migration completed\n");
>>>> +
>>>> +    dirty_bitmap_mig_cleanup();
>>>> +    return 0;
>>>> +}
>>>> +
>>>
>>> I suppose we don't need a flag that distinctly SAYS this is the end
>>> section, since we can tell by omission of
>>> DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK or ZERO_CHUNK.
>> Hmm. I think it simplifies the logic (to use EOS after each section).
>> And the same approach is in migration/block.c.. It's a question about
>> which format is better:  "Each section for dirty_bitmap_load ends with
>> EOS" or "Each section for dirty_bitmap_load ends with EOS except the
>> last one. The last one may be recognized by absent NORMAL_CHUNK and
>> ZERO_CHUNK"
>
> Oh, sorry, no, it's important EOS. There are several blocks with no
> *_CHUNK! Several bitmaps. And loop in dirty_bitmap_load will read them
> iteratively, and it will finish when find EOS.
>
>

Sorry, I worded that poorly. I was wondering why you didn't have an 
explicit "end of bitmap" flag, and I realized that you are doing this 
check essentially by the absence of the NORMAL_CHUNK/ZERO_CHUNK flags.

This is really just a comment on my part; I was expecting a more 
distinct "It is now safe to rebuild the bitmap" flag and was just 
commenting on why we didn't necessarily need one.

I think in another comment I point out that an "end of bitmap" flag 
might make the loading function simpler, and I still think that might be 
nice.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 8/8] migration: add migration/dirty-bitmap.c
  2015-02-13 17:32         ` John Snow
@ 2015-02-13 17:41           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-02-13 17:41 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: kwolf, Peter Maydell,
	Juan quin >> Juan Jose Quintela Carreira,
	Dr. David Alan Gilbert, stefanha, pbonzini, amit Shah, den

On 13.02.2015 20:32, John Snow wrote:
>
>
> On 02/13/2015 04:06 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>>
>>>>> +
>>>>> +    blk_mig_reset_dirty_cursor();
>>>>> +    dirty_phase(f, false);
>>>>> +
>>>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, 
>>>>> entry) {
>>>>> +        uint8_t flags = DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME |
>>>>> +                        DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME |
>>>>> +                        DIRTY_BITMAP_MIG_FLAG_ENABLED;
>>>>> +
>>>>> +        qemu_put_byte(f, flags);
>>>>> +        qemu_put_name(f, bdrv_get_device_name(dbms->bs));
>>>>> +        qemu_put_name(f, bdrv_dirty_bitmap_name(dbms->bitmap));
>>>>> +        qemu_put_byte(f, bdrv_dirty_bitmap_enabled(dbms->bitmap));
>>>>> +    }
>>>>> +
>>>>> +    qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>>>>> +
>>>>> +    DPRINTF("Dirty bitmaps migration completed\n");
>>>>> +
>>>>> +    dirty_bitmap_mig_cleanup();
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>
>>>> I suppose we don't need a flag that distinctly SAYS this is the end
>>>> section, since we can tell by omission of
>>>> DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK or ZERO_CHUNK.
>>> Hmm. I think it simplifies the logic (to use EOS after each section).
>>> And the same approach is in migration/block.c.. It's a question about
>>> which format is better:  "Each section for dirty_bitmap_load ends with
>>> EOS" or "Each section for dirty_bitmap_load ends with EOS except the
>>> last one. The last one may be recognized by absent NORMAL_CHUNK and
>>> ZERO_CHUNK"
>>
>> Oh, sorry, no, it's important EOS. There are several blocks with no
>> *_CHUNK! Several bitmaps. And loop in dirty_bitmap_load will read them
>> iteratively, and it will finish when find EOS.
>>
>>
>
> Sorry, I worded that poorly. I was wondering why you didn't have an 
> explicit "end of bitmap" flag, and I realized that you are doing this 
> check essentially by the absence of the NORMAL_CHUNK/ZERO_CHUNK flags.
>
> This is really just a comment on my part; I was expecting a more 
> distinct "It is now safe to rebuild the bitmap" flag and was just 
> commenting on why we didn't necessarily need one.
>
> I think in another comment I point out that an "end of bitmap" flag 
> might make the loading function simpler, and I still think that might 
> be nice.
Ok. Today I've refactored these things, there would be separate start 
and complete frames of bitmap, the first with additional granularity 
field (without any dirty bitmap data) and the second with additional 
enabled field (also without data).

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 8/8] migration: add migration/dirty-bitmap.c
  2015-02-13  8:19     ` Vladimir Sementsov-Ogievskiy
  2015-02-13  9:06       ` Vladimir Sementsov-Ogievskiy
@ 2015-02-13 20:22       ` John Snow
  2015-02-16 12:06         ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 35+ messages in thread
From: John Snow @ 2015-02-13 20:22 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, Peter Maydell,
	Juan quin >> Juan Jose Quintela Carreira,
	Dr. David Alan Gilbert, stefanha, den, amit Shah, pbonzini



On 02/13/2015 03:19 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 11.02.2015 00:33, John Snow wrote:
>> Peter Maydell: What's the right way to license a file as copied from a
>> previous version? See below, please;
>>
>> Max, Markus: ctrl+f "bdrv_get_device_name" and let me know what you
>> think, if you would.
>>
>> Juan, Amit, David: Copying migration maintainers.
>>
>> On 01/27/2015 05:56 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Live migration of dirty bitmaps. Only named dirty bitmaps are migrated.
>>> If destination qemu is already containing a dirty bitmap with the same
>>> name as a migrated bitmap, then their granularities should be the same,
>>> otherwise the error will be generated. If destination qemu doesn't
>>> contain such bitmap it will be created.
>>>
>>> format:
>>>
>>> 1 byte: flags
>>>
>>> [ 1 byte: node name size ] \  flags & DEVICE_NAME
>>> [ n bytes: node name     ] /
>>>
>>> [ 1 byte: bitmap name size ]       \
>>> [ n bytes: bitmap name     ]       | flags & BITMAP_NAME
>>> [ [ be64: granularity    ] ]  flags & GRANULARITY
>>>
>>> [ 1 byte: bitmap enabled bit ] flags & ENABLED
>>>
>>> [ be64: start sector      ] \ flags & (NORMAL_CHUNK | ZERO_CHUNK)
>>> [ be32: number of sectors ] /
>>>
>>> [ be64: buffer size ] \ flags & NORMAL_CHUNK
>>> [ n bytes: buffer   ] /
>>>
>>> The last chunk should contain flags & EOS. The chunk may skip device
>>> and/or bitmap names, assuming them to be the same with the previous
>>> chunk. GRANULARITY is sent with the first chunk for the bitmap. ENABLED
>>> bit is sent in the end of "complete" stage of migration. So when
>>> destination gets ENABLED flag it should deserialize_finish the bitmap
>>> and set its enabled bit to corresponding value.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>>> ---
>>>   include/migration/block.h |   1 +
>>>   migration/Makefile.objs   |   2 +-
>>>   migration/dirty-bitmap.c  | 606
>>> ++++++++++++++++++++++++++++++++++++++++++++++
>>>   vl.c                      |   1 +
>>>   4 files changed, 609 insertions(+), 1 deletion(-)
>>>   create mode 100644 migration/dirty-bitmap.c
>>>
>>> diff --git a/include/migration/block.h b/include/migration/block.h
>>> index ffa8ac0..566bb9f 100644
>>> --- a/include/migration/block.h
>>> +++ b/include/migration/block.h
>>> @@ -14,6 +14,7 @@
>>>   #ifndef BLOCK_MIGRATION_H
>>>   #define BLOCK_MIGRATION_H
>>>
>>> +void dirty_bitmap_mig_init(void);
>>>   void blk_mig_init(void);
>>>   int blk_mig_active(void);
>>>   uint64_t blk_mig_bytes_transferred(void);
>>
>> OK.
>>
>>> diff --git a/migration/Makefile.objs b/migration/Makefile.objs
>>> index d929e96..9adfda9 100644
>>> --- a/migration/Makefile.objs
>>> +++ b/migration/Makefile.objs
>>> @@ -6,5 +6,5 @@ common-obj-y += xbzrle.o
>>>   common-obj-$(CONFIG_RDMA) += rdma.o
>>>   common-obj-$(CONFIG_POSIX) += exec.o unix.o fd.o
>>>
>>> -common-obj-y += block.o
>>> +common-obj-y += block.o dirty-bitmap.o
>>>
>>
>> OK.
>>
>>> diff --git a/migration/dirty-bitmap.c b/migration/dirty-bitmap.c
>>> new file mode 100644
>>> index 0000000..8621218
>>> --- /dev/null
>>> +++ b/migration/dirty-bitmap.c
>>> @@ -0,0 +1,606 @@
>>> +/*
>>> + * QEMU dirty bitmap migration
>>> + *
>>> + * derived from migration/block.c
>>> + *
>>> + * Author:
>>> + * Sementsov-Ogievskiy Vladimir <vsementsov@parallels.com>
>>> + *
>>> + * original copyright message:
>>> + *
>>> =====================================================================
>>> + * Copyright IBM, Corp. 2009
>>> + *
>>> + * Authors:
>>> + *  Liran Schour <lirans@il.ibm.com>
>>> + *
>>> + * This work is licensed under the terms of the GNU GPL, version 2. See
>>> + * the COPYING file in the top-level directory.
>>> + *
>>> + * Contributions after 2012-01-13 are licensed under the terms of the
>>> + * GNU GPL, version 2 or (at your option) any later version.
>>> + *
>>> =====================================================================
>>> + */
>>> +
>>
>> Not super familiar with the right way to do licensing here; it's
>> possible you may not need to copy the original here, but I'm not sure.
>> You will want to make it clear what license applies to /your/ work, I
>> think. Maybe Peter Maydell can clue us in.
>>
>>> +#include "block/block.h"
>>> +#include "qemu/main-loop.h"
>>> +#include "qemu/error-report.h"
>>> +#include "migration/block.h"
>>> +#include "migration/migration.h"
>>> +#include "qemu/hbitmap.h"
>>> +#include <assert.h>
>>> +
>>> +#define CHUNK_SIZE                       (1 << 20)
>>> +
>>> +#define DIRTY_BITMAP_MIG_FLAG_EOS           0x01
>>> +#define DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK  0x02
>>> +#define DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK    0x04
>>> +#define DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME   0x08
>>> +#define DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME   0x10
>>> +#define DIRTY_BITMAP_MIG_FLAG_GRANULARITY   0x20
>>> +#define DIRTY_BITMAP_MIG_FLAG_ENABLED       0x40
>>> +/* flags should be <= 0xff */
>>> +
>>
>> We should give ourselves a little breathing room with the flags, since
>> we've only got room for one more.
> Ok. Will one more byte be enough?

I should hope so. If you do add a completion chunk and flag, that fills 
up the first byte completely, so having a second byte is a good idea.

I might recommend reserving the last bit of the second byte to be a flag 
such as DIRTY_BITMAP_EXTRA_FLAGS that indicates the presence of 
additional byte(s) of flags, to be determined later, if we ever need 
them, but two bytes for now should be sufficient.

>>
>>> +/* #define DEBUG_DIRTY_BITMAP_MIGRATION */
>>> +
>>> +#ifdef DEBUG_DIRTY_BITMAP_MIGRATION
>>> +#define DPRINTF(fmt, ...) \
>>> +    do { printf("dirty_migration: " fmt, ## __VA_ARGS__); } while (0)
>>> +#else
>>> +#define DPRINTF(fmt, ...) \
>>> +    do { } while (0)
>>> +#endif
>>> +
>>> +typedef struct DirtyBitmapMigBitmapState {
>>> +    /* Written during setup phase. */
>>> +    BlockDriverState *bs;
>>> +    BdrvDirtyBitmap *bitmap;
>>> +    HBitmap *dirty_bitmap;
>>
>> For my own sanity, I'd really prefer "bitmap" and "meta_bitmap" here;
>> "dirty_bitmap" is often used as a synonym (outside of this file) to
>> refer to the BdrvDirtyBitmap in general, so it's usage here can be
>> somewhat confusing.
>>
>> I'd appreciate "dirty_dirty_bitmap" as in your previous patch for
>> consistency, or "meta_bitmap" as I recommend.
>>
> Ok
>>> +    int64_t total_sectors;
>>> +    uint64_t sectors_per_chunk;
>>> +    QSIMPLEQ_ENTRY(DirtyBitmapMigBitmapState) entry;
>>> +
>>> +    /* For bulk phase. */
>>> +    bool bulk_completed;
>>> +    int64_t cur_sector;
>>> +    bool granularity_sent;
>>> +
>>> +    /* For dirty phase. */
>>> +    int64_t cur_dirty;
>>> +} DirtyBitmapMigBitmapState;
>>> +
>>> +typedef struct DirtyBitmapMigState {
>>> +    int migration_enable;
>>> +    QSIMPLEQ_HEAD(dbms_list, DirtyBitmapMigBitmapState) dbms_list;
>>> +
>>> +    bool bulk_completed;
>>> +
>>> +    /* for send_bitmap() */
>>> +    BlockDriverState *prev_bs;
>>> +    BdrvDirtyBitmap *prev_bitmap;
>>> +} DirtyBitmapMigState;
>>> +
>>> +static DirtyBitmapMigState dirty_bitmap_mig_state;
>>> +
>>> +/* read name from qemu file:
>>> + * format:
>>> + * 1 byte : len = name length (<256)
>>> + * len bytes : name without last zero byte
>>> + *
>>> + * name should point to the buffer >= 256 bytes length
>>> + */
>>> +static char *qemu_get_name(QEMUFile *f, char *name)
>>> +{
>>> +    int len = qemu_get_byte(f);
>>> +    qemu_get_buffer(f, (uint8_t *)name, len);
>>> +    name[len] = '\0';
>>> +
>>> +    DPRINTF("get name: %d %s\n", len, name);
>>> +
>>> +    return name;
>>> +}
>>> +
>>
>> OK. Maybe these could be "qemu_put_string" or "qemu_get_string" and
>> added to qemu-file.c so others can use them.
> If no objections for sharing this format, I'll do it.
>>
>>> +/* write name to qemu file:
>>> + * format:
>>> + * same as for qemu_get_name
>>> + *
>>> + * maximum name length is 255
>>> + */
>>> +static void qemu_put_name(QEMUFile *f, const char *name)
>>> +{
>>> +    int len = strlen(name);
>>> +
>>> +    DPRINTF("put name: %d %s\n", len, name);
>>> +
>>> +    assert(len < 256);
>>> +    qemu_put_byte(f, len);
>>> +    qemu_put_buffer(f, (const uint8_t *)name, len);
>>> +}
>>> +
>>
>> OK.
>>
>>> +static void send_bitmap(QEMUFile *f, DirtyBitmapMigBitmapState *dbms,
>>> +                        uint64_t start_sector, uint32_t nr_sectors)
>>> +{
>>> +    BlockDriverState *bs = dbms->bs;
>>> +    BdrvDirtyBitmap *bitmap = dbms->bitmap;
>>> +    uint8_t flags = 0;
>>> +    /* align for buffer_is_zero() */
>>> +    uint64_t align = 4 * sizeof(long);
>>> +    uint64_t buf_size =
>>> +        (bdrv_dirty_bitmap_data_size(bitmap, nr_sectors) + align - 1) &
>>> +        ~(align - 1);
>>> +    uint8_t *buf = g_malloc0(buf_size);
>>> +
>>> +    bdrv_dirty_bitmap_serialize_part(bitmap, buf, start_sector,
>>> nr_sectors);
>>> +
>>> +    if (buffer_is_zero(buf, buf_size)) {
>>> +        g_free(buf);
>>> +        buf = NULL;
>>> +        flags |= DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK;
>>> +    } else {
>>> +        flags |= DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK;
>>> +    }
>>> +
>>> +    if (bs != dirty_bitmap_mig_state.prev_bs) {
>>> +        dirty_bitmap_mig_state.prev_bs = bs;
>>> +        flags |= DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME;
>>> +    }
>>> +
>>> +    if (bitmap != dirty_bitmap_mig_state.prev_bitmap) {
>>> +        dirty_bitmap_mig_state.prev_bitmap = bitmap;
>>> +        flags |= DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME;
>>> +    }
>>
>> OK, so we use the current bs/bitmap under consideration to detect if
>> we have switched context, and put the names in the stream when it
>> happens. OK.
>>
>>> +
>>> +    if (dbms->granularity_sent == 0) {
>>> +        dbms->granularity_sent = 1;
>>> +        flags |= DIRTY_BITMAP_MIG_FLAG_GRANULARITY;
>>> +    }
>>> +
>>> +    DPRINTF("Enter send_bitmap"
>>> +            "\n   flags:        %x"
>>> +            "\n   start_sector: %" PRIu64
>>> +            "\n   nr_sectors:   %" PRIu32
>>> +            "\n   data_size:    %" PRIu64 "\n",
>>> +            flags, start_sector, nr_sectors, buf_size);
>>> +
>>> +    qemu_put_byte(f, flags);
>>> +
>>> +    if (flags & DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME) {
>>> +        qemu_put_name(f, bdrv_get_device_name(bs));
>>> +    }
>>
>> I am still not fully clear on this myself, but I think we are phasing
>> out bdrv_get_device_name. In the context of bitmaps, we are mostly
>> likely using them one-per-tree, but they /can/ be attached
>> one-per-node, so we shouldn't be trying to get the device name here,
>> but rather, the node-name.
>>
>> I /think/ we may want to be using bdrv_get_node_name, but I think it
>> is not currently true that all nodes WILL be named ... I am CC'ing
>> Markus Armbruster and Max Reitz for (A) A refresher course and (B)
>> Opinions on what function call might make sense here, given that we
>> wish to migrate bitmaps attached to arbitrary nodes.
> Hmm.. I'm not familiar with hierarchy of nodes and devices. As I
> understand, both command_line- and qmp- created drives are created
> through blockdev_init, which creates both blk(device) and bs(node)
> through blk_new_with_bs.. Am I right? Also, bdrv_get_device_name is used
> in migration/block.c.

Now that I'm more awake, here's a better rundown of what's going on:

It's something that is a little bit in flux right now, unfortunately. 
We're trying to transition to a format where we have arbitrarily complex 
Block trees, where the root of the tree is always a BlockBackend (See 
the big series by Max Reitz) and the configuration of the tree may 
become arbitrarily complex.

Simple trees may consist of just one BlockBackend and one 
BlockDriverState node, where I think we can refer to this BDS as the 
"root node," not to be confused with the BlockBackend "root." The 
BlockBackend is a relatively new invention, so it isn't actually used 
consistently everywhere yet.

In the future, we may have commands that make distinctions based on if 
you want to work on the BlockBackend, the root node under the 
blockbackend associated with a BDS, only the explicit node/BDS you 
identify, or some combination of the above semantics.

As of right now, bitmaps can be *attached* to any arbitrary node, though 
they are currently only *useful* when attached to the first child of the 
BlockBackend, the root node. It's only useful currently in cases where 
it is attached to the root because I've only proposed  patches for 
adding bitmap support to produce incrementals for Drive Backup, which 
operates only on drives/devices (the root node of a tree.)

However, in the context of migrating, it could be that we want to 
migrate any bitmaps attached to /any/ nodes, so we should be careful 
about what names we are pulling - we don't necessarily want the name of 
the root node or BlockBackend, we may want the BDS and accompanying name 
of strictly the node the bitmap is attached to.

I know other areas of the code don't provide a good example for this 
distinction, yet, but the block layer people are actively working on 
fixing that. (See also the back-and-forth reviews for what to name my 
QMP parameters in the incremental backup patches for some overview of 
this semantic transition.)

That said, We should think carefully about *which* name we want to put 
in the stream and what implications it has for migration.


(1) bdrv_get_node_name and bdrv_find_node

This would migrate bitmaps as attached to their specific BDS. This would 
mean that the node layout on the destination is either identical, or 
similar enough such that no named bitmaps are attached to a node not 
present on the destination.

This gives us precision: bitmaps may be attached lower in the tree and 
can provide more fine-grained detail for which layers have been changed 
or modified during runtime.

This also gives us fragility: In cases where we transfer, say, a complex 
tree of nodes and collapse it to a single destination drive, we'd be 
unable to migrate bitmaps not attached to the root along with it, 
because they'd have nowhere meaningful to attach.

It is perhaps somewhat unneccessary at this exact moment in time, as 
well, because bitmaps are currently only useful on root nodes.

(2) bdrv_get_device_name and bdrv_lookup_bs(device_name, NULL, errp)

This would migrate any bitmaps in a tree and attach them to the entire 
drive on the destination.

This is simpler: You just need to make sure that the root nodes have the 
same names, which is a lot easier to manage.

This matches how drive migration currently appears to work: The entire 
tree appears to be generally squashed into a single node and transferred 
cluster-by-cluster, without general consideration as to the layout of 
the local block tree. As we both know by now, none of the metadata is 
transferred, just the data.

It prevents migration of just bitmaps where you WANT the extra 
complexity: If a bitmap is attached lower in the tree, re-affixing it to 
the root of a destination tree might invalidate the semantics of what 
that bitmap was meant to track, and it may become useless.


So in summary:
using device names is probably fine for now, as it matches the current 
use case of bitmaps as well as drive migration; but using node names may 
give us more power and precision later.

I talked to Max about it, and he is leaning towards using device names 
for now and switching to node names if we decide we want that power.

(...I wonder if we could use a flag, for now, that says we're including 
DEVICE names. Later, we could add a flag that says we're using NODE 
names and add an option to toggle as the usage case sees fit.)


Are you confused yet? :D


>>
>>> +
>>> +    if (flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
>>> +        qemu_put_name(f, bdrv_dirty_bitmap_name(bitmap));
>>> +
>>> +        if (flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY) {
>>> +            qemu_put_be64(f, bdrv_dirty_bitmap_granularity(bs,
>>> bitmap));
>>> +        }
>>> +    } else {
>>> +        assert(!(flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY));
>>> +    }
>>> +
>>
>> I thought we were only migrating bitmaps with names?
>> I suppose the conditional can't hurt, but I am not clear on when we
>> won't have a bitmap name here.
> You are right, 'else' case is not possible.. Hmm. I've added it to be
> sure that format is not corrupted, when I decided to put granularity
> only with name. Wi won't have a bitmap name only when we send the same
> bitmap as on the previous send_bitmap() call. May be it will be better
> to use two separate if's without else and assert.

It's okay if it is just "paranoia," but I was just checking. It would 
make a decent assert().

>>
>>> +    qemu_put_be64(f, start_sector);
>>> +    qemu_put_be32(f, nr_sectors);
>>> +
>>> +    /* if a block is zero we need to flush here since the network
>>> +     * bandwidth is now a lot higher than the storage device bandwidth.
>>> +     * thus if we queue zero blocks we slow down the migration.
>>> +     * also, skip writing block when migrate only dirty bitmaps. */
>>> +    if (flags & DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK) {
>>> +        qemu_fflush(f);
>>> +        return;
>>> +    }
>>> +
>>> +    qemu_put_be64(f, buf_size);
>>> +    qemu_put_buffer(f, buf, buf_size);
>>> +    g_free(buf);
>>> +}
>>> +
>>> +
>>> +/* Called with iothread lock taken.  */
>>> +
>>> +static void set_dirty_tracking(void)
>>> +{
>>> +    DirtyBitmapMigBitmapState *dbms;
>>> +
>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>>> +        dbms->dirty_bitmap =
>>> +            bdrv_create_dirty_dirty_bitmap(dbms->bitmap, CHUNK_SIZE);
>>> +    }
>>> +}
>>> +
>>
>> OK: so we only have these dirty-dirty bitmaps when migration is
>> starting, which makes sense.
>>
>>> +static void unset_dirty_tracking(void)
>>> +{
>>> +    DirtyBitmapMigBitmapState *dbms;
>>> +
>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>>> +        bdrv_release_dirty_dirty_bitmap(dbms->bitmap);
>>> +    }
>>> +}
>>> +
>>
>> OK.
>>
>>> +static void init_dirty_bitmap_migration(QEMUFile *f)
>>> +{
>>> +    BlockDriverState *bs;
>>> +    BdrvDirtyBitmap *bitmap;
>>> +    DirtyBitmapMigBitmapState *dbms;
>>> +
>>> +    dirty_bitmap_mig_state.bulk_completed = false;
>>> +    dirty_bitmap_mig_state.prev_bs = NULL;
>>> +    dirty_bitmap_mig_state.prev_bitmap = NULL;
>>> +
>>> +    for (bs = bdrv_next(NULL); bs; bs = bdrv_next(bs)) {
>>> +        for (bitmap = bdrv_next_dirty_bitmap(bs, NULL); bitmap;
>>> +             bitmap = bdrv_next_dirty_bitmap(bs, bitmap)) {
>>> +            if (!bdrv_dirty_bitmap_name(bitmap)) {
>>> +                continue;
>>> +            }
>>> +
>>> +            dbms = g_new0(DirtyBitmapMigBitmapState, 1);
>>> +            dbms->bs = bs;
>>> +            dbms->bitmap = bitmap;
>>> +            dbms->total_sectors = bdrv_nb_sectors(bs);
>>> +            dbms->sectors_per_chunk = CHUNK_SIZE * 8 *
>>> +                bdrv_dirty_bitmap_granularity(dbms->bs, dbms->bitmap)
>>> +                >> BDRV_SECTOR_BITS;
>>> +
>>> + QSIMPLEQ_INSERT_TAIL(&dirty_bitmap_mig_state.dbms_list,
>>> +                                 dbms, entry);
>>> +        }
>>> +    }
>>> +}
>>> +
>>
>> OK, but see the note below for dirty_bitmap_mig_init.
> actually it is not 'init' but 'reinit' - called on every migration
> start.. Hmm. dbms_list should be cleared here before fill it again.
>>
>>> +/* Called with no lock taken.  */
>>> +static void bulk_phase_send_chunk(QEMUFile *f,
>>> DirtyBitmapMigBitmapState *dbms)
>>> +{
>>> +    uint32_t nr_sectors = MIN(dbms->total_sectors - dbms->cur_sector,
>>> +                             dbms->sectors_per_chunk);
>>
>> What about cases where nr_sectors will put us past the end of the
>> bitmap? The bitmap serialization implementation might need a touchup
>> with this in mind.
> I don't understand.. nr_sectors <=  dbms->total_sectors -
> dbms->cur_sector and it can't put us past the end...

Oh, because you take the minimum, so we don't have to worry about 
sectors_per_chunk eclipsing what we have.

Nevermind, I can't read... :(

>>
>>> +
>>> +    send_bitmap(f, dbms, dbms->cur_sector, nr_sectors);
>>> +
>>> +    dbms->cur_sector += nr_sectors;
>>> +    if (dbms->cur_sector >= dbms->total_sectors) {
>>> +        dbms->bulk_completed = true;
>>> +    }
>>> +}
>>> +
>>> +/* Called with no lock taken.  */
>>> +static void bulk_phase(QEMUFile *f, bool limit)
>>> +{
>>> +    DirtyBitmapMigBitmapState *dbms;
>>> +
>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>>> +        while (!dbms->bulk_completed) {
>>> +            bulk_phase_send_chunk(f, dbms);
>>> +            if (limit && qemu_file_rate_limit(f)) {
>>> +                return;
>>> +            }
>>> +        }
>>> +    }
>>> +
>>> +    dirty_bitmap_mig_state.bulk_completed = true;
>>> +}
>>
>> OK.
>>
>>> +
>>> +static void blk_mig_reset_dirty_cursor(void)
>>> +{
>>> +    DirtyBitmapMigBitmapState *dbms;
>>> +
>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>>> +        dbms->cur_dirty = 0;
>>> +    }
>>> +}
>>> +
>>
>> OK.
>>
>>> +/* Called with iothread lock taken.  */
>>> +static void dirty_phase_send_chunk(QEMUFile *f,
>>> DirtyBitmapMigBitmapState *dbms)
>>> +{
>>> +    uint32_t nr_sectors;
>>> +
>>> +    while (dbms->cur_dirty < dbms->total_sectors &&
>>> +           !hbitmap_get(dbms->dirty_bitmap, dbms->cur_dirty)) {
>>> +        dbms->cur_dirty += dbms->sectors_per_chunk;
>>> +    }
>>
>> OK, so we fast forward the dirty cursor while the meta-bitmap is
>> empty. Is it not worth using the HBitmapIterator here? You can reset
>> them everywhere you reset the dirty cursor, and then just fast-seek to
>> the first dirty sector.
> Yes, I've thought about it, just used simpler way (copied from
> migration/block.c) for an early version of the patch set. I will do it.

Only if it doesn't make things more complicated to look at.

>>
>>> +
>>> +    if (dbms->cur_dirty >= dbms->total_sectors) {
>>> +        return;
>>> +    }
>>> +
>>> +    nr_sectors = MIN(dbms->total_sectors - dbms->cur_dirty,
>>> +                     dbms->sectors_per_chunk);
>>
>> What happens if nr_sectors goes past the end?

Again, I misread.

>>
>>> +    send_bitmap(f, dbms, dbms->cur_dirty, nr_sectors);
>>> +    hbitmap_reset(dbms->dirty_bitmap, dbms->cur_dirty,
>>> dbms->sectors_per_chunk);
>>> +    dbms->cur_dirty += nr_sectors;
>>> +}
>>> +
>>> +/* Called with iothread lock taken.
>>> + *
>>> + * return value:
>>> + * 0: too much data for max_downtime
>>> + * 1: few enough data for max_downtime
>>> +*/
>>
>> dirty_phase below doesn't have a return value.
> rudimentary comment.. thanks.
>>
>>> +static void dirty_phase(QEMUFile *f, bool limit)
>>> +{
>>> +    DirtyBitmapMigBitmapState *dbms;
>>> +
>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>>> +        while (dbms->cur_dirty < dbms->total_sectors) {
>>> +            dirty_phase_send_chunk(f, dbms);
>>> +            if (limit && qemu_file_rate_limit(f)) {
>>> +                return;
>>> +            }
>>> +        }
>>> +    }
>>> +}
>>> +
>>
>> OK.
>>
>>> +
>>> +/* Called with iothread lock taken.  */
>>> +static void dirty_bitmap_mig_cleanup(void)
>>> +{
>>> +    DirtyBitmapMigBitmapState *dbms;
>>> +
>>> +    unset_dirty_tracking();
>>> +
>>> +    while ((dbms =
>>> QSIMPLEQ_FIRST(&dirty_bitmap_mig_state.dbms_list)) != NULL) {
>>> + QSIMPLEQ_REMOVE_HEAD(&dirty_bitmap_mig_state.dbms_list, entry);
>>> +        g_free(dbms);
>>> +    }
>>> +}
>>> +
>>
>> OK.
>>
>>> +static void dirty_bitmap_migration_cancel(void *opaque)
>>> +{
>>> +    dirty_bitmap_mig_cleanup();
>>> +}
>>> +
>>
>> OK.
>>
>>> +static int dirty_bitmap_save_iterate(QEMUFile *f, void *opaque)
>>> +{
>>> +    DPRINTF("Enter save live iterate\n");
>>> +
>>> +    blk_mig_reset_dirty_cursor();
>>
>> I suppose this is because it's easier to check if we are finished by
>> starting from sector 0 every time.
>>
>> A harder, but faster method may be: Use HBitmapIterators, but don't
>> reset them every iteration: just iterate until the end, and check that
>> the bitmap is empty. If the meta bitmap is empty, the dirty phase is
>> complete. If the meta bitmap is NOT empty, reset the HBI and continue
>> allowing iterations over the dirty phase.
> Ok, will do.
>>
>>> +
>>> +    if (dirty_bitmap_mig_state.bulk_completed) {
>>> +        qemu_mutex_lock_iothread();
>>> +        dirty_phase(f, true);
>>> +        qemu_mutex_unlock_iothread();
>>> +    } else {
>>> +        bulk_phase(f, true);
>>> +    }
>>> +
>>> +    qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>>> +
>>> +    return dirty_bitmap_mig_state.bulk_completed;
>>> +}
>>> +
>>> +/* Called with iothread lock taken.  */
>>> +
>>> +static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
>>> +{
>>> +    DirtyBitmapMigBitmapState *dbms;
>>> +    DPRINTF("Enter save live complete\n");
>>> +
>>> +    if (!dirty_bitmap_mig_state.bulk_completed) {
>>> +        bulk_phase(f, false);
>>> +    }
>>
>> [Not expertly familiar with savevm:] Under what conditions can this
>> happen?
> This can happen. save_complete will happen when savevm decide that
> pending data size to send is small enough. It was the case for my bugfix
> for migration/block.c about pending. To prevent save_complete when bulk
> phase isn't completed, save_pending returns (in my bugfix for
> migration/block.c) big value. Here I decided to make more honest
> save_pending, so I need to complete (if it doesn't) bulk phase in
> save_complete.

OK, Gotcha.

>>
>>> +
>>> +    blk_mig_reset_dirty_cursor();
>>> +    dirty_phase(f, false);
>>> +
>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>>> +        uint8_t flags = DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME |
>>> +                        DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME |
>>> +                        DIRTY_BITMAP_MIG_FLAG_ENABLED;
>>> +
>>> +        qemu_put_byte(f, flags);
>>> +        qemu_put_name(f, bdrv_get_device_name(dbms->bs));
>>> +        qemu_put_name(f, bdrv_dirty_bitmap_name(dbms->bitmap));
>>> +        qemu_put_byte(f, bdrv_dirty_bitmap_enabled(dbms->bitmap));
>>> +    }
>>> +
>>> +    qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>>> +
>>> +    DPRINTF("Dirty bitmaps migration completed\n");
>>> +
>>> +    dirty_bitmap_mig_cleanup();
>>> +    return 0;
>>> +}
>>> +
>>
>> I suppose we don't need a flag that distinctly SAYS this is the end
>> section, since we can tell by omission of
>> DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK or ZERO_CHUNK.
> Hmm. I think it simplifies the logic (to use EOS after each section).
> And the same approach is in migration/block.c.. It's a question about
> which format is better:  "Each section for dirty_bitmap_load ends with
> EOS" or "Each section for dirty_bitmap_load ends with EOS except the
> last one. The last one may be recognized by absent NORMAL_CHUNK and
> ZERO_CHUNK"
>>
>>> +static uint64_t dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
>>> +                                          uint64_t max_size)
>>> +{
>>> +    DirtyBitmapMigBitmapState *dbms;
>>> +    uint64_t pending = 0;
>>> +
>>> +    qemu_mutex_lock_iothread();
>>> +
>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>>> +        uint64_t sectors = hbitmap_count(dbms->dirty_bitmap);
>>> +        if (!dbms->bulk_completed) {
>>> +            sectors += dbms->total_sectors - dbms->cur_sector;
>>> +        }
>>> +        pending += bdrv_dirty_bitmap_data_size(dbms->bitmap, sectors);
>>> +    }
>>> +
>>> +    qemu_mutex_unlock_iothread();
>>> +
>>> +    DPRINTF("Enter save live pending %" PRIu64 ", max: %" PRIu64 "\n",
>>> +            pending, max_size);
>>> +    return pending;
>>> +}
>>> +
>>
>> OK.
>>
>>> +static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
>>> +{
>>> +    int flags;
>>> +
>>> +    static char device_name[256], bitmap_name[256];
>>> +    static BlockDriverState *bs;
>>> +    static BdrvDirtyBitmap *bitmap;
>>> +
>>> +    uint8_t *buf;
>>> +    uint64_t first_sector;
>>> +    uint32_t  nr_sectors;
>>> +    int ret;
>>> +
>>> +    DPRINTF("load start\n");
>>> +
>>> +    do {
>>> +        flags = qemu_get_byte(f);
>>> +        DPRINTF("flags: %x\n", flags);
>>> +
>>> +        if (flags & DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME) {
>>> +            qemu_get_name(f, device_name);
>>> +            bs = bdrv_find(device_name);
>>
>> Similar to the above confusion, you may want bdrv_lookup_bs or
>> similar, since we're going to be looking for BDS nodes instead of
>> "devices."
> In this case, should it be changed in migration/block.c too?

[See discussion above!]

>>
>>> +            if (!bs) {
>>> +                fprintf(stderr, "Error: unknown block device '%s'\n",
>>> +                        device_name);
>>> +                return -EINVAL;
>>> +            }
>>> +        }
>>> +
>>> +        if (flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
>>> +            if (!bs) {
>>> +                fprintf(stderr, "Error: block device name is not
>>> set\n");
>>> +                return -EINVAL;
>>> +            }
>>> +
>>> +            qemu_get_name(f, bitmap_name);
>>> +            bitmap = bdrv_find_dirty_bitmap(bs, bitmap_name);
>>> +            if (flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY) {
>>> +                /* First chunk from this bitmap */
>>> +                uint64_t granularity = qemu_get_be64(f);
>>> +                if (!bitmap) {
>>> +                    Error *local_err = NULL;
>>> +                    bitmap = bdrv_create_dirty_bitmap(bs, granularity,
>>> + bitmap_name,
>>> + &local_err);
>>> +                    if (!bitmap) {
>>> +                        error_report("%s",
>>> error_get_pretty(local_err));
>>> +                        error_free(local_err);
>>> +                        return -EINVAL;
>>> +                    }
>>> +                } else {
>>> +                    uint64_t dest_granularity =
>>> +                        bdrv_dirty_bitmap_granularity(bs, bitmap);
>>> +                    if (dest_granularity != granularity) {
>>> +                        fprintf(stderr,
>>> +                                "Error: "
>>> +                                "Migrated bitmap granularity (%"
>>> PRIu64 ") "
>>> +                                "is not match with destination
>>> bitmap '%s' "
>>> +                                "granularity (%" PRIu64 ")\n",
>>> +                                granularity,
>>> +                                bitmap_name,
>>> +                                dest_granularity);
>>> +                        return -EINVAL;
>>> +                    }
>>> +                }
>>> +                bdrv_disable_dirty_bitmap(bitmap);
>>> +            }
>>> +            if (!bitmap) {
>>> +                fprintf(stderr, "Error: unknown dirty bitmap "
>>> +                        "'%s' for block device '%s'\n",
>>> +                        bitmap_name, device_name);
>>> +                return -EINVAL;
>>> +            }
>>> +        }
>>> +
>>> +        if (flags & DIRTY_BITMAP_MIG_FLAG_ENABLED) {
>>> +            bool enabled;
>>> +            if (!bitmap) {
>>> +                fprintf(stderr, "Error: dirty bitmap name is not
>>> set\n");
>>> +                return -EINVAL;
>>> +            }
>>> +            bdrv_dirty_bitmap_deserialize_finish(bitmap);
>>> +            /* complete migration */
>>> +            enabled = qemu_get_byte(f);
>>> +            if (enabled) {
>>> +                bdrv_enable_dirty_bitmap(bitmap);
>>> +            }
>>> +        }
>>
>> Oh, so you use the ENABLED flag to show that migration is over.
> Yes, it was bad idea..
>> If we are going to commit to a stream format for bitmaps, though,
>> maybe it's best to actually create a "COMPLETION BLOCK" flag and then
>> split this function into two pieces:
>>
>> (1) The part that receives regular / zero blocks, and
>> (2) The part that receives completion data.
>>
>> That way, if we change the properties that bitmaps have down the line,
>> we aren't reliant on literally the "enabled" flag to decide what to do.
>>
>> Also, it might help make this fairly long function a little smaller
>> and more readable.
> Ok.
>>
>>> +
>>> +        if (flags & (DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK |
>>> +                     DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK)) {
>>> +            if (!bs) {
>>> +                fprintf(stderr, "Error: block device name is not
>>> set\n");
>>> +                return -EINVAL;
>>> +            }
>>> +            if (!bitmap) {
>>> +                fprintf(stderr, "Error: dirty bitmap name is not
>>> set\n");
>>> +                return -EINVAL;
>>> +            }
>>> +
>>> +            first_sector = qemu_get_be64(f);
>>> +            nr_sectors = qemu_get_be32(f);
>>> +            DPRINTF("chunk: %lu %u\n", first_sector, nr_sectors);
>>> +
>>> +
>>> +            if (flags & DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK) {
>>> +                bdrv_dirty_bitmap_deserialize_part0(bitmap,
>>> first_sector,
>>> + nr_sectors);
>>> +            } else {
>>> +                uint64_t buf_size = qemu_get_be64(f);
>>> +                uint64_t needed_size =
>>> +                    bdrv_dirty_bitmap_data_size(bitmap, nr_sectors);
>>> +
>>> +                if (needed_size > buf_size) {
>>> +                    fprintf(stderr,
>>> +                            "Error: Migrated bitmap granularity is
>>> not "
>>> +                            "match with destination bitmap
>>> granularity\n");
>>> +                    return -EINVAL;
>>> +                }
>>> +
>>
>> "Migrated bitmap granularity doesn't match the destination bitmap
>> granularity" perhaps.
>>
>>> +                buf = g_malloc(buf_size);
>>> +                qemu_get_buffer(f, buf, buf_size);
>>> +                bdrv_dirty_bitmap_deserialize_part(bitmap, buf,
>>> + first_sector,
>>> +                                                   nr_sectors);
>>> +                g_free(buf);
>>> +            }
>>> +        }
>>> +
>>> +        ret = qemu_file_get_error(f);
>>> +        if (ret != 0) {
>>> +            return ret;
>>> +        }
>>> +    } while (!(flags & DIRTY_BITMAP_MIG_FLAG_EOS));
>>> +
>>> +    DPRINTF("load finish\n");
>>> +    return 0;
>>> +}
>>> +
>>> +static void dirty_bitmap_set_params(const MigrationParams *params,
>>> void *opaque)
>>> +{
>>> +    dirty_bitmap_mig_state.migration_enable = params->dirty;
>>> +}
>>> +
>>
>> OK; though I am not immediately aware of what changes need to happen
>> to accommodate Eric's suggestions.
> This function will be dropped in v3.
>>
>>> +static bool dirty_bitmap_is_active(void *opaque)
>>> +{
>>> +    return dirty_bitmap_mig_state.migration_enable == 1;
>>> +}
>>> +
>>
>> OK.
>>
>>> +static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
>>> +{
>>> +    init_dirty_bitmap_migration(f);
>>> +
>>> +    qemu_mutex_lock_iothread();
>>> +    /* start track dirtyness of dirty bitmaps */
>>> +    set_dirty_tracking();
>>> +    qemu_mutex_unlock_iothread();
>>> +
>>> +    blk_mig_reset_dirty_cursor();
>>> +    qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>
>> OK; see dirty_bitmap_mig_init below, though.
>>
>>> +static SaveVMHandlers savevm_block_handlers = {
>>> +    .set_params = dirty_bitmap_set_params,
>>> +    .save_live_setup = dirty_bitmap_save_setup,
>>> +    .save_live_iterate = dirty_bitmap_save_iterate,
>>> +    .save_live_complete = dirty_bitmap_save_complete,
>>> +    .save_live_pending = dirty_bitmap_save_pending,
>>> +    .load_state = dirty_bitmap_load,
>>> +    .cancel = dirty_bitmap_migration_cancel,
>>> +    .is_active = dirty_bitmap_is_active,
>>> +};
>>> +
>>> +void dirty_bitmap_mig_init(void)
>>> +{
>>> +    QSIMPLEQ_INIT(&dirty_bitmap_mig_state.dbms_list);
>>
>> Maybe I haven't looked thoroughly enough yet, but it's weird that part
>> of the dirty_bitmap_mig_state is initialized here, and the rest of it
>> in init_dirty_bitmap_migration. I'd prefer to keep it all together, if
>> possible.
> dirty_bitmap_mig_init is called one time when qemu starts. QSIMPLEQ_INIT
> should be called once. dirty_bitmap_save_setup is called on every
> migration start, it's like 'reinitialize'.
>>
>>> +
>>> +    register_savevm_live(NULL, "dirty-bitmap", 0, 1,
>>> &savevm_block_handlers,
>>> +                         &dirty_bitmap_mig_state);
>>> +}
>>
>> OK.
>>
>>> diff --git a/vl.c b/vl.c
>>> index a824a7d..dee7220 100644
>>> --- a/vl.c
>>> +++ b/vl.c
>>> @@ -4184,6 +4184,7 @@ int main(int argc, char **argv, char **envp)
>>>
>>>       blk_mig_init();
>>>       ram_mig_init();
>>> +    dirty_bitmap_mig_init();
>>>
>>>       /* If the currently selected machine wishes to override the
>>> units-per-bus
>>>        * property of its default HBA interface type, do so now. */
>>>
>>
>> Hm, since dirty bitmaps are a sub-component of the block layer, would
>> it not make sense to put this hook under blk_mig_init, perhaps?
> IMHO the reason to put it here is to keep all
> register_savevm_live-entities in one place.

If you still feel that way I won't withhold my R-b, but there are 
already other cases such as ppc_spapr_init which are not in this general 
area of vl.c.

Plus the dozens of devices that use register_savevm as a wrapper to 
register_savevm_live, so maybe consolidating calls to this function 
isn't that important.

>>
>>
>> Overall this looks very clean compared to the intermingled format in
>> V1, and the code is organized pretty well. Just a few minor comments,
>> and I'd like to get the opinion of the migration maintainers, but I am
>> happy. Sorry it took me so long to review, please feel free to let me
>> know if you disagree with any of my opinions :)
>>
>> Thank you,
>> --John
>
> Thank you for reviewing my series)
>

Yup. Hopefully I didn't miss too much that will irritate the Migration 
overlords.

Once you respin on top of v12, I can run some thorough migration tests 
on it (perhaps over a weekend) and verify that it survives a couple 
hundred migrations without any kind of integrity loss.

This is what makes sense to me right now, anyway.

Do you think you'll be including the bitmap checksum in the 
BlockDirtyInfo command? That'd be convenient for iotests.

Thank you,
--John.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 8/8] migration: add migration/dirty-bitmap.c
  2015-02-13 20:22       ` John Snow
@ 2015-02-16 12:06         ` Vladimir Sementsov-Ogievskiy
  2015-02-16 18:18           ` John Snow
  0 siblings, 1 reply; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-02-16 12:06 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: kwolf, Peter Maydell,
	Juan quin >> Juan Jose Quintela Carreira,
	Dr. David Alan Gilbert, stefanha, den, amit Shah, pbonzini

On 13.02.2015 23:22, John Snow wrote:
>
>
> On 02/13/2015 03:19 AM, Vladimir Sementsov-Ogievskiy wrote:
>> On 11.02.2015 00:33, John Snow wrote:
>>> Peter Maydell: What's the right way to license a file as copied from a
>>> previous version? See below, please;
>>>
>>> Max, Markus: ctrl+f "bdrv_get_device_name" and let me know what you
>>> think, if you would.
>>>
>>> Juan, Amit, David: Copying migration maintainers.
>>>
>>> On 01/27/2015 05:56 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>> Live migration of dirty bitmaps. Only named dirty bitmaps are 
>>>> migrated.
>>>> If destination qemu is already containing a dirty bitmap with the same
>>>> name as a migrated bitmap, then their granularities should be the 
>>>> same,
>>>> otherwise the error will be generated. If destination qemu doesn't
>>>> contain such bitmap it will be created.
>>>>
>>>> format:
>>>>
>>>> 1 byte: flags
>>>>
>>>> [ 1 byte: node name size ] \  flags & DEVICE_NAME
>>>> [ n bytes: node name     ] /
>>>>
>>>> [ 1 byte: bitmap name size ]       \
>>>> [ n bytes: bitmap name     ]       | flags & BITMAP_NAME
>>>> [ [ be64: granularity    ] ]  flags & GRANULARITY
>>>>
>>>> [ 1 byte: bitmap enabled bit ] flags & ENABLED
>>>>
>>>> [ be64: start sector      ] \ flags & (NORMAL_CHUNK | ZERO_CHUNK)
>>>> [ be32: number of sectors ] /
>>>>
>>>> [ be64: buffer size ] \ flags & NORMAL_CHUNK
>>>> [ n bytes: buffer   ] /
>>>>
>>>> The last chunk should contain flags & EOS. The chunk may skip device
>>>> and/or bitmap names, assuming them to be the same with the previous
>>>> chunk. GRANULARITY is sent with the first chunk for the bitmap. 
>>>> ENABLED
>>>> bit is sent in the end of "complete" stage of migration. So when
>>>> destination gets ENABLED flag it should deserialize_finish the bitmap
>>>> and set its enabled bit to corresponding value.
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>>>> ---
>>>>   include/migration/block.h |   1 +
>>>>   migration/Makefile.objs   |   2 +-
>>>>   migration/dirty-bitmap.c  | 606
>>>> ++++++++++++++++++++++++++++++++++++++++++++++
>>>>   vl.c                      |   1 +
>>>>   4 files changed, 609 insertions(+), 1 deletion(-)
>>>>   create mode 100644 migration/dirty-bitmap.c
>>>>
>>>> diff --git a/include/migration/block.h b/include/migration/block.h
>>>> index ffa8ac0..566bb9f 100644
>>>> --- a/include/migration/block.h
>>>> +++ b/include/migration/block.h
>>>> @@ -14,6 +14,7 @@
>>>>   #ifndef BLOCK_MIGRATION_H
>>>>   #define BLOCK_MIGRATION_H
>>>>
>>>> +void dirty_bitmap_mig_init(void);
>>>>   void blk_mig_init(void);
>>>>   int blk_mig_active(void);
>>>>   uint64_t blk_mig_bytes_transferred(void);
>>>
>>> OK.
>>>
>>>> diff --git a/migration/Makefile.objs b/migration/Makefile.objs
>>>> index d929e96..9adfda9 100644
>>>> --- a/migration/Makefile.objs
>>>> +++ b/migration/Makefile.objs
>>>> @@ -6,5 +6,5 @@ common-obj-y += xbzrle.o
>>>>   common-obj-$(CONFIG_RDMA) += rdma.o
>>>>   common-obj-$(CONFIG_POSIX) += exec.o unix.o fd.o
>>>>
>>>> -common-obj-y += block.o
>>>> +common-obj-y += block.o dirty-bitmap.o
>>>>
>>>
>>> OK.
>>>
>>>> diff --git a/migration/dirty-bitmap.c b/migration/dirty-bitmap.c
>>>> new file mode 100644
>>>> index 0000000..8621218
>>>> --- /dev/null
>>>> +++ b/migration/dirty-bitmap.c
>>>> @@ -0,0 +1,606 @@
>>>> +/*
>>>> + * QEMU dirty bitmap migration
>>>> + *
>>>> + * derived from migration/block.c
>>>> + *
>>>> + * Author:
>>>> + * Sementsov-Ogievskiy Vladimir <vsementsov@parallels.com>
>>>> + *
>>>> + * original copyright message:
>>>> + *
>>>> =====================================================================
>>>> + * Copyright IBM, Corp. 2009
>>>> + *
>>>> + * Authors:
>>>> + *  Liran Schour <lirans@il.ibm.com>
>>>> + *
>>>> + * This work is licensed under the terms of the GNU GPL, version 
>>>> 2. See
>>>> + * the COPYING file in the top-level directory.
>>>> + *
>>>> + * Contributions after 2012-01-13 are licensed under the terms of the
>>>> + * GNU GPL, version 2 or (at your option) any later version.
>>>> + *
>>>> =====================================================================
>>>> + */
>>>> +
>>>
>>> Not super familiar with the right way to do licensing here; it's
>>> possible you may not need to copy the original here, but I'm not sure.
>>> You will want to make it clear what license applies to /your/ work, I
>>> think. Maybe Peter Maydell can clue us in.
>>>
>>>> +#include "block/block.h"
>>>> +#include "qemu/main-loop.h"
>>>> +#include "qemu/error-report.h"
>>>> +#include "migration/block.h"
>>>> +#include "migration/migration.h"
>>>> +#include "qemu/hbitmap.h"
>>>> +#include <assert.h>
>>>> +
>>>> +#define CHUNK_SIZE                       (1 << 20)
>>>> +
>>>> +#define DIRTY_BITMAP_MIG_FLAG_EOS           0x01
>>>> +#define DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK  0x02
>>>> +#define DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK    0x04
>>>> +#define DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME   0x08
>>>> +#define DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME   0x10
>>>> +#define DIRTY_BITMAP_MIG_FLAG_GRANULARITY   0x20
>>>> +#define DIRTY_BITMAP_MIG_FLAG_ENABLED       0x40
>>>> +/* flags should be <= 0xff */
>>>> +
>>>
>>> We should give ourselves a little breathing room with the flags, since
>>> we've only got room for one more.
>> Ok. Will one more byte be enough?
>
> I should hope so. If you do add a completion chunk and flag, that 
> fills up the first byte completely, so having a second byte is a good 
> idea.
>
> I might recommend reserving the last bit of the second byte to be a 
> flag such as DIRTY_BITMAP_EXTRA_FLAGS that indicates the presence of 
> additional byte(s) of flags, to be determined later, if we ever need 
> them, but two bytes for now should be sufficient.
Ok.
>
>>>
>>>> +/* #define DEBUG_DIRTY_BITMAP_MIGRATION */
>>>> +
>>>> +#ifdef DEBUG_DIRTY_BITMAP_MIGRATION
>>>> +#define DPRINTF(fmt, ...) \
>>>> +    do { printf("dirty_migration: " fmt, ## __VA_ARGS__); } while (0)
>>>> +#else
>>>> +#define DPRINTF(fmt, ...) \
>>>> +    do { } while (0)
>>>> +#endif
>>>> +
>>>> +typedef struct DirtyBitmapMigBitmapState {
>>>> +    /* Written during setup phase. */
>>>> +    BlockDriverState *bs;
>>>> +    BdrvDirtyBitmap *bitmap;
>>>> +    HBitmap *dirty_bitmap;
>>>
>>> For my own sanity, I'd really prefer "bitmap" and "meta_bitmap" here;
>>> "dirty_bitmap" is often used as a synonym (outside of this file) to
>>> refer to the BdrvDirtyBitmap in general, so it's usage here can be
>>> somewhat confusing.
>>>
>>> I'd appreciate "dirty_dirty_bitmap" as in your previous patch for
>>> consistency, or "meta_bitmap" as I recommend.
>>>
>> Ok
>>>> +    int64_t total_sectors;
>>>> +    uint64_t sectors_per_chunk;
>>>> +    QSIMPLEQ_ENTRY(DirtyBitmapMigBitmapState) entry;
>>>> +
>>>> +    /* For bulk phase. */
>>>> +    bool bulk_completed;
>>>> +    int64_t cur_sector;
>>>> +    bool granularity_sent;
>>>> +
>>>> +    /* For dirty phase. */
>>>> +    int64_t cur_dirty;
>>>> +} DirtyBitmapMigBitmapState;
>>>> +
>>>> +typedef struct DirtyBitmapMigState {
>>>> +    int migration_enable;
>>>> +    QSIMPLEQ_HEAD(dbms_list, DirtyBitmapMigBitmapState) dbms_list;
>>>> +
>>>> +    bool bulk_completed;
>>>> +
>>>> +    /* for send_bitmap() */
>>>> +    BlockDriverState *prev_bs;
>>>> +    BdrvDirtyBitmap *prev_bitmap;
>>>> +} DirtyBitmapMigState;
>>>> +
>>>> +static DirtyBitmapMigState dirty_bitmap_mig_state;
>>>> +
>>>> +/* read name from qemu file:
>>>> + * format:
>>>> + * 1 byte : len = name length (<256)
>>>> + * len bytes : name without last zero byte
>>>> + *
>>>> + * name should point to the buffer >= 256 bytes length
>>>> + */
>>>> +static char *qemu_get_name(QEMUFile *f, char *name)
>>>> +{
>>>> +    int len = qemu_get_byte(f);
>>>> +    qemu_get_buffer(f, (uint8_t *)name, len);
>>>> +    name[len] = '\0';
>>>> +
>>>> +    DPRINTF("get name: %d %s\n", len, name);
>>>> +
>>>> +    return name;
>>>> +}
>>>> +
>>>
>>> OK. Maybe these could be "qemu_put_string" or "qemu_get_string" and
>>> added to qemu-file.c so others can use them.
>> If no objections for sharing this format, I'll do it.
>>>
>>>> +/* write name to qemu file:
>>>> + * format:
>>>> + * same as for qemu_get_name
>>>> + *
>>>> + * maximum name length is 255
>>>> + */
>>>> +static void qemu_put_name(QEMUFile *f, const char *name)
>>>> +{
>>>> +    int len = strlen(name);
>>>> +
>>>> +    DPRINTF("put name: %d %s\n", len, name);
>>>> +
>>>> +    assert(len < 256);
>>>> +    qemu_put_byte(f, len);
>>>> +    qemu_put_buffer(f, (const uint8_t *)name, len);
>>>> +}
>>>> +
>>>
>>> OK.
>>>
>>>> +static void send_bitmap(QEMUFile *f, DirtyBitmapMigBitmapState *dbms,
>>>> +                        uint64_t start_sector, uint32_t nr_sectors)
>>>> +{
>>>> +    BlockDriverState *bs = dbms->bs;
>>>> +    BdrvDirtyBitmap *bitmap = dbms->bitmap;
>>>> +    uint8_t flags = 0;
>>>> +    /* align for buffer_is_zero() */
>>>> +    uint64_t align = 4 * sizeof(long);
>>>> +    uint64_t buf_size =
>>>> +        (bdrv_dirty_bitmap_data_size(bitmap, nr_sectors) + align - 
>>>> 1) &
>>>> +        ~(align - 1);
>>>> +    uint8_t *buf = g_malloc0(buf_size);
>>>> +
>>>> +    bdrv_dirty_bitmap_serialize_part(bitmap, buf, start_sector,
>>>> nr_sectors);
>>>> +
>>>> +    if (buffer_is_zero(buf, buf_size)) {
>>>> +        g_free(buf);
>>>> +        buf = NULL;
>>>> +        flags |= DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK;
>>>> +    } else {
>>>> +        flags |= DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK;
>>>> +    }
>>>> +
>>>> +    if (bs != dirty_bitmap_mig_state.prev_bs) {
>>>> +        dirty_bitmap_mig_state.prev_bs = bs;
>>>> +        flags |= DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME;
>>>> +    }
>>>> +
>>>> +    if (bitmap != dirty_bitmap_mig_state.prev_bitmap) {
>>>> +        dirty_bitmap_mig_state.prev_bitmap = bitmap;
>>>> +        flags |= DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME;
>>>> +    }
>>>
>>> OK, so we use the current bs/bitmap under consideration to detect if
>>> we have switched context, and put the names in the stream when it
>>> happens. OK.
>>>
>>>> +
>>>> +    if (dbms->granularity_sent == 0) {
>>>> +        dbms->granularity_sent = 1;
>>>> +        flags |= DIRTY_BITMAP_MIG_FLAG_GRANULARITY;
>>>> +    }
>>>> +
>>>> +    DPRINTF("Enter send_bitmap"
>>>> +            "\n   flags:        %x"
>>>> +            "\n   start_sector: %" PRIu64
>>>> +            "\n   nr_sectors:   %" PRIu32
>>>> +            "\n   data_size:    %" PRIu64 "\n",
>>>> +            flags, start_sector, nr_sectors, buf_size);
>>>> +
>>>> +    qemu_put_byte(f, flags);
>>>> +
>>>> +    if (flags & DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME) {
>>>> +        qemu_put_name(f, bdrv_get_device_name(bs));
>>>> +    }
>>>
>>> I am still not fully clear on this myself, but I think we are phasing
>>> out bdrv_get_device_name. In the context of bitmaps, we are mostly
>>> likely using them one-per-tree, but they /can/ be attached
>>> one-per-node, so we shouldn't be trying to get the device name here,
>>> but rather, the node-name.
>>>
>>> I /think/ we may want to be using bdrv_get_node_name, but I think it
>>> is not currently true that all nodes WILL be named ... I am CC'ing
>>> Markus Armbruster and Max Reitz for (A) A refresher course and (B)
>>> Opinions on what function call might make sense here, given that we
>>> wish to migrate bitmaps attached to arbitrary nodes.
>> Hmm.. I'm not familiar with hierarchy of nodes and devices. As I
>> understand, both command_line- and qmp- created drives are created
>> through blockdev_init, which creates both blk(device) and bs(node)
>> through blk_new_with_bs.. Am I right? Also, bdrv_get_device_name is used
>> in migration/block.c.
>
> Now that I'm more awake, here's a better rundown of what's going on:
>
> It's something that is a little bit in flux right now, unfortunately. 
> We're trying to transition to a format where we have arbitrarily 
> complex Block trees, where the root of the tree is always a 
> BlockBackend (See the big series by Max Reitz) and the configuration 
> of the tree may become arbitrarily complex.
>
> Simple trees may consist of just one BlockBackend and one 
> BlockDriverState node, where I think we can refer to this BDS as the 
> "root node," not to be confused with the BlockBackend "root." The 
> BlockBackend is a relatively new invention, so it isn't actually used 
> consistently everywhere yet.
>
> In the future, we may have commands that make distinctions based on if 
> you want to work on the BlockBackend, the root node under the 
> blockbackend associated with a BDS, only the explicit node/BDS you 
> identify, or some combination of the above semantics.
>
> As of right now, bitmaps can be *attached* to any arbitrary node, 
> though they are currently only *useful* when attached to the first 
> child of the BlockBackend, the root node. It's only useful currently 
> in cases where it is attached to the root because I've only proposed  
> patches for adding bitmap support to produce incrementals for Drive 
> Backup, which operates only on drives/devices (the root node of a tree.)
>
> However, in the context of migrating, it could be that we want to 
> migrate any bitmaps attached to /any/ nodes, so we should be careful 
> about what names we are pulling - we don't necessarily want the name 
> of the root node or BlockBackend, we may want the BDS and accompanying 
> name of strictly the node the bitmap is attached to.
>
> I know other areas of the code don't provide a good example for this 
> distinction, yet, but the block layer people are actively working on 
> fixing that. (See also the back-and-forth reviews for what to name my 
> QMP parameters in the incremental backup patches for some overview of 
> this semantic transition.)
>
> That said, We should think carefully about *which* name we want to put 
> in the stream and what implications it has for migration.
>
>
> (1) bdrv_get_node_name and bdrv_find_node
>
> This would migrate bitmaps as attached to their specific BDS. This 
> would mean that the node layout on the destination is either 
> identical, or similar enough such that no named bitmaps are attached 
> to a node not present on the destination.
>
> This gives us precision: bitmaps may be attached lower in the tree and 
> can provide more fine-grained detail for which layers have been 
> changed or modified during runtime.
>
> This also gives us fragility: In cases where we transfer, say, a 
> complex tree of nodes and collapse it to a single destination drive, 
> we'd be unable to migrate bitmaps not attached to the root along with 
> it, because they'd have nowhere meaningful to attach.
>
> It is perhaps somewhat unneccessary at this exact moment in time, as 
> well, because bitmaps are currently only useful on root nodes.
>
> (2) bdrv_get_device_name and bdrv_lookup_bs(device_name, NULL, errp)
>
> This would migrate any bitmaps in a tree and attach them to the entire 
> drive on the destination.
>
> This is simpler: You just need to make sure that the root nodes have 
> the same names, which is a lot easier to manage.
>
> This matches how drive migration currently appears to work: The entire 
> tree appears to be generally squashed into a single node and 
> transferred cluster-by-cluster, without general consideration as to 
> the layout of the local block tree. As we both know by now, none of 
> the metadata is transferred, just the data.
>
> It prevents migration of just bitmaps where you WANT the extra 
> complexity: If a bitmap is attached lower in the tree, re-affixing it 
> to the root of a destination tree might invalidate the semantics of 
> what that bitmap was meant to track, and it may become useless.
>
>
> So in summary:
> using device names is probably fine for now, as it matches the current 
> use case of bitmaps as well as drive migration; but using node names 
> may give us more power and precision later.
>
> I talked to Max about it, and he is leaning towards using device names 
> for now and switching to node names if we decide we want that power.
>
> (...I wonder if we could use a flag, for now, that says we're 
> including DEVICE names. Later, we could add a flag that says we're 
> using NODE names and add an option to toggle as the usage case sees fit.)
>
>
> Are you confused yet? :D
O, thanks for the explanation). Are we really need this flag? As Markus 
wrote, nodes and devices are sharing namespaces.. We can use 
bdrv_lookup_bs(name, name, errp)..

Also, we can, for example, send bitmaps as follows:

if node has name - send bitmap with this name
if node is root, but hasn't name - send it with blk name
otherwise - don't send the bitmap
>
>
>>>
>>>> +
>>>> +    if (flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
>>>> +        qemu_put_name(f, bdrv_dirty_bitmap_name(bitmap));
>>>> +
>>>> +        if (flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY) {
>>>> +            qemu_put_be64(f, bdrv_dirty_bitmap_granularity(bs,
>>>> bitmap));
>>>> +        }
>>>> +    } else {
>>>> +        assert(!(flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY));
>>>> +    }
>>>> +
>>>
>>> I thought we were only migrating bitmaps with names?
>>> I suppose the conditional can't hurt, but I am not clear on when we
>>> won't have a bitmap name here.
>> You are right, 'else' case is not possible.. Hmm. I've added it to be
>> sure that format is not corrupted, when I decided to put granularity
>> only with name. Wi won't have a bitmap name only when we send the same
>> bitmap as on the previous send_bitmap() call. May be it will be better
>> to use two separate if's without else and assert.
>
> It's okay if it is just "paranoia," but I was just checking. It would 
> make a decent assert().
>
>>>
>>>> +    qemu_put_be64(f, start_sector);
>>>> +    qemu_put_be32(f, nr_sectors);
>>>> +
>>>> +    /* if a block is zero we need to flush here since the network
>>>> +     * bandwidth is now a lot higher than the storage device 
>>>> bandwidth.
>>>> +     * thus if we queue zero blocks we slow down the migration.
>>>> +     * also, skip writing block when migrate only dirty bitmaps. */
>>>> +    if (flags & DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK) {
>>>> +        qemu_fflush(f);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    qemu_put_be64(f, buf_size);
>>>> +    qemu_put_buffer(f, buf, buf_size);
>>>> +    g_free(buf);
>>>> +}
>>>> +
>>>> +
>>>> +/* Called with iothread lock taken.  */
>>>> +
>>>> +static void set_dirty_tracking(void)
>>>> +{
>>>> +    DirtyBitmapMigBitmapState *dbms;
>>>> +
>>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, 
>>>> entry) {
>>>> +        dbms->dirty_bitmap =
>>>> +            bdrv_create_dirty_dirty_bitmap(dbms->bitmap, CHUNK_SIZE);
>>>> +    }
>>>> +}
>>>> +
>>>
>>> OK: so we only have these dirty-dirty bitmaps when migration is
>>> starting, which makes sense.
>>>
>>>> +static void unset_dirty_tracking(void)
>>>> +{
>>>> +    DirtyBitmapMigBitmapState *dbms;
>>>> +
>>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, 
>>>> entry) {
>>>> +        bdrv_release_dirty_dirty_bitmap(dbms->bitmap);
>>>> +    }
>>>> +}
>>>> +
>>>
>>> OK.
>>>
>>>> +static void init_dirty_bitmap_migration(QEMUFile *f)
>>>> +{
>>>> +    BlockDriverState *bs;
>>>> +    BdrvDirtyBitmap *bitmap;
>>>> +    DirtyBitmapMigBitmapState *dbms;
>>>> +
>>>> +    dirty_bitmap_mig_state.bulk_completed = false;
>>>> +    dirty_bitmap_mig_state.prev_bs = NULL;
>>>> +    dirty_bitmap_mig_state.prev_bitmap = NULL;
>>>> +
>>>> +    for (bs = bdrv_next(NULL); bs; bs = bdrv_next(bs)) {
>>>> +        for (bitmap = bdrv_next_dirty_bitmap(bs, NULL); bitmap;
>>>> +             bitmap = bdrv_next_dirty_bitmap(bs, bitmap)) {
>>>> +            if (!bdrv_dirty_bitmap_name(bitmap)) {
>>>> +                continue;
>>>> +            }
>>>> +
>>>> +            dbms = g_new0(DirtyBitmapMigBitmapState, 1);
>>>> +            dbms->bs = bs;
>>>> +            dbms->bitmap = bitmap;
>>>> +            dbms->total_sectors = bdrv_nb_sectors(bs);
>>>> +            dbms->sectors_per_chunk = CHUNK_SIZE * 8 *
>>>> +                bdrv_dirty_bitmap_granularity(dbms->bs, dbms->bitmap)
>>>> +                >> BDRV_SECTOR_BITS;
>>>> +
>>>> + QSIMPLEQ_INSERT_TAIL(&dirty_bitmap_mig_state.dbms_list,
>>>> +                                 dbms, entry);
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>
>>> OK, but see the note below for dirty_bitmap_mig_init.
>> actually it is not 'init' but 'reinit' - called on every migration
>> start.. Hmm. dbms_list should be cleared here before fill it again.
>>>
>>>> +/* Called with no lock taken.  */
>>>> +static void bulk_phase_send_chunk(QEMUFile *f,
>>>> DirtyBitmapMigBitmapState *dbms)
>>>> +{
>>>> +    uint32_t nr_sectors = MIN(dbms->total_sectors - dbms->cur_sector,
>>>> +                             dbms->sectors_per_chunk);
>>>
>>> What about cases where nr_sectors will put us past the end of the
>>> bitmap? The bitmap serialization implementation might need a touchup
>>> with this in mind.
>> I don't understand.. nr_sectors <=  dbms->total_sectors -
>> dbms->cur_sector and it can't put us past the end...
>
> Oh, because you take the minimum, so we don't have to worry about 
> sectors_per_chunk eclipsing what we have.
>
> Nevermind, I can't read... :(
>
>>>
>>>> +
>>>> +    send_bitmap(f, dbms, dbms->cur_sector, nr_sectors);
>>>> +
>>>> +    dbms->cur_sector += nr_sectors;
>>>> +    if (dbms->cur_sector >= dbms->total_sectors) {
>>>> +        dbms->bulk_completed = true;
>>>> +    }
>>>> +}
>>>> +
>>>> +/* Called with no lock taken.  */
>>>> +static void bulk_phase(QEMUFile *f, bool limit)
>>>> +{
>>>> +    DirtyBitmapMigBitmapState *dbms;
>>>> +
>>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, 
>>>> entry) {
>>>> +        while (!dbms->bulk_completed) {
>>>> +            bulk_phase_send_chunk(f, dbms);
>>>> +            if (limit && qemu_file_rate_limit(f)) {
>>>> +                return;
>>>> +            }
>>>> +        }
>>>> +    }
>>>> +
>>>> +    dirty_bitmap_mig_state.bulk_completed = true;
>>>> +}
>>>
>>> OK.
>>>
>>>> +
>>>> +static void blk_mig_reset_dirty_cursor(void)
>>>> +{
>>>> +    DirtyBitmapMigBitmapState *dbms;
>>>> +
>>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, 
>>>> entry) {
>>>> +        dbms->cur_dirty = 0;
>>>> +    }
>>>> +}
>>>> +
>>>
>>> OK.
>>>
>>>> +/* Called with iothread lock taken. */
>>>> +static void dirty_phase_send_chunk(QEMUFile *f,
>>>> DirtyBitmapMigBitmapState *dbms)
>>>> +{
>>>> +    uint32_t nr_sectors;
>>>> +
>>>> +    while (dbms->cur_dirty < dbms->total_sectors &&
>>>> +           !hbitmap_get(dbms->dirty_bitmap, dbms->cur_dirty)) {
>>>> +        dbms->cur_dirty += dbms->sectors_per_chunk;
>>>> +    }
>>>
>>> OK, so we fast forward the dirty cursor while the meta-bitmap is
>>> empty. Is it not worth using the HBitmapIterator here? You can reset
>>> them everywhere you reset the dirty cursor, and then just fast-seek to
>>> the first dirty sector.
>> Yes, I've thought about it, just used simpler way (copied from
>> migration/block.c) for an early version of the patch set. I will do it.
>
> Only if it doesn't make things more complicated to look at.
>
>>>
>>>> +
>>>> +    if (dbms->cur_dirty >= dbms->total_sectors) {
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    nr_sectors = MIN(dbms->total_sectors - dbms->cur_dirty,
>>>> +                     dbms->sectors_per_chunk);
>>>
>>> What happens if nr_sectors goes past the end?
>
> Again, I misread.
>
>>>
>>>> +    send_bitmap(f, dbms, dbms->cur_dirty, nr_sectors);
>>>> +    hbitmap_reset(dbms->dirty_bitmap, dbms->cur_dirty,
>>>> dbms->sectors_per_chunk);
>>>> +    dbms->cur_dirty += nr_sectors;
>>>> +}
>>>> +
>>>> +/* Called with iothread lock taken.
>>>> + *
>>>> + * return value:
>>>> + * 0: too much data for max_downtime
>>>> + * 1: few enough data for max_downtime
>>>> +*/
>>>
>>> dirty_phase below doesn't have a return value.
>> rudimentary comment.. thanks.
>>>
>>>> +static void dirty_phase(QEMUFile *f, bool limit)
>>>> +{
>>>> +    DirtyBitmapMigBitmapState *dbms;
>>>> +
>>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, 
>>>> entry) {
>>>> +        while (dbms->cur_dirty < dbms->total_sectors) {
>>>> +            dirty_phase_send_chunk(f, dbms);
>>>> +            if (limit && qemu_file_rate_limit(f)) {
>>>> +                return;
>>>> +            }
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>
>>> OK.
>>>
>>>> +
>>>> +/* Called with iothread lock taken.  */
>>>> +static void dirty_bitmap_mig_cleanup(void)
>>>> +{
>>>> +    DirtyBitmapMigBitmapState *dbms;
>>>> +
>>>> +    unset_dirty_tracking();
>>>> +
>>>> +    while ((dbms =
>>>> QSIMPLEQ_FIRST(&dirty_bitmap_mig_state.dbms_list)) != NULL) {
>>>> + QSIMPLEQ_REMOVE_HEAD(&dirty_bitmap_mig_state.dbms_list, entry);
>>>> +        g_free(dbms);
>>>> +    }
>>>> +}
>>>> +
>>>
>>> OK.
>>>
>>>> +static void dirty_bitmap_migration_cancel(void *opaque)
>>>> +{
>>>> +    dirty_bitmap_mig_cleanup();
>>>> +}
>>>> +
>>>
>>> OK.
>>>
>>>> +static int dirty_bitmap_save_iterate(QEMUFile *f, void *opaque)
>>>> +{
>>>> +    DPRINTF("Enter save live iterate\n");
>>>> +
>>>> +    blk_mig_reset_dirty_cursor();
>>>
>>> I suppose this is because it's easier to check if we are finished by
>>> starting from sector 0 every time.
>>>
>>> A harder, but faster method may be: Use HBitmapIterators, but don't
>>> reset them every iteration: just iterate until the end, and check that
>>> the bitmap is empty. If the meta bitmap is empty, the dirty phase is
>>> complete. If the meta bitmap is NOT empty, reset the HBI and continue
>>> allowing iterations over the dirty phase.
>> Ok, will do.
>>>
>>>> +
>>>> +    if (dirty_bitmap_mig_state.bulk_completed) {
>>>> +        qemu_mutex_lock_iothread();
>>>> +        dirty_phase(f, true);
>>>> +        qemu_mutex_unlock_iothread();
>>>> +    } else {
>>>> +        bulk_phase(f, true);
>>>> +    }
>>>> +
>>>> +    qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>>>> +
>>>> +    return dirty_bitmap_mig_state.bulk_completed;
>>>> +}
>>>> +
>>>> +/* Called with iothread lock taken.  */
>>>> +
>>>> +static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
>>>> +{
>>>> +    DirtyBitmapMigBitmapState *dbms;
>>>> +    DPRINTF("Enter save live complete\n");
>>>> +
>>>> +    if (!dirty_bitmap_mig_state.bulk_completed) {
>>>> +        bulk_phase(f, false);
>>>> +    }
>>>
>>> [Not expertly familiar with savevm:] Under what conditions can this
>>> happen?
>> This can happen. save_complete will happen when savevm decide that
>> pending data size to send is small enough. It was the case for my bugfix
>> for migration/block.c about pending. To prevent save_complete when bulk
>> phase isn't completed, save_pending returns (in my bugfix for
>> migration/block.c) big value. Here I decided to make more honest
>> save_pending, so I need to complete (if it doesn't) bulk phase in
>> save_complete.
>
> OK, Gotcha.
>
>>>
>>>> +
>>>> +    blk_mig_reset_dirty_cursor();
>>>> +    dirty_phase(f, false);
>>>> +
>>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, 
>>>> entry) {
>>>> +        uint8_t flags = DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME |
>>>> +                        DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME |
>>>> +                        DIRTY_BITMAP_MIG_FLAG_ENABLED;
>>>> +
>>>> +        qemu_put_byte(f, flags);
>>>> +        qemu_put_name(f, bdrv_get_device_name(dbms->bs));
>>>> +        qemu_put_name(f, bdrv_dirty_bitmap_name(dbms->bitmap));
>>>> +        qemu_put_byte(f, bdrv_dirty_bitmap_enabled(dbms->bitmap));
>>>> +    }
>>>> +
>>>> +    qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>>>> +
>>>> +    DPRINTF("Dirty bitmaps migration completed\n");
>>>> +
>>>> +    dirty_bitmap_mig_cleanup();
>>>> +    return 0;
>>>> +}
>>>> +
>>>
>>> I suppose we don't need a flag that distinctly SAYS this is the end
>>> section, since we can tell by omission of
>>> DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK or ZERO_CHUNK.
>> Hmm. I think it simplifies the logic (to use EOS after each section).
>> And the same approach is in migration/block.c.. It's a question about
>> which format is better:  "Each section for dirty_bitmap_load ends with
>> EOS" or "Each section for dirty_bitmap_load ends with EOS except the
>> last one. The last one may be recognized by absent NORMAL_CHUNK and
>> ZERO_CHUNK"
>>>
>>>> +static uint64_t dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
>>>> +                                          uint64_t max_size)
>>>> +{
>>>> +    DirtyBitmapMigBitmapState *dbms;
>>>> +    uint64_t pending = 0;
>>>> +
>>>> +    qemu_mutex_lock_iothread();
>>>> +
>>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, 
>>>> entry) {
>>>> +        uint64_t sectors = hbitmap_count(dbms->dirty_bitmap);
>>>> +        if (!dbms->bulk_completed) {
>>>> +            sectors += dbms->total_sectors - dbms->cur_sector;
>>>> +        }
>>>> +        pending += bdrv_dirty_bitmap_data_size(dbms->bitmap, 
>>>> sectors);
>>>> +    }
>>>> +
>>>> +    qemu_mutex_unlock_iothread();
>>>> +
>>>> +    DPRINTF("Enter save live pending %" PRIu64 ", max: %" PRIu64 
>>>> "\n",
>>>> +            pending, max_size);
>>>> +    return pending;
>>>> +}
>>>> +
>>>
>>> OK.
>>>
>>>> +static int dirty_bitmap_load(QEMUFile *f, void *opaque, int 
>>>> version_id)
>>>> +{
>>>> +    int flags;
>>>> +
>>>> +    static char device_name[256], bitmap_name[256];
>>>> +    static BlockDriverState *bs;
>>>> +    static BdrvDirtyBitmap *bitmap;
>>>> +
>>>> +    uint8_t *buf;
>>>> +    uint64_t first_sector;
>>>> +    uint32_t  nr_sectors;
>>>> +    int ret;
>>>> +
>>>> +    DPRINTF("load start\n");
>>>> +
>>>> +    do {
>>>> +        flags = qemu_get_byte(f);
>>>> +        DPRINTF("flags: %x\n", flags);
>>>> +
>>>> +        if (flags & DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME) {
>>>> +            qemu_get_name(f, device_name);
>>>> +            bs = bdrv_find(device_name);
>>>
>>> Similar to the above confusion, you may want bdrv_lookup_bs or
>>> similar, since we're going to be looking for BDS nodes instead of
>>> "devices."
>> In this case, should it be changed in migration/block.c too?
>
> [See discussion above!]
>
>>>
>>>> +            if (!bs) {
>>>> +                fprintf(stderr, "Error: unknown block device '%s'\n",
>>>> +                        device_name);
>>>> +                return -EINVAL;
>>>> +            }
>>>> +        }
>>>> +
>>>> +        if (flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
>>>> +            if (!bs) {
>>>> +                fprintf(stderr, "Error: block device name is not
>>>> set\n");
>>>> +                return -EINVAL;
>>>> +            }
>>>> +
>>>> +            qemu_get_name(f, bitmap_name);
>>>> +            bitmap = bdrv_find_dirty_bitmap(bs, bitmap_name);
>>>> +            if (flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY) {
>>>> +                /* First chunk from this bitmap */
>>>> +                uint64_t granularity = qemu_get_be64(f);
>>>> +                if (!bitmap) {
>>>> +                    Error *local_err = NULL;
>>>> +                    bitmap = bdrv_create_dirty_bitmap(bs, 
>>>> granularity,
>>>> + bitmap_name,
>>>> + &local_err);
>>>> +                    if (!bitmap) {
>>>> +                        error_report("%s",
>>>> error_get_pretty(local_err));
>>>> +                        error_free(local_err);
>>>> +                        return -EINVAL;
>>>> +                    }
>>>> +                } else {
>>>> +                    uint64_t dest_granularity =
>>>> +                        bdrv_dirty_bitmap_granularity(bs, bitmap);
>>>> +                    if (dest_granularity != granularity) {
>>>> +                        fprintf(stderr,
>>>> +                                "Error: "
>>>> +                                "Migrated bitmap granularity (%"
>>>> PRIu64 ") "
>>>> +                                "is not match with destination
>>>> bitmap '%s' "
>>>> +                                "granularity (%" PRIu64 ")\n",
>>>> +                                granularity,
>>>> +                                bitmap_name,
>>>> +                                dest_granularity);
>>>> +                        return -EINVAL;
>>>> +                    }
>>>> +                }
>>>> +                bdrv_disable_dirty_bitmap(bitmap);
>>>> +            }
>>>> +            if (!bitmap) {
>>>> +                fprintf(stderr, "Error: unknown dirty bitmap "
>>>> +                        "'%s' for block device '%s'\n",
>>>> +                        bitmap_name, device_name);
>>>> +                return -EINVAL;
>>>> +            }
>>>> +        }
>>>> +
>>>> +        if (flags & DIRTY_BITMAP_MIG_FLAG_ENABLED) {
>>>> +            bool enabled;
>>>> +            if (!bitmap) {
>>>> +                fprintf(stderr, "Error: dirty bitmap name is not
>>>> set\n");
>>>> +                return -EINVAL;
>>>> +            }
>>>> +            bdrv_dirty_bitmap_deserialize_finish(bitmap);
>>>> +            /* complete migration */
>>>> +            enabled = qemu_get_byte(f);
>>>> +            if (enabled) {
>>>> +                bdrv_enable_dirty_bitmap(bitmap);
>>>> +            }
>>>> +        }
>>>
>>> Oh, so you use the ENABLED flag to show that migration is over.
>> Yes, it was bad idea..
>>> If we are going to commit to a stream format for bitmaps, though,
>>> maybe it's best to actually create a "COMPLETION BLOCK" flag and then
>>> split this function into two pieces:
>>>
>>> (1) The part that receives regular / zero blocks, and
>>> (2) The part that receives completion data.
>>>
>>> That way, if we change the properties that bitmaps have down the line,
>>> we aren't reliant on literally the "enabled" flag to decide what to do.
>>>
>>> Also, it might help make this fairly long function a little smaller
>>> and more readable.
>> Ok.
>>>
>>>> +
>>>> +        if (flags & (DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK |
>>>> +                     DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK)) {
>>>> +            if (!bs) {
>>>> +                fprintf(stderr, "Error: block device name is not
>>>> set\n");
>>>> +                return -EINVAL;
>>>> +            }
>>>> +            if (!bitmap) {
>>>> +                fprintf(stderr, "Error: dirty bitmap name is not
>>>> set\n");
>>>> +                return -EINVAL;
>>>> +            }
>>>> +
>>>> +            first_sector = qemu_get_be64(f);
>>>> +            nr_sectors = qemu_get_be32(f);
>>>> +            DPRINTF("chunk: %lu %u\n", first_sector, nr_sectors);
>>>> +
>>>> +
>>>> +            if (flags & DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK) {
>>>> +                bdrv_dirty_bitmap_deserialize_part0(bitmap,
>>>> first_sector,
>>>> + nr_sectors);
>>>> +            } else {
>>>> +                uint64_t buf_size = qemu_get_be64(f);
>>>> +                uint64_t needed_size =
>>>> +                    bdrv_dirty_bitmap_data_size(bitmap, nr_sectors);
>>>> +
>>>> +                if (needed_size > buf_size) {
>>>> +                    fprintf(stderr,
>>>> +                            "Error: Migrated bitmap granularity is
>>>> not "
>>>> +                            "match with destination bitmap
>>>> granularity\n");
>>>> +                    return -EINVAL;
>>>> +                }
>>>> +
>>>
>>> "Migrated bitmap granularity doesn't match the destination bitmap
>>> granularity" perhaps.
>>>
>>>> +                buf = g_malloc(buf_size);
>>>> +                qemu_get_buffer(f, buf, buf_size);
>>>> +                bdrv_dirty_bitmap_deserialize_part(bitmap, buf,
>>>> + first_sector,
>>>> + nr_sectors);
>>>> +                g_free(buf);
>>>> +            }
>>>> +        }
>>>> +
>>>> +        ret = qemu_file_get_error(f);
>>>> +        if (ret != 0) {
>>>> +            return ret;
>>>> +        }
>>>> +    } while (!(flags & DIRTY_BITMAP_MIG_FLAG_EOS));
>>>> +
>>>> +    DPRINTF("load finish\n");
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static void dirty_bitmap_set_params(const MigrationParams *params,
>>>> void *opaque)
>>>> +{
>>>> +    dirty_bitmap_mig_state.migration_enable = params->dirty;
>>>> +}
>>>> +
>>>
>>> OK; though I am not immediately aware of what changes need to happen
>>> to accommodate Eric's suggestions.
>> This function will be dropped in v3.
>>>
>>>> +static bool dirty_bitmap_is_active(void *opaque)
>>>> +{
>>>> +    return dirty_bitmap_mig_state.migration_enable == 1;
>>>> +}
>>>> +
>>>
>>> OK.
>>>
>>>> +static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
>>>> +{
>>>> +    init_dirty_bitmap_migration(f);
>>>> +
>>>> +    qemu_mutex_lock_iothread();
>>>> +    /* start track dirtyness of dirty bitmaps */
>>>> +    set_dirty_tracking();
>>>> +    qemu_mutex_unlock_iothread();
>>>> +
>>>> +    blk_mig_reset_dirty_cursor();
>>>> +    qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>
>>> OK; see dirty_bitmap_mig_init below, though.
>>>
>>>> +static SaveVMHandlers savevm_block_handlers = {
>>>> +    .set_params = dirty_bitmap_set_params,
>>>> +    .save_live_setup = dirty_bitmap_save_setup,
>>>> +    .save_live_iterate = dirty_bitmap_save_iterate,
>>>> +    .save_live_complete = dirty_bitmap_save_complete,
>>>> +    .save_live_pending = dirty_bitmap_save_pending,
>>>> +    .load_state = dirty_bitmap_load,
>>>> +    .cancel = dirty_bitmap_migration_cancel,
>>>> +    .is_active = dirty_bitmap_is_active,
>>>> +};
>>>> +
>>>> +void dirty_bitmap_mig_init(void)
>>>> +{
>>>> +    QSIMPLEQ_INIT(&dirty_bitmap_mig_state.dbms_list);
>>>
>>> Maybe I haven't looked thoroughly enough yet, but it's weird that part
>>> of the dirty_bitmap_mig_state is initialized here, and the rest of it
>>> in init_dirty_bitmap_migration. I'd prefer to keep it all together, if
>>> possible.
>> dirty_bitmap_mig_init is called one time when qemu starts. QSIMPLEQ_INIT
>> should be called once. dirty_bitmap_save_setup is called on every
>> migration start, it's like 'reinitialize'.
>>>
>>>> +
>>>> +    register_savevm_live(NULL, "dirty-bitmap", 0, 1,
>>>> &savevm_block_handlers,
>>>> +                         &dirty_bitmap_mig_state);
>>>> +}
>>>
>>> OK.
>>>
>>>> diff --git a/vl.c b/vl.c
>>>> index a824a7d..dee7220 100644
>>>> --- a/vl.c
>>>> +++ b/vl.c
>>>> @@ -4184,6 +4184,7 @@ int main(int argc, char **argv, char **envp)
>>>>
>>>>       blk_mig_init();
>>>>       ram_mig_init();
>>>> +    dirty_bitmap_mig_init();
>>>>
>>>>       /* If the currently selected machine wishes to override the
>>>> units-per-bus
>>>>        * property of its default HBA interface type, do so now. */
>>>>
>>>
>>> Hm, since dirty bitmaps are a sub-component of the block layer, would
>>> it not make sense to put this hook under blk_mig_init, perhaps?
>> IMHO the reason to put it here is to keep all
>> register_savevm_live-entities in one place.
>
> If you still feel that way I won't withhold my R-b, but there are 
> already other cases such as ppc_spapr_init which are not in this 
> general area of vl.c.
>
> Plus the dozens of devices that use register_savevm as a wrapper to 
> register_savevm_live, so maybe consolidating calls to this function 
> isn't that important.
Hm, I've missed it, ppc_spapr_init is not here, yes.. Another thing 
here: dirty bitmaps migration are separate from blk migration. And it 
may be used without blk migration (nbd+mirrow migration may be used).. 
Is it ok to connect dirty bitmaps migration to blk_mig_init, which is 
located in migration/block.c, which may not be used at all when we 
bitmaps are migrated using migration/dirty-bitmap.c?
In other words, yes, dirty bitmaps are a sub-component of the block 
layer, but dirty bitmap migration is not a sub-component of blk migration.
>
>>>
>>>
>>> Overall this looks very clean compared to the intermingled format in
>>> V1, and the code is organized pretty well. Just a few minor comments,
>>> and I'd like to get the opinion of the migration maintainers, but I am
>>> happy. Sorry it took me so long to review, please feel free to let me
>>> know if you disagree with any of my opinions :)
>>>
>>> Thank you,
>>> --John
>>
>> Thank you for reviewing my series)
>>
>
> Yup. Hopefully I didn't miss too much that will irritate the Migration 
> overlords.
>
> Once you respin on top of v12, I can run some thorough migration tests 
> on it (perhaps over a weekend) and verify that it survives a couple 
> hundred migrations without any kind of integrity loss.
I hope I'll do it with all other things in about two days.
>
> This is what makes sense to me right now, anyway.
>
> Do you think you'll be including the bitmap checksum in the 
> BlockDirtyInfo command? That'd be convenient for iotests.
Ok, will do. Good idea. Only two points:
1) Is it ok to include debug info into BlockDirtyInfo? Will users be 
happy with it?
2) When I was debugging my code, the information about dirty-regions was 
very useful. Now, all is working and checksums are enough for regression 
control.
>
> Thank you,
> --John.


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 8/8] migration: add migration/dirty-bitmap.c
  2015-02-16 12:06         ` Vladimir Sementsov-Ogievskiy
@ 2015-02-16 18:18           ` John Snow
  2015-02-16 18:22             ` Dr. David Alan Gilbert
  2015-02-17  8:54             ` Vladimir Sementsov-Ogievskiy
  0 siblings, 2 replies; 35+ messages in thread
From: John Snow @ 2015-02-16 18:18 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, Peter Maydell,
	Juan quin >> Juan Jose Quintela Carreira,
	Dr. David Alan Gilbert, stefanha, den, amit Shah, pbonzini



On 02/16/2015 07:06 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 13.02.2015 23:22, John Snow wrote:
>>
>>
>> On 02/13/2015 03:19 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> On 11.02.2015 00:33, John Snow wrote:
>>>> Peter Maydell: What's the right way to license a file as copied from a
>>>> previous version? See below, please;
>>>>
>>>> Max, Markus: ctrl+f "bdrv_get_device_name" and let me know what you
>>>> think, if you would.
>>>>
>>>> Juan, Amit, David: Copying migration maintainers.
>>>>
>>>> On 01/27/2015 05:56 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>>> Live migration of dirty bitmaps. Only named dirty bitmaps are
>>>>> migrated.
>>>>> If destination qemu is already containing a dirty bitmap with the same
>>>>> name as a migrated bitmap, then their granularities should be the
>>>>> same,
>>>>> otherwise the error will be generated. If destination qemu doesn't
>>>>> contain such bitmap it will be created.
>>>>>
>>>>> format:
>>>>>
>>>>> 1 byte: flags
>>>>>
>>>>> [ 1 byte: node name size ] \  flags & DEVICE_NAME
>>>>> [ n bytes: node name     ] /
>>>>>
>>>>> [ 1 byte: bitmap name size ]       \
>>>>> [ n bytes: bitmap name     ]       | flags & BITMAP_NAME
>>>>> [ [ be64: granularity    ] ]  flags & GRANULARITY
>>>>>
>>>>> [ 1 byte: bitmap enabled bit ] flags & ENABLED
>>>>>
>>>>> [ be64: start sector      ] \ flags & (NORMAL_CHUNK | ZERO_CHUNK)
>>>>> [ be32: number of sectors ] /
>>>>>
>>>>> [ be64: buffer size ] \ flags & NORMAL_CHUNK
>>>>> [ n bytes: buffer   ] /
>>>>>
>>>>> The last chunk should contain flags & EOS. The chunk may skip device
>>>>> and/or bitmap names, assuming them to be the same with the previous
>>>>> chunk. GRANULARITY is sent with the first chunk for the bitmap.
>>>>> ENABLED
>>>>> bit is sent in the end of "complete" stage of migration. So when
>>>>> destination gets ENABLED flag it should deserialize_finish the bitmap
>>>>> and set its enabled bit to corresponding value.
>>>>>
>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>>>>> ---
>>>>>   include/migration/block.h |   1 +
>>>>>   migration/Makefile.objs   |   2 +-
>>>>>   migration/dirty-bitmap.c  | 606
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++
>>>>>   vl.c                      |   1 +
>>>>>   4 files changed, 609 insertions(+), 1 deletion(-)
>>>>>   create mode 100644 migration/dirty-bitmap.c
>>>>>
>>>>> diff --git a/include/migration/block.h b/include/migration/block.h
>>>>> index ffa8ac0..566bb9f 100644
>>>>> --- a/include/migration/block.h
>>>>> +++ b/include/migration/block.h
>>>>> @@ -14,6 +14,7 @@
>>>>>   #ifndef BLOCK_MIGRATION_H
>>>>>   #define BLOCK_MIGRATION_H
>>>>>
>>>>> +void dirty_bitmap_mig_init(void);
>>>>>   void blk_mig_init(void);
>>>>>   int blk_mig_active(void);
>>>>>   uint64_t blk_mig_bytes_transferred(void);
>>>>
>>>> OK.
>>>>
>>>>> diff --git a/migration/Makefile.objs b/migration/Makefile.objs
>>>>> index d929e96..9adfda9 100644
>>>>> --- a/migration/Makefile.objs
>>>>> +++ b/migration/Makefile.objs
>>>>> @@ -6,5 +6,5 @@ common-obj-y += xbzrle.o
>>>>>   common-obj-$(CONFIG_RDMA) += rdma.o
>>>>>   common-obj-$(CONFIG_POSIX) += exec.o unix.o fd.o
>>>>>
>>>>> -common-obj-y += block.o
>>>>> +common-obj-y += block.o dirty-bitmap.o
>>>>>
>>>>
>>>> OK.
>>>>
>>>>> diff --git a/migration/dirty-bitmap.c b/migration/dirty-bitmap.c
>>>>> new file mode 100644
>>>>> index 0000000..8621218
>>>>> --- /dev/null
>>>>> +++ b/migration/dirty-bitmap.c
>>>>> @@ -0,0 +1,606 @@
>>>>> +/*
>>>>> + * QEMU dirty bitmap migration
>>>>> + *
>>>>> + * derived from migration/block.c
>>>>> + *
>>>>> + * Author:
>>>>> + * Sementsov-Ogievskiy Vladimir <vsementsov@parallels.com>
>>>>> + *
>>>>> + * original copyright message:
>>>>> + *
>>>>> =====================================================================
>>>>> + * Copyright IBM, Corp. 2009
>>>>> + *
>>>>> + * Authors:
>>>>> + *  Liran Schour <lirans@il.ibm.com>
>>>>> + *
>>>>> + * This work is licensed under the terms of the GNU GPL, version
>>>>> 2. See
>>>>> + * the COPYING file in the top-level directory.
>>>>> + *
>>>>> + * Contributions after 2012-01-13 are licensed under the terms of the
>>>>> + * GNU GPL, version 2 or (at your option) any later version.
>>>>> + *
>>>>> =====================================================================
>>>>> + */
>>>>> +
>>>>
>>>> Not super familiar with the right way to do licensing here; it's
>>>> possible you may not need to copy the original here, but I'm not sure.
>>>> You will want to make it clear what license applies to /your/ work, I
>>>> think. Maybe Peter Maydell can clue us in.
>>>>
>>>>> +#include "block/block.h"
>>>>> +#include "qemu/main-loop.h"
>>>>> +#include "qemu/error-report.h"
>>>>> +#include "migration/block.h"
>>>>> +#include "migration/migration.h"
>>>>> +#include "qemu/hbitmap.h"
>>>>> +#include <assert.h>
>>>>> +
>>>>> +#define CHUNK_SIZE                       (1 << 20)
>>>>> +
>>>>> +#define DIRTY_BITMAP_MIG_FLAG_EOS           0x01
>>>>> +#define DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK  0x02
>>>>> +#define DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK    0x04
>>>>> +#define DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME   0x08
>>>>> +#define DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME   0x10
>>>>> +#define DIRTY_BITMAP_MIG_FLAG_GRANULARITY   0x20
>>>>> +#define DIRTY_BITMAP_MIG_FLAG_ENABLED       0x40
>>>>> +/* flags should be <= 0xff */
>>>>> +
>>>>
>>>> We should give ourselves a little breathing room with the flags, since
>>>> we've only got room for one more.
>>> Ok. Will one more byte be enough?
>>
>> I should hope so. If you do add a completion chunk and flag, that
>> fills up the first byte completely, so having a second byte is a good
>> idea.
>>
>> I might recommend reserving the last bit of the second byte to be a
>> flag such as DIRTY_BITMAP_EXTRA_FLAGS that indicates the presence of
>> additional byte(s) of flags, to be determined later, if we ever need
>> them, but two bytes for now should be sufficient.
> Ok.
>>
>>>>
>>>>> +/* #define DEBUG_DIRTY_BITMAP_MIGRATION */
>>>>> +
>>>>> +#ifdef DEBUG_DIRTY_BITMAP_MIGRATION
>>>>> +#define DPRINTF(fmt, ...) \
>>>>> +    do { printf("dirty_migration: " fmt, ## __VA_ARGS__); } while (0)
>>>>> +#else
>>>>> +#define DPRINTF(fmt, ...) \
>>>>> +    do { } while (0)
>>>>> +#endif
>>>>> +
>>>>> +typedef struct DirtyBitmapMigBitmapState {
>>>>> +    /* Written during setup phase. */
>>>>> +    BlockDriverState *bs;
>>>>> +    BdrvDirtyBitmap *bitmap;
>>>>> +    HBitmap *dirty_bitmap;
>>>>
>>>> For my own sanity, I'd really prefer "bitmap" and "meta_bitmap" here;
>>>> "dirty_bitmap" is often used as a synonym (outside of this file) to
>>>> refer to the BdrvDirtyBitmap in general, so it's usage here can be
>>>> somewhat confusing.
>>>>
>>>> I'd appreciate "dirty_dirty_bitmap" as in your previous patch for
>>>> consistency, or "meta_bitmap" as I recommend.
>>>>
>>> Ok
>>>>> +    int64_t total_sectors;
>>>>> +    uint64_t sectors_per_chunk;
>>>>> +    QSIMPLEQ_ENTRY(DirtyBitmapMigBitmapState) entry;
>>>>> +
>>>>> +    /* For bulk phase. */
>>>>> +    bool bulk_completed;
>>>>> +    int64_t cur_sector;
>>>>> +    bool granularity_sent;
>>>>> +
>>>>> +    /* For dirty phase. */
>>>>> +    int64_t cur_dirty;
>>>>> +} DirtyBitmapMigBitmapState;
>>>>> +
>>>>> +typedef struct DirtyBitmapMigState {
>>>>> +    int migration_enable;
>>>>> +    QSIMPLEQ_HEAD(dbms_list, DirtyBitmapMigBitmapState) dbms_list;
>>>>> +
>>>>> +    bool bulk_completed;
>>>>> +
>>>>> +    /* for send_bitmap() */
>>>>> +    BlockDriverState *prev_bs;
>>>>> +    BdrvDirtyBitmap *prev_bitmap;
>>>>> +} DirtyBitmapMigState;
>>>>> +
>>>>> +static DirtyBitmapMigState dirty_bitmap_mig_state;
>>>>> +
>>>>> +/* read name from qemu file:
>>>>> + * format:
>>>>> + * 1 byte : len = name length (<256)
>>>>> + * len bytes : name without last zero byte
>>>>> + *
>>>>> + * name should point to the buffer >= 256 bytes length
>>>>> + */
>>>>> +static char *qemu_get_name(QEMUFile *f, char *name)
>>>>> +{
>>>>> +    int len = qemu_get_byte(f);
>>>>> +    qemu_get_buffer(f, (uint8_t *)name, len);
>>>>> +    name[len] = '\0';
>>>>> +
>>>>> +    DPRINTF("get name: %d %s\n", len, name);
>>>>> +
>>>>> +    return name;
>>>>> +}
>>>>> +
>>>>
>>>> OK. Maybe these could be "qemu_put_string" or "qemu_get_string" and
>>>> added to qemu-file.c so others can use them.
>>> If no objections for sharing this format, I'll do it.
>>>>
>>>>> +/* write name to qemu file:
>>>>> + * format:
>>>>> + * same as for qemu_get_name
>>>>> + *
>>>>> + * maximum name length is 255
>>>>> + */
>>>>> +static void qemu_put_name(QEMUFile *f, const char *name)
>>>>> +{
>>>>> +    int len = strlen(name);
>>>>> +
>>>>> +    DPRINTF("put name: %d %s\n", len, name);
>>>>> +
>>>>> +    assert(len < 256);
>>>>> +    qemu_put_byte(f, len);
>>>>> +    qemu_put_buffer(f, (const uint8_t *)name, len);
>>>>> +}
>>>>> +
>>>>
>>>> OK.
>>>>
>>>>> +static void send_bitmap(QEMUFile *f, DirtyBitmapMigBitmapState *dbms,
>>>>> +                        uint64_t start_sector, uint32_t nr_sectors)
>>>>> +{
>>>>> +    BlockDriverState *bs = dbms->bs;
>>>>> +    BdrvDirtyBitmap *bitmap = dbms->bitmap;
>>>>> +    uint8_t flags = 0;
>>>>> +    /* align for buffer_is_zero() */
>>>>> +    uint64_t align = 4 * sizeof(long);
>>>>> +    uint64_t buf_size =
>>>>> +        (bdrv_dirty_bitmap_data_size(bitmap, nr_sectors) + align -
>>>>> 1) &
>>>>> +        ~(align - 1);
>>>>> +    uint8_t *buf = g_malloc0(buf_size);
>>>>> +
>>>>> +    bdrv_dirty_bitmap_serialize_part(bitmap, buf, start_sector,
>>>>> nr_sectors);
>>>>> +
>>>>> +    if (buffer_is_zero(buf, buf_size)) {
>>>>> +        g_free(buf);
>>>>> +        buf = NULL;
>>>>> +        flags |= DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK;
>>>>> +    } else {
>>>>> +        flags |= DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK;
>>>>> +    }
>>>>> +
>>>>> +    if (bs != dirty_bitmap_mig_state.prev_bs) {
>>>>> +        dirty_bitmap_mig_state.prev_bs = bs;
>>>>> +        flags |= DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME;
>>>>> +    }
>>>>> +
>>>>> +    if (bitmap != dirty_bitmap_mig_state.prev_bitmap) {
>>>>> +        dirty_bitmap_mig_state.prev_bitmap = bitmap;
>>>>> +        flags |= DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME;
>>>>> +    }
>>>>
>>>> OK, so we use the current bs/bitmap under consideration to detect if
>>>> we have switched context, and put the names in the stream when it
>>>> happens. OK.
>>>>
>>>>> +
>>>>> +    if (dbms->granularity_sent == 0) {
>>>>> +        dbms->granularity_sent = 1;
>>>>> +        flags |= DIRTY_BITMAP_MIG_FLAG_GRANULARITY;
>>>>> +    }
>>>>> +
>>>>> +    DPRINTF("Enter send_bitmap"
>>>>> +            "\n   flags:        %x"
>>>>> +            "\n   start_sector: %" PRIu64
>>>>> +            "\n   nr_sectors:   %" PRIu32
>>>>> +            "\n   data_size:    %" PRIu64 "\n",
>>>>> +            flags, start_sector, nr_sectors, buf_size);
>>>>> +
>>>>> +    qemu_put_byte(f, flags);
>>>>> +
>>>>> +    if (flags & DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME) {
>>>>> +        qemu_put_name(f, bdrv_get_device_name(bs));
>>>>> +    }
>>>>
>>>> I am still not fully clear on this myself, but I think we are phasing
>>>> out bdrv_get_device_name. In the context of bitmaps, we are mostly
>>>> likely using them one-per-tree, but they /can/ be attached
>>>> one-per-node, so we shouldn't be trying to get the device name here,
>>>> but rather, the node-name.
>>>>
>>>> I /think/ we may want to be using bdrv_get_node_name, but I think it
>>>> is not currently true that all nodes WILL be named ... I am CC'ing
>>>> Markus Armbruster and Max Reitz for (A) A refresher course and (B)
>>>> Opinions on what function call might make sense here, given that we
>>>> wish to migrate bitmaps attached to arbitrary nodes.
>>> Hmm.. I'm not familiar with hierarchy of nodes and devices. As I
>>> understand, both command_line- and qmp- created drives are created
>>> through blockdev_init, which creates both blk(device) and bs(node)
>>> through blk_new_with_bs.. Am I right? Also, bdrv_get_device_name is used
>>> in migration/block.c.
>>
>> Now that I'm more awake, here's a better rundown of what's going on:
>>
>> It's something that is a little bit in flux right now, unfortunately.
>> We're trying to transition to a format where we have arbitrarily
>> complex Block trees, where the root of the tree is always a
>> BlockBackend (See the big series by Max Reitz) and the configuration
>> of the tree may become arbitrarily complex.
>>
>> Simple trees may consist of just one BlockBackend and one
>> BlockDriverState node, where I think we can refer to this BDS as the
>> "root node," not to be confused with the BlockBackend "root." The
>> BlockBackend is a relatively new invention, so it isn't actually used
>> consistently everywhere yet.
>>
>> In the future, we may have commands that make distinctions based on if
>> you want to work on the BlockBackend, the root node under the
>> blockbackend associated with a BDS, only the explicit node/BDS you
>> identify, or some combination of the above semantics.
>>
>> As of right now, bitmaps can be *attached* to any arbitrary node,
>> though they are currently only *useful* when attached to the first
>> child of the BlockBackend, the root node. It's only useful currently
>> in cases where it is attached to the root because I've only proposed
>> patches for adding bitmap support to produce incrementals for Drive
>> Backup, which operates only on drives/devices (the root node of a tree.)
>>
>> However, in the context of migrating, it could be that we want to
>> migrate any bitmaps attached to /any/ nodes, so we should be careful
>> about what names we are pulling - we don't necessarily want the name
>> of the root node or BlockBackend, we may want the BDS and accompanying
>> name of strictly the node the bitmap is attached to.
>>
>> I know other areas of the code don't provide a good example for this
>> distinction, yet, but the block layer people are actively working on
>> fixing that. (See also the back-and-forth reviews for what to name my
>> QMP parameters in the incremental backup patches for some overview of
>> this semantic transition.)
>>
>> That said, We should think carefully about *which* name we want to put
>> in the stream and what implications it has for migration.
>>
>>
>> (1) bdrv_get_node_name and bdrv_find_node
>>
>> This would migrate bitmaps as attached to their specific BDS. This
>> would mean that the node layout on the destination is either
>> identical, or similar enough such that no named bitmaps are attached
>> to a node not present on the destination.
>>
>> This gives us precision: bitmaps may be attached lower in the tree and
>> can provide more fine-grained detail for which layers have been
>> changed or modified during runtime.
>>
>> This also gives us fragility: In cases where we transfer, say, a
>> complex tree of nodes and collapse it to a single destination drive,
>> we'd be unable to migrate bitmaps not attached to the root along with
>> it, because they'd have nowhere meaningful to attach.
>>
>> It is perhaps somewhat unneccessary at this exact moment in time, as
>> well, because bitmaps are currently only useful on root nodes.
>>
>> (2) bdrv_get_device_name and bdrv_lookup_bs(device_name, NULL, errp)
>>
>> This would migrate any bitmaps in a tree and attach them to the entire
>> drive on the destination.
>>
>> This is simpler: You just need to make sure that the root nodes have
>> the same names, which is a lot easier to manage.
>>
>> This matches how drive migration currently appears to work: The entire
>> tree appears to be generally squashed into a single node and
>> transferred cluster-by-cluster, without general consideration as to
>> the layout of the local block tree. As we both know by now, none of
>> the metadata is transferred, just the data.
>>
>> It prevents migration of just bitmaps where you WANT the extra
>> complexity: If a bitmap is attached lower in the tree, re-affixing it
>> to the root of a destination tree might invalidate the semantics of
>> what that bitmap was meant to track, and it may become useless.
>>
>>
>> So in summary:
>> using device names is probably fine for now, as it matches the current
>> use case of bitmaps as well as drive migration; but using node names
>> may give us more power and precision later.
>>
>> I talked to Max about it, and he is leaning towards using device names
>> for now and switching to node names if we decide we want that power.
>>
>> (...I wonder if we could use a flag, for now, that says we're
>> including DEVICE names. Later, we could add a flag that says we're
>> using NODE names and add an option to toggle as the usage case sees fit.)
>>
>>
>> Are you confused yet? :D
> O, thanks for the explanation). Are we really need this flag? As Markus
> wrote, nodes and devices are sharing namespaces.. We can use
> bdrv_lookup_bs(name, name, errp)..

what 'name' are you using here, though? It looked to me like in your 
backup routine we got a list of BDS entries and get the name *from* the 
BDS, so we still have to think about how we want to /get/ the name.

>
> Also, we can, for example, send bitmaps as follows:
>
> if node has name - send bitmap with this name
> if node is root, but hasn't name - send it with blk name
> otherwise - don't send the bitmap

The node a bitmap is attached to should always have a name -- it would 
not be possible via the existing interface to attach it to a node 
without a name.

I *think* the root node should always have a name, but I am actually 
less sure of that.

>>
>>
>>>>
>>>>> +
>>>>> +    if (flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
>>>>> +        qemu_put_name(f, bdrv_dirty_bitmap_name(bitmap));
>>>>> +
>>>>> +        if (flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY) {
>>>>> +            qemu_put_be64(f, bdrv_dirty_bitmap_granularity(bs,
>>>>> bitmap));
>>>>> +        }
>>>>> +    } else {
>>>>> +        assert(!(flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY));
>>>>> +    }
>>>>> +
>>>>
>>>> I thought we were only migrating bitmaps with names?
>>>> I suppose the conditional can't hurt, but I am not clear on when we
>>>> won't have a bitmap name here.
>>> You are right, 'else' case is not possible.. Hmm. I've added it to be
>>> sure that format is not corrupted, when I decided to put granularity
>>> only with name. Wi won't have a bitmap name only when we send the same
>>> bitmap as on the previous send_bitmap() call. May be it will be better
>>> to use two separate if's without else and assert.
>>
>> It's okay if it is just "paranoia," but I was just checking. It would
>> make a decent assert().
>>
>>>>
>>>>> +    qemu_put_be64(f, start_sector);
>>>>> +    qemu_put_be32(f, nr_sectors);
>>>>> +
>>>>> +    /* if a block is zero we need to flush here since the network
>>>>> +     * bandwidth is now a lot higher than the storage device
>>>>> bandwidth.
>>>>> +     * thus if we queue zero blocks we slow down the migration.
>>>>> +     * also, skip writing block when migrate only dirty bitmaps. */
>>>>> +    if (flags & DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK) {
>>>>> +        qemu_fflush(f);
>>>>> +        return;
>>>>> +    }
>>>>> +
>>>>> +    qemu_put_be64(f, buf_size);
>>>>> +    qemu_put_buffer(f, buf, buf_size);
>>>>> +    g_free(buf);
>>>>> +}
>>>>> +
>>>>> +
>>>>> +/* Called with iothread lock taken.  */
>>>>> +
>>>>> +static void set_dirty_tracking(void)
>>>>> +{
>>>>> +    DirtyBitmapMigBitmapState *dbms;
>>>>> +
>>>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list,
>>>>> entry) {
>>>>> +        dbms->dirty_bitmap =
>>>>> +            bdrv_create_dirty_dirty_bitmap(dbms->bitmap, CHUNK_SIZE);
>>>>> +    }
>>>>> +}
>>>>> +
>>>>
>>>> OK: so we only have these dirty-dirty bitmaps when migration is
>>>> starting, which makes sense.
>>>>
>>>>> +static void unset_dirty_tracking(void)
>>>>> +{
>>>>> +    DirtyBitmapMigBitmapState *dbms;
>>>>> +
>>>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list,
>>>>> entry) {
>>>>> +        bdrv_release_dirty_dirty_bitmap(dbms->bitmap);
>>>>> +    }
>>>>> +}
>>>>> +
>>>>
>>>> OK.
>>>>
>>>>> +static void init_dirty_bitmap_migration(QEMUFile *f)
>>>>> +{
>>>>> +    BlockDriverState *bs;
>>>>> +    BdrvDirtyBitmap *bitmap;
>>>>> +    DirtyBitmapMigBitmapState *dbms;
>>>>> +
>>>>> +    dirty_bitmap_mig_state.bulk_completed = false;
>>>>> +    dirty_bitmap_mig_state.prev_bs = NULL;
>>>>> +    dirty_bitmap_mig_state.prev_bitmap = NULL;
>>>>> +
>>>>> +    for (bs = bdrv_next(NULL); bs; bs = bdrv_next(bs)) {
>>>>> +        for (bitmap = bdrv_next_dirty_bitmap(bs, NULL); bitmap;
>>>>> +             bitmap = bdrv_next_dirty_bitmap(bs, bitmap)) {
>>>>> +            if (!bdrv_dirty_bitmap_name(bitmap)) {
>>>>> +                continue;
>>>>> +            }
>>>>> +
>>>>> +            dbms = g_new0(DirtyBitmapMigBitmapState, 1);
>>>>> +            dbms->bs = bs;
>>>>> +            dbms->bitmap = bitmap;
>>>>> +            dbms->total_sectors = bdrv_nb_sectors(bs);
>>>>> +            dbms->sectors_per_chunk = CHUNK_SIZE * 8 *
>>>>> +                bdrv_dirty_bitmap_granularity(dbms->bs, dbms->bitmap)
>>>>> +                >> BDRV_SECTOR_BITS;
>>>>> +
>>>>> + QSIMPLEQ_INSERT_TAIL(&dirty_bitmap_mig_state.dbms_list,
>>>>> +                                 dbms, entry);
>>>>> +        }
>>>>> +    }
>>>>> +}
>>>>> +
>>>>
>>>> OK, but see the note below for dirty_bitmap_mig_init.
>>> actually it is not 'init' but 'reinit' - called on every migration
>>> start.. Hmm. dbms_list should be cleared here before fill it again.
>>>>
>>>>> +/* Called with no lock taken.  */
>>>>> +static void bulk_phase_send_chunk(QEMUFile *f,
>>>>> DirtyBitmapMigBitmapState *dbms)
>>>>> +{
>>>>> +    uint32_t nr_sectors = MIN(dbms->total_sectors - dbms->cur_sector,
>>>>> +                             dbms->sectors_per_chunk);
>>>>
>>>> What about cases where nr_sectors will put us past the end of the
>>>> bitmap? The bitmap serialization implementation might need a touchup
>>>> with this in mind.
>>> I don't understand.. nr_sectors <=  dbms->total_sectors -
>>> dbms->cur_sector and it can't put us past the end...
>>
>> Oh, because you take the minimum, so we don't have to worry about
>> sectors_per_chunk eclipsing what we have.
>>
>> Nevermind, I can't read... :(
>>
>>>>
>>>>> +
>>>>> +    send_bitmap(f, dbms, dbms->cur_sector, nr_sectors);
>>>>> +
>>>>> +    dbms->cur_sector += nr_sectors;
>>>>> +    if (dbms->cur_sector >= dbms->total_sectors) {
>>>>> +        dbms->bulk_completed = true;
>>>>> +    }
>>>>> +}
>>>>> +
>>>>> +/* Called with no lock taken.  */
>>>>> +static void bulk_phase(QEMUFile *f, bool limit)
>>>>> +{
>>>>> +    DirtyBitmapMigBitmapState *dbms;
>>>>> +
>>>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list,
>>>>> entry) {
>>>>> +        while (!dbms->bulk_completed) {
>>>>> +            bulk_phase_send_chunk(f, dbms);
>>>>> +            if (limit && qemu_file_rate_limit(f)) {
>>>>> +                return;
>>>>> +            }
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +    dirty_bitmap_mig_state.bulk_completed = true;
>>>>> +}
>>>>
>>>> OK.
>>>>
>>>>> +
>>>>> +static void blk_mig_reset_dirty_cursor(void)
>>>>> +{
>>>>> +    DirtyBitmapMigBitmapState *dbms;
>>>>> +
>>>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list,
>>>>> entry) {
>>>>> +        dbms->cur_dirty = 0;
>>>>> +    }
>>>>> +}
>>>>> +
>>>>
>>>> OK.
>>>>
>>>>> +/* Called with iothread lock taken. */
>>>>> +static void dirty_phase_send_chunk(QEMUFile *f,
>>>>> DirtyBitmapMigBitmapState *dbms)
>>>>> +{
>>>>> +    uint32_t nr_sectors;
>>>>> +
>>>>> +    while (dbms->cur_dirty < dbms->total_sectors &&
>>>>> +           !hbitmap_get(dbms->dirty_bitmap, dbms->cur_dirty)) {
>>>>> +        dbms->cur_dirty += dbms->sectors_per_chunk;
>>>>> +    }
>>>>
>>>> OK, so we fast forward the dirty cursor while the meta-bitmap is
>>>> empty. Is it not worth using the HBitmapIterator here? You can reset
>>>> them everywhere you reset the dirty cursor, and then just fast-seek to
>>>> the first dirty sector.
>>> Yes, I've thought about it, just used simpler way (copied from
>>> migration/block.c) for an early version of the patch set. I will do it.
>>
>> Only if it doesn't make things more complicated to look at.
>>
>>>>
>>>>> +
>>>>> +    if (dbms->cur_dirty >= dbms->total_sectors) {
>>>>> +        return;
>>>>> +    }
>>>>> +
>>>>> +    nr_sectors = MIN(dbms->total_sectors - dbms->cur_dirty,
>>>>> +                     dbms->sectors_per_chunk);
>>>>
>>>> What happens if nr_sectors goes past the end?
>>
>> Again, I misread.
>>
>>>>
>>>>> +    send_bitmap(f, dbms, dbms->cur_dirty, nr_sectors);
>>>>> +    hbitmap_reset(dbms->dirty_bitmap, dbms->cur_dirty,
>>>>> dbms->sectors_per_chunk);
>>>>> +    dbms->cur_dirty += nr_sectors;
>>>>> +}
>>>>> +
>>>>> +/* Called with iothread lock taken.
>>>>> + *
>>>>> + * return value:
>>>>> + * 0: too much data for max_downtime
>>>>> + * 1: few enough data for max_downtime
>>>>> +*/
>>>>
>>>> dirty_phase below doesn't have a return value.
>>> rudimentary comment.. thanks.
>>>>
>>>>> +static void dirty_phase(QEMUFile *f, bool limit)
>>>>> +{
>>>>> +    DirtyBitmapMigBitmapState *dbms;
>>>>> +
>>>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list,
>>>>> entry) {
>>>>> +        while (dbms->cur_dirty < dbms->total_sectors) {
>>>>> +            dirty_phase_send_chunk(f, dbms);
>>>>> +            if (limit && qemu_file_rate_limit(f)) {
>>>>> +                return;
>>>>> +            }
>>>>> +        }
>>>>> +    }
>>>>> +}
>>>>> +
>>>>
>>>> OK.
>>>>
>>>>> +
>>>>> +/* Called with iothread lock taken.  */
>>>>> +static void dirty_bitmap_mig_cleanup(void)
>>>>> +{
>>>>> +    DirtyBitmapMigBitmapState *dbms;
>>>>> +
>>>>> +    unset_dirty_tracking();
>>>>> +
>>>>> +    while ((dbms =
>>>>> QSIMPLEQ_FIRST(&dirty_bitmap_mig_state.dbms_list)) != NULL) {
>>>>> + QSIMPLEQ_REMOVE_HEAD(&dirty_bitmap_mig_state.dbms_list, entry);
>>>>> +        g_free(dbms);
>>>>> +    }
>>>>> +}
>>>>> +
>>>>
>>>> OK.
>>>>
>>>>> +static void dirty_bitmap_migration_cancel(void *opaque)
>>>>> +{
>>>>> +    dirty_bitmap_mig_cleanup();
>>>>> +}
>>>>> +
>>>>
>>>> OK.
>>>>
>>>>> +static int dirty_bitmap_save_iterate(QEMUFile *f, void *opaque)
>>>>> +{
>>>>> +    DPRINTF("Enter save live iterate\n");
>>>>> +
>>>>> +    blk_mig_reset_dirty_cursor();
>>>>
>>>> I suppose this is because it's easier to check if we are finished by
>>>> starting from sector 0 every time.
>>>>
>>>> A harder, but faster method may be: Use HBitmapIterators, but don't
>>>> reset them every iteration: just iterate until the end, and check that
>>>> the bitmap is empty. If the meta bitmap is empty, the dirty phase is
>>>> complete. If the meta bitmap is NOT empty, reset the HBI and continue
>>>> allowing iterations over the dirty phase.
>>> Ok, will do.
>>>>
>>>>> +
>>>>> +    if (dirty_bitmap_mig_state.bulk_completed) {
>>>>> +        qemu_mutex_lock_iothread();
>>>>> +        dirty_phase(f, true);
>>>>> +        qemu_mutex_unlock_iothread();
>>>>> +    } else {
>>>>> +        bulk_phase(f, true);
>>>>> +    }
>>>>> +
>>>>> +    qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>>>>> +
>>>>> +    return dirty_bitmap_mig_state.bulk_completed;
>>>>> +}
>>>>> +
>>>>> +/* Called with iothread lock taken.  */
>>>>> +
>>>>> +static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
>>>>> +{
>>>>> +    DirtyBitmapMigBitmapState *dbms;
>>>>> +    DPRINTF("Enter save live complete\n");
>>>>> +
>>>>> +    if (!dirty_bitmap_mig_state.bulk_completed) {
>>>>> +        bulk_phase(f, false);
>>>>> +    }
>>>>
>>>> [Not expertly familiar with savevm:] Under what conditions can this
>>>> happen?
>>> This can happen. save_complete will happen when savevm decide that
>>> pending data size to send is small enough. It was the case for my bugfix
>>> for migration/block.c about pending. To prevent save_complete when bulk
>>> phase isn't completed, save_pending returns (in my bugfix for
>>> migration/block.c) big value. Here I decided to make more honest
>>> save_pending, so I need to complete (if it doesn't) bulk phase in
>>> save_complete.
>>
>> OK, Gotcha.
>>
>>>>
>>>>> +
>>>>> +    blk_mig_reset_dirty_cursor();
>>>>> +    dirty_phase(f, false);
>>>>> +
>>>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list,
>>>>> entry) {
>>>>> +        uint8_t flags = DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME |
>>>>> +                        DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME |
>>>>> +                        DIRTY_BITMAP_MIG_FLAG_ENABLED;
>>>>> +
>>>>> +        qemu_put_byte(f, flags);
>>>>> +        qemu_put_name(f, bdrv_get_device_name(dbms->bs));
>>>>> +        qemu_put_name(f, bdrv_dirty_bitmap_name(dbms->bitmap));
>>>>> +        qemu_put_byte(f, bdrv_dirty_bitmap_enabled(dbms->bitmap));
>>>>> +    }
>>>>> +
>>>>> +    qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>>>>> +
>>>>> +    DPRINTF("Dirty bitmaps migration completed\n");
>>>>> +
>>>>> +    dirty_bitmap_mig_cleanup();
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>
>>>> I suppose we don't need a flag that distinctly SAYS this is the end
>>>> section, since we can tell by omission of
>>>> DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK or ZERO_CHUNK.
>>> Hmm. I think it simplifies the logic (to use EOS after each section).
>>> And the same approach is in migration/block.c.. It's a question about
>>> which format is better:  "Each section for dirty_bitmap_load ends with
>>> EOS" or "Each section for dirty_bitmap_load ends with EOS except the
>>> last one. The last one may be recognized by absent NORMAL_CHUNK and
>>> ZERO_CHUNK"
>>>>
>>>>> +static uint64_t dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
>>>>> +                                          uint64_t max_size)
>>>>> +{
>>>>> +    DirtyBitmapMigBitmapState *dbms;
>>>>> +    uint64_t pending = 0;
>>>>> +
>>>>> +    qemu_mutex_lock_iothread();
>>>>> +
>>>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list,
>>>>> entry) {
>>>>> +        uint64_t sectors = hbitmap_count(dbms->dirty_bitmap);
>>>>> +        if (!dbms->bulk_completed) {
>>>>> +            sectors += dbms->total_sectors - dbms->cur_sector;
>>>>> +        }
>>>>> +        pending += bdrv_dirty_bitmap_data_size(dbms->bitmap,
>>>>> sectors);
>>>>> +    }
>>>>> +
>>>>> +    qemu_mutex_unlock_iothread();
>>>>> +
>>>>> +    DPRINTF("Enter save live pending %" PRIu64 ", max: %" PRIu64
>>>>> "\n",
>>>>> +            pending, max_size);
>>>>> +    return pending;
>>>>> +}
>>>>> +
>>>>
>>>> OK.
>>>>
>>>>> +static int dirty_bitmap_load(QEMUFile *f, void *opaque, int
>>>>> version_id)
>>>>> +{
>>>>> +    int flags;
>>>>> +
>>>>> +    static char device_name[256], bitmap_name[256];
>>>>> +    static BlockDriverState *bs;
>>>>> +    static BdrvDirtyBitmap *bitmap;
>>>>> +
>>>>> +    uint8_t *buf;
>>>>> +    uint64_t first_sector;
>>>>> +    uint32_t  nr_sectors;
>>>>> +    int ret;
>>>>> +
>>>>> +    DPRINTF("load start\n");
>>>>> +
>>>>> +    do {
>>>>> +        flags = qemu_get_byte(f);
>>>>> +        DPRINTF("flags: %x\n", flags);
>>>>> +
>>>>> +        if (flags & DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME) {
>>>>> +            qemu_get_name(f, device_name);
>>>>> +            bs = bdrv_find(device_name);
>>>>
>>>> Similar to the above confusion, you may want bdrv_lookup_bs or
>>>> similar, since we're going to be looking for BDS nodes instead of
>>>> "devices."
>>> In this case, should it be changed in migration/block.c too?
>>
>> [See discussion above!]
>>
>>>>
>>>>> +            if (!bs) {
>>>>> +                fprintf(stderr, "Error: unknown block device '%s'\n",
>>>>> +                        device_name);
>>>>> +                return -EINVAL;
>>>>> +            }
>>>>> +        }
>>>>> +
>>>>> +        if (flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
>>>>> +            if (!bs) {
>>>>> +                fprintf(stderr, "Error: block device name is not
>>>>> set\n");
>>>>> +                return -EINVAL;
>>>>> +            }
>>>>> +
>>>>> +            qemu_get_name(f, bitmap_name);
>>>>> +            bitmap = bdrv_find_dirty_bitmap(bs, bitmap_name);
>>>>> +            if (flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY) {
>>>>> +                /* First chunk from this bitmap */
>>>>> +                uint64_t granularity = qemu_get_be64(f);
>>>>> +                if (!bitmap) {
>>>>> +                    Error *local_err = NULL;
>>>>> +                    bitmap = bdrv_create_dirty_bitmap(bs,
>>>>> granularity,
>>>>> + bitmap_name,
>>>>> + &local_err);
>>>>> +                    if (!bitmap) {
>>>>> +                        error_report("%s",
>>>>> error_get_pretty(local_err));
>>>>> +                        error_free(local_err);
>>>>> +                        return -EINVAL;
>>>>> +                    }
>>>>> +                } else {
>>>>> +                    uint64_t dest_granularity =
>>>>> +                        bdrv_dirty_bitmap_granularity(bs, bitmap);
>>>>> +                    if (dest_granularity != granularity) {
>>>>> +                        fprintf(stderr,
>>>>> +                                "Error: "
>>>>> +                                "Migrated bitmap granularity (%"
>>>>> PRIu64 ") "
>>>>> +                                "is not match with destination
>>>>> bitmap '%s' "
>>>>> +                                "granularity (%" PRIu64 ")\n",
>>>>> +                                granularity,
>>>>> +                                bitmap_name,
>>>>> +                                dest_granularity);
>>>>> +                        return -EINVAL;
>>>>> +                    }
>>>>> +                }
>>>>> +                bdrv_disable_dirty_bitmap(bitmap);
>>>>> +            }
>>>>> +            if (!bitmap) {
>>>>> +                fprintf(stderr, "Error: unknown dirty bitmap "
>>>>> +                        "'%s' for block device '%s'\n",
>>>>> +                        bitmap_name, device_name);
>>>>> +                return -EINVAL;
>>>>> +            }
>>>>> +        }
>>>>> +
>>>>> +        if (flags & DIRTY_BITMAP_MIG_FLAG_ENABLED) {
>>>>> +            bool enabled;
>>>>> +            if (!bitmap) {
>>>>> +                fprintf(stderr, "Error: dirty bitmap name is not
>>>>> set\n");
>>>>> +                return -EINVAL;
>>>>> +            }
>>>>> +            bdrv_dirty_bitmap_deserialize_finish(bitmap);
>>>>> +            /* complete migration */
>>>>> +            enabled = qemu_get_byte(f);
>>>>> +            if (enabled) {
>>>>> +                bdrv_enable_dirty_bitmap(bitmap);
>>>>> +            }
>>>>> +        }
>>>>
>>>> Oh, so you use the ENABLED flag to show that migration is over.
>>> Yes, it was bad idea..
>>>> If we are going to commit to a stream format for bitmaps, though,
>>>> maybe it's best to actually create a "COMPLETION BLOCK" flag and then
>>>> split this function into two pieces:
>>>>
>>>> (1) The part that receives regular / zero blocks, and
>>>> (2) The part that receives completion data.
>>>>
>>>> That way, if we change the properties that bitmaps have down the line,
>>>> we aren't reliant on literally the "enabled" flag to decide what to do.
>>>>
>>>> Also, it might help make this fairly long function a little smaller
>>>> and more readable.
>>> Ok.
>>>>
>>>>> +
>>>>> +        if (flags & (DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK |
>>>>> +                     DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK)) {
>>>>> +            if (!bs) {
>>>>> +                fprintf(stderr, "Error: block device name is not
>>>>> set\n");
>>>>> +                return -EINVAL;
>>>>> +            }
>>>>> +            if (!bitmap) {
>>>>> +                fprintf(stderr, "Error: dirty bitmap name is not
>>>>> set\n");
>>>>> +                return -EINVAL;
>>>>> +            }
>>>>> +
>>>>> +            first_sector = qemu_get_be64(f);
>>>>> +            nr_sectors = qemu_get_be32(f);
>>>>> +            DPRINTF("chunk: %lu %u\n", first_sector, nr_sectors);
>>>>> +
>>>>> +
>>>>> +            if (flags & DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK) {
>>>>> +                bdrv_dirty_bitmap_deserialize_part0(bitmap,
>>>>> first_sector,
>>>>> + nr_sectors);
>>>>> +            } else {
>>>>> +                uint64_t buf_size = qemu_get_be64(f);
>>>>> +                uint64_t needed_size =
>>>>> +                    bdrv_dirty_bitmap_data_size(bitmap, nr_sectors);
>>>>> +
>>>>> +                if (needed_size > buf_size) {
>>>>> +                    fprintf(stderr,
>>>>> +                            "Error: Migrated bitmap granularity is
>>>>> not "
>>>>> +                            "match with destination bitmap
>>>>> granularity\n");
>>>>> +                    return -EINVAL;
>>>>> +                }
>>>>> +
>>>>
>>>> "Migrated bitmap granularity doesn't match the destination bitmap
>>>> granularity" perhaps.
>>>>
>>>>> +                buf = g_malloc(buf_size);
>>>>> +                qemu_get_buffer(f, buf, buf_size);
>>>>> +                bdrv_dirty_bitmap_deserialize_part(bitmap, buf,
>>>>> + first_sector,
>>>>> + nr_sectors);
>>>>> +                g_free(buf);
>>>>> +            }
>>>>> +        }
>>>>> +
>>>>> +        ret = qemu_file_get_error(f);
>>>>> +        if (ret != 0) {
>>>>> +            return ret;
>>>>> +        }
>>>>> +    } while (!(flags & DIRTY_BITMAP_MIG_FLAG_EOS));
>>>>> +
>>>>> +    DPRINTF("load finish\n");
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>> +static void dirty_bitmap_set_params(const MigrationParams *params,
>>>>> void *opaque)
>>>>> +{
>>>>> +    dirty_bitmap_mig_state.migration_enable = params->dirty;
>>>>> +}
>>>>> +
>>>>
>>>> OK; though I am not immediately aware of what changes need to happen
>>>> to accommodate Eric's suggestions.
>>> This function will be dropped in v3.
>>>>
>>>>> +static bool dirty_bitmap_is_active(void *opaque)
>>>>> +{
>>>>> +    return dirty_bitmap_mig_state.migration_enable == 1;
>>>>> +}
>>>>> +
>>>>
>>>> OK.
>>>>
>>>>> +static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
>>>>> +{
>>>>> +    init_dirty_bitmap_migration(f);
>>>>> +
>>>>> +    qemu_mutex_lock_iothread();
>>>>> +    /* start track dirtyness of dirty bitmaps */
>>>>> +    set_dirty_tracking();
>>>>> +    qemu_mutex_unlock_iothread();
>>>>> +
>>>>> +    blk_mig_reset_dirty_cursor();
>>>>> +    qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>>>>> +
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>
>>>> OK; see dirty_bitmap_mig_init below, though.
>>>>
>>>>> +static SaveVMHandlers savevm_block_handlers = {
>>>>> +    .set_params = dirty_bitmap_set_params,
>>>>> +    .save_live_setup = dirty_bitmap_save_setup,
>>>>> +    .save_live_iterate = dirty_bitmap_save_iterate,
>>>>> +    .save_live_complete = dirty_bitmap_save_complete,
>>>>> +    .save_live_pending = dirty_bitmap_save_pending,
>>>>> +    .load_state = dirty_bitmap_load,
>>>>> +    .cancel = dirty_bitmap_migration_cancel,
>>>>> +    .is_active = dirty_bitmap_is_active,
>>>>> +};
>>>>> +
>>>>> +void dirty_bitmap_mig_init(void)
>>>>> +{
>>>>> +    QSIMPLEQ_INIT(&dirty_bitmap_mig_state.dbms_list);
>>>>
>>>> Maybe I haven't looked thoroughly enough yet, but it's weird that part
>>>> of the dirty_bitmap_mig_state is initialized here, and the rest of it
>>>> in init_dirty_bitmap_migration. I'd prefer to keep it all together, if
>>>> possible.
>>> dirty_bitmap_mig_init is called one time when qemu starts. QSIMPLEQ_INIT
>>> should be called once. dirty_bitmap_save_setup is called on every
>>> migration start, it's like 'reinitialize'.
>>>>
>>>>> +
>>>>> +    register_savevm_live(NULL, "dirty-bitmap", 0, 1,
>>>>> &savevm_block_handlers,
>>>>> +                         &dirty_bitmap_mig_state);
>>>>> +}
>>>>
>>>> OK.
>>>>
>>>>> diff --git a/vl.c b/vl.c
>>>>> index a824a7d..dee7220 100644
>>>>> --- a/vl.c
>>>>> +++ b/vl.c
>>>>> @@ -4184,6 +4184,7 @@ int main(int argc, char **argv, char **envp)
>>>>>
>>>>>       blk_mig_init();
>>>>>       ram_mig_init();
>>>>> +    dirty_bitmap_mig_init();
>>>>>
>>>>>       /* If the currently selected machine wishes to override the
>>>>> units-per-bus
>>>>>        * property of its default HBA interface type, do so now. */
>>>>>
>>>>
>>>> Hm, since dirty bitmaps are a sub-component of the block layer, would
>>>> it not make sense to put this hook under blk_mig_init, perhaps?
>>> IMHO the reason to put it here is to keep all
>>> register_savevm_live-entities in one place.
>>
>> If you still feel that way I won't withhold my R-b, but there are
>> already other cases such as ppc_spapr_init which are not in this
>> general area of vl.c.
>>
>> Plus the dozens of devices that use register_savevm as a wrapper to
>> register_savevm_live, so maybe consolidating calls to this function
>> isn't that important.
> Hm, I've missed it, ppc_spapr_init is not here, yes.. Another thing
> here: dirty bitmaps migration are separate from blk migration. And it
> may be used without blk migration (nbd+mirrow migration may be used)..
> Is it ok to connect dirty bitmaps migration to blk_mig_init, which is
> located in migration/block.c, which may not be used at all when we
> bitmaps are migrated using migration/dirty-bitmap.c?
> In other words, yes, dirty bitmaps are a sub-component of the block
> layer, but dirty bitmap migration is not a sub-component of blk migration.
>>
>>>>
>>>>
>>>> Overall this looks very clean compared to the intermingled format in
>>>> V1, and the code is organized pretty well. Just a few minor comments,
>>>> and I'd like to get the opinion of the migration maintainers, but I am
>>>> happy. Sorry it took me so long to review, please feel free to let me
>>>> know if you disagree with any of my opinions :)
>>>>
>>>> Thank you,
>>>> --John
>>>
>>> Thank you for reviewing my series)
>>>
>>
>> Yup. Hopefully I didn't miss too much that will irritate the Migration
>> overlords.
>>
>> Once you respin on top of v12, I can run some thorough migration tests
>> on it (perhaps over a weekend) and verify that it survives a couple
>> hundred migrations without any kind of integrity loss.
> I hope I'll do it with all other things in about two days.
>>
>> This is what makes sense to me right now, anyway.
>>
>> Do you think you'll be including the bitmap checksum in the
>> BlockDirtyInfo command? That'd be convenient for iotests.
> Ok, will do. Good idea. Only two points:
> 1) Is it ok to include debug info into BlockDirtyInfo? Will users be
> happy with it?
> 2) When I was debugging my code, the information about dirty-regions was
> very useful. Now, all is working and checksums are enough for regression
> control.

I think leaving some tactical debug prints in the code, disabled, is 
perfectly fine from my personal standpoint. We're not really done 
implementing all of these features yet and they may yet be useful.

I'd vote for leaving in any non-QMP/QAPI debug information you wish, for 
now. We just have to be careful about interfaces -- no QMP/HMP commands.

As long as it looks reasonably tidy, I don't think it will be a problem.

-- 
—js

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 8/8] migration: add migration/dirty-bitmap.c
  2015-02-16 18:18           ` John Snow
@ 2015-02-16 18:22             ` Dr. David Alan Gilbert
  2015-02-17  8:54             ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 35+ messages in thread
From: Dr. David Alan Gilbert @ 2015-02-16 18:22 UTC (permalink / raw)
  To: John Snow
  Cc: kwolf, Peter Maydell,
	Juan quin >> Juan Jose Quintela Carreira, qemu-devel,
	Vladimir Sementsov-Ogievskiy, stefanha, den, amit Shah, pbonzini

A small request on this patch;  please make it migration/block-dirty-bitmap.c;
we've got enough bitmaps floating around the RAM code in migration that I
wouldn't want to confuse things.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 8/8] migration: add migration/dirty-bitmap.c
  2015-02-16 18:18           ` John Snow
  2015-02-16 18:22             ` Dr. David Alan Gilbert
@ 2015-02-17  8:54             ` Vladimir Sementsov-Ogievskiy
  2015-02-17 18:45               ` John Snow
  1 sibling, 1 reply; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-02-17  8:54 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: kwolf, Peter Maydell,
	Juan quin >> Juan Jose Quintela Carreira,
	Dr. David Alan Gilbert, stefanha, den, amit Shah, pbonzini

On 16.02.2015 21:18, John Snow wrote:
>
>
> On 02/16/2015 07:06 AM, Vladimir Sementsov-Ogievskiy wrote:
>> On 13.02.2015 23:22, John Snow wrote:
>>>
>>>
>>> On 02/13/2015 03:19 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>> On 11.02.2015 00:33, John Snow wrote:

>>>> So in summary:
>>>> using device names is probably fine for now, as it matches the current
>>>> use case of bitmaps as well as drive migration; but using node names
>>>> may give us more power and precision later.
>>>>
>>>> I talked to Max about it, and he is leaning towards using device names
>>>> for now and switching to node names if we decide we want that power.
>>>>
>>>> (...I wonder if we could use a flag, for now, that says we're
>>>> including DEVICE names. Later, we could add a flag that says we're
>>>> using NODE names and add an option to toggle as the usage case sees 
>>>> fit.)
>>>>
>>>>
>>>> Are you confused yet? :D
>> O, thanks for the explanation). Are we really need this flag? As Markus
>> wrote, nodes and devices are sharing namespaces.. We can use
>> bdrv_lookup_bs(name, name, errp)..
>
> what 'name' are you using here, though? It looked to me like in your 
> backup routine we got a list of BDS entries and get the name *from* 
> the BDS, so we still have to think about how we want to /get/ the name.
>
>>
>> Also, we can, for example, send bitmaps as follows:
>>
>> if node has name - send bitmap with this name
>> if node is root, but hasn't name - send it with blk name
>> otherwise - don't send the bitmap
>
> The node a bitmap is attached to should always have a name -- it would 
> not be possible via the existing interface to attach it to a node 
> without a name.
>
> I *think* the root node should always have a name, but I am actually 
> less sure of that.
>
Hmm.. No? bitmap is attached using bdrv_lookup_bs(name, name, errp), 
which can find device with this name. qemu option -drive 
file=...,id=disk creates blk named 'disk' and attached node with no name.


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 8/8] migration: add migration/dirty-bitmap.c
  2015-02-17  8:54             ` Vladimir Sementsov-Ogievskiy
@ 2015-02-17 18:45               ` John Snow
  2015-02-17 19:12                 ` Eric Blake
  0 siblings, 1 reply; 35+ messages in thread
From: John Snow @ 2015-02-17 18:45 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, Peter Maydell,
	Juan quin >> Juan Jose Quintela Carreira,
	Dr. David Alan Gilbert, stefanha, den, amit Shah, pbonzini



On 02/17/2015 03:54 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 16.02.2015 21:18, John Snow wrote:
>>
>>
>> On 02/16/2015 07:06 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> On 13.02.2015 23:22, John Snow wrote:
>>>>
>>>>
>>>> On 02/13/2015 03:19 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>>> On 11.02.2015 00:33, John Snow wrote:
>
>>>>> So in summary:
>>>>> using device names is probably fine for now, as it matches the current
>>>>> use case of bitmaps as well as drive migration; but using node names
>>>>> may give us more power and precision later.
>>>>>
>>>>> I talked to Max about it, and he is leaning towards using device names
>>>>> for now and switching to node names if we decide we want that power.
>>>>>
>>>>> (...I wonder if we could use a flag, for now, that says we're
>>>>> including DEVICE names. Later, we could add a flag that says we're
>>>>> using NODE names and add an option to toggle as the usage case sees
>>>>> fit.)
>>>>>
>>>>>
>>>>> Are you confused yet? :D
>>> O, thanks for the explanation). Are we really need this flag? As Markus
>>> wrote, nodes and devices are sharing namespaces.. We can use
>>> bdrv_lookup_bs(name, name, errp)..
>>
>> what 'name' are you using here, though? It looked to me like in your
>> backup routine we got a list of BDS entries and get the name *from*
>> the BDS, so we still have to think about how we want to /get/ the name.
>>
>>>
>>> Also, we can, for example, send bitmaps as follows:
>>>
>>> if node has name - send bitmap with this name
>>> if node is root, but hasn't name - send it with blk name
>>> otherwise - don't send the bitmap
>>
>> The node a bitmap is attached to should always have a name -- it would
>> not be possible via the existing interface to attach it to a node
>> without a name.
>>
>> I *think* the root node should always have a name, but I am actually
>> less sure of that.
>>
> Hmm.. No? bitmap is attached using bdrv_lookup_bs(name, name, errp),
> which can find device with this name. qemu option -drive
> file=...,id=disk creates blk named 'disk' and attached node with no name.
>
>

Very good point -- We use the device name as a convenience shortcut to 
the first node. So the node a bitmap is attached to might in fact not 
have a name.

I see what you mean.

Hmm. So you propose, on the sending side:

(1) Use this node's name, if available
(2) Fall back to the backend/"device" name, if present,
(3) Do not send the bitmap if neither names are present (THIS scenario 
should be impossible to achieve and should trigger an error.)

What about on the receiving end? To which node(s) do we attach the 
bitmap? I assume:

(1) Try to find a node with this name and attach it there,
(2) Fall back to attaching it to the root node of a device/BB with this name
(3) If neither are present, abort.

Is that right? If so, it sounds serviceable to me, but keep in mind that 
bdrv_lookup_bs() on the receiving end will default to #2 first before #1.

Overall, this makes the migration very "fuzzy" in terms of which bitmaps 
will go where, and the onus is really on the user (or management tool) 
to keep trees compatibly similar for the purposes of bitmap migration.

I still wonder if making the exportation of names more explicit in terms 
of a flag ("this name is a node-name", "this name is a backend-name") 
might still be of use for providing more rigid and knowable migration 
semantics.

Might as well go ahead with your suggestion for now, and we'll see if 
anyone from migration or libvirt groups has a reason not to do it that 
way. I don't have any concrete reasons.

--js

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 8/8] migration: add migration/dirty-bitmap.c
  2015-02-17 18:45               ` John Snow
@ 2015-02-17 19:12                 ` Eric Blake
  0 siblings, 0 replies; 35+ messages in thread
From: Eric Blake @ 2015-02-17 19:12 UTC (permalink / raw)
  To: John Snow, Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, Peter Maydell,
	Juan quin >> Juan Jose Quintela Carreira,
	Dr. David Alan Gilbert, stefanha, pbonzini, amit Shah, den

[-- Attachment #1: Type: text/plain, Size: 2488 bytes --]

On 02/17/2015 11:45 AM, John Snow wrote:

>>>
>> Hmm.. No? bitmap is attached using bdrv_lookup_bs(name, name, errp),
>> which can find device with this name. qemu option -drive
>> file=...,id=disk creates blk named 'disk' and attached node with no name.
>>
>>
> 
> Very good point -- We use the device name as a convenience shortcut to
> the first node. So the node a bitmap is attached to might in fact not
> have a name.

Or, if we revisit Jeff's proposal to always auto-name every node, then
you'd never have a node without a name.  Ultimately, libvirt should be
using named nodes, but we're not there yet.

> 
> I see what you mean.
> 
> Hmm. So you propose, on the sending side:
> 
> (1) Use this node's name, if available
> (2) Fall back to the backend/"device" name, if present,
> (3) Do not send the bitmap if neither names are present (THIS scenario
> should be impossible to achieve and should trigger an error.)

Seems okay to me.

> 
> What about on the receiving end? To which node(s) do we attach the
> bitmap? I assume:
> 
> (1) Try to find a node with this name and attach it there,
> (2) Fall back to attaching it to the root node of a device/BB with this
> name
> (3) If neither are present, abort.
> 

That would also work. Since node names and device names share the same
namespace, it should always resolve to the correct node.

> Is that right? If so, it sounds serviceable to me, but keep in mind that
> bdrv_lookup_bs() on the receiving end will default to #2 first before #1.

Except that if a device is named "foo", then no node has that name, so
it is unambiguous that you meant the node attached to device "foo".

> 
> Overall, this makes the migration very "fuzzy" in terms of which bitmaps
> will go where, and the onus is really on the user (or management tool)
> to keep trees compatibly similar for the purposes of bitmap migration.
> 
> I still wonder if making the exportation of names more explicit in terms
> of a flag ("this name is a node-name", "this name is a backend-name")
> might still be of use for providing more rigid and knowable migration
> semantics.
> 
> Might as well go ahead with your suggestion for now, and we'll see if
> anyone from migration or libvirt groups has a reason not to do it that
> way. I don't have any concrete reasons.
> 
> --js
> 
> 
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2015-02-17 19:12 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-27 10:56 [Qemu-devel] [PATCH RFC v2 0/8] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 1/8] qmp: print dirty bitmap Vladimir Sementsov-Ogievskiy
2015-01-27 16:17   ` Eric Blake
2015-01-27 16:23     ` Vladimir Sementsov-Ogievskiy
2015-02-10 21:28   ` John Snow
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 2/8] hbitmap: serialization Vladimir Sementsov-Ogievskiy
2015-02-10 21:29   ` John Snow
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 3/8] block: BdrvDirtyBitmap serialization interface Vladimir Sementsov-Ogievskiy
2015-02-10 21:29   ` John Snow
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 4/8] block: add dirty-dirty bitmaps Vladimir Sementsov-Ogievskiy
2015-02-10 21:30   ` John Snow
2015-02-12 10:51     ` Vladimir Sementsov-Ogievskiy
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 5/8] block: add bdrv_dirty_bitmap_enabled() Vladimir Sementsov-Ogievskiy
2015-02-10 21:30   ` John Snow
2015-02-12 10:54     ` Vladimir Sementsov-Ogievskiy
2015-02-12 16:22       ` John Snow
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 6/8] block: add bdrv_next_dirty_bitmap() Vladimir Sementsov-Ogievskiy
2015-02-10 21:31   ` John Snow
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 7/8] migration: add dirty parameter Vladimir Sementsov-Ogievskiy
2015-01-27 16:20   ` Eric Blake
2015-02-04 14:42     ` Vladimir Sementsov-Ogievskiy
2015-02-10 21:32   ` John Snow
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 8/8] migration: add migration/dirty-bitmap.c Vladimir Sementsov-Ogievskiy
2015-02-10 21:33   ` John Snow
2015-02-13  8:19     ` Vladimir Sementsov-Ogievskiy
2015-02-13  9:06       ` Vladimir Sementsov-Ogievskiy
2015-02-13 17:32         ` John Snow
2015-02-13 17:41           ` Vladimir Sementsov-Ogievskiy
2015-02-13 20:22       ` John Snow
2015-02-16 12:06         ` Vladimir Sementsov-Ogievskiy
2015-02-16 18:18           ` John Snow
2015-02-16 18:22             ` Dr. David Alan Gilbert
2015-02-17  8:54             ` Vladimir Sementsov-Ogievskiy
2015-02-17 18:45               ` John Snow
2015-02-17 19:12                 ` Eric Blake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.