All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/8] Dirty bitmaps migration
@ 2014-12-11 14:17 Vladimir Sementsov-Ogievskiy
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 1/9] block: rename bdrv_reset_dirty_bitmap Vladimir Sementsov-Ogievskiy
                   ` (8 more replies)
  0 siblings, 9 replies; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2014-12-11 14:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, den, vsementsov, jsnow, stefanha

These patches provide dirty bitmap migration feature. Only named dirty
bitmaps are to be migrated. Migration is made as a part of block
migration in block-migration.c.

Dirty bitmap migration may be enabled by "dirty" parameter for qmp migrate
command. If "blk" and "inc" parameters are false when "dirty" is true
block migration is actually skipped: no allocatoions, no bdrv_read's,
no bdrv_write's, only bitmaps are migrated.

The patch set includes two my previous bug fixes, which are necessary
for it. The patch set is based on Incremental backup series by John
Snow.

Vladimir Sementsov-Ogievskiy (8):
  block-migration: fix pending() return value
  block: fix spoiling all dirty bitmaps by mirror and migration
  hbitmap: store / restore
  block: BdrvDirtyBitmap store/restore interface
  block-migration: tiny refactoring
  block-migration: remove not needed iothread lock
  migration: add dirty parameter
  block-migration: add named dirty bitmaps migration

 block-migration.c             | 232 +++++++++++++++++++++++++++++++++---------
 block.c                       |  76 +++++++++++++-
 block/mirror.c                |  11 +-
 hmp-commands.hx               |  10 +-
 hmp.c                         |   4 +-
 include/block/block.h         |  16 ++-
 include/migration/migration.h |   1 +
 include/qemu/hbitmap.h        |  49 +++++++++
 migration.c                   |   4 +-
 qapi-schema.json              |   2 +-
 qmp-commands.hx               |   5 +-
 savevm.c                      |   4 +-
 util/hbitmap.c                |  84 +++++++++++++++
 13 files changed, 434 insertions(+), 64 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH 1/9] block: rename bdrv_reset_dirty_bitmap
  2014-12-11 14:17 [Qemu-devel] [PATCH 0/8] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
@ 2014-12-11 14:17 ` Vladimir Sementsov-Ogievskiy
  2015-01-08 21:19   ` John Snow
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 2/9] block-migration: fix pending() return value Vladimir Sementsov-Ogievskiy
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2014-12-11 14:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, den, vsementsov, jsnow, stefanha

We will need functions for set/unset a subregion of BdrvDirtyBitmap, to
fix migration and mirror (accordingly to the following patch "block: fix
spoiling all dirty bitmaps by mirror and migration"). Having the
old function 'bdrv_reset_dirty_bitmap' we'll have to add functions like

bdrv_set_dirty_bitmap_region and bdrv_reset_dirty_bitmap_region

But it is more consistent to have

bdrv_set_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
                      uint64_t start, uint64_t count)
bdrv_reset_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
                        uint64_t start, uint64_t count)

for more transparent access to underlaying hbitmap interface. So, here
we rename the considered function to 'bdrv_clear_dirty_bitmap'.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 block.c               | 2 +-
 block/backup.c        | 2 +-
 include/block/block.h | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index 677bc6f..a37dc2f 100644
--- a/block.c
+++ b/block.c
@@ -5364,7 +5364,7 @@ BdrvDirtyBitmap *bdrv_copy_dirty_bitmap(BlockDriverState *bs,
     return new_bitmap;
 }
 
-void bdrv_reset_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap)
+void bdrv_clear_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap)
 {
     hbitmap_reset(bitmap->bitmap, 0, bitmap->size);
 }
diff --git a/block/backup.c b/block/backup.c
index 2aab68f..da87581 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -449,7 +449,7 @@ void backup_start(BlockDriverState *bs, BlockDriverState *target,
         case BITMAP_USE_MODE_RESET:
             original = sync_bitmap;
             sync_bitmap = bdrv_copy_dirty_bitmap(bs, sync_bitmap, NULL);
-            bdrv_reset_dirty_bitmap(bs, original);
+            bdrv_clear_dirty_bitmap(bs, original);
             break;
         case BITMAP_USE_MODE_CONSUME:
             bdrv_dirty_bitmap_make_anon(bs, sync_bitmap);
diff --git a/include/block/block.h b/include/block/block.h
index e535581..368a371 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -437,7 +437,7 @@ BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,
 BdrvDirtyBitmap *bdrv_find_dirty_bitmap(BlockDriverState *bs,
                                         const char *name);
 void bdrv_dirty_bitmap_make_anon(BlockDriverState *bs, BdrvDirtyBitmap *bitmap);
-void bdrv_reset_dirty_bitmap(BlockDriverState *bs,
+void bdrv_clear_dirty_bitmap(BlockDriverState *bs,
                              BdrvDirtyBitmap *bitmap);
 BdrvDirtyBitmap *bdrv_copy_dirty_bitmap(BlockDriverState *bs,
                                         const BdrvDirtyBitmap *bitmap,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH 2/9] block-migration: fix pending() return value
  2014-12-11 14:17 [Qemu-devel] [PATCH 0/8] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 1/9] block: rename bdrv_reset_dirty_bitmap Vladimir Sementsov-Ogievskiy
@ 2014-12-11 14:17 ` Vladimir Sementsov-Ogievskiy
  2015-01-08 21:20   ` John Snow
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 3/9] block: fix spoiling all dirty bitmaps by mirror and migration Vladimir Sementsov-Ogievskiy
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2014-12-11 14:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, den, vsementsov, jsnow, stefanha

Because of wrong return value of .save_live_pending() in
block-migration, migration finishes before the whole disk
is transferred. Such situation occures when the migration
process is fast enouth, for example when source and dest
are on the same host.

If in the bulk phase we return something < max_size, we will skip
transferring the tail of the device. Currently we have "set pending to
BLOCK_SIZE if it is zero" for bulk phase, but there no guarantee, that
it will be < max_size.

True approach is to return, for example, max_size+1 when we are in the
bulk phase.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 block-migration.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block-migration.c b/block-migration.c
index 6f3aa18..8df30d9 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -757,8 +757,8 @@ static uint64_t block_save_pending(QEMUFile *f, void *opaque, uint64_t max_size)
                        block_mig_state.read_done * BLOCK_SIZE;
 
     /* Report at least one block pending during bulk phase */
-    if (pending == 0 && !block_mig_state.bulk_completed) {
-        pending = BLOCK_SIZE;
+    if (pending <= max_size && !block_mig_state.bulk_completed) {
+        pending = max_size + BLOCK_SIZE;
     }
     blk_mig_unlock();
     qemu_mutex_unlock_iothread();
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH 3/9] block: fix spoiling all dirty bitmaps by mirror and migration
  2014-12-11 14:17 [Qemu-devel] [PATCH 0/8] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 1/9] block: rename bdrv_reset_dirty_bitmap Vladimir Sementsov-Ogievskiy
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 2/9] block-migration: fix pending() return value Vladimir Sementsov-Ogievskiy
@ 2014-12-11 14:17 ` Vladimir Sementsov-Ogievskiy
  2015-01-08 21:20   ` John Snow
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 4/9] hbitmap: store / restore Vladimir Sementsov-Ogievskiy
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2014-12-11 14:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, den, vsementsov, jsnow, stefanha

Mirror and migration use dirty bitmaps for their purposes, and since
commit [block: per caller dirty bitmap] they use their own bitmaps, not
the global one. But they use old functions bdrv_set_dirty and
bdrv_reset_dirty, which change all dirty bitmaps.

Named dirty bitmaps series by Fam and Snow are affected: mirroring and
migration will spoil all (not related to this mirroring or migration)
named dirty bitmaps.

This patch fixes this by adding bdrv_set_dirty_bitmap and
bdrv_reset_dirty_bitmap, which change concrete bitmap. Also, to prevent
such mistakes in future, old functions bdrv_(set,reset)_dirty are made
static, for internal block usage.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
CC: John Snow <jsnow@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: Denis V. Lunev <den@openvz.org>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
---
 block-migration.c     |  5 +++--
 block.c               | 23 ++++++++++++++++++++---
 block/mirror.c        | 11 +++++++----
 include/block/block.h |  6 ++++--
 4 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/block-migration.c b/block-migration.c
index 8df30d9..5b4aa0f 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -303,7 +303,7 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
     blk->aiocb = bdrv_aio_readv(bs, cur_sector, &blk->qiov,
                                 nr_sectors, blk_mig_read_cb, blk);
 
-    bdrv_reset_dirty(bs, cur_sector, nr_sectors);
+    bdrv_reset_dirty_bitmap(bs, bmds->dirty_bitmap, cur_sector, nr_sectors);
     qemu_mutex_unlock_iothread();
 
     bmds->cur_sector = cur_sector + nr_sectors;
@@ -496,7 +496,8 @@ static int mig_save_device_dirty(QEMUFile *f, BlkMigDevState *bmds,
                 g_free(blk);
             }
 
-            bdrv_reset_dirty(bmds->bs, sector, nr_sectors);
+            bdrv_reset_dirty_bitmap(bmds->bs, bmds->dirty_bitmap, sector,
+                                    nr_sectors);
             break;
         }
         sector += BDRV_SECTORS_PER_DIRTY_CHUNK;
diff --git a/block.c b/block.c
index a37dc2f..6edf1dc 100644
--- a/block.c
+++ b/block.c
@@ -101,6 +101,10 @@ static QTAILQ_HEAD(, BlockDriverState) graph_bdrv_states =
 static QLIST_HEAD(, BlockDriver) bdrv_drivers =
     QLIST_HEAD_INITIALIZER(bdrv_drivers);
 
+static void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
+                           int nr_sectors);
+static void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector,
+                             int nr_sectors);
 /* If non-zero, use only whitelisted block drivers */
 static int use_bdrv_whitelist;
 
@@ -5495,8 +5499,20 @@ void bdrv_dirty_iter_set(HBitmapIter *hbi, int64_t offset)
     hbitmap_iter_init(hbi, hbi->hb, offset);
 }
 
-void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
-                    int nr_sectors)
+void bdrv_set_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
+                           int64_t cur_sector, int nr_sectors)
+{
+    hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
+}
+
+void bdrv_reset_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
+                             int64_t cur_sector, int nr_sectors)
+{
+    hbitmap_reset(bitmap->bitmap, cur_sector, nr_sectors);
+}
+
+static void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
+                           int nr_sectors)
 {
     BdrvDirtyBitmap *bitmap;
     QLIST_FOREACH(bitmap, &bs->dirty_bitmaps, list) {
@@ -5507,7 +5523,8 @@ void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
     }
 }
 
-void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector, int nr_sectors)
+static void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector,
+                             int nr_sectors)
 {
     BdrvDirtyBitmap *bitmap;
     QLIST_FOREACH(bitmap, &bs->dirty_bitmaps, list) {
diff --git a/block/mirror.c b/block/mirror.c
index af91ae0..f22ebd4 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -128,7 +128,8 @@ static void mirror_write_complete(void *opaque, int ret)
         BlockDriverState *source = s->common.bs;
         BlockErrorAction action;
 
-        bdrv_set_dirty(source, op->sector_num, op->nb_sectors);
+        bdrv_set_dirty_bitmap(source, s->dirty_bitmap, op->sector_num,
+                              op->nb_sectors);
         action = mirror_error_action(s, false, -ret);
         if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) {
             s->ret = ret;
@@ -145,7 +146,8 @@ static void mirror_read_complete(void *opaque, int ret)
         BlockDriverState *source = s->common.bs;
         BlockErrorAction action;
 
-        bdrv_set_dirty(source, op->sector_num, op->nb_sectors);
+        bdrv_set_dirty_bitmap(source, s->dirty_bitmap, op->sector_num,
+                              op->nb_sectors);
         action = mirror_error_action(s, true, -ret);
         if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) {
             s->ret = ret;
@@ -286,7 +288,8 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
         next_sector += sectors_per_chunk;
     }
 
-    bdrv_reset_dirty(source, sector_num, nb_sectors);
+    bdrv_reset_dirty_bitmap(source, s->dirty_bitmap, sector_num,
+                            nb_sectors);
 
     /* Copy the dirty cluster.  */
     s->in_flight++;
@@ -442,7 +445,7 @@ static void coroutine_fn mirror_run(void *opaque)
 
             assert(n > 0);
             if (ret == 1) {
-                bdrv_set_dirty(bs, sector_num, n);
+                bdrv_set_dirty_bitmap(bs, s->dirty_bitmap, sector_num, n);
                 sector_num = next;
             } else {
                 sector_num += n;
diff --git a/include/block/block.h b/include/block/block.h
index 368a371..b21233c 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -450,8 +450,10 @@ uint64_t bdrv_dbm_calc_def_granularity(BlockDriverState *bs);
 int64_t bdrv_dirty_bitmap_granularity(BlockDriverState *bs,
                                       BdrvDirtyBitmap *bitmap);
 int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap, int64_t sector);
-void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector, int nr_sectors);
-void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector, int nr_sectors);
+void bdrv_set_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
+                           int64_t cur_sector, int nr_sectors);
+void bdrv_reset_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
+                             int64_t cur_sector, int nr_sectors);
 void bdrv_dirty_iter_init(BlockDriverState *bs,
                           BdrvDirtyBitmap *bitmap, struct HBitmapIter *hbi);
 void bdrv_dirty_iter_set(struct HBitmapIter *hbi, int64_t offset);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH 4/9] hbitmap: store / restore
  2014-12-11 14:17 [Qemu-devel] [PATCH 0/8] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 3/9] block: fix spoiling all dirty bitmaps by mirror and migration Vladimir Sementsov-Ogievskiy
@ 2014-12-11 14:17 ` Vladimir Sementsov-Ogievskiy
  2015-01-08 21:21   ` John Snow
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 5/9] block: BdrvDirtyBitmap store/restore interface Vladimir Sementsov-Ogievskiy
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2014-12-11 14:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, den, vsementsov, jsnow, stefanha

Functions to store / restore HBitmap. HBitmap should be saved to linear
bitmap format independently of endianess.

Because of restoring in several steps, every step writes only the last
level of the bitmap. All other levels are restored by
hbitmap_restore_finish as a last step of restoring. So, HBitmap is
inconsistent while restoring.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 include/qemu/hbitmap.h | 49 +++++++++++++++++++++++++++++
 util/hbitmap.c         | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 133 insertions(+)

diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index b645cfc..e57b610 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -126,6 +126,55 @@ void hbitmap_reset(HBitmap *hb, uint64_t start, uint64_t count);
 bool hbitmap_get(const HBitmap *hb, uint64_t item);
 
 /**
+ * hbitmap_data_size:
+ * @hb: HBitmap to operate on.
+ * @count: Number of bits
+ *
+ * Return amount of bytes hbitmap_store_data needs
+ */
+uint64_t hbitmap_data_size(const HBitmap *hb, uint64_t count);
+
+/**
+ * hbitmap_store_data
+ * @hb: HBitmap to oprate on.
+ * @buf: Buffer to store bitmap data.
+ * @start: First bit to store.
+ * @count: Number of bits to store.
+ *
+ * Stores HBitmap data corresponding to given region. The format of saved data
+ * is linear sequence of bits, so it can be used by hbitmap_restore_data
+ * independently of endianess
+ */
+void hbitmap_store_data(const HBitmap *hb, uint8_t *buf,
+                        uint64_t start, uint64_t count);
+
+/**
+ * hbitmap_restore_data
+ * @hb: HBitmap to oprate on.
+ * @buf: Buffer to restore bitmap data from.
+ * @start: First bit to restore.
+ * @count: Number of bits to restore.
+ *
+ * Retores HBitmap data corresponding to given region. The format is the same
+ * as for hbitmap_store_data.
+ *
+ * ! The bitmap becomes inconsistent after this operation.
+ * hbitmap_restore_finish should be called before using the bitmap after
+ * data restoring.
+ */
+void hbitmap_restore_data(HBitmap *hb, uint8_t *buf,
+                          uint64_t start, uint64_t count);
+
+/**
+ * hbitmap_restore_finish
+ * @hb: HBitmap to operate on.
+ *
+ * Repair HBitmap after calling hbitmap_restore_data. Actuall all HBitmap
+ * layers are restore here.
+ */
+void hbitmap_restore_finish(HBitmap *hb);
+
+/**
  * hbitmap_free:
  * @hb: HBitmap to operate on.
  *
diff --git a/util/hbitmap.c b/util/hbitmap.c
index 8aa7406..ac0323f 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -362,6 +362,90 @@ bool hbitmap_get(const HBitmap *hb, uint64_t item)
     return (hb->levels[HBITMAP_LEVELS - 1][pos >> BITS_PER_LEVEL] & bit) != 0;
 }
 
+uint64_t hbitmap_data_size(const HBitmap *hb, uint64_t count)
+{
+    uint64_t size;
+
+    if (count == 0) {
+        return 0;
+    }
+
+    size = (((count - 1) >> hb->granularity) >> BITS_PER_LEVEL) + 1;
+
+    return size * sizeof(unsigned long);
+}
+
+void hbitmap_store_data(const HBitmap *hb, uint8_t *buf,
+                        uint64_t start, uint64_t count)
+{
+    uint64_t last = start + count - 1;
+    unsigned long *out = (unsigned long *)buf;
+
+    if (count == 0) {
+        return;
+    }
+
+    start = (start >> hb->granularity) >> BITS_PER_LEVEL;
+    last = (last >> hb->granularity) >> BITS_PER_LEVEL;
+    count = last - start + 1;
+
+#ifdef __BIG_ENDIAN_BITFIELD
+    for (i = start; i <= last; ++i) {
+        unsigned long el = hb->levels[HBITMAP_LEVELS - 1][i];
+        out[i] = (BITS_PER_LONG == 32 ? cpu_to_le32(el) : cpu_to_le64(el));
+    }
+#else
+    memcpy(out, &hb->levels[HBITMAP_LEVELS - 1][start],
+           count * sizeof(unsigned long));
+#endif
+}
+
+void hbitmap_restore_data(HBitmap *hb, uint8_t *buf,
+                          uint64_t start, uint64_t count)
+{
+    uint64_t last = start + count - 1;
+    unsigned long *in = (unsigned long *)buf;
+
+    if (count == 0) {
+        return;
+    }
+
+    start = (start >> hb->granularity) >> BITS_PER_LEVEL;
+    last = (last >> hb->granularity) >> BITS_PER_LEVEL;
+    count = last - start + 1;
+
+#ifdef __BIG_ENDIAN_BITFIELD
+    for (i = start; i <= last; ++i) {
+        hb->levels[HBITMAP_LEVELS - 1][i] =
+            (BITS_PER_LONG == 32 ? be32_to_cpu(in[i]) : be64_to_cpu(in[i]));
+    }
+#else
+    memcpy(&hb->levels[HBITMAP_LEVELS - 1][start], in,
+           count * sizeof(unsigned long));
+#endif
+}
+
+void hbitmap_restore_finish(HBitmap *bitmap)
+{
+    int64_t i, size;
+    int lev, j;
+
+    /* restore levels starting from penultimate to zero level, assuming
+     * that the last one is ok */
+    size = MAX((bitmap->size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 1);
+    for (lev = HBITMAP_LEVELS - 1; lev-- > 0; ) {
+        size = MAX((size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 1);
+        for (i = 0; i < size; ++i) {
+            bitmap->levels[lev][i] = 0;
+            for (j = 0; j < BITS_PER_LONG; ++j) {
+                if (bitmap->levels[lev+1][(i << BITS_PER_LEVEL) + j]) {
+                    bitmap->levels[lev][i] |=  (1 << j);
+                }
+            }
+        }
+    }
+}
+
 void hbitmap_free(HBitmap *hb)
 {
     unsigned i;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH 5/9] block: BdrvDirtyBitmap store/restore interface
  2014-12-11 14:17 [Qemu-devel] [PATCH 0/8] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
                   ` (3 preceding siblings ...)
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 4/9] hbitmap: store / restore Vladimir Sementsov-Ogievskiy
@ 2014-12-11 14:17 ` Vladimir Sementsov-Ogievskiy
  2015-01-08 21:22   ` John Snow
  2015-01-14 11:27   ` Vladimir Sementsov-Ogievskiy
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 6/9] block-migration: tiny refactoring Vladimir Sementsov-Ogievskiy
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2014-12-11 14:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, den, vsementsov, jsnow, stefanha

Several functions to provide necessary access to BdrvDirtyBitmap for
block-migration.c

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 block.c               | 61 +++++++++++++++++++++++++++++++++++++++++++++++++--
 include/block/block.h | 10 +++++++++
 2 files changed, 69 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 6edf1dc..7d42620 100644
--- a/block.c
+++ b/block.c
@@ -5511,8 +5511,65 @@ void bdrv_reset_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
     hbitmap_reset(bitmap->bitmap, cur_sector, nr_sectors);
 }
 
-static void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
-                           int nr_sectors)
+const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap)
+{
+    return bitmap->name;
+}
+
+uint64_t bdrv_dbm_data_size(const BdrvDirtyBitmap *bitmap, uint64_t count)
+{
+    return hbitmap_data_size(bitmap->bitmap, count);
+}
+
+void bdrv_dbm_store_data(const BdrvDirtyBitmap *bitmap, uint8_t *buf,
+                         uint64_t start, uint64_t count)
+{
+    hbitmap_store_data(bitmap->bitmap, buf, start, count);
+}
+
+void bdrv_dbm_restore_data(BdrvDirtyBitmap *bitmap, uint8_t *buf,
+                           uint64_t start, uint64_t count)
+{
+    hbitmap_restore_data(bitmap->bitmap, buf, start, count);
+}
+
+BdrvDirtyBitmap **bdrv_dbm_find_all_named(BlockDriverState *bs, int *count)
+{
+    BdrvDirtyBitmap *bm, **res, **iter;
+    assert(count);
+
+    QLIST_FOREACH(bm, &bs->dirty_bitmaps, list) {
+        if (bm->name != NULL) {
+            (*count)++;
+        }
+    }
+
+    iter = res = g_malloc(sizeof(*res) * (*count));
+    QLIST_FOREACH(bm, &bs->dirty_bitmaps, list) {
+        if (bm->name != NULL) {
+            *iter++ = bm;
+        }
+    }
+
+    return res;
+}
+
+void bdrv_dbm_restore_finish(void)
+{
+    BlockDriverState *bs;
+    BdrvDirtyBitmap *bm;
+
+    for (bs = bdrv_next(NULL); bs != NULL; bs = bdrv_next(bs)) {
+        QLIST_FOREACH(bm, &bs->dirty_bitmaps, list) {
+            if (bm->name != NULL) {
+                hbitmap_restore_finish(bm->bitmap);
+            }
+        }
+    }
+}
+
+void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
+                    int nr_sectors)
 {
     BdrvDirtyBitmap *bitmap;
     QLIST_FOREACH(bitmap, &bs->dirty_bitmaps, list) {
diff --git a/include/block/block.h b/include/block/block.h
index b21233c..09eff80 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -459,6 +459,16 @@ void bdrv_dirty_iter_init(BlockDriverState *bs,
 void bdrv_dirty_iter_set(struct HBitmapIter *hbi, int64_t offset);
 int64_t bdrv_get_dirty_count(BlockDriverState *bs, BdrvDirtyBitmap *bitmap);
 
+uint64_t bdrv_dbm_data_size(const BdrvDirtyBitmap *bitmap, uint64_t count);
+void bdrv_dbm_store_data(const BdrvDirtyBitmap *bitmap, uint8_t *buf,
+                         uint64_t start, uint64_t count);
+void bdrv_dbm_restore_data(BdrvDirtyBitmap *bitmap, uint8_t *buf,
+                           uint64_t start, uint64_t count);
+bool bdrv_dbm_is_named(BdrvDirtyBitmap *bitmap);
+const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap);
+BdrvDirtyBitmap **bdrv_dbm_find_all_named(BlockDriverState *bs, int *count);
+void bdrv_dbm_restore_finish(void);
+
 void bdrv_enable_copy_on_read(BlockDriverState *bs);
 void bdrv_disable_copy_on_read(BlockDriverState *bs);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH 6/9] block-migration: tiny refactoring
  2014-12-11 14:17 [Qemu-devel] [PATCH 0/8] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
                   ` (4 preceding siblings ...)
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 5/9] block: BdrvDirtyBitmap store/restore interface Vladimir Sementsov-Ogievskiy
@ 2014-12-11 14:17 ` Vladimir Sementsov-Ogievskiy
  2015-01-08 21:23   ` John Snow
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 7/9] block-migration: remove not needed iothread lock Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2014-12-11 14:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, den, vsementsov, jsnow, stefanha

Add blk_create and blk_free to remove code duplicates. Otherwise,
duplicates will rise in the following patches because of BlkMigBlock
sturcture extendin.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 block-migration.c | 56 +++++++++++++++++++++++++++++--------------------------
 1 file changed, 30 insertions(+), 26 deletions(-)

diff --git a/block-migration.c b/block-migration.c
index 5b4aa0f..d0c825f 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -113,6 +113,30 @@ static void blk_mig_unlock(void)
     qemu_mutex_unlock(&block_mig_state.lock);
 }
 
+/* Only allocating and initializing structure fields, not copying any data. */
+
+static BlkMigBlock *blk_create(BlkMigDevState *bmds, int64_t sector,
+                                int nr_sectors)
+{
+    BlkMigBlock *blk = g_new(BlkMigBlock, 1);
+    blk->buf = g_malloc(BLOCK_SIZE);
+    blk->bmds = bmds;
+    blk->sector = sector;
+    blk->nr_sectors = nr_sectors;
+
+    blk->iov.iov_base = blk->buf;
+    blk->iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE;
+    qemu_iovec_init_external(&blk->qiov, &blk->iov, 1);
+
+    return blk;
+}
+
+static void blk_free(BlkMigBlock *blk)
+{
+    g_free(blk->buf);
+    g_free(blk);
+}
+
 /* Must run outside of the iothread lock during the bulk phase,
  * or the VM will stall.
  */
@@ -285,15 +309,7 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
         nr_sectors = total_sectors - cur_sector;
     }
 
-    blk = g_new(BlkMigBlock, 1);
-    blk->buf = g_malloc(BLOCK_SIZE);
-    blk->bmds = bmds;
-    blk->sector = cur_sector;
-    blk->nr_sectors = nr_sectors;
-
-    blk->iov.iov_base = blk->buf;
-    blk->iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE;
-    qemu_iovec_init_external(&blk->qiov, &blk->iov, 1);
+    blk = blk_create(bmds, cur_sector, nr_sectors);
 
     blk_mig_lock();
     block_mig_state.submitted++;
@@ -467,17 +483,9 @@ static int mig_save_device_dirty(QEMUFile *f, BlkMigDevState *bmds,
             } else {
                 nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
             }
-            blk = g_new(BlkMigBlock, 1);
-            blk->buf = g_malloc(BLOCK_SIZE);
-            blk->bmds = bmds;
-            blk->sector = sector;
-            blk->nr_sectors = nr_sectors;
+            blk = blk_create(bmds, sector, nr_sectors);
 
             if (is_async) {
-                blk->iov.iov_base = blk->buf;
-                blk->iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE;
-                qemu_iovec_init_external(&blk->qiov, &blk->iov, 1);
-
                 blk->aiocb = bdrv_aio_readv(bmds->bs, sector, &blk->qiov,
                                             nr_sectors, blk_mig_read_cb, blk);
 
@@ -492,8 +500,7 @@ static int mig_save_device_dirty(QEMUFile *f, BlkMigDevState *bmds,
                 }
                 blk_send(f, blk);
 
-                g_free(blk->buf);
-                g_free(blk);
+                blk_free(blk);
             }
 
             bdrv_reset_dirty_bitmap(bmds->bs, bmds->dirty_bitmap, sector,
@@ -508,8 +515,7 @@ static int mig_save_device_dirty(QEMUFile *f, BlkMigDevState *bmds,
 
 error:
     DPRINTF("Error reading sector %" PRId64 "\n", sector);
-    g_free(blk->buf);
-    g_free(blk);
+    blk_free(blk);
     return ret;
 }
 
@@ -560,8 +566,7 @@ static int flush_blks(QEMUFile *f)
         blk_send(f, blk);
         blk_mig_lock();
 
-        g_free(blk->buf);
-        g_free(blk);
+        blk_free(blk);
 
         block_mig_state.read_done--;
         block_mig_state.transferred++;
@@ -612,8 +617,7 @@ static void blk_mig_cleanup(void)
 
     while ((blk = QSIMPLEQ_FIRST(&block_mig_state.blk_list)) != NULL) {
         QSIMPLEQ_REMOVE_HEAD(&block_mig_state.blk_list, entry);
-        g_free(blk->buf);
-        g_free(blk);
+        blk_free(blk);
     }
     blk_mig_unlock();
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH 7/9] block-migration: remove not needed iothread lock
  2014-12-11 14:17 [Qemu-devel] [PATCH 0/8] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
                   ` (5 preceding siblings ...)
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 6/9] block-migration: tiny refactoring Vladimir Sementsov-Ogievskiy
@ 2014-12-11 14:17 ` Vladimir Sementsov-Ogievskiy
  2015-01-08 21:24   ` John Snow
  2015-01-08 22:28   ` Paolo Bonzini
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 8/9] migration: add dirty parameter Vladimir Sementsov-Ogievskiy
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration Vladimir Sementsov-Ogievskiy
  8 siblings, 2 replies; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2014-12-11 14:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, den, vsementsov, jsnow, stefanha

Instead of locking iothread, we can just swap these calls. So, if some
write to our range occures before resetting the bitmap, then it will
get into subsequent aio read, becouse it occures, in any case, after
resetting the bitmap.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 block-migration.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/block-migration.c b/block-migration.c
index d0c825f..908a66d 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -315,13 +315,11 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
     block_mig_state.submitted++;
     blk_mig_unlock();
 
-    qemu_mutex_lock_iothread();
+    bdrv_reset_dirty_bitmap(bs, bmds->dirty_bitmap, cur_sector, nr_sectors);
+
     blk->aiocb = bdrv_aio_readv(bs, cur_sector, &blk->qiov,
                                 nr_sectors, blk_mig_read_cb, blk);
 
-    bdrv_reset_dirty_bitmap(bs, bmds->dirty_bitmap, cur_sector, nr_sectors);
-    qemu_mutex_unlock_iothread();
-
     bmds->cur_sector = cur_sector + nr_sectors;
     return (bmds->cur_sector >= total_sectors);
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH 8/9] migration: add dirty parameter
  2014-12-11 14:17 [Qemu-devel] [PATCH 0/8] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
                   ` (6 preceding siblings ...)
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 7/9] block-migration: remove not needed iothread lock Vladimir Sementsov-Ogievskiy
@ 2014-12-11 14:17 ` Vladimir Sementsov-Ogievskiy
  2014-12-11 15:18   ` Eric Blake
  2015-01-08 21:51   ` John Snow
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration Vladimir Sementsov-Ogievskiy
  8 siblings, 2 replies; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2014-12-11 14:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, den, vsementsov, jsnow, stefanha

Add dirty parameter to qmp-migrate command. If this parameter is true,
block-migration.c will migrate dirty bitmaps. This parameter can be used
without "blk" parameter to migrate only dirty bitmaps, skipping block
migration.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 hmp-commands.hx               | 10 ++++++----
 hmp.c                         |  4 +++-
 include/migration/migration.h |  1 +
 migration.c                   |  4 +++-
 qapi-schema.json              |  2 +-
 qmp-commands.hx               |  5 ++++-
 savevm.c                      |  3 ++-
 7 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index e37bc8b..d421695 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -887,23 +887,25 @@ ETEXI
 
     {
         .name       = "migrate",
-        .args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
+        .args_type  = "detach:-d,blk:-b,inc:-i,dirty:-D,uri:s",
         .params     = "[-d] [-b] [-i] uri",
         .help       = "migrate to URI (using -d to not wait for completion)"
 		      "\n\t\t\t -b for migration without shared storage with"
 		      " full copy of disk\n\t\t\t -i for migration without "
-		      "shared storage with incremental copy of disk "
-		      "(base image shared between src and destination)",
+		      "shared storage with incremental copy of disk\n\t\t\t"
+		      " -D for migration of named dirty bitmaps as well\n\t\t\t"
+		      " (base image shared between src and destination)",
         .mhandler.cmd = hmp_migrate,
     },
 
 
 STEXI
-@item migrate [-d] [-b] [-i] @var{uri}
+@item migrate [-d] [-b] [-i] [-D] @var{uri}
 @findex migrate
 Migrate to @var{uri} (using -d to not wait for completion).
 	-b for migration with full copy of disk
 	-i for migration with incremental copy of disk (base image is shared)
+	-D for migration of named dirty bitmaps
 ETEXI
 
     {
diff --git a/hmp.c b/hmp.c
index 17a2a2b..771f666 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1340,10 +1340,12 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
     int detach = qdict_get_try_bool(qdict, "detach", 0);
     int blk = qdict_get_try_bool(qdict, "blk", 0);
     int inc = qdict_get_try_bool(qdict, "inc", 0);
+    int dirty = qdict_get_try_bool(qdict, "dirty", 0);
     const char *uri = qdict_get_str(qdict, "uri");
     Error *err = NULL;
 
-    qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, &err);
+    qmp_migrate(uri, !!blk, blk, !!inc, inc, !!dirty, dirty,
+                false, false, &err);
     if (err) {
         monitor_printf(mon, "migrate: %s\n", error_get_pretty(err));
         error_free(err);
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 3cb5ba8..48d71d3 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -37,6 +37,7 @@
 struct MigrationParams {
     bool blk;
     bool shared;
+    bool dirty;
 };
 
 typedef struct MigrationState MigrationState;
diff --git a/migration.c b/migration.c
index c49a05a..e7bb7f3 100644
--- a/migration.c
+++ b/migration.c
@@ -404,7 +404,8 @@ void migrate_del_blocker(Error *reason)
 }
 
 void qmp_migrate(const char *uri, bool has_blk, bool blk,
-                 bool has_inc, bool inc, bool has_detach, bool detach,
+                 bool has_inc, bool inc, bool has_dirty, bool dirty,
+                 bool has_detach, bool detach,
                  Error **errp)
 {
     Error *local_err = NULL;
@@ -414,6 +415,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 
     params.blk = has_blk && blk;
     params.shared = has_inc && inc;
+    params.dirty = has_dirty && dirty;
 
     if (s->state == MIG_STATE_ACTIVE || s->state == MIG_STATE_SETUP ||
         s->state == MIG_STATE_CANCELLING) {
diff --git a/qapi-schema.json b/qapi-schema.json
index 793031b..203f4f0 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1660,7 +1660,7 @@
 # Since: 0.14.0
 ##
 { 'command': 'migrate',
-  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' } }
+  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*dirty': 'bool', '*detach': 'bool' } }
 
 # @xen-save-devices-state:
 #
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 1eba583..e41d033 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -610,7 +610,7 @@ EQMP
 
     {
         .name       = "migrate",
-        .args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
+        .args_type  = "detach:-d,blk:-b,inc:-i,dirty:-D,uri:s",
         .mhandler.cmd_new = qmp_marshal_input_migrate,
     },
 
@@ -624,6 +624,7 @@ Arguments:
 
 - "blk": block migration, full disk copy (json-bool, optional)
 - "inc": incremental disk copy (json-bool, optional)
+- "dirty": migrate named dirty bitmaps (json-bool, optional)
 - "uri": Destination URI (json-string)
 
 Example:
@@ -638,6 +639,8 @@ Notes:
 (2) All boolean arguments default to false
 (3) The user Monitor's "detach" argument is invalid in QMP and should not
     be used
+(4) The "dirty" argument may be used without "blk", to migrate only dirty
+    bitmaps
 
 EQMP
 
diff --git a/savevm.c b/savevm.c
index 08ec678..a598d1d 100644
--- a/savevm.c
+++ b/savevm.c
@@ -784,7 +784,8 @@ static int qemu_savevm_state(QEMUFile *f)
     int ret;
     MigrationParams params = {
         .blk = 0,
-        .shared = 0
+        .shared = 0,
+        .dirty = 0
     };
 
     if (qemu_savevm_state_blocked(NULL)) {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration
  2014-12-11 14:17 [Qemu-devel] [PATCH 0/8] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
                   ` (7 preceding siblings ...)
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 8/9] migration: add dirty parameter Vladimir Sementsov-Ogievskiy
@ 2014-12-11 14:17 ` Vladimir Sementsov-Ogievskiy
  2015-01-08 22:05   ` John Snow
                     ` (2 more replies)
  8 siblings, 3 replies; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2014-12-11 14:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, den, vsementsov, jsnow, stefanha

Just migrate parts of dirty bitmaps, corresponding to the block being
migrated. Also, skip block migration if it is disabled (blk parameter
of migrate command is false).

Skipping shared sectors: bitmaps are migrated independently of this,
just send blk without block data.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 block-migration.c | 173 +++++++++++++++++++++++++++++++++++++++++++++++-------
 savevm.c          |   1 +
 2 files changed, 154 insertions(+), 20 deletions(-)

diff --git a/block-migration.c b/block-migration.c
index 908a66d..95d54a1 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -32,6 +32,8 @@
 #define BLK_MIG_FLAG_EOS                0x02
 #define BLK_MIG_FLAG_PROGRESS           0x04
 #define BLK_MIG_FLAG_ZERO_BLOCK         0x08
+#define BLK_MIG_FLAG_HAS_BLOCK          0x10
+#define BLK_MIG_FLAG_HAS_BITMAPS        0x20
 
 #define MAX_IS_ALLOCATED_SEARCH 65536
 
@@ -51,6 +53,8 @@ typedef struct BlkMigDevState {
     int shared_base;
     int64_t total_sectors;
     QSIMPLEQ_ENTRY(BlkMigDevState) entry;
+    int nr_bitmaps;
+    BdrvDirtyBitmap **dirty_bitmaps;
 
     /* Only used by migration thread.  Does not need a lock.  */
     int bulk_completed;
@@ -64,6 +68,11 @@ typedef struct BlkMigDevState {
     Error *blocker;
 } BlkMigDevState;
 
+typedef struct BlkMigDirtyBitmap {
+    BdrvDirtyBitmap *bitmap;
+    uint8_t *buf;
+} BlkMigDirtyBitmap;
+
 typedef struct BlkMigBlock {
     /* Only used by migration thread.  */
     uint8_t *buf;
@@ -74,6 +83,9 @@ typedef struct BlkMigBlock {
     QEMUIOVector qiov;
     BlockAIOCB *aiocb;
 
+    int nr_bitmaps;
+    BlkMigDirtyBitmap *dirty_bitmaps;
+
     /* Protected by block migration lock.  */
     int ret;
     QSIMPLEQ_ENTRY(BlkMigBlock) entry;
@@ -83,6 +95,7 @@ typedef struct BlkMigState {
     /* Written during setup phase.  Can be read without a lock.  */
     int blk_enable;
     int shared_base;
+    int dbm_enable;
     QSIMPLEQ_HEAD(bmds_list, BlkMigDevState) bmds_list;
     int64_t total_sector_sum;
     bool zero_blocks;
@@ -116,27 +129,63 @@ static void blk_mig_unlock(void)
 /* Only allocating and initializing structure fields, not copying any data. */
 
 static BlkMigBlock *blk_create(BlkMigDevState *bmds, int64_t sector,
-                                int nr_sectors)
+                               int nr_sectors, bool only_bitmaps)
 {
+    int i;
     BlkMigBlock *blk = g_new(BlkMigBlock, 1);
-    blk->buf = g_malloc(BLOCK_SIZE);
+    blk->buf = only_bitmaps ? NULL : g_malloc(BLOCK_SIZE);
     blk->bmds = bmds;
     blk->sector = sector;
     blk->nr_sectors = nr_sectors;
+    blk->dirty_bitmaps = NULL;
+    blk->nr_bitmaps = 0;
 
     blk->iov.iov_base = blk->buf;
     blk->iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE;
     qemu_iovec_init_external(&blk->qiov, &blk->iov, 1);
 
+    if (block_mig_state.dbm_enable) {
+        BlkMigDirtyBitmap *mig_bm;
+
+        blk->nr_bitmaps = bmds->nr_bitmaps;
+        mig_bm = blk->dirty_bitmaps = g_new(BlkMigDirtyBitmap,
+                                            bmds->nr_bitmaps);
+        for (i = 0; i < bmds->nr_bitmaps; ++i) {
+            BdrvDirtyBitmap *bm = bmds->dirty_bitmaps[i];
+            mig_bm->buf = g_malloc(bdrv_dbm_data_size(bm, nr_sectors));
+            mig_bm->bitmap = bm;
+            mig_bm++;
+        }
+    }
+
     return blk;
 }
 
 static void blk_free(BlkMigBlock *blk)
 {
+    int i;
     g_free(blk->buf);
+
+    if (blk->dirty_bitmaps) {
+        for (i = 0; i < blk->nr_bitmaps; ++i) {
+            g_free(blk->dirty_bitmaps[i].buf);
+        }
+        g_free(blk->dirty_bitmaps);
+    }
+
     g_free(blk);
 }
 
+static void blk_store_bitmaps(BlkMigBlock *blk)
+{
+    int i;
+    for (i = 0; i < blk->nr_bitmaps; ++i) {
+        bdrv_dbm_store_data(blk->dirty_bitmaps[i].bitmap,
+                            blk->dirty_bitmaps[i].buf,
+                            blk->sector, blk->nr_sectors);
+    }
+}
+
 /* Must run outside of the iothread lock during the bulk phase,
  * or the VM will stall.
  */
@@ -146,9 +195,15 @@ static void blk_send(QEMUFile *f, BlkMigBlock * blk)
     int len;
     uint64_t flags = BLK_MIG_FLAG_DEVICE_BLOCK;
 
-    if (block_mig_state.zero_blocks &&
+    if (block_mig_state.zero_blocks && blk->buf &&
         buffer_is_zero(blk->buf, BLOCK_SIZE)) {
         flags |= BLK_MIG_FLAG_ZERO_BLOCK;
+    } else if (blk->buf) {
+        flags |= BLK_MIG_FLAG_HAS_BLOCK;
+    }
+
+    if (block_mig_state.dbm_enable) {
+        flags |= BLK_MIG_FLAG_HAS_BITMAPS;
     }
 
     /* sector number and flags */
@@ -160,10 +215,27 @@ static void blk_send(QEMUFile *f, BlkMigBlock * blk)
     qemu_put_byte(f, len);
     qemu_put_buffer(f, (uint8_t *)bdrv_get_device_name(blk->bmds->bs), len);
 
+    /* dirty bitmaps */
+    if (flags & BLK_MIG_FLAG_HAS_BITMAPS) {
+        int i;
+        qemu_put_be32(f, blk->nr_bitmaps);
+        for (i = 0; i < blk->nr_bitmaps; ++i) {
+            BdrvDirtyBitmap *bm = blk->dirty_bitmaps[i].bitmap;
+            int buf_size = bdrv_dbm_data_size(bm, blk->nr_sectors);
+            const char *name = bdrv_dirty_bitmap_name(bm);
+            int len = strlen(name);
+
+            qemu_put_byte(f, len);
+            qemu_put_buffer(f, (const uint8_t *)name, len);
+            qemu_put_buffer(f, blk->dirty_bitmaps[i].buf, buf_size);
+        }
+    }
+
     /* if a block is zero we need to flush here since the network
      * bandwidth is now a lot higher than the storage device bandwidth.
-     * thus if we queue zero blocks we slow down the migration */
-    if (flags & BLK_MIG_FLAG_ZERO_BLOCK) {
+     * thus if we queue zero blocks we slow down the migration.
+     * also, skip writing block when migrate only dirty bitmaps. */
+    if (!(flags & BLK_MIG_FLAG_HAS_BLOCK)) {
         qemu_fflush(f);
         return;
     }
@@ -282,13 +354,20 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
     BlockDriverState *bs = bmds->bs;
     BlkMigBlock *blk;
     int nr_sectors;
+    bool skip_chunk = false;
 
     if (bmds->shared_base) {
         qemu_mutex_lock_iothread();
-        while (cur_sector < total_sectors &&
-               !bdrv_is_allocated(bs, cur_sector, MAX_IS_ALLOCATED_SEARCH,
-                                  &nr_sectors)) {
-            cur_sector += nr_sectors;
+        if (block_mig_state.dbm_enable) {
+            bdrv_is_allocated(bs, cur_sector, BDRV_SECTORS_PER_DIRTY_CHUNK,
+                              &nr_sectors);
+            skip_chunk = nr_sectors >= BDRV_SECTORS_PER_DIRTY_CHUNK;
+        } else {
+            while (cur_sector < total_sectors &&
+                   !bdrv_is_allocated(bs, cur_sector, MAX_IS_ALLOCATED_SEARCH,
+                                      &nr_sectors)) {
+                cur_sector += nr_sectors;
+            }
         }
         qemu_mutex_unlock_iothread();
     }
@@ -309,7 +388,8 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
         nr_sectors = total_sectors - cur_sector;
     }
 
-    blk = blk_create(bmds, cur_sector, nr_sectors);
+    blk = blk_create(bmds, cur_sector, nr_sectors,
+                     skip_chunk || !block_mig_state.blk_enable);
 
     blk_mig_lock();
     block_mig_state.submitted++;
@@ -317,8 +397,16 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
 
     bdrv_reset_dirty_bitmap(bs, bmds->dirty_bitmap, cur_sector, nr_sectors);
 
-    blk->aiocb = bdrv_aio_readv(bs, cur_sector, &blk->qiov,
-                                nr_sectors, blk_mig_read_cb, blk);
+    if (block_mig_state.dbm_enable) {
+        blk_store_bitmaps(blk);
+    }
+
+    if (block_mig_state.blk_enable && !skip_chunk) {
+        blk->aiocb = bdrv_aio_readv(bs, cur_sector, &blk->qiov,
+                                    nr_sectors, blk_mig_read_cb, blk);
+    } else if (block_mig_state.dbm_enable) {
+        blk_mig_read_cb(blk, 0);
+    }
 
     bmds->cur_sector = cur_sector + nr_sectors;
     return (bmds->cur_sector >= total_sectors);
@@ -403,6 +491,8 @@ static void init_blk_migration(QEMUFile *f)
             DPRINTF("Start full migration for %s\n", bdrv_get_device_name(bs));
         }
 
+        bmds->dirty_bitmaps = bdrv_dbm_find_all_named(bs, &bmds->nr_bitmaps);
+
         QSIMPLEQ_INSERT_TAIL(&block_mig_state.bmds_list, bmds, entry);
     }
 }
@@ -481,20 +571,32 @@ static int mig_save_device_dirty(QEMUFile *f, BlkMigDevState *bmds,
             } else {
                 nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
             }
-            blk = blk_create(bmds, sector, nr_sectors);
+            blk = blk_create(bmds, sector, nr_sectors,
+                             !block_mig_state.blk_enable);
+
+            if (block_mig_state.dbm_enable) {
+                blk_store_bitmaps(blk);
+            }
 
             if (is_async) {
-                blk->aiocb = bdrv_aio_readv(bmds->bs, sector, &blk->qiov,
-                                            nr_sectors, blk_mig_read_cb, blk);
+                if (block_mig_state.blk_enable) {
+                    blk->aiocb = bdrv_aio_readv(bmds->bs, sector, &blk->qiov,
+                                                nr_sectors, blk_mig_read_cb,
+                                                blk);
+                } else {
+                    blk_mig_read_cb(blk, 0);
+                }
 
                 blk_mig_lock();
                 block_mig_state.submitted++;
                 bmds_set_aio_inflight(bmds, sector, nr_sectors, 1);
                 blk_mig_unlock();
             } else {
-                ret = bdrv_read(bmds->bs, sector, blk->buf, nr_sectors);
-                if (ret < 0) {
-                    goto error;
+                if (block_mig_state.blk_enable) {
+                    ret = bdrv_read(bmds->bs, sector, blk->buf, nr_sectors);
+                    if (ret < 0) {
+                        goto error;
+                    }
                 }
                 blk_send(f, blk);
 
@@ -817,10 +919,39 @@ static int block_load(QEMUFile *f, void *opaque, int version_id)
                 nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
             }
 
+            /* load dirty bitmaps */
+            if (flags & BLK_MIG_FLAG_HAS_BITMAPS) {
+                int i, nr_bitmaps = qemu_get_be32(f);
+
+                for (i = 0; i < nr_bitmaps; ++i) {
+                    char bitmap_name[256];
+                    BdrvDirtyBitmap *bitmap;
+                    int buf_size, len;
+
+                    len = qemu_get_byte(f);
+                    qemu_get_buffer(f, (uint8_t *)bitmap_name, len);
+                    bitmap_name[len] = '\0';
+
+                    bitmap = bdrv_find_dirty_bitmap(bs, bitmap_name);
+                    if (!bitmap) {
+                        fprintf(stderr, "Error unknown dirty bitmap\
+                                %s for block device %s\n",
+                                bitmap_name, device_name);
+                        return -EINVAL;
+                    }
+
+                    buf_size = bdrv_dbm_data_size(bitmap, nr_sectors);
+                    buf = g_malloc(buf_size);
+                    qemu_get_buffer(f, buf, buf_size);
+                    bdrv_dbm_restore_data(bitmap, buf, addr, nr_sectors);
+                    g_free(buf);
+                }
+            }
+
             if (flags & BLK_MIG_FLAG_ZERO_BLOCK) {
                 ret = bdrv_write_zeroes(bs, addr, nr_sectors,
                                         BDRV_REQ_MAY_UNMAP);
-            } else {
+            } else if (flags & BLK_MIG_FLAG_HAS_BLOCK) {
                 buf = g_malloc(BLOCK_SIZE);
                 qemu_get_buffer(f, buf, BLOCK_SIZE);
                 ret = bdrv_write(bs, addr, buf, nr_sectors);
@@ -855,6 +986,7 @@ static void block_set_params(const MigrationParams *params, void *opaque)
 {
     block_mig_state.blk_enable = params->blk;
     block_mig_state.shared_base = params->shared;
+    block_mig_state.dbm_enable = params->dirty;
 
     /* shared base means that blk_enable = 1 */
     block_mig_state.blk_enable |= params->shared;
@@ -862,7 +994,8 @@ static void block_set_params(const MigrationParams *params, void *opaque)
 
 static bool block_is_active(void *opaque)
 {
-    return block_mig_state.blk_enable == 1;
+    return block_mig_state.blk_enable == 1 ||
+           block_mig_state.dbm_enable == 1;
 }
 
 static SaveVMHandlers savevm_block_handlers = {
diff --git a/savevm.c b/savevm.c
index a598d1d..1299faa 100644
--- a/savevm.c
+++ b/savevm.c
@@ -983,6 +983,7 @@ int qemu_loadvm_state(QEMUFile *f)
         }
     }
 
+    bdrv_dbm_restore_finish();
     cpu_synchronize_all_post_init();
 
     ret = 0;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 8/9] migration: add dirty parameter
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 8/9] migration: add dirty parameter Vladimir Sementsov-Ogievskiy
@ 2014-12-11 15:18   ` Eric Blake
  2014-12-15  8:33     ` Vladimir Sementsov-Ogievskiy
  2015-01-08 21:51   ` John Snow
  1 sibling, 1 reply; 47+ messages in thread
From: Eric Blake @ 2014-12-11 15:18 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel; +Cc: kwolf, den, jsnow, stefanha

[-- Attachment #1: Type: text/plain, Size: 919 bytes --]

On 12/11/2014 07:17 AM, Vladimir Sementsov-Ogievskiy wrote:
> Add dirty parameter to qmp-migrate command. If this parameter is true,
> block-migration.c will migrate dirty bitmaps. This parameter can be used
> without "blk" parameter to migrate only dirty bitmaps, skipping block
> migration.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---

> +++ b/qapi-schema.json
> @@ -1660,7 +1660,7 @@
>  # Since: 0.14.0
>  ##
>  { 'command': 'migrate',
> -  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' } }
> +  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*dirty': 'bool', '*detach': 'bool' } }

Missing documentation of the new option (including a mention of '(since
2.3)'.  Also, please keep lines shorter than 80 columns.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 8/9] migration: add dirty parameter
  2014-12-11 15:18   ` Eric Blake
@ 2014-12-15  8:33     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2014-12-15  8:33 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, den, jsnow, stefanha

Thanks. It's interesting that checkpatch.pl doesn't warn about this line.

Best regards,
Vladimir

On 11.12.2014 18:18, Eric Blake wrote:
> On 12/11/2014 07:17 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Add dirty parameter to qmp-migrate command. If this parameter is true,
>> block-migration.c will migrate dirty bitmaps. This parameter can be used
>> without "blk" parameter to migrate only dirty bitmaps, skipping block
>> migration.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>> ---
>> +++ b/qapi-schema.json
>> @@ -1660,7 +1660,7 @@
>>   # Since: 0.14.0
>>   ##
>>   { 'command': 'migrate',
>> -  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' } }
>> +  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*dirty': 'bool', '*detach': 'bool' } }
> Missing documentation of the new option (including a mention of '(since
> 2.3)'.  Also, please keep lines shorter than 80 columns.
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] block: rename bdrv_reset_dirty_bitmap
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 1/9] block: rename bdrv_reset_dirty_bitmap Vladimir Sementsov-Ogievskiy
@ 2015-01-08 21:19   ` John Snow
  0 siblings, 0 replies; 47+ messages in thread
From: John Snow @ 2015-01-08 21:19 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel; +Cc: kwolf, den, stefanha



On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote:
> We will need functions for set/unset a subregion of BdrvDirtyBitmap, to
> fix migration and mirror (accordingly to the following patch "block: fix
> spoiling all dirty bitmaps by mirror and migration"). Having the
> old function 'bdrv_reset_dirty_bitmap' we'll have to add functions like
>
> bdrv_set_dirty_bitmap_region and bdrv_reset_dirty_bitmap_region
>
> But it is more consistent to have
>
> bdrv_set_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
>                        uint64_t start, uint64_t count)
> bdrv_reset_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
>                          uint64_t start, uint64_t count)
>
> for more transparent access to underlaying hbitmap interface. So, here
> we rename the considered function to 'bdrv_clear_dirty_bitmap'.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>   block.c               | 2 +-
>   block/backup.c        | 2 +-
>   include/block/block.h | 2 +-
>   3 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/block.c b/block.c
> index 677bc6f..a37dc2f 100644
> --- a/block.c
> +++ b/block.c
> @@ -5364,7 +5364,7 @@ BdrvDirtyBitmap *bdrv_copy_dirty_bitmap(BlockDriverState *bs,
>       return new_bitmap;
>   }
>
> -void bdrv_reset_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap)
> +void bdrv_clear_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap)
>   {
>       hbitmap_reset(bitmap->bitmap, 0, bitmap->size);
>   }
> diff --git a/block/backup.c b/block/backup.c
> index 2aab68f..da87581 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -449,7 +449,7 @@ void backup_start(BlockDriverState *bs, BlockDriverState *target,
>           case BITMAP_USE_MODE_RESET:
>               original = sync_bitmap;
>               sync_bitmap = bdrv_copy_dirty_bitmap(bs, sync_bitmap, NULL);
> -            bdrv_reset_dirty_bitmap(bs, original);
> +            bdrv_clear_dirty_bitmap(bs, original);
>               break;
>           case BITMAP_USE_MODE_CONSUME:
>               bdrv_dirty_bitmap_make_anon(bs, sync_bitmap);
> diff --git a/include/block/block.h b/include/block/block.h
> index e535581..368a371 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -437,7 +437,7 @@ BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,
>   BdrvDirtyBitmap *bdrv_find_dirty_bitmap(BlockDriverState *bs,
>                                           const char *name);
>   void bdrv_dirty_bitmap_make_anon(BlockDriverState *bs, BdrvDirtyBitmap *bitmap);
> -void bdrv_reset_dirty_bitmap(BlockDriverState *bs,
> +void bdrv_clear_dirty_bitmap(BlockDriverState *bs,
>                                BdrvDirtyBitmap *bitmap);
>   BdrvDirtyBitmap *bdrv_copy_dirty_bitmap(BlockDriverState *bs,
>                                           const BdrvDirtyBitmap *bitmap,
>

Looks like the same change I made in V10 of my series, so we can drop this.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] block-migration: fix pending() return value
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 2/9] block-migration: fix pending() return value Vladimir Sementsov-Ogievskiy
@ 2015-01-08 21:20   ` John Snow
  2015-01-09 19:01     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 47+ messages in thread
From: John Snow @ 2015-01-08 21:20 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, juan quin >> Juan Jose Quintela Carreira,
	dg >> Dr. David Alan Gilbert, stefanha, amit Shah, den



On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote:
> Because of wrong return value of .save_live_pending() in
> block-migration, migration finishes before the whole disk
> is transferred. Such situation occures when the migration
(occurs)
> process is fast enouth, for example when source and dest
(enough)
> are on the same host.
>
> If in the bulk phase we return something < max_size, we will skip
> transferring the tail of the device. Currently we have "set pending to
> BLOCK_SIZE if it is zero" for bulk phase, but there no guarantee, that
> it will be < max_size.
>
> True approach is to return, for example, max_size+1 when we are in the
> bulk phase.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>   block-migration.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/block-migration.c b/block-migration.c
> index 6f3aa18..8df30d9 100644
> --- a/block-migration.c
> +++ b/block-migration.c
> @@ -757,8 +757,8 @@ static uint64_t block_save_pending(QEMUFile *f, void *opaque, uint64_t max_size)
>                          block_mig_state.read_done * BLOCK_SIZE;
>
>       /* Report at least one block pending during bulk phase */
> -    if (pending == 0 && !block_mig_state.bulk_completed) {
> -        pending = BLOCK_SIZE;
> +    if (pending <= max_size && !block_mig_state.bulk_completed) {
> +        pending = max_size + BLOCK_SIZE;
>       }
>       blk_mig_unlock();
>       qemu_mutex_unlock_iothread();
>

This looks sane to me, but I am CC'ing the Migration maintainers to give 
it a proper look.

Looks to me like this is to prevent this from happening, in 
migration/migration.c:

             pending_size = qemu_savevm_state_pending(s->file, max_size);
             trace_migrate_pending(pending_size, max_size);
             if (pending_size && pending_size >= max_size) {

If we fail that condition, we omit data because we do not call 
qemu_savevm_state_iterate.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 3/9] block: fix spoiling all dirty bitmaps by mirror and migration
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 3/9] block: fix spoiling all dirty bitmaps by mirror and migration Vladimir Sementsov-Ogievskiy
@ 2015-01-08 21:20   ` John Snow
  0 siblings, 0 replies; 47+ messages in thread
From: John Snow @ 2015-01-08 21:20 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel; +Cc: kwolf, den, stefanha



On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote:
> Mirror and migration use dirty bitmaps for their purposes, and since
> commit [block: per caller dirty bitmap] they use their own bitmaps, not
> the global one. But they use old functions bdrv_set_dirty and
> bdrv_reset_dirty, which change all dirty bitmaps.
>
> Named dirty bitmaps series by Fam and Snow are affected: mirroring and
> migration will spoil all (not related to this mirroring or migration)
> named dirty bitmaps.
>
> This patch fixes this by adding bdrv_set_dirty_bitmap and
> bdrv_reset_dirty_bitmap, which change concrete bitmap. Also, to prevent
> such mistakes in future, old functions bdrv_(set,reset)_dirty are made
> static, for internal block usage.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> CC: John Snow <jsnow@redhat.com>
> CC: Fam Zheng <famz@redhat.com>
> CC: Denis V. Lunev <den@openvz.org>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Kevin Wolf <kwolf@redhat.com>
> ---
>   block-migration.c     |  5 +++--
>   block.c               | 23 ++++++++++++++++++++---
>   block/mirror.c        | 11 +++++++----
>   include/block/block.h |  6 ++++--
>   4 files changed, 34 insertions(+), 11 deletions(-)
>
[snip]

A version of this already went in upstream, so it can be dropped here now.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] hbitmap: store / restore
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 4/9] hbitmap: store / restore Vladimir Sementsov-Ogievskiy
@ 2015-01-08 21:21   ` John Snow
  2015-01-08 21:37     ` Paolo Bonzini
  2015-01-13 12:59     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 2 replies; 47+ messages in thread
From: John Snow @ 2015-01-08 21:21 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, den, stefanha, pbonzini >> Paolo Bonzini



On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote:
> Functions to store / restore HBitmap. HBitmap should be saved to linear
> bitmap format independently of endianess.
>
> Because of restoring in several steps, every step writes only the last
> level of the bitmap. All other levels are restored by
> hbitmap_restore_finish as a last step of restoring. So, HBitmap is
> inconsistent while restoring.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>   include/qemu/hbitmap.h | 49 +++++++++++++++++++++++++++++
>   util/hbitmap.c         | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 133 insertions(+)
>
> diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
> index b645cfc..e57b610 100644
> --- a/include/qemu/hbitmap.h
> +++ b/include/qemu/hbitmap.h
> @@ -126,6 +126,55 @@ void hbitmap_reset(HBitmap *hb, uint64_t start, uint64_t count);
>   bool hbitmap_get(const HBitmap *hb, uint64_t item);
>
>   /**
> + * hbitmap_data_size:
> + * @hb: HBitmap to operate on.
> + * @count: Number of bits
> + *
> + * Return amount of bytes hbitmap_store_data needs
> + */
> +uint64_t hbitmap_data_size(const HBitmap *hb, uint64_t count);
> +
> +/**
> + * hbitmap_store_data
> + * @hb: HBitmap to oprate on.
> + * @buf: Buffer to store bitmap data.
> + * @start: First bit to store.
> + * @count: Number of bits to store.
> + *
> + * Stores HBitmap data corresponding to given region. The format of saved data
> + * is linear sequence of bits, so it can be used by hbitmap_restore_data
> + * independently of endianess
> + */
> +void hbitmap_store_data(const HBitmap *hb, uint8_t *buf,
> +                        uint64_t start, uint64_t count);
> +
> +/**
> + * hbitmap_restore_data
> + * @hb: HBitmap to oprate on.
> + * @buf: Buffer to restore bitmap data from.
> + * @start: First bit to restore.
> + * @count: Number of bits to restore.
> + *
> + * Retores HBitmap data corresponding to given region. The format is the same
> + * as for hbitmap_store_data.
> + *
> + * ! The bitmap becomes inconsistent after this operation.
> + * hbitmap_restore_finish should be called before using the bitmap after
> + * data restoring.
> + */
> +void hbitmap_restore_data(HBitmap *hb, uint8_t *buf,
> +                          uint64_t start, uint64_t count);
> +

I guess by nature of how bitmap migration has been implemented, we 
cannot help but restore parts of the bitmap piece by piece, which 
requires us to force the bitmap into an inconsistent state.

Yuck.

It might be "nice" to turn on a disable bit inside the hbitmap that 
disallows its use until it is made consistent again, but I don't know 
what the performance hit of the extra conditionals everywhere would be like.

> +/**
> + * hbitmap_restore_finish
> + * @hb: HBitmap to operate on.
> + *
> + * Repair HBitmap after calling hbitmap_restore_data. Actuall all HBitmap
> + * layers are restore here.
> + */
> +void hbitmap_restore_finish(HBitmap *hb);
> +
> +/**
>    * hbitmap_free:
>    * @hb: HBitmap to operate on.
>    *

These are just biased opinions:

- It might be nice to name the store/restore functions "serialize" and 
"deserialize," to describe exactly what they are doing.

- I might refer to "restore_finish" as "post_load" or "post_restore" or 
something similar to mimic how device migration functions are named. I 
think hbitmap_restore_data_finalize would also be fine; the key part 
here is clearly naming it relative to "restore_data."

> diff --git a/util/hbitmap.c b/util/hbitmap.c
> index 8aa7406..ac0323f 100644
> --- a/util/hbitmap.c
> +++ b/util/hbitmap.c
> @@ -362,6 +362,90 @@ bool hbitmap_get(const HBitmap *hb, uint64_t item)
>       return (hb->levels[HBITMAP_LEVELS - 1][pos >> BITS_PER_LEVEL] & bit) != 0;
>   }
>
> +uint64_t hbitmap_data_size(const HBitmap *hb, uint64_t count)
> +{
> +    uint64_t size;
> +
> +    if (count == 0) {
> +        return 0;
> +    }
> +
> +    size = (((count - 1) >> hb->granularity) >> BITS_PER_LEVEL) + 1;
> +
> +    return size * sizeof(unsigned long);
> +}
> +

This seems flawed to me: number of bits without an offset can't be 
mapped to a number of real bytes, because "two bits" may or may not 
cross a granularity boundary, depending on *WHERE* you start counting.

e.g.

granularity = 1 (i.e. 2^1 = 2 virtual bits per 1 real bit)
virtual: 001100
real:     0 1 0

The amount of space required to hold "two bits" here could be as little 
as one bit, if the offset is k={0,2,4} but it could be as much as two 
bits if the offset is k={1,3}.

You may never use the function in this way, but mapping virtual bits to 
an implementation byte-size seems like it is inviting an inconsistency.

> +void hbitmap_store_data(const HBitmap *hb, uint8_t *buf,
> +                        uint64_t start, uint64_t count)
> +{
> +    uint64_t last = start + count - 1;
> +    unsigned long *out = (unsigned long *)buf;
> +
> +    if (count == 0) {
> +        return;
> +    }
> +
> +    start = (start >> hb->granularity) >> BITS_PER_LEVEL;
> +    last = (last >> hb->granularity) >> BITS_PER_LEVEL;
> +    count = last - start + 1;
> +
> +#ifdef __BIG_ENDIAN_BITFIELD
> +    for (i = start; i <= last; ++i) {
> +        unsigned long el = hb->levels[HBITMAP_LEVELS - 1][i];
> +        out[i] = (BITS_PER_LONG == 32 ? cpu_to_le32(el) : cpu_to_le64(el));
> +    }
> +#else
> +    memcpy(out, &hb->levels[HBITMAP_LEVELS - 1][start],
> +           count * sizeof(unsigned long));
> +#endif
> +}
> +

I suppose the ifdefs are trying to take an optimization by using memcpy 
if at all possible, without a conversion.

Why are the conversions to little endian, though? Shouldn't we be 
serializing to a Big Endian format?

> +void hbitmap_restore_data(HBitmap *hb, uint8_t *buf,
> +                          uint64_t start, uint64_t count)
> +{
> +    uint64_t last = start + count - 1;
> +    unsigned long *in = (unsigned long *)buf;
> +
> +    if (count == 0) {
> +        return;
> +    }
> +
> +    start = (start >> hb->granularity) >> BITS_PER_LEVEL;
> +    last = (last >> hb->granularity) >> BITS_PER_LEVEL;
> +    count = last - start + 1;
> +
> +#ifdef __BIG_ENDIAN_BITFIELD
> +    for (i = start; i <= last; ++i) {
> +        hb->levels[HBITMAP_LEVELS - 1][i] =
> +            (BITS_PER_LONG == 32 ? be32_to_cpu(in[i]) : be64_to_cpu(in[i]));
> +    }
> +#else
> +    memcpy(&hb->levels[HBITMAP_LEVELS - 1][start], in,
> +           count * sizeof(unsigned long));
> +#endif
> +}
> +

...? We're storing as LE but restoring from BE? I'm confused.

I'm also not clear on the __BIG_ENDIAN_BITFIELD macro. Why do we want to 
pack differently based on how C-bitfields are packed by the compiler? I 
don't think that has any impact on how longs are stored (always in the 
host native format.)

I would rather see the serialize/deserialize always convert to and from 
BE explicitly with cpu_to_be[32|64] and be[32|64]_to_cpu. I think trying 
to optimize the pack/unpack in the no-op condition with a # define and a 
memcpy is inviting portability problems.

> +void hbitmap_restore_finish(HBitmap *bitmap)
> +{
> +    int64_t i, size;
> +    int lev, j;
> +
> +    /* restore levels starting from penultimate to zero level, assuming
> +     * that the last one is ok */
> +    size = MAX((bitmap->size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 1);
> +    for (lev = HBITMAP_LEVELS - 1; lev-- > 0; ) {
> +        size = MAX((size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 1);
> +        for (i = 0; i < size; ++i) {
> +            bitmap->levels[lev][i] = 0;
> +            for (j = 0; j < BITS_PER_LONG; ++j) {
> +                if (bitmap->levels[lev+1][(i << BITS_PER_LEVEL) + j]) {
> +                    bitmap->levels[lev][i] |=  (1 << j);
> +                }
> +            }

Just a musing: Should we cache the size of each level? I know we can 
keep recalculating it, but... why bother? We recalculate it in so many 
places now.

> +        }
> +    }
> +}
> +
>   void hbitmap_free(HBitmap *hb)
>   {
>       unsigned i;
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 5/9] block: BdrvDirtyBitmap store/restore interface
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 5/9] block: BdrvDirtyBitmap store/restore interface Vladimir Sementsov-Ogievskiy
@ 2015-01-08 21:22   ` John Snow
  2015-01-14 11:27   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 47+ messages in thread
From: John Snow @ 2015-01-08 21:22 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel; +Cc: kwolf, den, stefanha



On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote:
> Several functions to provide necessary access to BdrvDirtyBitmap for
> block-migration.c
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>   block.c               | 61 +++++++++++++++++++++++++++++++++++++++++++++++++--
>   include/block/block.h | 10 +++++++++
>   2 files changed, 69 insertions(+), 2 deletions(-)
>
> diff --git a/block.c b/block.c
> index 6edf1dc..7d42620 100644
> --- a/block.c
> +++ b/block.c
> @@ -5511,8 +5511,65 @@ void bdrv_reset_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
>       hbitmap_reset(bitmap->bitmap, cur_sector, nr_sectors);
>   }
>
> -static void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
> -                           int nr_sectors)
> +const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap)
> +{
> +    return bitmap->name;
> +}
> +
> +uint64_t bdrv_dbm_data_size(const BdrvDirtyBitmap *bitmap, uint64_t count)
> +{
> +    return hbitmap_data_size(bitmap->bitmap, count);
> +}
> +

Stefan recommended to me for V10 that I avoid using different 
abbreviations such as "dbm" to avoid lengthy functions, so I removed any 
mention of "DBM" from my patches. We should coordinate and ensure any 
abbreviations we use for the BdrvDirtyBitmap functions are used 
consistently... or not at all.

> +void bdrv_dbm_store_data(const BdrvDirtyBitmap *bitmap, uint8_t *buf,
> +                         uint64_t start, uint64_t count)
> +{
> +    hbitmap_store_data(bitmap->bitmap, buf, start, count);
> +}
> +
> +void bdrv_dbm_restore_data(BdrvDirtyBitmap *bitmap, uint8_t *buf,
> +                           uint64_t start, uint64_t count)
> +{
> +    hbitmap_restore_data(bitmap->bitmap, buf, start, count);
> +}
> +
> +BdrvDirtyBitmap **bdrv_dbm_find_all_named(BlockDriverState *bs, int *count)
> +{
> +    BdrvDirtyBitmap *bm, **res, **iter;
> +    assert(count);
> +

Should force *count back to zero before usage.

> +    QLIST_FOREACH(bm, &bs->dirty_bitmaps, list) {
> +        if (bm->name != NULL) {
> +            (*count)++;
> +        }
> +    }
> +
> +    iter = res = g_malloc(sizeof(*res) * (*count));
> +    QLIST_FOREACH(bm, &bs->dirty_bitmaps, list) {
> +        if (bm->name != NULL) {
> +            *iter++ = bm;
> +        }
> +    }
> +
> +    return res;
> +}
> +
> +void bdrv_dbm_restore_finish(void)
> +{
> +    BlockDriverState *bs;
> +    BdrvDirtyBitmap *bm;
> +
> +    for (bs = bdrv_next(NULL); bs != NULL; bs = bdrv_next(bs)) {
> +        QLIST_FOREACH(bm, &bs->dirty_bitmaps, list) {
> +            if (bm->name != NULL) {
> +                hbitmap_restore_finish(bm->bitmap);
> +            }
> +        }
> +    }
> +}
> +
> +void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
> +                    int nr_sectors)

Is removing the static keyword here for set_dirty intentional?

>   {
>       BdrvDirtyBitmap *bitmap;
>       QLIST_FOREACH(bitmap, &bs->dirty_bitmaps, list) {
> diff --git a/include/block/block.h b/include/block/block.h
> index b21233c..09eff80 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -459,6 +459,16 @@ void bdrv_dirty_iter_init(BlockDriverState *bs,
>   void bdrv_dirty_iter_set(struct HBitmapIter *hbi, int64_t offset);
>   int64_t bdrv_get_dirty_count(BlockDriverState *bs, BdrvDirtyBitmap *bitmap);
>
> +uint64_t bdrv_dbm_data_size(const BdrvDirtyBitmap *bitmap, uint64_t count);
> +void bdrv_dbm_store_data(const BdrvDirtyBitmap *bitmap, uint8_t *buf,
> +                         uint64_t start, uint64_t count);
> +void bdrv_dbm_restore_data(BdrvDirtyBitmap *bitmap, uint8_t *buf,
> +                           uint64_t start, uint64_t count);
> +bool bdrv_dbm_is_named(BdrvDirtyBitmap *bitmap);
> +const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap);
> +BdrvDirtyBitmap **bdrv_dbm_find_all_named(BlockDriverState *bs, int *count);
> +void bdrv_dbm_restore_finish(void);
> +
>   void bdrv_enable_copy_on_read(BlockDriverState *bs);
>   void bdrv_disable_copy_on_read(BlockDriverState *bs);
>
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 6/9] block-migration: tiny refactoring
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 6/9] block-migration: tiny refactoring Vladimir Sementsov-Ogievskiy
@ 2015-01-08 21:23   ` John Snow
  2015-01-14 12:26     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 47+ messages in thread
From: John Snow @ 2015-01-08 21:23 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel; +Cc: kwolf, den, stefanha



On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote:
> Add blk_create and blk_free to remove code duplicates. Otherwise,
> duplicates will rise in the following patches because of BlkMigBlock
> sturcture extendin.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>   block-migration.c | 56 +++++++++++++++++++++++++++++--------------------------
>   1 file changed, 30 insertions(+), 26 deletions(-)
>
> diff --git a/block-migration.c b/block-migration.c
> index 5b4aa0f..d0c825f 100644
> --- a/block-migration.c
> +++ b/block-migration.c
> @@ -113,6 +113,30 @@ static void blk_mig_unlock(void)
>       qemu_mutex_unlock(&block_mig_state.lock);
>   }
>
> +/* Only allocating and initializing structure fields, not copying any data. */
> +
> +static BlkMigBlock *blk_create(BlkMigDevState *bmds, int64_t sector,
> +                                int nr_sectors)
> +{
> +    BlkMigBlock *blk = g_new(BlkMigBlock, 1);
> +    blk->buf = g_malloc(BLOCK_SIZE);
> +    blk->bmds = bmds;
> +    blk->sector = sector;
> +    blk->nr_sectors = nr_sectors;
> +
> +    blk->iov.iov_base = blk->buf;
> +    blk->iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE;
> +    qemu_iovec_init_external(&blk->qiov, &blk->iov, 1);
> +
> +    return blk;
> +}
> +
> +static void blk_free(BlkMigBlock *blk)
> +{
> +    g_free(blk->buf);
> +    g_free(blk);
> +}
> +
>   /* Must run outside of the iothread lock during the bulk phase,
>    * or the VM will stall.
>    */
> @@ -285,15 +309,7 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
>           nr_sectors = total_sectors - cur_sector;
>       }
>
> -    blk = g_new(BlkMigBlock, 1);
> -    blk->buf = g_malloc(BLOCK_SIZE);
> -    blk->bmds = bmds;
> -    blk->sector = cur_sector;
> -    blk->nr_sectors = nr_sectors;
> -
> -    blk->iov.iov_base = blk->buf;
> -    blk->iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE;
> -    qemu_iovec_init_external(&blk->qiov, &blk->iov, 1);
> +    blk = blk_create(bmds, cur_sector, nr_sectors);
>
>       blk_mig_lock();
>       block_mig_state.submitted++;
> @@ -467,17 +483,9 @@ static int mig_save_device_dirty(QEMUFile *f, BlkMigDevState *bmds,
>               } else {
>                   nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
>               }
> -            blk = g_new(BlkMigBlock, 1);
> -            blk->buf = g_malloc(BLOCK_SIZE);
> -            blk->bmds = bmds;
> -            blk->sector = sector;
> -            blk->nr_sectors = nr_sectors;
> +            blk = blk_create(bmds, sector, nr_sectors);
>
>               if (is_async) {
> -                blk->iov.iov_base = blk->buf;
> -                blk->iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE;
> -                qemu_iovec_init_external(&blk->qiov, &blk->iov, 1);
> -

I suppose in the (!is_async) branch we don't reference iov/qiov again, 
but the functional difference caught my eye. It used to only be called 
under the "is_async" branch, but now is going to be executed 
unconditionally.

Is that fine?

>                   blk->aiocb = bdrv_aio_readv(bmds->bs, sector, &blk->qiov,
>                                               nr_sectors, blk_mig_read_cb, blk);
>
> @@ -492,8 +500,7 @@ static int mig_save_device_dirty(QEMUFile *f, BlkMigDevState *bmds,
>                   }
>                   blk_send(f, blk);
>
> -                g_free(blk->buf);
> -                g_free(blk);
> +                blk_free(blk);
>               }
>
>               bdrv_reset_dirty_bitmap(bmds->bs, bmds->dirty_bitmap, sector,
> @@ -508,8 +515,7 @@ static int mig_save_device_dirty(QEMUFile *f, BlkMigDevState *bmds,
>
>   error:
>       DPRINTF("Error reading sector %" PRId64 "\n", sector);
> -    g_free(blk->buf);
> -    g_free(blk);
> +    blk_free(blk);
>       return ret;
>   }
>
> @@ -560,8 +566,7 @@ static int flush_blks(QEMUFile *f)
>           blk_send(f, blk);
>           blk_mig_lock();
>
> -        g_free(blk->buf);
> -        g_free(blk);
> +        blk_free(blk);
>
>           block_mig_state.read_done--;
>           block_mig_state.transferred++;
> @@ -612,8 +617,7 @@ static void blk_mig_cleanup(void)
>
>       while ((blk = QSIMPLEQ_FIRST(&block_mig_state.blk_list)) != NULL) {
>           QSIMPLEQ_REMOVE_HEAD(&block_mig_state.blk_list, entry);
> -        g_free(blk->buf);
> -        g_free(blk);
> +        blk_free(blk);
>       }
>       blk_mig_unlock();
>   }
>

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 7/9] block-migration: remove not needed iothread lock
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 7/9] block-migration: remove not needed iothread lock Vladimir Sementsov-Ogievskiy
@ 2015-01-08 21:24   ` John Snow
  2015-01-16 12:54     ` Vladimir Sementsov-Ogievskiy
  2015-01-08 22:28   ` Paolo Bonzini
  1 sibling, 1 reply; 47+ messages in thread
From: John Snow @ 2015-01-08 21:24 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel; +Cc: kwolf, den, stefanha



On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote:
> Instead of locking iothread, we can just swap these calls. So, if some
> write to our range occures before resetting the bitmap, then it will
> get into subsequent aio read, becouse it occures, in any case, after
> resetting the bitmap.
>

s/occures/occurs/g
s/becouse/because/g

(I hope you are not annoyed by the spelling corrections: They are in 
good faith and personally I would hope people would correct any of my 
spelling mistakes before it goes in the git log!)

> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>   block-migration.c | 6 ++----
>   1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/block-migration.c b/block-migration.c
> index d0c825f..908a66d 100644
> --- a/block-migration.c
> +++ b/block-migration.c
> @@ -315,13 +315,11 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
>       block_mig_state.submitted++;
>       blk_mig_unlock();
>
> -    qemu_mutex_lock_iothread();
> +    bdrv_reset_dirty_bitmap(bs, bmds->dirty_bitmap, cur_sector, nr_sectors);
> +
>       blk->aiocb = bdrv_aio_readv(bs, cur_sector, &blk->qiov,
>                                   nr_sectors, blk_mig_read_cb, blk);
>
> -    bdrv_reset_dirty_bitmap(bs, bmds->dirty_bitmap, cur_sector, nr_sectors);
> -    qemu_mutex_unlock_iothread();
> -
>       bmds->cur_sector = cur_sector + nr_sectors;
>       return (bmds->cur_sector >= total_sectors);
>   }
>

OK, so the justification here is that by ordering it as "reset, read" 
that any writes that occur after the reset will be simply included in 
the read, and we won't have reset any bits that we didn't actually get a 
chance to read.

OK.

But what about losing the ability to clear bits that are set needlessly?

e.g.:

Sector 1 is dirty. Sector 2 is clean.
We reset the bitmap. All sectors are clean.
A write occurs to sector 2, marking it dirty.
We read sectors one and two.
Sector two is now erroneously marked dirty, when it isn't.

It's not a data integrity problem, but we're swapping one inefficiency 
for another.

Do you have a justification for why this is better than the lock?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] hbitmap: store / restore
  2015-01-08 21:21   ` John Snow
@ 2015-01-08 21:37     ` Paolo Bonzini
  2015-01-13 12:59     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 47+ messages in thread
From: Paolo Bonzini @ 2015-01-08 21:37 UTC (permalink / raw)
  To: John Snow, Vladimir Sementsov-Ogievskiy, qemu-devel; +Cc: kwolf, den, stefanha



On 08/01/2015 22:21, John Snow wrote:
> Why are the conversions to little endian, though? Shouldn't we be
> serializing to a Big Endian format?

Because reading two 32-bit little-endian longs or a 64-bit little-endian
long gives the same value.  This is not true for big-endian.

Take the following two 32-bit values:

   0x04030201    0x08070605

representing offsets 0 and 32 in the bitmap.  Parse them back as one
64-bit little endian value:

   0x0807060504030201 = 0x04030201 + 0x08070605 * 2^32

Now parse them as as one 64-bit big endian value:

   0x0403020108070605 => 0x08070605 + 0x04030201 * 2^32

so the values are swapped.

>> +void hbitmap_restore_data(HBitmap *hb, uint8_t *buf,
>> +                          uint64_t start, uint64_t count)
>> +{
>> +    uint64_t last = start + count - 1;
>> +    unsigned long *in = (unsigned long *)buf;
>> +
>> +    if (count == 0) {
>> +        return;
>> +    }
>> +
>> +    start = (start >> hb->granularity) >> BITS_PER_LEVEL;
>> +    last = (last >> hb->granularity) >> BITS_PER_LEVEL;
>> +    count = last - start + 1;
>> +
>> +#ifdef __BIG_ENDIAN_BITFIELD
>> +    for (i = start; i <= last; ++i) {
>> +        hb->levels[HBITMAP_LEVELS - 1][i] =
>> +            (BITS_PER_LONG == 32 ? be32_to_cpu(in[i]) :
>> be64_to_cpu(in[i]));
>> +    }
>> +#else
>> +    memcpy(&hb->levels[HBITMAP_LEVELS - 1][start], in,
>> +           count * sizeof(unsigned long));
>> +#endif
>> +}
>> +
> 
> ...? We're storing as LE but restoring from BE? I'm confused.
> 
> I'm also not clear on the __BIG_ENDIAN_BITFIELD macro. Why do we want to
> pack differently based on how C-bitfields are packed by the compiler? I
> don't think that has any impact on how longs are stored (always in the
> host native format.)

I agree.

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 8/9] migration: add dirty parameter
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 8/9] migration: add dirty parameter Vladimir Sementsov-Ogievskiy
  2014-12-11 15:18   ` Eric Blake
@ 2015-01-08 21:51   ` John Snow
  2015-01-08 22:29     ` Eric Blake
  2015-01-08 22:37     ` Paolo Bonzini
  1 sibling, 2 replies; 47+ messages in thread
From: John Snow @ 2015-01-08 21:51 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel; +Cc: kwolf, den, stefanha

CC'ing Eric Blake for monitor interface review.

On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote:
> Add dirty parameter to qmp-migrate command. If this parameter is true,
> block-migration.c will migrate dirty bitmaps. This parameter can be used
> without "blk" parameter to migrate only dirty bitmaps, skipping block
> migration.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>   hmp-commands.hx               | 10 ++++++----
>   hmp.c                         |  4 +++-
>   include/migration/migration.h |  1 +
>   migration.c                   |  4 +++-
>   qapi-schema.json              |  2 +-
>   qmp-commands.hx               |  5 ++++-
>   savevm.c                      |  3 ++-
>   7 files changed, 20 insertions(+), 9 deletions(-)
>
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index e37bc8b..d421695 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -887,23 +887,25 @@ ETEXI
>
>       {
>           .name       = "migrate",
> -        .args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
> +        .args_type  = "detach:-d,blk:-b,inc:-i,dirty:-D,uri:s",
>           .params     = "[-d] [-b] [-i] uri",

^ Missing the new -D parameter?

>           .help       = "migrate to URI (using -d to not wait for completion)"
>   		      "\n\t\t\t -b for migration without shared storage with"
>   		      " full copy of disk\n\t\t\t -i for migration without "
> -		      "shared storage with incremental copy of disk "
> -		      "(base image shared between src and destination)",
> +		      "shared storage with incremental copy of disk\n\t\t\t"
> +		      " -D for migration of named dirty bitmaps as well\n\t\t\t"
> +		      " (base image shared between src and destination)",
>           .mhandler.cmd = hmp_migrate,
>       },
>
>
>   STEXI
> -@item migrate [-d] [-b] [-i] @var{uri}
> +@item migrate [-d] [-b] [-i] [-D] @var{uri}
>   @findex migrate
>   Migrate to @var{uri} (using -d to not wait for completion).
>   	-b for migration with full copy of disk
>   	-i for migration with incremental copy of disk (base image is shared)
> +	-D for migration of named dirty bitmaps
>   ETEXI
>
>       {
> diff --git a/hmp.c b/hmp.c
> index 17a2a2b..771f666 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -1340,10 +1340,12 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
>       int detach = qdict_get_try_bool(qdict, "detach", 0);
>       int blk = qdict_get_try_bool(qdict, "blk", 0);
>       int inc = qdict_get_try_bool(qdict, "inc", 0);
> +    int dirty = qdict_get_try_bool(qdict, "dirty", 0);
>       const char *uri = qdict_get_str(qdict, "uri");
>       Error *err = NULL;
>
> -    qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, &err);
> +    qmp_migrate(uri, !!blk, blk, !!inc, inc, !!dirty, dirty,
> +                false, false, &err);
>       if (err) {
>           monitor_printf(mon, "migrate: %s\n", error_get_pretty(err));
>           error_free(err);
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 3cb5ba8..48d71d3 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -37,6 +37,7 @@
>   struct MigrationParams {
>       bool blk;
>       bool shared;
> +    bool dirty;
>   };
>
>   typedef struct MigrationState MigrationState;
> diff --git a/migration.c b/migration.c
> index c49a05a..e7bb7f3 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -404,7 +404,8 @@ void migrate_del_blocker(Error *reason)
>   }
>
>   void qmp_migrate(const char *uri, bool has_blk, bool blk,
> -                 bool has_inc, bool inc, bool has_detach, bool detach,
> +                 bool has_inc, bool inc, bool has_dirty, bool dirty,
> +                 bool has_detach, bool detach,
>                    Error **errp)
>   {
>       Error *local_err = NULL;
> @@ -414,6 +415,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
>
>       params.blk = has_blk && blk;
>       params.shared = has_inc && inc;
> +    params.dirty = has_dirty && dirty;
>
>       if (s->state == MIG_STATE_ACTIVE || s->state == MIG_STATE_SETUP ||
>           s->state == MIG_STATE_CANCELLING) {
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 793031b..203f4f0 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -1660,7 +1660,7 @@
>   # Since: 0.14.0
>   ##

You'll want to document the new parameter just above here, and add in 
the "since 2.3" blurb.

>   { 'command': 'migrate',
> -  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' } }
> +  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*dirty': 'bool', '*detach': 'bool' } }
>
>   # @xen-save-devices-state:
>   #
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index 1eba583..e41d033 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -610,7 +610,7 @@ EQMP
>
>       {
>           .name       = "migrate",
> -        .args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
> +        .args_type  = "detach:-d,blk:-b,inc:-i,dirty:-D,uri:s",
>           .mhandler.cmd_new = qmp_marshal_input_migrate,
>       },
>
> @@ -624,6 +624,7 @@ Arguments:
>
>   - "blk": block migration, full disk copy (json-bool, optional)
>   - "inc": incremental disk copy (json-bool, optional)
> +- "dirty": migrate named dirty bitmaps (json-bool, optional)
>   - "uri": Destination URI (json-string)
>
>   Example:
> @@ -638,6 +639,8 @@ Notes:
>   (2) All boolean arguments default to false
>   (3) The user Monitor's "detach" argument is invalid in QMP and should not
>       be used
> +(4) The "dirty" argument may be used without "blk", to migrate only dirty
> +    bitmaps
>
>   EQMP
>
> diff --git a/savevm.c b/savevm.c
> index 08ec678..a598d1d 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -784,7 +784,8 @@ static int qemu_savevm_state(QEMUFile *f)
>       int ret;
>       MigrationParams params = {
>           .blk = 0,
> -        .shared = 0
> +        .shared = 0,
> +        .dirty = 0
>       };
>
>       if (qemu_savevm_state_blocked(NULL)) {
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration Vladimir Sementsov-Ogievskiy
@ 2015-01-08 22:05   ` John Snow
  2015-01-17 17:17     ` Vladimir Sementsov-Ogievskiy
  2015-01-08 22:36   ` Paolo Bonzini
  2015-01-12 16:48   ` Paolo Bonzini
  2 siblings, 1 reply; 47+ messages in thread
From: John Snow @ 2015-01-08 22:05 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, Juan Q >> Juan Jose Quintela Carreira,
	gilbert >> Dr. David Alan Gilbert, stefanha, Amit Shah,
	den

CCing migration maintainers, feedback otherwise in-line.

On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote:
> Just migrate parts of dirty bitmaps, corresponding to the block being
> migrated. Also, skip block migration if it is disabled (blk parameter
> of migrate command is false).
>
> Skipping shared sectors: bitmaps are migrated independently of this,
> just send blk without block data.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---

In terms of general approach, migrating the dirty bitmap alongside the 
blocks it describes makes sense when migrating both.

Is this a lot of overhead when that's not the case, though? If we 
utilize the "bitmap only" pathways added here, don't we iterate through 
the savevm handlers a lot to only transfer very little data per section?

If we really need migration of bitmaps apart from the data they 
describe, is it not worth developing a more condensed transfer mechanism 
to get more bitmap data per iteration, instead of just one block's worth?

>   block-migration.c | 173 +++++++++++++++++++++++++++++++++++++++++++++++-------
>   savevm.c          |   1 +
>   2 files changed, 154 insertions(+), 20 deletions(-)
>
> diff --git a/block-migration.c b/block-migration.c
> index 908a66d..95d54a1 100644
> --- a/block-migration.c
> +++ b/block-migration.c
> @@ -32,6 +32,8 @@
>   #define BLK_MIG_FLAG_EOS                0x02
>   #define BLK_MIG_FLAG_PROGRESS           0x04
>   #define BLK_MIG_FLAG_ZERO_BLOCK         0x08
> +#define BLK_MIG_FLAG_HAS_BLOCK          0x10
> +#define BLK_MIG_FLAG_HAS_BITMAPS        0x20
>

OK: As a result of allowing bitmaps to be migrated without the block 
data itself, we now must acknowledge the concept that we can send either 
block data, bitmaps, both, or neither -- so new defines are added here 
to indicate what data can be found in the section following.

>   #define MAX_IS_ALLOCATED_SEARCH 65536
>
> @@ -51,6 +53,8 @@ typedef struct BlkMigDevState {
>       int shared_base;
>       int64_t total_sectors;
>       QSIMPLEQ_ENTRY(BlkMigDevState) entry;
> +    int nr_bitmaps;
> +    BdrvDirtyBitmap **dirty_bitmaps;
>
>       /* Only used by migration thread.  Does not need a lock.  */
>       int bulk_completed;
> @@ -64,6 +68,11 @@ typedef struct BlkMigDevState {
>       Error *blocker;
>   } BlkMigDevState;
>
> +typedef struct BlkMigDirtyBitmap {
> +    BdrvDirtyBitmap *bitmap;
> +    uint8_t *buf;
> +} BlkMigDirtyBitmap;
> +
>   typedef struct BlkMigBlock {
>       /* Only used by migration thread.  */
>       uint8_t *buf;
> @@ -74,6 +83,9 @@ typedef struct BlkMigBlock {
>       QEMUIOVector qiov;
>       BlockAIOCB *aiocb;
>
> +    int nr_bitmaps;
> +    BlkMigDirtyBitmap *dirty_bitmaps;
> +
>       /* Protected by block migration lock.  */
>       int ret;
>       QSIMPLEQ_ENTRY(BlkMigBlock) entry;
> @@ -83,6 +95,7 @@ typedef struct BlkMigState {
>       /* Written during setup phase.  Can be read without a lock.  */
>       int blk_enable;
>       int shared_base;
> +    int dbm_enable;

Similar to the feedback in a previous patch, we may not want to use 
'dbm' to mean "Dirty Bitmap" because we have not been applying the 
abbreviation consistently.

For now, the recommendation from stefan is to use the full 
"bdrv_dirty_bitmap" or "dirty_bitmap" in function names.

If we do want an acronym to refer to this particular type of dirty 
bitmap, we should apply it consistently to all functions that work with 
the BdrvDirtyBitmap type.

For now, "bdrv_dirty_bitmap_enable" should suffice, even though it's a 
bit wordy.

>       QSIMPLEQ_HEAD(bmds_list, BlkMigDevState) bmds_list;
>       int64_t total_sector_sum;
>       bool zero_blocks;
> @@ -116,27 +129,63 @@ static void blk_mig_unlock(void)
>   /* Only allocating and initializing structure fields, not copying any data. */
>
>   static BlkMigBlock *blk_create(BlkMigDevState *bmds, int64_t sector,
> -                                int nr_sectors)
> +                               int nr_sectors, bool only_bitmaps)
>   {
> +    int i;
>       BlkMigBlock *blk = g_new(BlkMigBlock, 1);
> -    blk->buf = g_malloc(BLOCK_SIZE);
> +    blk->buf = only_bitmaps ? NULL : g_malloc(BLOCK_SIZE);
>       blk->bmds = bmds;
>       blk->sector = sector;
>       blk->nr_sectors = nr_sectors;
> +    blk->dirty_bitmaps = NULL;
> +    blk->nr_bitmaps = 0;
>
>       blk->iov.iov_base = blk->buf;
>       blk->iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE;
>       qemu_iovec_init_external(&blk->qiov, &blk->iov, 1);
>

We can probably skip this block if only_bitmaps is true.

> +    if (block_mig_state.dbm_enable) {
> +        BlkMigDirtyBitmap *mig_bm;
> +
> +        blk->nr_bitmaps = bmds->nr_bitmaps;
> +        mig_bm = blk->dirty_bitmaps = g_new(BlkMigDirtyBitmap,
> +                                            bmds->nr_bitmaps);
> +        for (i = 0; i < bmds->nr_bitmaps; ++i) {
> +            BdrvDirtyBitmap *bm = bmds->dirty_bitmaps[i];
> +            mig_bm->buf = g_malloc(bdrv_dbm_data_size(bm, nr_sectors));
> +            mig_bm->bitmap = bm;
> +            mig_bm++;
> +        }
> +    }
> +

It's strange to me that we'd give it an "only bitmaps" boolean and then 
rely on a different condition to populate them. Is it not worthy of an 
error if we specify only_bitmaps when block_mig_state.dbm_enable is false?

>       return blk;
>   }
>
>   static void blk_free(BlkMigBlock *blk)
>   {
> +    int i;
>       g_free(blk->buf);
> +
> +    if (blk->dirty_bitmaps) {
> +        for (i = 0; i < blk->nr_bitmaps; ++i) {
> +            g_free(blk->dirty_bitmaps[i].buf);
> +        }
> +        g_free(blk->dirty_bitmaps);
> +    }
> +
>       g_free(blk);
>   }
>
> +static void blk_store_bitmaps(BlkMigBlock *blk)
> +{
> +    int i;
> +    for (i = 0; i < blk->nr_bitmaps; ++i) {
> +        bdrv_dbm_store_data(blk->dirty_bitmaps[i].bitmap,
> +                            blk->dirty_bitmaps[i].buf,
> +                            blk->sector, blk->nr_sectors);
> +    }
> +}
> +

I am worried about the case in which blk->nr_sectors is not an integral 
multiple of the bitmap sector granularity, for reasons discussed in the 
previous patches.

I'm not positive it IS a problem either, but maybe you have already 
thought about this.

>   /* Must run outside of the iothread lock during the bulk phase,
>    * or the VM will stall.
>    */
> @@ -146,9 +195,15 @@ static void blk_send(QEMUFile *f, BlkMigBlock * blk)
>       int len;
>       uint64_t flags = BLK_MIG_FLAG_DEVICE_BLOCK;
>
> -    if (block_mig_state.zero_blocks &&
> +    if (block_mig_state.zero_blocks && blk->buf &&
>           buffer_is_zero(blk->buf, BLOCK_SIZE)) {
>           flags |= BLK_MIG_FLAG_ZERO_BLOCK;
> +    } else if (blk->buf) {
> +        flags |= BLK_MIG_FLAG_HAS_BLOCK;
> +    }
> +
> +    if (block_mig_state.dbm_enable) {
> +        flags |= BLK_MIG_FLAG_HAS_BITMAPS;
>       }
>
>       /* sector number and flags */
> @@ -160,10 +215,27 @@ static void blk_send(QEMUFile *f, BlkMigBlock * blk)
>       qemu_put_byte(f, len);
>       qemu_put_buffer(f, (uint8_t *)bdrv_get_device_name(blk->bmds->bs), len);
>
> +    /* dirty bitmaps */
> +    if (flags & BLK_MIG_FLAG_HAS_BITMAPS) {
> +        int i;
> +        qemu_put_be32(f, blk->nr_bitmaps);
> +        for (i = 0; i < blk->nr_bitmaps; ++i) {
> +            BdrvDirtyBitmap *bm = blk->dirty_bitmaps[i].bitmap;
> +            int buf_size = bdrv_dbm_data_size(bm, blk->nr_sectors);
> +            const char *name = bdrv_dirty_bitmap_name(bm);
> +            int len = strlen(name);
> +
> +            qemu_put_byte(f, len);
> +            qemu_put_buffer(f, (const uint8_t *)name, len);

No length in the stream for the size of the dirty bitmap buffer?

> +            qemu_put_buffer(f, blk->dirty_bitmaps[i].buf, buf_size);
> +        }
> +    }
> +

Future patch idea: We can probably optimize this block by testing a 
range of sectors on the bitmap and skipping bitmaps that don't have data 
for this block.

Since we populated the bitmaps previously, we should already have the 
chance to have counted how many of them are nonempty, so we can use that 
number instead of blk->nr_bitmaps.

Not important for the series right now.

>       /* if a block is zero we need to flush here since the network
>        * bandwidth is now a lot higher than the storage device bandwidth.
> -     * thus if we queue zero blocks we slow down the migration */
> -    if (flags & BLK_MIG_FLAG_ZERO_BLOCK) {
> +     * thus if we queue zero blocks we slow down the migration.
> +     * also, skip writing block when migrate only dirty bitmaps. */
> +    if (!(flags & BLK_MIG_FLAG_HAS_BLOCK)) {

I'm not completely clear on the reasoning behind the original codeblock 
here, but changing it from "Only when zeroes" to "Not when we have data: 
Zeroes and Bitmaps" makes sense, since bitmap-only sections are going to 
be very sparse too.

>           qemu_fflush(f);
>           return;
>       }

A little after this, there's a call to qemu_put_buffer(f, blk->buf, 
BLOCK_SIZE), but we have the case where blk->buf may be NULL in bitmap 
only cases. I think we need to guard against that.

> @@ -282,13 +354,20 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
>       BlockDriverState *bs = bmds->bs;
>       BlkMigBlock *blk;
>       int nr_sectors;
> +    bool skip_chunk = false;
>
>       if (bmds->shared_base) {
>           qemu_mutex_lock_iothread();
> -        while (cur_sector < total_sectors &&
> -               !bdrv_is_allocated(bs, cur_sector, MAX_IS_ALLOCATED_SEARCH,
> -                                  &nr_sectors)) {
> -            cur_sector += nr_sectors;
> +        if (block_mig_state.dbm_enable) {
> +            bdrv_is_allocated(bs, cur_sector, BDRV_SECTORS_PER_DIRTY_CHUNK,
> +                              &nr_sectors);
> +            skip_chunk = nr_sectors >= BDRV_SECTORS_PER_DIRTY_CHUNK;
> +        } else {
> +            while (cur_sector < total_sectors &&
> +                   !bdrv_is_allocated(bs, cur_sector, MAX_IS_ALLOCATED_SEARCH,
> +                                      &nr_sectors)) {
> +                cur_sector += nr_sectors;
> +            }

OK, so the approach taken here is that if bitmaps are enabled, we check 
only to see if this current chunk is allocated or not. If it isn't, we 
declare this a "bitmap only" chunk and set skip_chunk to true.

The else clause taken implies no bitmap data, because either dbm_enable 
or blk_enable (or both) MUST be set for us to be here.

(1) Why are you not checking the return value of bdrv_is_allocated? It 
could return either true or false AND nr_sectors == 
BDRV_SECTORS_PER_DIRTY_CHUNK, provided it was entirely allocated or 
entirely unallocated, so I think this condition is not working as you 
intend it to.

(2) This seems slightly hackey in a sparse or no-data situation. We will 
transmit many small data sections, each containing a chunk of bitmap 
data, one at a time, with no data interspersed -- which leaves a high 
metadata-to-data ratio in these cases.

Would it be an involved process to alter the nr_sectors count to be 
something that isn't a single sector in the (dbm_enable && !blk_enable) 
case? That way we can spin until we find data worth transferring and 
transfer all of the bitmap metadata in a larger chunk all at once.

Whatever approach you choose, it might be nice to have a comment in-line 
explaining the general approach.

>           }
>           qemu_mutex_unlock_iothread();
>       }
> @@ -309,7 +388,8 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
>           nr_sectors = total_sectors - cur_sector;
>       }
>
> -    blk = blk_create(bmds, cur_sector, nr_sectors);
> +    blk = blk_create(bmds, cur_sector, nr_sectors,
> +                     skip_chunk || !block_mig_state.blk_enable);
>
>       blk_mig_lock();
>       block_mig_state.submitted++;
> @@ -317,8 +397,16 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
>
>       bdrv_reset_dirty_bitmap(bs, bmds->dirty_bitmap, cur_sector, nr_sectors);
>
> -    blk->aiocb = bdrv_aio_readv(bs, cur_sector, &blk->qiov,
> -                                nr_sectors, blk_mig_read_cb, blk);
> +    if (block_mig_state.dbm_enable) {
> +        blk_store_bitmaps(blk);
> +    }
> +
> +    if (block_mig_state.blk_enable && !skip_chunk) {
> +        blk->aiocb = bdrv_aio_readv(bs, cur_sector, &blk->qiov,
> +                                    nr_sectors, blk_mig_read_cb, blk);
> +    } else if (block_mig_state.dbm_enable) {
> +        blk_mig_read_cb(blk, 0);
> +    }
>
>       bmds->cur_sector = cur_sector + nr_sectors;
>       return (bmds->cur_sector >= total_sectors);
> @@ -403,6 +491,8 @@ static void init_blk_migration(QEMUFile *f)
>               DPRINTF("Start full migration for %s\n", bdrv_get_device_name(bs));
>           }
>
> +        bmds->dirty_bitmaps = bdrv_dbm_find_all_named(bs, &bmds->nr_bitmaps);
> +

At this point, we should have block_mig_state.dbm_enable set (or 
disabled), so we can skip this initialization if it is not set.

>           QSIMPLEQ_INSERT_TAIL(&block_mig_state.bmds_list, bmds, entry);
>       }
>   }
> @@ -481,20 +571,32 @@ static int mig_save_device_dirty(QEMUFile *f, BlkMigDevState *bmds,
>               } else {
>                   nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
>               }
> -            blk = blk_create(bmds, sector, nr_sectors);
> +            blk = blk_create(bmds, sector, nr_sectors,
> +                             !block_mig_state.blk_enable);
> +
> +            if (block_mig_state.dbm_enable) {
> +                blk_store_bitmaps(blk);
> +            }
>
>               if (is_async) {
> -                blk->aiocb = bdrv_aio_readv(bmds->bs, sector, &blk->qiov,
> -                                            nr_sectors, blk_mig_read_cb, blk);
> +                if (block_mig_state.blk_enable) {
> +                    blk->aiocb = bdrv_aio_readv(bmds->bs, sector, &blk->qiov,
> +                                                nr_sectors, blk_mig_read_cb,
> +                                                blk);
> +                } else {
> +                    blk_mig_read_cb(blk, 0);
> +                }
>
>                   blk_mig_lock();
>                   block_mig_state.submitted++;
>                   bmds_set_aio_inflight(bmds, sector, nr_sectors, 1);
>                   blk_mig_unlock();
>               } else {
> -                ret = bdrv_read(bmds->bs, sector, blk->buf, nr_sectors);
> -                if (ret < 0) {
> -                    goto error;
> +                if (block_mig_state.blk_enable) {
> +                    ret = bdrv_read(bmds->bs, sector, blk->buf, nr_sectors);
> +                    if (ret < 0) {
> +                        goto error;
> +                    }
>                   }
>                   blk_send(f, blk);
>
> @@ -817,10 +919,39 @@ static int block_load(QEMUFile *f, void *opaque, int version_id)
>                   nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
>               }
>
> +            /* load dirty bitmaps */
> +            if (flags & BLK_MIG_FLAG_HAS_BITMAPS) {
> +                int i, nr_bitmaps = qemu_get_be32(f);
> +
> +                for (i = 0; i < nr_bitmaps; ++i) {
> +                    char bitmap_name[256];
> +                    BdrvDirtyBitmap *bitmap;
> +                    int buf_size, len;
> +
> +                    len = qemu_get_byte(f);
> +                    qemu_get_buffer(f, (uint8_t *)bitmap_name, len);
> +                    bitmap_name[len] = '\0';
> +
> +                    bitmap = bdrv_find_dirty_bitmap(bs, bitmap_name);
> +                    if (!bitmap) {
> +                        fprintf(stderr, "Error unknown dirty bitmap\
> +                                %s for block device %s\n",
> +                                bitmap_name, device_name);
> +                        return -EINVAL;
> +                    }
> +
> +                    buf_size = bdrv_dbm_data_size(bitmap, nr_sectors);

Oh, this is why you didn't store the length... Still, since it seems as 
if you expect the bitmap to already exist on the destination, what if 
the granularity or size is different? It might be nice to explicitly 
state the size of the buffer -- one user misconfiguration and you might 
blow the rest of the migration.

This way, if the calculated size doesn't match the received size, you 
have the chance to throw a meaningful error.

> +                    buf = g_malloc(buf_size);
> +                    qemu_get_buffer(f, buf, buf_size);
> +                    bdrv_dbm_restore_data(bitmap, buf, addr, nr_sectors);
> +                    g_free(buf);
> +                }
> +            }
> +

So with this approach, the user needs to have created the bitmaps 
already? Would it be possible to create a mode where they don't, or am I 
misreading something?

We could re-create them on the fly whenever the offset is 0 on the 
receiving side, the only other piece of information we'd really need at 
that point is the granularity.

>               if (flags & BLK_MIG_FLAG_ZERO_BLOCK) {
>                   ret = bdrv_write_zeroes(bs, addr, nr_sectors,
>                                           BDRV_REQ_MAY_UNMAP);
> -            } else {
> +            } else if (flags & BLK_MIG_FLAG_HAS_BLOCK) {
>                   buf = g_malloc(BLOCK_SIZE);
>                   qemu_get_buffer(f, buf, BLOCK_SIZE);
>                   ret = bdrv_write(bs, addr, buf, nr_sectors);
> @@ -855,6 +986,7 @@ static void block_set_params(const MigrationParams *params, void *opaque)
>   {
>       block_mig_state.blk_enable = params->blk;
>       block_mig_state.shared_base = params->shared;
> +    block_mig_state.dbm_enable = params->dirty;
>
>       /* shared base means that blk_enable = 1 */
>       block_mig_state.blk_enable |= params->shared;
> @@ -862,7 +994,8 @@ static void block_set_params(const MigrationParams *params, void *opaque)
>
>   static bool block_is_active(void *opaque)
>   {
> -    return block_mig_state.blk_enable == 1;
> +    return block_mig_state.blk_enable == 1 ||
> +           block_mig_state.dbm_enable == 1;
>   }
>
>   static SaveVMHandlers savevm_block_handlers = {
> diff --git a/savevm.c b/savevm.c
> index a598d1d..1299faa 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -983,6 +983,7 @@ int qemu_loadvm_state(QEMUFile *f)
>           }
>       }
>
> +    bdrv_dbm_restore_finish();

Do we not have a cleaner way to call this? I guess we have no explicit 
callback mechanisms for when a particular BDS is completely done, 
because of the multiple passes over the data that we perform --

Still, it might be nicer to have a callback for the whole of block 
migration, and call bdrv_dbm_restore_finish() from there instead of 
putting this implementation detail all the way up inside of 
qemu_loadvm_state.

Can we migrate an empty-data sentinel section from block migration upon 
completion that qemu_loadvm_state launches a post_load method for that 
can invoke bdrv_dbm_restore_finish for us?

Alternatively, can this not be safely done inside of block_load? 
somewhere around here:

printf("Completed %d %%%c", (int)addr,
                    (addr == 100) ? '\n' : '\r');
             fflush(stdout);

The "100%" completion flag is only written in block_save_complete(),
so once we read this flag we should have necessary and sufficient 
information to conclude that it's safe to do the BdrvDirtyBitmap 
migration post-load stuff.

Of course, maybe generic section post-load callbacks are useful to other 
devices, and I'd be fine with either approach.

CCing migration people for opinions.

>       cpu_synchronize_all_post_init();
>
>       ret = 0;
>


Thank you, and sorry it took me so long to digest the series!
--John Snow

P.S.: Happy New Year! I hope 2015 treats you well :~)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 7/9] block-migration: remove not needed iothread lock
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 7/9] block-migration: remove not needed iothread lock Vladimir Sementsov-Ogievskiy
  2015-01-08 21:24   ` John Snow
@ 2015-01-08 22:28   ` Paolo Bonzini
  2015-01-16 13:03     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 47+ messages in thread
From: Paolo Bonzini @ 2015-01-08 22:28 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel; +Cc: kwolf, den, jsnow, stefanha



On 11/12/2014 15:17, Vladimir Sementsov-Ogievskiy wrote:
> -    qemu_mutex_lock_iothread();
> +    bdrv_reset_dirty_bitmap(bs, bmds->dirty_bitmap, cur_sector, nr_sectors);
> +
>      blk->aiocb = bdrv_aio_readv(bs, cur_sector, &blk->qiov,
>                                  nr_sectors, blk_mig_read_cb, blk);
>  
> -    bdrv_reset_dirty_bitmap(bs, bmds->dirty_bitmap, cur_sector, nr_sectors);
> -    qemu_mutex_unlock_iothread();
> -

bdrv_aio_readv is not thread safe, so it needs the iothread lock.

bdrv_reset_dirty_bitmap is also not thread safe, because
bmds->dirty_bitmap is accessed concurrently by bdrv_reset_dirty
(discard) and bdrv_set_dirty (write).  So it needs the iothread lock too.

Moving bdrv_reset_dirty_bitmap before the read is a good idea.

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 8/9] migration: add dirty parameter
  2015-01-08 21:51   ` John Snow
@ 2015-01-08 22:29     ` Eric Blake
  2015-01-08 22:31       ` John Snow
  2015-01-08 22:37     ` Paolo Bonzini
  1 sibling, 1 reply; 47+ messages in thread
From: Eric Blake @ 2015-01-08 22:29 UTC (permalink / raw)
  To: John Snow, Vladimir Sementsov-Ogievskiy, qemu-devel; +Cc: kwolf, den, stefanha

[-- Attachment #1: Type: text/plain, Size: 934 bytes --]

On 01/08/2015 02:51 PM, John Snow wrote:
> CC'ing Eric Blake for monitor interface review.

Indeed, I already saw and reviewed the monitor interface in a mail dated
Dec 11.

> 
> On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Add dirty parameter to qmp-migrate command. If this parameter is true,
>> block-migration.c will migrate dirty bitmaps. This parameter can be used
>> without "blk" parameter to migrate only dirty bitmaps, skipping block
>> migration.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>

>> +        .args_type  = "detach:-d,blk:-b,inc:-i,dirty:-D,uri:s",
>>           .params     = "[-d] [-b] [-i] uri",
> 
> ^ Missing the new -D parameter?

But I missed this point :)  (I've been known to pay less attention to
HMP than I do to QMP)

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 8/9] migration: add dirty parameter
  2015-01-08 22:29     ` Eric Blake
@ 2015-01-08 22:31       ` John Snow
  0 siblings, 0 replies; 47+ messages in thread
From: John Snow @ 2015-01-08 22:31 UTC (permalink / raw)
  To: Eric Blake, Vladimir Sementsov-Ogievskiy, qemu-devel; +Cc: kwolf, den, stefanha



On 01/08/2015 05:29 PM, Eric Blake wrote:
> On 01/08/2015 02:51 PM, John Snow wrote:
>> CC'ing Eric Blake for monitor interface review.
>
> Indeed, I already saw and reviewed the monitor interface in a mail dated
> Dec 11.
>

Sorry, I missed that one. Thank you, though :)

>>
>> On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Add dirty parameter to qmp-migrate command. If this parameter is true,
>>> block-migration.c will migrate dirty bitmaps. This parameter can be used
>>> without "blk" parameter to migrate only dirty bitmaps, skipping block
>>> migration.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>
>>> +        .args_type  = "detach:-d,blk:-b,inc:-i,dirty:-D,uri:s",
>>>            .params     = "[-d] [-b] [-i] uri",
>>
>> ^ Missing the new -D parameter?
>
> But I missed this point :)  (I've been known to pay less attention to
> HMP than I do to QMP)
>

Sometimes robots miss things, it's fine ;)

-- 
—js

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration Vladimir Sementsov-Ogievskiy
  2015-01-08 22:05   ` John Snow
@ 2015-01-08 22:36   ` Paolo Bonzini
  2015-01-08 22:45     ` Eric Blake
  2015-01-12 14:20     ` Vladimir Sementsov-Ogievskiy
  2015-01-12 16:48   ` Paolo Bonzini
  2 siblings, 2 replies; 47+ messages in thread
From: Paolo Bonzini @ 2015-01-08 22:36 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel; +Cc: kwolf, den, jsnow, stefanha



On 11/12/2014 15:17, Vladimir Sementsov-Ogievskiy wrote:
> Just migrate parts of dirty bitmaps, corresponding to the block being
> migrated. Also, skip block migration if it is disabled (blk parameter
> of migrate command is false).
> 
> Skipping shared sectors: bitmaps are migrated independently of this,
> just send blk without block data.

I think dirty bitmaps should be migrated separately from the block
migration.  It makes a lot of sense to migrate them even if you are not
using block migration.

Compared to new flags, the availability of migration capabilities can be
queried via query-migrate-capabilities before actually invoking the
migration.  So they are preferred, and you should add a migration
capability instead of a new argument.

The bitmaps are transmitted many times in their entirety, but only the
last copy actually means something.  The others are lost.  This means
you should use the non-live interface (register_savevm).  This will
simplify the code a lot.

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 8/9] migration: add dirty parameter
  2015-01-08 21:51   ` John Snow
  2015-01-08 22:29     ` Eric Blake
@ 2015-01-08 22:37     ` Paolo Bonzini
  1 sibling, 0 replies; 47+ messages in thread
From: Paolo Bonzini @ 2015-01-08 22:37 UTC (permalink / raw)
  To: John Snow, Vladimir Sementsov-Ogievskiy, qemu-devel; +Cc: kwolf, den, stefanha



On 08/01/2015 22:51, John Snow wrote:
> CC'ing Eric Blake for monitor interface review.

See also my review of patch 9.

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration
  2015-01-08 22:36   ` Paolo Bonzini
@ 2015-01-08 22:45     ` Eric Blake
  2015-01-08 22:49       ` Paolo Bonzini
  2015-01-12 14:20     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 47+ messages in thread
From: Eric Blake @ 2015-01-08 22:45 UTC (permalink / raw)
  To: Paolo Bonzini, Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, den, jsnow, stefanha

[-- Attachment #1: Type: text/plain, Size: 2038 bytes --]

On 01/08/2015 03:36 PM, Paolo Bonzini wrote:
> 
> 
> On 11/12/2014 15:17, Vladimir Sementsov-Ogievskiy wrote:
>> Just migrate parts of dirty bitmaps, corresponding to the block being
>> migrated. Also, skip block migration if it is disabled (blk parameter
>> of migrate command is false).
>>
>> Skipping shared sectors: bitmaps are migrated independently of this,
>> just send blk without block data.
> 
> I think dirty bitmaps should be migrated separately from the block
> migration.  It makes a lot of sense to migrate them even if you are not
> using block migration.
> 
> Compared to new flags, the availability of migration capabilities can be
> queried via query-migrate-capabilities before actually invoking the
> migration.  So they are preferred, and you should add a migration
> capability instead of a new argument.

Good point - we still don't have argument introspection (only command
introspection), so turning on bitmap migration with a capability is
indeed preferable over having to probe whether attempting an optional
parameter will induce failure.

> 
> The bitmaps are transmitted many times in their entirety, but only the
> last copy actually means something.  The others are lost.  This means
> you should use the non-live interface (register_savevm).  This will
> simplify the code a lot.

If I understand correctly, you are saying that:

block migration vs. assuming shared storage during migration

is independent from:

current behavior of resetting dirty tracking on destination vs. new
one-shot non-live dirty bitmap migration

and that all four combinations are potentially useful.  Also, by doing
dirty bitmap migration as one-shot at the tail end of migration (in the
small window when things are no longer live) you have a lot less effort,
although the window of non-live activity is now a bit longer if the
dirty table migration is not efficient.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration
  2015-01-08 22:45     ` Eric Blake
@ 2015-01-08 22:49       ` Paolo Bonzini
  0 siblings, 0 replies; 47+ messages in thread
From: Paolo Bonzini @ 2015-01-08 22:49 UTC (permalink / raw)
  To: Eric Blake, Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, den, jsnow, stefanha

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On 08/01/2015 23:45, Eric Blake wrote:
>>> The bitmaps are transmitted many times in their entirety, but
>>> only the last copy actually means something.  The others are
>>> lost.  This means you should use the non-live interface
>>> (register_savevm).  This will simplify the code a lot.
> If I understand correctly, you are saying that:
> 
> block migration vs. assuming shared storage during migration
> 
> is independent from:
> 
> current behavior of resetting dirty tracking on destination vs.
> new one-shot non-live dirty bitmap migration
> 
> and that all four combinations are potentially useful.

Yes.  For example, you could use drive-mirror to snoop writes to an
NBD destination (e.g. to a malware scanner or to a secondary machine
if you have fault tolerance).  Then you need to migrate the dirty
bitmap even if you have shared storage.

> Also, by doing dirty bitmap migration as one-shot at the tail end
> of migration (in the small window when things are no longer live)
> you have a lot less effort, although the window of non-live
> activity is now a bit longer if the dirty table migration is not
> efficient.

As of this series, dirty bitmap migration is not attempting to track
the dirtiness of the dirty bitmap itself (you might have to read that
twice :)).  So there's nothing to lose in terms of efficiency.

Paolo
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBAgAGBQJUrwlwAAoJEL/70l94x66DSN4H+gLqqkghOqdkPzduTmf1hzi1
pwLe/TGZv+gNNWu8G2hO2FfasMlmBxEOVtwODFcwGExvPnH2jv1N7/dplIwKqbLS
g9Hq5cI3UagxfFQts7fyBhjCxeUrHuaKfIvc/A1kfMMwesn8PPGFUGEXNqIs1NGc
I+3qTE5TlcOKWCfL+3zgLDUowvRACbTJngHfveOtoWscVz743yQxhH3NOKPu7jxR
8H5AC9TK7sr3azHrXFgMP53W2qHT0KHTUyLQem+iKagFLsxX6oyOEn0ueMooAndE
tzvw+5EiFj8MfAzPAdQWEPYwxQyDBy/oXQizQTCmAQku2TZIlWi+kStJ3K/qEaM=
=v5Gj
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] block-migration: fix pending() return value
  2015-01-08 21:20   ` John Snow
@ 2015-01-09 19:01     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 47+ messages in thread
From: Dr. David Alan Gilbert @ 2015-01-09 19:01 UTC (permalink / raw)
  To: John Snow
  Cc: kwolf, juan quin >> Juan Jose Quintela Carreira,
	qemu-devel, Vladimir Sementsov-Ogievskiy, stefanha, amit Shah,
	den

* John Snow (jsnow@redhat.com) wrote:
> 
> 
> On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote:
> >Because of wrong return value of .save_live_pending() in
> >block-migration, migration finishes before the whole disk
> >is transferred. Such situation occures when the migration
> (occurs)
> >process is fast enouth, for example when source and dest
> (enough)
> >are on the same host.
> >
> >If in the bulk phase we return something < max_size, we will skip
> >transferring the tail of the device. Currently we have "set pending to
> >BLOCK_SIZE if it is zero" for bulk phase, but there no guarantee, that
> >it will be < max_size.
> >
> >True approach is to return, for example, max_size+1 when we are in the
> >bulk phase.
> >
> >Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> >---
> >  block-migration.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> >diff --git a/block-migration.c b/block-migration.c
> >index 6f3aa18..8df30d9 100644
> >--- a/block-migration.c
> >+++ b/block-migration.c
> >@@ -757,8 +757,8 @@ static uint64_t block_save_pending(QEMUFile *f, void *opaque, uint64_t max_size)
> >                         block_mig_state.read_done * BLOCK_SIZE;
> >
> >      /* Report at least one block pending during bulk phase */
> >-    if (pending == 0 && !block_mig_state.bulk_completed) {
> >-        pending = BLOCK_SIZE;
> >+    if (pending <= max_size && !block_mig_state.bulk_completed) {
> >+        pending = max_size + BLOCK_SIZE;
> >      }
> >      blk_mig_unlock();
> >      qemu_mutex_unlock_iothread();
> >
> 
> This looks sane to me, but I am CC'ing the Migration maintainers to give it
> a proper look.
> 
> Looks to me like this is to prevent this from happening, in
> migration/migration.c:
> 
>             pending_size = qemu_savevm_state_pending(s->file, max_size);
>             trace_migrate_pending(pending_size, max_size);
>             if (pending_size && pending_size >= max_size) {
> 
> If we fail that condition, we omit data because we do not call
> qemu_savevm_state_iterate.

Yes, I think this is safe, it doesn't feel nice, but it's safe, and I don't
know of a nicer way unless that is it could calculate the actual amount of
pending data.
The other thing that would feel safe would be to make the _complete handler
check to see if the bulk stage has completed and if it hasn't complete it.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration
  2015-01-08 22:36   ` Paolo Bonzini
  2015-01-08 22:45     ` Eric Blake
@ 2015-01-12 14:20     ` Vladimir Sementsov-Ogievskiy
  2015-01-12 14:42       ` Paolo Bonzini
  1 sibling, 1 reply; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-01-12 14:20 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, den, jsnow, stefanha


Best regards,
Vladimir

On 09.01.2015 01:36, Paolo Bonzini wrote:
> The bitmaps are transmitted many times in their entirety, but only the 
> last copy actually means something. The others are lost. This means 
> you should use the non-live interface (register_savevm). This will 
> simplify the code a lot.

But what if the dirty bitmap is really big?

For example, for 5 Pb drive the bitmap with granularity 65536 will be of 
2 Gb size. So, shouldn't it be migrated iteratively for live migration?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration
  2015-01-12 14:20     ` Vladimir Sementsov-Ogievskiy
@ 2015-01-12 14:42       ` Paolo Bonzini
  0 siblings, 0 replies; 47+ messages in thread
From: Paolo Bonzini @ 2015-01-12 14:42 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel; +Cc: kwolf, den, jsnow, stefanha



On 12/01/2015 15:20, Vladimir Sementsov-Ogievskiy wrote:
> 
> On 09.01.2015 01:36, Paolo Bonzini wrote:
>> The bitmaps are transmitted many times in their entirety, but only the
>> last copy actually means something. The others are lost. This means
>> you should use the non-live interface (register_savevm). This will
>> simplify the code a lot.
> 
> But what if the dirty bitmap is really big?
> 
> For example, for 5 Pb drive the bitmap with granularity 65536 will be of
> 2 Gb size. So, shouldn't it be migrated iteratively for live migration?

But your code is not attempting to do that.  It is not attempting to
track the dirtiness of the dirty bitmap, so to speak.

For such a huge storage, in any case, I suspect the solution is to not
use QEMU's live operations and, instead, operate at the storage level.

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration Vladimir Sementsov-Ogievskiy
  2015-01-08 22:05   ` John Snow
  2015-01-08 22:36   ` Paolo Bonzini
@ 2015-01-12 16:48   ` Paolo Bonzini
  2015-01-12 17:31     ` John Snow
  2 siblings, 1 reply; 47+ messages in thread
From: Paolo Bonzini @ 2015-01-12 16:48 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel; +Cc: kwolf, den, jsnow, stefanha



On 11/12/2014 15:17, Vladimir Sementsov-Ogievskiy wrote:
> Just migrate parts of dirty bitmaps, corresponding to the block being
> migrated. Also, skip block migration if it is disabled (blk parameter
> of migrate command is false).

So I have misread the patch. blk here:

+static void blk_store_bitmaps(BlkMigBlock *blk)
+{
+    int i;
+    for (i = 0; i < blk->nr_bitmaps; ++i) {
+        bdrv_dbm_store_data(blk->dirty_bitmaps[i].bitmap,
+                            blk->dirty_bitmaps[i].buf,
+                            blk->sector, blk->nr_sectors);
+    }

refers to a BlkMigBlock, not a BlockBackend.

Still, I need an explanation of the operation of this patch.

During the bulk phase you migrate all of the dirty bitmap.  This is okay.

Then, during the iterative phase you mgirate parts of the dirty bitmap
corresponding to sectors that are dirty for block migration.  This means
parts of the dirty bitmap that have become 1.

How do you track which parts of the disk have been acted upon (e.g.
mirrored in the case of the drive-mirror command), so that they have
become 0?

This, strictly speaking, is conservative so it is not incorrect, but it
basically throws away all the work done by the block job during the
block migration---which might take several minutes.  Is a non-live
solution really that bad?  If it is, and it is not feasible to just
migrate the bitmaps non-live, can we do better to migrate bits of the
dirty bitmap that have become 0?

Paolo

> Skipping shared sectors: bitmaps are migrated independently of this,
> just send blk without block data.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>  block-migration.c | 173 +++++++++++++++++++++++++++++++++++++++++++++++-------
>  savevm.c          |   1 +
>  2 files changed, 154 insertions(+), 20 deletions(-)
> 
> diff --git a/block-migration.c b/block-migration.c
> index 908a66d..95d54a1 100644
> --- a/block-migration.c
> +++ b/block-migration.c
> @@ -32,6 +32,8 @@
>  #define BLK_MIG_FLAG_EOS                0x02
>  #define BLK_MIG_FLAG_PROGRESS           0x04
>  #define BLK_MIG_FLAG_ZERO_BLOCK         0x08
> +#define BLK_MIG_FLAG_HAS_BLOCK          0x10
> +#define BLK_MIG_FLAG_HAS_BITMAPS        0x20
>  
>  #define MAX_IS_ALLOCATED_SEARCH 65536
>  
> @@ -51,6 +53,8 @@ typedef struct BlkMigDevState {
>      int shared_base;
>      int64_t total_sectors;
>      QSIMPLEQ_ENTRY(BlkMigDevState) entry;
> +    int nr_bitmaps;
> +    BdrvDirtyBitmap **dirty_bitmaps;
>  
>      /* Only used by migration thread.  Does not need a lock.  */
>      int bulk_completed;
> @@ -64,6 +68,11 @@ typedef struct BlkMigDevState {
>      Error *blocker;
>  } BlkMigDevState;
>  
> +typedef struct BlkMigDirtyBitmap {
> +    BdrvDirtyBitmap *bitmap;
> +    uint8_t *buf;
> +} BlkMigDirtyBitmap;
> +
>  typedef struct BlkMigBlock {
>      /* Only used by migration thread.  */
>      uint8_t *buf;
> @@ -74,6 +83,9 @@ typedef struct BlkMigBlock {
>      QEMUIOVector qiov;
>      BlockAIOCB *aiocb;
>  
> +    int nr_bitmaps;
> +    BlkMigDirtyBitmap *dirty_bitmaps;
> +
>      /* Protected by block migration lock.  */
>      int ret;
>      QSIMPLEQ_ENTRY(BlkMigBlock) entry;
> @@ -83,6 +95,7 @@ typedef struct BlkMigState {
>      /* Written during setup phase.  Can be read without a lock.  */
>      int blk_enable;
>      int shared_base;
> +    int dbm_enable;
>      QSIMPLEQ_HEAD(bmds_list, BlkMigDevState) bmds_list;
>      int64_t total_sector_sum;
>      bool zero_blocks;
> @@ -116,27 +129,63 @@ static void blk_mig_unlock(void)
>  /* Only allocating and initializing structure fields, not copying any data. */
>  
>  static BlkMigBlock *blk_create(BlkMigDevState *bmds, int64_t sector,
> -                                int nr_sectors)
> +                               int nr_sectors, bool only_bitmaps)
>  {
> +    int i;
>      BlkMigBlock *blk = g_new(BlkMigBlock, 1);
> -    blk->buf = g_malloc(BLOCK_SIZE);
> +    blk->buf = only_bitmaps ? NULL : g_malloc(BLOCK_SIZE);
>      blk->bmds = bmds;
>      blk->sector = sector;
>      blk->nr_sectors = nr_sectors;
> +    blk->dirty_bitmaps = NULL;
> +    blk->nr_bitmaps = 0;
>  
>      blk->iov.iov_base = blk->buf;
>      blk->iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE;
>      qemu_iovec_init_external(&blk->qiov, &blk->iov, 1);
>  
> +    if (block_mig_state.dbm_enable) {
> +        BlkMigDirtyBitmap *mig_bm;
> +
> +        blk->nr_bitmaps = bmds->nr_bitmaps;
> +        mig_bm = blk->dirty_bitmaps = g_new(BlkMigDirtyBitmap,
> +                                            bmds->nr_bitmaps);
> +        for (i = 0; i < bmds->nr_bitmaps; ++i) {
> +            BdrvDirtyBitmap *bm = bmds->dirty_bitmaps[i];
> +            mig_bm->buf = g_malloc(bdrv_dbm_data_size(bm, nr_sectors));
> +            mig_bm->bitmap = bm;
> +            mig_bm++;
> +        }
> +    }
> +
>      return blk;
>  }
>  
>  static void blk_free(BlkMigBlock *blk)
>  {
> +    int i;
>      g_free(blk->buf);
> +
> +    if (blk->dirty_bitmaps) {
> +        for (i = 0; i < blk->nr_bitmaps; ++i) {
> +            g_free(blk->dirty_bitmaps[i].buf);
> +        }
> +        g_free(blk->dirty_bitmaps);
> +    }
> +
>      g_free(blk);
>  }
>  
> +static void blk_store_bitmaps(BlkMigBlock *blk)
> +{
> +    int i;
> +    for (i = 0; i < blk->nr_bitmaps; ++i) {
> +        bdrv_dbm_store_data(blk->dirty_bitmaps[i].bitmap,
> +                            blk->dirty_bitmaps[i].buf,
> +                            blk->sector, blk->nr_sectors);
> +    }
> +}
> +
>  /* Must run outside of the iothread lock during the bulk phase,
>   * or the VM will stall.
>   */
> @@ -146,9 +195,15 @@ static void blk_send(QEMUFile *f, BlkMigBlock * blk)
>      int len;
>      uint64_t flags = BLK_MIG_FLAG_DEVICE_BLOCK;
>  
> -    if (block_mig_state.zero_blocks &&
> +    if (block_mig_state.zero_blocks && blk->buf &&
>          buffer_is_zero(blk->buf, BLOCK_SIZE)) {
>          flags |= BLK_MIG_FLAG_ZERO_BLOCK;
> +    } else if (blk->buf) {
> +        flags |= BLK_MIG_FLAG_HAS_BLOCK;
> +    }
> +
> +    if (block_mig_state.dbm_enable) {
> +        flags |= BLK_MIG_FLAG_HAS_BITMAPS;
>      }
>  
>      /* sector number and flags */
> @@ -160,10 +215,27 @@ static void blk_send(QEMUFile *f, BlkMigBlock * blk)
>      qemu_put_byte(f, len);
>      qemu_put_buffer(f, (uint8_t *)bdrv_get_device_name(blk->bmds->bs), len);
>  
> +    /* dirty bitmaps */
> +    if (flags & BLK_MIG_FLAG_HAS_BITMAPS) {
> +        int i;
> +        qemu_put_be32(f, blk->nr_bitmaps);
> +        for (i = 0; i < blk->nr_bitmaps; ++i) {
> +            BdrvDirtyBitmap *bm = blk->dirty_bitmaps[i].bitmap;
> +            int buf_size = bdrv_dbm_data_size(bm, blk->nr_sectors);
> +            const char *name = bdrv_dirty_bitmap_name(bm);
> +            int len = strlen(name);
> +
> +            qemu_put_byte(f, len);
> +            qemu_put_buffer(f, (const uint8_t *)name, len);
> +            qemu_put_buffer(f, blk->dirty_bitmaps[i].buf, buf_size);
> +        }
> +    }
> +
>      /* if a block is zero we need to flush here since the network
>       * bandwidth is now a lot higher than the storage device bandwidth.
> -     * thus if we queue zero blocks we slow down the migration */
> -    if (flags & BLK_MIG_FLAG_ZERO_BLOCK) {
> +     * thus if we queue zero blocks we slow down the migration.
> +     * also, skip writing block when migrate only dirty bitmaps. */
> +    if (!(flags & BLK_MIG_FLAG_HAS_BLOCK)) {
>          qemu_fflush(f);
>          return;
>      }
> @@ -282,13 +354,20 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
>      BlockDriverState *bs = bmds->bs;
>      BlkMigBlock *blk;
>      int nr_sectors;
> +    bool skip_chunk = false;
>  
>      if (bmds->shared_base) {
>          qemu_mutex_lock_iothread();
> -        while (cur_sector < total_sectors &&
> -               !bdrv_is_allocated(bs, cur_sector, MAX_IS_ALLOCATED_SEARCH,
> -                                  &nr_sectors)) {
> -            cur_sector += nr_sectors;
> +        if (block_mig_state.dbm_enable) {
> +            bdrv_is_allocated(bs, cur_sector, BDRV_SECTORS_PER_DIRTY_CHUNK,
> +                              &nr_sectors);
> +            skip_chunk = nr_sectors >= BDRV_SECTORS_PER_DIRTY_CHUNK;
> +        } else {
> +            while (cur_sector < total_sectors &&
> +                   !bdrv_is_allocated(bs, cur_sector, MAX_IS_ALLOCATED_SEARCH,
> +                                      &nr_sectors)) {
> +                cur_sector += nr_sectors;
> +            }
>          }
>          qemu_mutex_unlock_iothread();
>      }
> @@ -309,7 +388,8 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
>          nr_sectors = total_sectors - cur_sector;
>      }
>  
> -    blk = blk_create(bmds, cur_sector, nr_sectors);
> +    blk = blk_create(bmds, cur_sector, nr_sectors,
> +                     skip_chunk || !block_mig_state.blk_enable);
>  
>      blk_mig_lock();
>      block_mig_state.submitted++;
> @@ -317,8 +397,16 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
>  
>      bdrv_reset_dirty_bitmap(bs, bmds->dirty_bitmap, cur_sector, nr_sectors);
>  
> -    blk->aiocb = bdrv_aio_readv(bs, cur_sector, &blk->qiov,
> -                                nr_sectors, blk_mig_read_cb, blk);
> +    if (block_mig_state.dbm_enable) {
> +        blk_store_bitmaps(blk);
> +    }
> +
> +    if (block_mig_state.blk_enable && !skip_chunk) {
> +        blk->aiocb = bdrv_aio_readv(bs, cur_sector, &blk->qiov,
> +                                    nr_sectors, blk_mig_read_cb, blk);
> +    } else if (block_mig_state.dbm_enable) {
> +        blk_mig_read_cb(blk, 0);
> +    }
>  
>      bmds->cur_sector = cur_sector + nr_sectors;
>      return (bmds->cur_sector >= total_sectors);
> @@ -403,6 +491,8 @@ static void init_blk_migration(QEMUFile *f)
>              DPRINTF("Start full migration for %s\n", bdrv_get_device_name(bs));
>          }
>  
> +        bmds->dirty_bitmaps = bdrv_dbm_find_all_named(bs, &bmds->nr_bitmaps);
> +
>          QSIMPLEQ_INSERT_TAIL(&block_mig_state.bmds_list, bmds, entry);
>      }
>  }
> @@ -481,20 +571,32 @@ static int mig_save_device_dirty(QEMUFile *f, BlkMigDevState *bmds,
>              } else {
>                  nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
>              }
> -            blk = blk_create(bmds, sector, nr_sectors);
> +            blk = blk_create(bmds, sector, nr_sectors,
> +                             !block_mig_state.blk_enable);
> +
> +            if (block_mig_state.dbm_enable) {
> +                blk_store_bitmaps(blk);
> +            }
>  
>              if (is_async) {
> -                blk->aiocb = bdrv_aio_readv(bmds->bs, sector, &blk->qiov,
> -                                            nr_sectors, blk_mig_read_cb, blk);
> +                if (block_mig_state.blk_enable) {
> +                    blk->aiocb = bdrv_aio_readv(bmds->bs, sector, &blk->qiov,
> +                                                nr_sectors, blk_mig_read_cb,
> +                                                blk);
> +                } else {
> +                    blk_mig_read_cb(blk, 0);
> +                }
>  
>                  blk_mig_lock();
>                  block_mig_state.submitted++;
>                  bmds_set_aio_inflight(bmds, sector, nr_sectors, 1);
>                  blk_mig_unlock();
>              } else {
> -                ret = bdrv_read(bmds->bs, sector, blk->buf, nr_sectors);
> -                if (ret < 0) {
> -                    goto error;
> +                if (block_mig_state.blk_enable) {
> +                    ret = bdrv_read(bmds->bs, sector, blk->buf, nr_sectors);
> +                    if (ret < 0) {
> +                        goto error;
> +                    }
>                  }
>                  blk_send(f, blk);
>  
> @@ -817,10 +919,39 @@ static int block_load(QEMUFile *f, void *opaque, int version_id)
>                  nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
>              }
>  
> +            /* load dirty bitmaps */
> +            if (flags & BLK_MIG_FLAG_HAS_BITMAPS) {
> +                int i, nr_bitmaps = qemu_get_be32(f);
> +
> +                for (i = 0; i < nr_bitmaps; ++i) {
> +                    char bitmap_name[256];
> +                    BdrvDirtyBitmap *bitmap;
> +                    int buf_size, len;
> +
> +                    len = qemu_get_byte(f);
> +                    qemu_get_buffer(f, (uint8_t *)bitmap_name, len);
> +                    bitmap_name[len] = '\0';
> +
> +                    bitmap = bdrv_find_dirty_bitmap(bs, bitmap_name);
> +                    if (!bitmap) {
> +                        fprintf(stderr, "Error unknown dirty bitmap\
> +                                %s for block device %s\n",
> +                                bitmap_name, device_name);
> +                        return -EINVAL;
> +                    }
> +
> +                    buf_size = bdrv_dbm_data_size(bitmap, nr_sectors);
> +                    buf = g_malloc(buf_size);
> +                    qemu_get_buffer(f, buf, buf_size);
> +                    bdrv_dbm_restore_data(bitmap, buf, addr, nr_sectors);
> +                    g_free(buf);
> +                }
> +            }
> +
>              if (flags & BLK_MIG_FLAG_ZERO_BLOCK) {
>                  ret = bdrv_write_zeroes(bs, addr, nr_sectors,
>                                          BDRV_REQ_MAY_UNMAP);
> -            } else {
> +            } else if (flags & BLK_MIG_FLAG_HAS_BLOCK) {
>                  buf = g_malloc(BLOCK_SIZE);
>                  qemu_get_buffer(f, buf, BLOCK_SIZE);
>                  ret = bdrv_write(bs, addr, buf, nr_sectors);
> @@ -855,6 +986,7 @@ static void block_set_params(const MigrationParams *params, void *opaque)
>  {
>      block_mig_state.blk_enable = params->blk;
>      block_mig_state.shared_base = params->shared;
> +    block_mig_state.dbm_enable = params->dirty;
>  
>      /* shared base means that blk_enable = 1 */
>      block_mig_state.blk_enable |= params->shared;
> @@ -862,7 +994,8 @@ static void block_set_params(const MigrationParams *params, void *opaque)
>  
>  static bool block_is_active(void *opaque)
>  {
> -    return block_mig_state.blk_enable == 1;
> +    return block_mig_state.blk_enable == 1 ||
> +           block_mig_state.dbm_enable == 1;
>  }
>  
>  static SaveVMHandlers savevm_block_handlers = {
> diff --git a/savevm.c b/savevm.c
> index a598d1d..1299faa 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -983,6 +983,7 @@ int qemu_loadvm_state(QEMUFile *f)
>          }
>      }
>  
> +    bdrv_dbm_restore_finish();
>      cpu_synchronize_all_post_init();
>  
>      ret = 0;
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration
  2015-01-12 16:48   ` Paolo Bonzini
@ 2015-01-12 17:31     ` John Snow
  2015-01-12 19:09       ` Paolo Bonzini
  0 siblings, 1 reply; 47+ messages in thread
From: John Snow @ 2015-01-12 17:31 UTC (permalink / raw)
  To: Paolo Bonzini, Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, den, stefanha



On 01/12/2015 11:48 AM, Paolo Bonzini wrote:
>
>
> On 11/12/2014 15:17, Vladimir Sementsov-Ogievskiy wrote:
>> Just migrate parts of dirty bitmaps, corresponding to the block being
>> migrated. Also, skip block migration if it is disabled (blk parameter
>> of migrate command is false).
>
> So I have misread the patch. blk here:
>
> +static void blk_store_bitmaps(BlkMigBlock *blk)
> +{
> +    int i;
> +    for (i = 0; i < blk->nr_bitmaps; ++i) {
> +        bdrv_dbm_store_data(blk->dirty_bitmaps[i].bitmap,
> +                            blk->dirty_bitmaps[i].buf,
> +                            blk->sector, blk->nr_sectors);
> +    }
>
> refers to a BlkMigBlock, not a BlockBackend.
>
> Still, I need an explanation of the operation of this patch.
>
> During the bulk phase you migrate all of the dirty bitmap.  This is okay.
>
> Then, during the iterative phase you mgirate parts of the dirty bitmap
> corresponding to sectors that are dirty for block migration.  This means
> parts of the dirty bitmap that have become 1.
>
> How do you track which parts of the disk have been acted upon (e.g.
> mirrored in the case of the drive-mirror command), so that they have
> become 0?

Hm... so your question here is, "What happens if the named bitmap you 
are transferring  gets reset to 0 during migration? Will anybody track 
this change?"

As far as I know, the only "consumer" of _named_ BdrvDirtyBitmaps is 
block/backup.c, and we do not support it for mirror (yet?)

So can a block-backup job be running WHILE we migrate? Otherwise, I 
don't think this situation (untracked bitmap resets) will arise here.

> This, strictly speaking, is conservative so it is not incorrect, but it
> basically throws away all the work done by the block job during the
> block migration---which might take several minutes.  Is a non-live
> solution really that bad?  If it is, and it is not feasible to just
> migrate the bitmaps non-live, can we do better to migrate bits of the
> dirty bitmap that have become 0?
>
> Paolo
>
>> Skipping shared sectors: bitmaps are migrated independently of this,
>> just send blk without block data.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>> ---
>>   block-migration.c | 173 +++++++++++++++++++++++++++++++++++++++++++++++-------
>>   savevm.c          |   1 +
>>   2 files changed, 154 insertions(+), 20 deletions(-)
>>
>> diff --git a/block-migration.c b/block-migration.c
>> index 908a66d..95d54a1 100644
>> --- a/block-migration.c
>> +++ b/block-migration.c
>> @@ -32,6 +32,8 @@
>>   #define BLK_MIG_FLAG_EOS                0x02
>>   #define BLK_MIG_FLAG_PROGRESS           0x04
>>   #define BLK_MIG_FLAG_ZERO_BLOCK         0x08
>> +#define BLK_MIG_FLAG_HAS_BLOCK          0x10
>> +#define BLK_MIG_FLAG_HAS_BITMAPS        0x20
>>
>>   #define MAX_IS_ALLOCATED_SEARCH 65536
>>
>> @@ -51,6 +53,8 @@ typedef struct BlkMigDevState {
>>       int shared_base;
>>       int64_t total_sectors;
>>       QSIMPLEQ_ENTRY(BlkMigDevState) entry;
>> +    int nr_bitmaps;
>> +    BdrvDirtyBitmap **dirty_bitmaps;
>>
>>       /* Only used by migration thread.  Does not need a lock.  */
>>       int bulk_completed;
>> @@ -64,6 +68,11 @@ typedef struct BlkMigDevState {
>>       Error *blocker;
>>   } BlkMigDevState;
>>
>> +typedef struct BlkMigDirtyBitmap {
>> +    BdrvDirtyBitmap *bitmap;
>> +    uint8_t *buf;
>> +} BlkMigDirtyBitmap;
>> +
>>   typedef struct BlkMigBlock {
>>       /* Only used by migration thread.  */
>>       uint8_t *buf;
>> @@ -74,6 +83,9 @@ typedef struct BlkMigBlock {
>>       QEMUIOVector qiov;
>>       BlockAIOCB *aiocb;
>>
>> +    int nr_bitmaps;
>> +    BlkMigDirtyBitmap *dirty_bitmaps;
>> +
>>       /* Protected by block migration lock.  */
>>       int ret;
>>       QSIMPLEQ_ENTRY(BlkMigBlock) entry;
>> @@ -83,6 +95,7 @@ typedef struct BlkMigState {
>>       /* Written during setup phase.  Can be read without a lock.  */
>>       int blk_enable;
>>       int shared_base;
>> +    int dbm_enable;
>>       QSIMPLEQ_HEAD(bmds_list, BlkMigDevState) bmds_list;
>>       int64_t total_sector_sum;
>>       bool zero_blocks;
>> @@ -116,27 +129,63 @@ static void blk_mig_unlock(void)
>>   /* Only allocating and initializing structure fields, not copying any data. */
>>
>>   static BlkMigBlock *blk_create(BlkMigDevState *bmds, int64_t sector,
>> -                                int nr_sectors)
>> +                               int nr_sectors, bool only_bitmaps)
>>   {
>> +    int i;
>>       BlkMigBlock *blk = g_new(BlkMigBlock, 1);
>> -    blk->buf = g_malloc(BLOCK_SIZE);
>> +    blk->buf = only_bitmaps ? NULL : g_malloc(BLOCK_SIZE);
>>       blk->bmds = bmds;
>>       blk->sector = sector;
>>       blk->nr_sectors = nr_sectors;
>> +    blk->dirty_bitmaps = NULL;
>> +    blk->nr_bitmaps = 0;
>>
>>       blk->iov.iov_base = blk->buf;
>>       blk->iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE;
>>       qemu_iovec_init_external(&blk->qiov, &blk->iov, 1);
>>
>> +    if (block_mig_state.dbm_enable) {
>> +        BlkMigDirtyBitmap *mig_bm;
>> +
>> +        blk->nr_bitmaps = bmds->nr_bitmaps;
>> +        mig_bm = blk->dirty_bitmaps = g_new(BlkMigDirtyBitmap,
>> +                                            bmds->nr_bitmaps);
>> +        for (i = 0; i < bmds->nr_bitmaps; ++i) {
>> +            BdrvDirtyBitmap *bm = bmds->dirty_bitmaps[i];
>> +            mig_bm->buf = g_malloc(bdrv_dbm_data_size(bm, nr_sectors));
>> +            mig_bm->bitmap = bm;
>> +            mig_bm++;
>> +        }
>> +    }
>> +
>>       return blk;
>>   }
>>
>>   static void blk_free(BlkMigBlock *blk)
>>   {
>> +    int i;
>>       g_free(blk->buf);
>> +
>> +    if (blk->dirty_bitmaps) {
>> +        for (i = 0; i < blk->nr_bitmaps; ++i) {
>> +            g_free(blk->dirty_bitmaps[i].buf);
>> +        }
>> +        g_free(blk->dirty_bitmaps);
>> +    }
>> +
>>       g_free(blk);
>>   }
>>
>> +static void blk_store_bitmaps(BlkMigBlock *blk)
>> +{
>> +    int i;
>> +    for (i = 0; i < blk->nr_bitmaps; ++i) {
>> +        bdrv_dbm_store_data(blk->dirty_bitmaps[i].bitmap,
>> +                            blk->dirty_bitmaps[i].buf,
>> +                            blk->sector, blk->nr_sectors);
>> +    }
>> +}
>> +
>>   /* Must run outside of the iothread lock during the bulk phase,
>>    * or the VM will stall.
>>    */
>> @@ -146,9 +195,15 @@ static void blk_send(QEMUFile *f, BlkMigBlock * blk)
>>       int len;
>>       uint64_t flags = BLK_MIG_FLAG_DEVICE_BLOCK;
>>
>> -    if (block_mig_state.zero_blocks &&
>> +    if (block_mig_state.zero_blocks && blk->buf &&
>>           buffer_is_zero(blk->buf, BLOCK_SIZE)) {
>>           flags |= BLK_MIG_FLAG_ZERO_BLOCK;
>> +    } else if (blk->buf) {
>> +        flags |= BLK_MIG_FLAG_HAS_BLOCK;
>> +    }
>> +
>> +    if (block_mig_state.dbm_enable) {
>> +        flags |= BLK_MIG_FLAG_HAS_BITMAPS;
>>       }
>>
>>       /* sector number and flags */
>> @@ -160,10 +215,27 @@ static void blk_send(QEMUFile *f, BlkMigBlock * blk)
>>       qemu_put_byte(f, len);
>>       qemu_put_buffer(f, (uint8_t *)bdrv_get_device_name(blk->bmds->bs), len);
>>
>> +    /* dirty bitmaps */
>> +    if (flags & BLK_MIG_FLAG_HAS_BITMAPS) {
>> +        int i;
>> +        qemu_put_be32(f, blk->nr_bitmaps);
>> +        for (i = 0; i < blk->nr_bitmaps; ++i) {
>> +            BdrvDirtyBitmap *bm = blk->dirty_bitmaps[i].bitmap;
>> +            int buf_size = bdrv_dbm_data_size(bm, blk->nr_sectors);
>> +            const char *name = bdrv_dirty_bitmap_name(bm);
>> +            int len = strlen(name);
>> +
>> +            qemu_put_byte(f, len);
>> +            qemu_put_buffer(f, (const uint8_t *)name, len);
>> +            qemu_put_buffer(f, blk->dirty_bitmaps[i].buf, buf_size);
>> +        }
>> +    }
>> +
>>       /* if a block is zero we need to flush here since the network
>>        * bandwidth is now a lot higher than the storage device bandwidth.
>> -     * thus if we queue zero blocks we slow down the migration */
>> -    if (flags & BLK_MIG_FLAG_ZERO_BLOCK) {
>> +     * thus if we queue zero blocks we slow down the migration.
>> +     * also, skip writing block when migrate only dirty bitmaps. */
>> +    if (!(flags & BLK_MIG_FLAG_HAS_BLOCK)) {
>>           qemu_fflush(f);
>>           return;
>>       }
>> @@ -282,13 +354,20 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
>>       BlockDriverState *bs = bmds->bs;
>>       BlkMigBlock *blk;
>>       int nr_sectors;
>> +    bool skip_chunk = false;
>>
>>       if (bmds->shared_base) {
>>           qemu_mutex_lock_iothread();
>> -        while (cur_sector < total_sectors &&
>> -               !bdrv_is_allocated(bs, cur_sector, MAX_IS_ALLOCATED_SEARCH,
>> -                                  &nr_sectors)) {
>> -            cur_sector += nr_sectors;
>> +        if (block_mig_state.dbm_enable) {
>> +            bdrv_is_allocated(bs, cur_sector, BDRV_SECTORS_PER_DIRTY_CHUNK,
>> +                              &nr_sectors);
>> +            skip_chunk = nr_sectors >= BDRV_SECTORS_PER_DIRTY_CHUNK;
>> +        } else {
>> +            while (cur_sector < total_sectors &&
>> +                   !bdrv_is_allocated(bs, cur_sector, MAX_IS_ALLOCATED_SEARCH,
>> +                                      &nr_sectors)) {
>> +                cur_sector += nr_sectors;
>> +            }
>>           }
>>           qemu_mutex_unlock_iothread();
>>       }
>> @@ -309,7 +388,8 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
>>           nr_sectors = total_sectors - cur_sector;
>>       }
>>
>> -    blk = blk_create(bmds, cur_sector, nr_sectors);
>> +    blk = blk_create(bmds, cur_sector, nr_sectors,
>> +                     skip_chunk || !block_mig_state.blk_enable);
>>
>>       blk_mig_lock();
>>       block_mig_state.submitted++;
>> @@ -317,8 +397,16 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
>>
>>       bdrv_reset_dirty_bitmap(bs, bmds->dirty_bitmap, cur_sector, nr_sectors);
>>
>> -    blk->aiocb = bdrv_aio_readv(bs, cur_sector, &blk->qiov,
>> -                                nr_sectors, blk_mig_read_cb, blk);
>> +    if (block_mig_state.dbm_enable) {
>> +        blk_store_bitmaps(blk);
>> +    }
>> +
>> +    if (block_mig_state.blk_enable && !skip_chunk) {
>> +        blk->aiocb = bdrv_aio_readv(bs, cur_sector, &blk->qiov,
>> +                                    nr_sectors, blk_mig_read_cb, blk);
>> +    } else if (block_mig_state.dbm_enable) {
>> +        blk_mig_read_cb(blk, 0);
>> +    }
>>
>>       bmds->cur_sector = cur_sector + nr_sectors;
>>       return (bmds->cur_sector >= total_sectors);
>> @@ -403,6 +491,8 @@ static void init_blk_migration(QEMUFile *f)
>>               DPRINTF("Start full migration for %s\n", bdrv_get_device_name(bs));
>>           }
>>
>> +        bmds->dirty_bitmaps = bdrv_dbm_find_all_named(bs, &bmds->nr_bitmaps);
>> +
>>           QSIMPLEQ_INSERT_TAIL(&block_mig_state.bmds_list, bmds, entry);
>>       }
>>   }
>> @@ -481,20 +571,32 @@ static int mig_save_device_dirty(QEMUFile *f, BlkMigDevState *bmds,
>>               } else {
>>                   nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
>>               }
>> -            blk = blk_create(bmds, sector, nr_sectors);
>> +            blk = blk_create(bmds, sector, nr_sectors,
>> +                             !block_mig_state.blk_enable);
>> +
>> +            if (block_mig_state.dbm_enable) {
>> +                blk_store_bitmaps(blk);
>> +            }
>>
>>               if (is_async) {
>> -                blk->aiocb = bdrv_aio_readv(bmds->bs, sector, &blk->qiov,
>> -                                            nr_sectors, blk_mig_read_cb, blk);
>> +                if (block_mig_state.blk_enable) {
>> +                    blk->aiocb = bdrv_aio_readv(bmds->bs, sector, &blk->qiov,
>> +                                                nr_sectors, blk_mig_read_cb,
>> +                                                blk);
>> +                } else {
>> +                    blk_mig_read_cb(blk, 0);
>> +                }
>>
>>                   blk_mig_lock();
>>                   block_mig_state.submitted++;
>>                   bmds_set_aio_inflight(bmds, sector, nr_sectors, 1);
>>                   blk_mig_unlock();
>>               } else {
>> -                ret = bdrv_read(bmds->bs, sector, blk->buf, nr_sectors);
>> -                if (ret < 0) {
>> -                    goto error;
>> +                if (block_mig_state.blk_enable) {
>> +                    ret = bdrv_read(bmds->bs, sector, blk->buf, nr_sectors);
>> +                    if (ret < 0) {
>> +                        goto error;
>> +                    }
>>                   }
>>                   blk_send(f, blk);
>>
>> @@ -817,10 +919,39 @@ static int block_load(QEMUFile *f, void *opaque, int version_id)
>>                   nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
>>               }
>>
>> +            /* load dirty bitmaps */
>> +            if (flags & BLK_MIG_FLAG_HAS_BITMAPS) {
>> +                int i, nr_bitmaps = qemu_get_be32(f);
>> +
>> +                for (i = 0; i < nr_bitmaps; ++i) {
>> +                    char bitmap_name[256];
>> +                    BdrvDirtyBitmap *bitmap;
>> +                    int buf_size, len;
>> +
>> +                    len = qemu_get_byte(f);
>> +                    qemu_get_buffer(f, (uint8_t *)bitmap_name, len);
>> +                    bitmap_name[len] = '\0';
>> +
>> +                    bitmap = bdrv_find_dirty_bitmap(bs, bitmap_name);
>> +                    if (!bitmap) {
>> +                        fprintf(stderr, "Error unknown dirty bitmap\
>> +                                %s for block device %s\n",
>> +                                bitmap_name, device_name);
>> +                        return -EINVAL;
>> +                    }
>> +
>> +                    buf_size = bdrv_dbm_data_size(bitmap, nr_sectors);
>> +                    buf = g_malloc(buf_size);
>> +                    qemu_get_buffer(f, buf, buf_size);
>> +                    bdrv_dbm_restore_data(bitmap, buf, addr, nr_sectors);
>> +                    g_free(buf);
>> +                }
>> +            }
>> +
>>               if (flags & BLK_MIG_FLAG_ZERO_BLOCK) {
>>                   ret = bdrv_write_zeroes(bs, addr, nr_sectors,
>>                                           BDRV_REQ_MAY_UNMAP);
>> -            } else {
>> +            } else if (flags & BLK_MIG_FLAG_HAS_BLOCK) {
>>                   buf = g_malloc(BLOCK_SIZE);
>>                   qemu_get_buffer(f, buf, BLOCK_SIZE);
>>                   ret = bdrv_write(bs, addr, buf, nr_sectors);
>> @@ -855,6 +986,7 @@ static void block_set_params(const MigrationParams *params, void *opaque)
>>   {
>>       block_mig_state.blk_enable = params->blk;
>>       block_mig_state.shared_base = params->shared;
>> +    block_mig_state.dbm_enable = params->dirty;
>>
>>       /* shared base means that blk_enable = 1 */
>>       block_mig_state.blk_enable |= params->shared;
>> @@ -862,7 +994,8 @@ static void block_set_params(const MigrationParams *params, void *opaque)
>>
>>   static bool block_is_active(void *opaque)
>>   {
>> -    return block_mig_state.blk_enable == 1;
>> +    return block_mig_state.blk_enable == 1 ||
>> +           block_mig_state.dbm_enable == 1;
>>   }
>>
>>   static SaveVMHandlers savevm_block_handlers = {
>> diff --git a/savevm.c b/savevm.c
>> index a598d1d..1299faa 100644
>> --- a/savevm.c
>> +++ b/savevm.c
>> @@ -983,6 +983,7 @@ int qemu_loadvm_state(QEMUFile *f)
>>           }
>>       }
>>
>> +    bdrv_dbm_restore_finish();
>>       cpu_synchronize_all_post_init();
>>
>>       ret = 0;
>>
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration
  2015-01-12 17:31     ` John Snow
@ 2015-01-12 19:09       ` Paolo Bonzini
  2015-01-13  9:16         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 47+ messages in thread
From: Paolo Bonzini @ 2015-01-12 19:09 UTC (permalink / raw)
  To: John Snow, Vladimir Sementsov-Ogievskiy, qemu-devel; +Cc: kwolf, den, stefanha



On 12/01/2015 18:31, John Snow wrote:
>> How do you track which parts of the disk have been acted upon (e.g.
>> mirrored in the case of the drive-mirror command), so that they have
>> become 0?
> 
> Hm... so your question here is, "What happens if the named bitmap you
> are transferring  gets reset to 0 during migration? Will anybody track
> this change?"
> 
> As far as I know, the only "consumer" of _named_ BdrvDirtyBitmaps is
> block/backup.c, and we do not support it for mirror (yet?)

Right.

> So can a block-backup job be running WHILE we migrate?

I think so.

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration
  2015-01-12 19:09       ` Paolo Bonzini
@ 2015-01-13  9:16         ` Vladimir Sementsov-Ogievskiy
  2015-01-13 16:35           ` John Snow
  0 siblings, 1 reply; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-01-13  9:16 UTC (permalink / raw)
  To: Paolo Bonzini, John Snow, qemu-devel; +Cc: kwolf, den, stefanha


On 12.01.2015 17:42, Paolo Bonzini wrote:
> On 12/01/2015 15:20, Vladimir Sementsov-Ogievskiy wrote:
>> On 09.01.2015 01:36, Paolo Bonzini wrote:
>>> The bitmaps are transmitted many times in their entirety, but only the
>>> last copy actually means something. The others are lost. This means
>>> you should use the non-live interface (register_savevm). This will
>>> simplify the code a lot.
>> But what if the dirty bitmap is really big?
>>
>> For example, for 5 Pb drive the bitmap with granularity 65536 will be of
>> 2 Gb size. So, shouldn't it be migrated iteratively for live migration?
> But your code is not attempting to do that.  It is not attempting to
> track the dirtiness of the dirty bitmap, so to speak.
>
> For such a huge storage, in any case, I suspect the solution is to not
> use QEMU's live operations and, instead, operate at the storage level.
>
> Paolo

On 12.01.2015 22:09, Paolo Bonzini wrote:
> On 12/01/2015 18:31, John Snow wrote:
>>> How do you track which parts of the disk have been acted upon (e.g.
>>> mirrored in the case of the drive-mirror command), so that they have
>>> become 0?
>> Hm... so your question here is, "What happens if the named bitmap you
>> are transferring  gets reset to 0 during migration? Will anybody track
>> this change?"
>>
>> As far as I know, the only "consumer" of _named_ BdrvDirtyBitmaps is
>> block/backup.c, and we do not support it for mirror (yet?)
> Right.
>
>> So can a block-backup job be running WHILE we migrate?
> I think so.
>
> Paolo

There is no dirty bitmap for dirty bitmap in my solution, actually block 
migration dirty bitmap is used for this. It provides tracking of setting 
bitmap bits to 1 (when corresponding blocks are written), and if the 
bitmap changes only is this way (no other block-jobs are working with 
the migrated bitmap) - then the bitmap will be migrated iteratively as 
it is.

Such approach doesn't provide tracking of resetting the bitmap to 0, and 
such changes are not migrated. So if the bitmap is in use by any 
block-job, it may have additional set bits after migration.

If we want to migrate more precisely we need separate dirty bitmap for 
every migrated bitmap. And tracking of this dirty-dirty bitmap should be 
injected into BdrvDirtyBitmap interface.

But do we really need precise migration of the bitmap being in use by 
some block-job (backup for example)? In this case we should "migrate" 
the backup process to, migrate working block-job.. Is it possible and is 
it planned to be implemented?

Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] hbitmap: store / restore
  2015-01-08 21:21   ` John Snow
  2015-01-08 21:37     ` Paolo Bonzini
@ 2015-01-13 12:59     ` Vladimir Sementsov-Ogievskiy
  2015-01-13 17:08       ` John Snow
  1 sibling, 1 reply; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-01-13 12:59 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: kwolf, den, stefanha, pbonzini >> Paolo Bonzini


Best regards,
Vladimir

On 09.01.2015 00:21, John Snow wrote:
>
>
> On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Functions to store / restore HBitmap. HBitmap should be saved to linear
>> bitmap format independently of endianess.
>>
>> Because of restoring in several steps, every step writes only the last
>> level of the bitmap. All other levels are restored by
>> hbitmap_restore_finish as a last step of restoring. So, HBitmap is
>> inconsistent while restoring.
.....
>
> I guess by nature of how bitmap migration has been implemented, we 
> cannot help but restore parts of the bitmap piece by piece, which 
> requires us to force the bitmap into an inconsistent state.
>
> Yuck.
>
> It might be "nice" to turn on a disable bit inside the hbitmap that 
> disallows its use until it is made consistent again, but I don't know 
> what the performance hit of the extra conditionals everywhere would be 
> like.
>

Another approach is to restore the bitmap after each piece set. It is a 
bit slower of course but the bitmap will be consistent after 
hbitmap_restore_data()

>> +/**
>> + * hbitmap_restore_finish
>> + * @hb: HBitmap to operate on.
>> + *
>> + * Repair HBitmap after calling hbitmap_restore_data. Actuall all 
>> HBitmap
>> + * layers are restore here.
>> + */
>> +void hbitmap_restore_finish(HBitmap *hb);
>> +
>> +/**
>>    * hbitmap_free:
>>    * @hb: HBitmap to operate on.
>>    *
>
> These are just biased opinions:
>
> - It might be nice to name the store/restore functions "serialize" and 
> "deserialize," to describe exactly what they are doing.
>
> - I might refer to "restore_finish" as "post_load" or "post_restore" 
> or something similar to mimic how device migration functions are 
> named. I think hbitmap_restore_data_finalize would also be fine; the 
> key part here is clearly naming it relative to "restore_data."
>

Hmm. Ok, what about the following set:
     hbitmap_serialize()
     hbitmap_deserialize_part()
     hbitmap_deserialize_finish()

>> diff --git a/util/hbitmap.c b/util/hbitmap.c
>> index 8aa7406..ac0323f 100644
>> --- a/util/hbitmap.c
>> +++ b/util/hbitmap.c
>> @@ -362,6 +362,90 @@ bool hbitmap_get(const HBitmap *hb, uint64_t item)
>>       return (hb->levels[HBITMAP_LEVELS - 1][pos >> BITS_PER_LEVEL] & 
>> bit) != 0;
>>   }
>>
>> +uint64_t hbitmap_data_size(const HBitmap *hb, uint64_t count)
>> +{
>> +    uint64_t size;
>> +
>> +    if (count == 0) {
>> +        return 0;
>> +    }
>> +
>> +    size = (((count - 1) >> hb->granularity) >> BITS_PER_LEVEL) + 1;
>> +
>> +    return size * sizeof(unsigned long);
>> +}
>> +
>
> This seems flawed to me: number of bits without an offset can't be 
> mapped to a number of real bytes, because "two bits" may or may not 
> cross a granularity boundary, depending on *WHERE* you start counting.
>
> e.g.
>
> granularity = 1 (i.e. 2^1 = 2 virtual bits per 1 real bit)
> virtual: 001100
> real:     0 1 0
>
> The amount of space required to hold "two bits" here could be as 
> little as one bit, if the offset is k={0,2,4} but it could be as much 
> as two bits if the offset is k={1,3}.
>
> You may never use the function in this way, but mapping virtual bits 
> to an implementation byte-size seems like it is inviting an 
> inconsistency.
I dislike this function too.. But unfortunately we need the size in 
bytes used for serialization.

Hmm. Ok, without loss of generality, let offset be less than 
granularity. The precise formula should look like:

size = (((offset+count-1) >> hb->granularity) >> BITS_PER_LEVEL);

So,
size = ((((1 << hb->granularity) + count - 2) >> hb->granularity) >> 
BITS_PER_LEVEL) + 1;
would be enough in any case. Ok?



>
>> +void hbitmap_store_data(const HBitmap *hb, uint8_t *buf,
>> +                        uint64_t start, uint64_t count)
>> +{
>> +    uint64_t last = start + count - 1;
>> +    unsigned long *out = (unsigned long *)buf;
>> +
>> +    if (count == 0) {
>> +        return;
>> +    }
>> +
>> +    start = (start >> hb->granularity) >> BITS_PER_LEVEL;
>> +    last = (last >> hb->granularity) >> BITS_PER_LEVEL;
>> +    count = last - start + 1;
>> +
>> +#ifdef __BIG_ENDIAN_BITFIELD
>> +    for (i = start; i <= last; ++i) {
>> +        unsigned long el = hb->levels[HBITMAP_LEVELS - 1][i];
>> +        out[i] = (BITS_PER_LONG == 32 ? cpu_to_le32(el) : 
>> cpu_to_le64(el));
>> +    }
>> +#else
>> +    memcpy(out, &hb->levels[HBITMAP_LEVELS - 1][start],
>> +           count * sizeof(unsigned long));
>> +#endif
>> +}
>> +
>
> I suppose the ifdefs are trying to take an optimization by using 
> memcpy if at all possible, without a conversion.
>
> Why are the conversions to little endian, though? Shouldn't we be 
> serializing to a Big Endian format?
As Paolo already explained it is for portability. LE bitmap 
representation is independent of size of bitmap array elements, which 
may be 32bit/64bit, or whatever else with LE format.
>
>> +void hbitmap_restore_data(HBitmap *hb, uint8_t *buf,
>> +                          uint64_t start, uint64_t count)
>> +{
>> +    uint64_t last = start + count - 1;
>> +    unsigned long *in = (unsigned long *)buf;
>> +
>> +    if (count == 0) {
>> +        return;
>> +    }
>> +
>> +    start = (start >> hb->granularity) >> BITS_PER_LEVEL;
>> +    last = (last >> hb->granularity) >> BITS_PER_LEVEL;
>> +    count = last - start + 1;
>> +
>> +#ifdef __BIG_ENDIAN_BITFIELD
>> +    for (i = start; i <= last; ++i) {
>> +        hb->levels[HBITMAP_LEVELS - 1][i] =
>> +            (BITS_PER_LONG == 32 ? be32_to_cpu(in[i]) : 
>> be64_to_cpu(in[i]));
>> +    }
>> +#else
>> +    memcpy(&hb->levels[HBITMAP_LEVELS - 1][start], in,
>> +           count * sizeof(unsigned long));
>> +#endif
>> +}
>> +
>
> ...? We're storing as LE but restoring from BE? I'm confused.
Oh, it's a mistake. Thanks. I've missed it when testing because of 
{#ifdef __BIG_ENDIAN_BITFIELD } branch was never compiled.
>
> I'm also not clear on the __BIG_ENDIAN_BITFIELD macro. Why do we want 
> to pack differently based on how C-bitfields are packed by the 
> compiler? I don't think that has any impact on how longs are stored 
> (always in the host native format.)
>
> I would rather see the serialize/deserialize always convert to and 
> from BE explicitly with cpu_to_be[32|64] and be[32|64]_to_cpu. I think 
> trying to optimize the pack/unpack in the no-op condition with a # 
> define and a memcpy is inviting portability problems.
Ok. I'll leave only branch with for(...) in v2.
>
>> +void hbitmap_restore_finish(HBitmap *bitmap)
>> +{
>> +    int64_t i, size;
>> +    int lev, j;
>> +
>> +    /* restore levels starting from penultimate to zero level, assuming
>> +     * that the last one is ok */
>> +    size = MAX((bitmap->size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 
>> 1);
>> +    for (lev = HBITMAP_LEVELS - 1; lev-- > 0; ) {
>> +        size = MAX((size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 1);
>> +        for (i = 0; i < size; ++i) {
>> +            bitmap->levels[lev][i] = 0;
>> +            for (j = 0; j < BITS_PER_LONG; ++j) {
>> +                if (bitmap->levels[lev+1][(i << BITS_PER_LEVEL) + j]) {
>> +                    bitmap->levels[lev][i] |=  (1 << j);
>> +                }
>> +            }
>
> Just a musing: Should we cache the size of each level? I know we can 
> keep recalculating it, but... why bother? We recalculate it in so many 
> places now.
Interesting point.

However, there is another real bug: there are may be no longs on the 
next level for last bits of the current one. The new version of this 
function will be in v2.
>
>> +        }
>> +    }
>> +}
>> +
>>   void hbitmap_free(HBitmap *hb)
>>   {
>>       unsigned i;
>>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration
  2015-01-13  9:16         ` Vladimir Sementsov-Ogievskiy
@ 2015-01-13 16:35           ` John Snow
  0 siblings, 0 replies; 47+ messages in thread
From: John Snow @ 2015-01-13 16:35 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Paolo Bonzini, stefanha
  Cc: kwolf, den, qemu-devel



On 01/13/2015 04:16 AM, Vladimir Sementsov-Ogievskiy wrote:
>
> On 12.01.2015 17:42, Paolo Bonzini wrote:
>> On 12/01/2015 15:20, Vladimir Sementsov-Ogievskiy wrote:
>>> On 09.01.2015 01:36, Paolo Bonzini wrote:
>>>> The bitmaps are transmitted many times in their entirety, but only the
>>>> last copy actually means something. The others are lost. This means
>>>> you should use the non-live interface (register_savevm). This will
>>>> simplify the code a lot.
>>> But what if the dirty bitmap is really big?
>>>
>>> For example, for 5 Pb drive the bitmap with granularity 65536 will be of
>>> 2 Gb size. So, shouldn't it be migrated iteratively for live migration?
>> But your code is not attempting to do that.  It is not attempting to
>> track the dirtiness of the dirty bitmap, so to speak.
>>
>> For such a huge storage, in any case, I suspect the solution is to not
>> use QEMU's live operations and, instead, operate at the storage level.
>>
>> Paolo
>
> On 12.01.2015 22:09, Paolo Bonzini wrote:
>> On 12/01/2015 18:31, John Snow wrote:
>>>> How do you track which parts of the disk have been acted upon (e.g.
>>>> mirrored in the case of the drive-mirror command), so that they have
>>>> become 0?
>>> Hm... so your question here is, "What happens if the named bitmap you
>>> are transferring  gets reset to 0 during migration? Will anybody track
>>> this change?"
>>>
>>> As far as I know, the only "consumer" of _named_ BdrvDirtyBitmaps is
>>> block/backup.c, and we do not support it for mirror (yet?)
>> Right.
>>
>>> So can a block-backup job be running WHILE we migrate?
>> I think so.
>>
>> Paolo
>
> There is no dirty bitmap for dirty bitmap in my solution, actually block
> migration dirty bitmap is used for this. It provides tracking of setting
> bitmap bits to 1 (when corresponding blocks are written), and if the
> bitmap changes only is this way (no other block-jobs are working with
> the migrated bitmap) - then the bitmap will be migrated iteratively as
> it is.
>
> Such approach doesn't provide tracking of resetting the bitmap to 0, and
> such changes are not migrated. So if the bitmap is in use by any
> block-job, it may have additional set bits after migration.
>
> If we want to migrate more precisely we need separate dirty bitmap for
> every migrated bitmap. And tracking of this dirty-dirty bitmap should be
> injected into BdrvDirtyBitmap interface.
>
> But do we really need precise migration of the bitmap being in use by
> some block-job (backup for example)? In this case we should "migrate"
> the backup process to, migrate working block-job.. Is it possible and is
> it planned to be implemented?
>
> Best regards,
> Vladimir
>
>
>

This is my question, too. The approach presented here will at least work 
correctly, as far as I can tell. The only problem may arise if we try to 
migrate while doing an incremental backup which will lead to too many 
dirty bits being migrated, which is just an inefficiency.

I'm not convinced it's important ...
Stefan, any thoughts?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] hbitmap: store / restore
  2015-01-13 12:59     ` Vladimir Sementsov-Ogievskiy
@ 2015-01-13 17:08       ` John Snow
  2015-01-14 10:29         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 47+ messages in thread
From: John Snow @ 2015-01-13 17:08 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, den, stefanha, pbonzini >> Paolo Bonzini



On 01/13/2015 07:59 AM, Vladimir Sementsov-Ogievskiy wrote:
>
> Best regards,
> Vladimir
>
> On 09.01.2015 00:21, John Snow wrote:
>>
>>
>> On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Functions to store / restore HBitmap. HBitmap should be saved to linear
>>> bitmap format independently of endianess.
>>>
>>> Because of restoring in several steps, every step writes only the last
>>> level of the bitmap. All other levels are restored by
>>> hbitmap_restore_finish as a last step of restoring. So, HBitmap is
>>> inconsistent while restoring.
> .....
>>
>> I guess by nature of how bitmap migration has been implemented, we
>> cannot help but restore parts of the bitmap piece by piece, which
>> requires us to force the bitmap into an inconsistent state.
>>
>> Yuck.
>>
>> It might be "nice" to turn on a disable bit inside the hbitmap that
>> disallows its use until it is made consistent again, but I don't know
>> what the performance hit of the extra conditionals everywhere would be
>> like.
>>
>
> Another approach is to restore the bitmap after each piece set. It is a
> bit slower of course but the bitmap will be consistent after
> hbitmap_restore_data()
>

Yeah, that's not great either. The approach you have already taken is 
probably the best.

>>> +/**
>>> + * hbitmap_restore_finish
>>> + * @hb: HBitmap to operate on.
>>> + *
>>> + * Repair HBitmap after calling hbitmap_restore_data. Actuall all
>>> HBitmap
>>> + * layers are restore here.
>>> + */
>>> +void hbitmap_restore_finish(HBitmap *hb);
>>> +
>>> +/**
>>>    * hbitmap_free:
>>>    * @hb: HBitmap to operate on.
>>>    *
>>
>> These are just biased opinions:
>>
>> - It might be nice to name the store/restore functions "serialize" and
>> "deserialize," to describe exactly what they are doing.
>>
>> - I might refer to "restore_finish" as "post_load" or "post_restore"
>> or something similar to mimic how device migration functions are
>> named. I think hbitmap_restore_data_finalize would also be fine; the
>> key part here is clearly naming it relative to "restore_data."
>>
>
> Hmm. Ok, what about the following set:
>      hbitmap_serialize()
>      hbitmap_deserialize_part()
>      hbitmap_deserialize_finish()
>

Looks good to me!

>>> diff --git a/util/hbitmap.c b/util/hbitmap.c
>>> index 8aa7406..ac0323f 100644
>>> --- a/util/hbitmap.c
>>> +++ b/util/hbitmap.c
>>> @@ -362,6 +362,90 @@ bool hbitmap_get(const HBitmap *hb, uint64_t item)
>>>       return (hb->levels[HBITMAP_LEVELS - 1][pos >> BITS_PER_LEVEL] &
>>> bit) != 0;
>>>   }
>>>
>>> +uint64_t hbitmap_data_size(const HBitmap *hb, uint64_t count)
>>> +{
>>> +    uint64_t size;
>>> +
>>> +    if (count == 0) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    size = (((count - 1) >> hb->granularity) >> BITS_PER_LEVEL) + 1;
>>> +
>>> +    return size * sizeof(unsigned long);
>>> +}
>>> +
>>
>> This seems flawed to me: number of bits without an offset can't be
>> mapped to a number of real bytes, because "two bits" may or may not
>> cross a granularity boundary, depending on *WHERE* you start counting.
>>
>> e.g.
>>
>> granularity = 1 (i.e. 2^1 = 2 virtual bits per 1 real bit)
>> virtual: 001100
>> real:     0 1 0
>>
>> The amount of space required to hold "two bits" here could be as
>> little as one bit, if the offset is k={0,2,4} but it could be as much
>> as two bits if the offset is k={1,3}.
>>
>> You may never use the function in this way, but mapping virtual bits
>> to an implementation byte-size seems like it is inviting an
>> inconsistency.
> I dislike this function too.. But unfortunately we need the size in
> bytes used for serialization.
>
> Hmm. Ok, without loss of generality, let offset be less than
> granularity. The precise formula should look like:
>
> size = (((offset+count-1) >> hb->granularity) >> BITS_PER_LEVEL);
>
> So,
> size = ((((1 << hb->granularity) + count - 2) >> hb->granularity) >>
> BITS_PER_LEVEL) + 1;
> would be enough in any case. Ok?
>

I think so, as long as when you deserialize the object it does so 
correctly, regardless of whether or not you start on an even multiple of 
the granularity.

>
>
>>
>>> +void hbitmap_store_data(const HBitmap *hb, uint8_t *buf,
>>> +                        uint64_t start, uint64_t count)
>>> +{
>>> +    uint64_t last = start + count - 1;
>>> +    unsigned long *out = (unsigned long *)buf;
>>> +
>>> +    if (count == 0) {
>>> +        return;
>>> +    }
>>> +
>>> +    start = (start >> hb->granularity) >> BITS_PER_LEVEL;
>>> +    last = (last >> hb->granularity) >> BITS_PER_LEVEL;
>>> +    count = last - start + 1;
>>> +
>>> +#ifdef __BIG_ENDIAN_BITFIELD
>>> +    for (i = start; i <= last; ++i) {
>>> +        unsigned long el = hb->levels[HBITMAP_LEVELS - 1][i];
>>> +        out[i] = (BITS_PER_LONG == 32 ? cpu_to_le32(el) :
>>> cpu_to_le64(el));
>>> +    }
>>> +#else
>>> +    memcpy(out, &hb->levels[HBITMAP_LEVELS - 1][start],
>>> +           count * sizeof(unsigned long));
>>> +#endif
>>> +}
>>> +
>>
>> I suppose the ifdefs are trying to take an optimization by using
>> memcpy if at all possible, without a conversion.
>>
>> Why are the conversions to little endian, though? Shouldn't we be
>> serializing to a Big Endian format?
> As Paolo already explained it is for portability. LE bitmap
> representation is independent of size of bitmap array elements, which
> may be 32bit/64bit, or whatever else with LE format.

Yes, that makes sense. :)

>>
>>> +void hbitmap_restore_data(HBitmap *hb, uint8_t *buf,
>>> +                          uint64_t start, uint64_t count)
>>> +{
>>> +    uint64_t last = start + count - 1;
>>> +    unsigned long *in = (unsigned long *)buf;
>>> +
>>> +    if (count == 0) {
>>> +        return;
>>> +    }
>>> +
>>> +    start = (start >> hb->granularity) >> BITS_PER_LEVEL;
>>> +    last = (last >> hb->granularity) >> BITS_PER_LEVEL;
>>> +    count = last - start + 1;
>>> +
>>> +#ifdef __BIG_ENDIAN_BITFIELD
>>> +    for (i = start; i <= last; ++i) {
>>> +        hb->levels[HBITMAP_LEVELS - 1][i] =
>>> +            (BITS_PER_LONG == 32 ? be32_to_cpu(in[i]) :
>>> be64_to_cpu(in[i]));
>>> +    }
>>> +#else
>>> +    memcpy(&hb->levels[HBITMAP_LEVELS - 1][start], in,
>>> +           count * sizeof(unsigned long));
>>> +#endif
>>> +}
>>> +
>>
>> ...? We're storing as LE but restoring from BE? I'm confused.
> Oh, it's a mistake. Thanks. I've missed it when testing because of
> {#ifdef __BIG_ENDIAN_BITFIELD } branch was never compiled.

Glad I am not crazy :)

>>
>> I'm also not clear on the __BIG_ENDIAN_BITFIELD macro. Why do we want
>> to pack differently based on how C-bitfields are packed by the
>> compiler? I don't think that has any impact on how longs are stored
>> (always in the host native format.)
>>
>> I would rather see the serialize/deserialize always convert to and
>> from BE explicitly with cpu_to_be[32|64] and be[32|64]_to_cpu. I think
>> trying to optimize the pack/unpack in the no-op condition with a #
>> define and a memcpy is inviting portability problems.
> Ok. I'll leave only branch with for(...) in v2.

You can correct me if I am mistaken, but I seem to recall determining 
endianness in a platform, compiler and libc independent way is very 
difficult during the pre-processor stage.

If there _IS_ some way to do this, the optimization is fine, but I do 
not think it will work as written.

The un-optimized version is the safest.

>>
>>> +void hbitmap_restore_finish(HBitmap *bitmap)
>>> +{
>>> +    int64_t i, size;
>>> +    int lev, j;
>>> +
>>> +    /* restore levels starting from penultimate to zero level, assuming
>>> +     * that the last one is ok */
>>> +    size = MAX((bitmap->size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL,
>>> 1);
>>> +    for (lev = HBITMAP_LEVELS - 1; lev-- > 0; ) {
>>> +        size = MAX((size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 1);
>>> +        for (i = 0; i < size; ++i) {
>>> +            bitmap->levels[lev][i] = 0;
>>> +            for (j = 0; j < BITS_PER_LONG; ++j) {
>>> +                if (bitmap->levels[lev+1][(i << BITS_PER_LEVEL) + j]) {
>>> +                    bitmap->levels[lev][i] |=  (1 << j);
>>> +                }
>>> +            }
>>
>> Just a musing: Should we cache the size of each level? I know we can
>> keep recalculating it, but... why bother? We recalculate it in so many
>> places now.
> Interesting point.
>
> However, there is another real bug: there are may be no longs on the
> next level for last bits of the current one. The new version of this
> function will be in v2.

Great :)

>>
>>> +        }
>>> +    }
>>> +}
>>> +
>>>   void hbitmap_free(HBitmap *hb)
>>>   {
>>>       unsigned i;
>>>
>

Thank you!
--js

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] hbitmap: store / restore
  2015-01-13 17:08       ` John Snow
@ 2015-01-14 10:29         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-01-14 10:29 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: kwolf, den, stefanha, pbonzini >> Paolo Bonzini


Best regards,
Vladimir

On 13.01.2015 20:08, John Snow wrote:
> On 01/13/2015 07:59 AM, Vladimir Sementsov-Ogievskiy wrote:
>> On 09.01.2015 00:21, John Snow wrote:
>>> On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote:
> ....
>>>> +/**
>>>> + * hbitmap_restore_finish
>>>> + * @hb: HBitmap to operate on.
>>>> + *
>>>> + * Repair HBitmap after calling hbitmap_restore_data. Actuall all
>>>> HBitmap
>>>> + * layers are restore here.
>>>> + */
>>>> +void hbitmap_restore_finish(HBitmap *hb);
>>>> +
>>>> +/**
>>>>    * hbitmap_free:
>>>>    * @hb: HBitmap to operate on.
>>>>    *
>>>
>>> These are just biased opinions:
>>>
>>> - It might be nice to name the store/restore functions "serialize" and
>>> "deserialize," to describe exactly what they are doing.
>>>
>>> - I might refer to "restore_finish" as "post_load" or "post_restore"
>>> or something similar to mimic how device migration functions are
>>> named. I think hbitmap_restore_data_finalize would also be fine; the
>>> key part here is clearly naming it relative to "restore_data."
>>>
>>
>> Hmm. Ok, what about the following set:
>>      hbitmap_serialize()
>>      hbitmap_deserialize_part()
>>      hbitmap_deserialize_finish()
>>
>
> Looks good to me!
* hbitmap_serialize_part() is better to be similar with its pair function
>
>>>> diff --git a/util/hbitmap.c b/util/hbitmap.c
>>>> index 8aa7406..ac0323f 100644
>>>> --- a/util/hbitmap.c
>>>> +++ b/util/hbitmap.c
>>>> @@ -362,6 +362,90 @@ bool hbitmap_get(const HBitmap *hb, uint64_t 
>>>> item)
>>>>       return (hb->levels[HBITMAP_LEVELS - 1][pos >> BITS_PER_LEVEL] &
>>>> bit) != 0;
>>>>   }
>>>>
>>>> +uint64_t hbitmap_data_size(const HBitmap *hb, uint64_t count)
>>>> +{
>>>> +    uint64_t size;
>>>> +
>>>> +    if (count == 0) {
>>>> +        return 0;
>>>> +    }
>>>> +
>>>> +    size = (((count - 1) >> hb->granularity) >> BITS_PER_LEVEL) + 1;
>>>> +
>>>> +    return size * sizeof(unsigned long);
>>>> +}
>>>> +
>>>
>>> This seems flawed to me: number of bits without an offset can't be
>>> mapped to a number of real bytes, because "two bits" may or may not
>>> cross a granularity boundary, depending on *WHERE* you start counting.
>>>
>>> e.g.
>>>
>>> granularity = 1 (i.e. 2^1 = 2 virtual bits per 1 real bit)
>>> virtual: 001100
>>> real:     0 1 0
>>>
>>> The amount of space required to hold "two bits" here could be as
>>> little as one bit, if the offset is k={0,2,4} but it could be as much
>>> as two bits if the offset is k={1,3}.
>>>
>>> You may never use the function in this way, but mapping virtual bits
>>> to an implementation byte-size seems like it is inviting an
>>> inconsistency.
>> I dislike this function too.. But unfortunately we need the size in
>> bytes used for serialization.
>>
>> Hmm. Ok, without loss of generality, let offset be less than
>> granularity. The precise formula should look like:
>>
>> size = (((offset+count-1) >> hb->granularity) >> BITS_PER_LEVEL);
>>
>> So,
>> size = ((((1 << hb->granularity) + count - 2) >> hb->granularity) >>
>> BITS_PER_LEVEL) + 1;
>> would be enough in any case. Ok?
>>
>
> I think so, as long as when you deserialize the object it does so 
> correctly, regardless of whether or not you start on an even multiple 
> of the granularity.
>
May be, also rename hbitmap_data_size to hbitmap_serialize_size or even 
drop it and make function hbitmap_store_data() return the necessary size 
when *buf is NULL.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 5/9] block: BdrvDirtyBitmap store/restore interface
  2014-12-11 14:17 ` [Qemu-devel] [PATCH 5/9] block: BdrvDirtyBitmap store/restore interface Vladimir Sementsov-Ogievskiy
  2015-01-08 21:22   ` John Snow
@ 2015-01-14 11:27   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-01-14 11:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, den, jsnow, stefanha

As in previous patch, rename store/restore to serialize/deserialize...

Hmm. In this case, isn't it be better to include serialization of 
granularity, name and name length in these functions?

Best regards,
Vladimir

On 11.12.2014 17:17, Vladimir Sementsov-Ogievskiy wrote:
> Several functions to provide necessary access to BdrvDirtyBitmap for
> block-migration.c
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>   block.c               | 61 +++++++++++++++++++++++++++++++++++++++++++++++++--
>   include/block/block.h | 10 +++++++++
>   2 files changed, 69 insertions(+), 2 deletions(-)
>
> diff --git a/block.c b/block.c
> index 6edf1dc..7d42620 100644
> --- a/block.c
> +++ b/block.c
> @@ -5511,8 +5511,65 @@ void bdrv_reset_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
>       hbitmap_reset(bitmap->bitmap, cur_sector, nr_sectors);
>   }
>   
> -static void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
> -                           int nr_sectors)
> +const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap)
> +{
> +    return bitmap->name;
> +}
> +
> +uint64_t bdrv_dbm_data_size(const BdrvDirtyBitmap *bitmap, uint64_t count)
> +{
> +    return hbitmap_data_size(bitmap->bitmap, count);
> +}
> +
> +void bdrv_dbm_store_data(const BdrvDirtyBitmap *bitmap, uint8_t *buf,
> +                         uint64_t start, uint64_t count)
> +{
> +    hbitmap_store_data(bitmap->bitmap, buf, start, count);
> +}
> +
> +void bdrv_dbm_restore_data(BdrvDirtyBitmap *bitmap, uint8_t *buf,
> +                           uint64_t start, uint64_t count)
> +{
> +    hbitmap_restore_data(bitmap->bitmap, buf, start, count);
> +}
> +
> +BdrvDirtyBitmap **bdrv_dbm_find_all_named(BlockDriverState *bs, int *count)
> +{
> +    BdrvDirtyBitmap *bm, **res, **iter;
> +    assert(count);
> +
> +    QLIST_FOREACH(bm, &bs->dirty_bitmaps, list) {
> +        if (bm->name != NULL) {
> +            (*count)++;
> +        }
> +    }
> +
> +    iter = res = g_malloc(sizeof(*res) * (*count));
> +    QLIST_FOREACH(bm, &bs->dirty_bitmaps, list) {
> +        if (bm->name != NULL) {
> +            *iter++ = bm;
> +        }
> +    }
> +
> +    return res;
> +}
> +
> +void bdrv_dbm_restore_finish(void)
> +{
> +    BlockDriverState *bs;
> +    BdrvDirtyBitmap *bm;
> +
> +    for (bs = bdrv_next(NULL); bs != NULL; bs = bdrv_next(bs)) {
> +        QLIST_FOREACH(bm, &bs->dirty_bitmaps, list) {
> +            if (bm->name != NULL) {
> +                hbitmap_restore_finish(bm->bitmap);
> +            }
> +        }
> +    }
> +}
> +
> +void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
> +                    int nr_sectors)
>   {
>       BdrvDirtyBitmap *bitmap;
>       QLIST_FOREACH(bitmap, &bs->dirty_bitmaps, list) {
> diff --git a/include/block/block.h b/include/block/block.h
> index b21233c..09eff80 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -459,6 +459,16 @@ void bdrv_dirty_iter_init(BlockDriverState *bs,
>   void bdrv_dirty_iter_set(struct HBitmapIter *hbi, int64_t offset);
>   int64_t bdrv_get_dirty_count(BlockDriverState *bs, BdrvDirtyBitmap *bitmap);
>   
> +uint64_t bdrv_dbm_data_size(const BdrvDirtyBitmap *bitmap, uint64_t count);
> +void bdrv_dbm_store_data(const BdrvDirtyBitmap *bitmap, uint8_t *buf,
> +                         uint64_t start, uint64_t count);
> +void bdrv_dbm_restore_data(BdrvDirtyBitmap *bitmap, uint8_t *buf,
> +                           uint64_t start, uint64_t count);
> +bool bdrv_dbm_is_named(BdrvDirtyBitmap *bitmap);
> +const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap);
> +BdrvDirtyBitmap **bdrv_dbm_find_all_named(BlockDriverState *bs, int *count);
> +void bdrv_dbm_restore_finish(void);
> +
>   void bdrv_enable_copy_on_read(BlockDriverState *bs);
>   void bdrv_disable_copy_on_read(BlockDriverState *bs);
>   

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 6/9] block-migration: tiny refactoring
  2015-01-08 21:23   ` John Snow
@ 2015-01-14 12:26     ` Vladimir Sementsov-Ogievskiy
  2015-01-14 16:53       ` John Snow
  0 siblings, 1 reply; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-01-14 12:26 UTC (permalink / raw)
  To: John Snow, qemu-devel; +Cc: kwolf, den, stefanha

On 09.01.2015 00:23, John Snow wrote:
>
>
> On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Add blk_create and blk_free to remove code duplicates. Otherwise,
>> duplicates will rise in the following patches because of BlkMigBlock
>> sturcture extendin.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>> ---
>>   block-migration.c | 56 
>> +++++++++++++++++++++++++++++--------------------------
>>   1 file changed, 30 insertions(+), 26 deletions(-)
>>
>> diff --git a/block-migration.c b/block-migration.c
>> index 5b4aa0f..d0c825f 100644
>> --- a/block-migration.c
>> +++ b/block-migration.c
>> @@ -113,6 +113,30 @@ static void blk_mig_unlock(void)
>>       qemu_mutex_unlock(&block_mig_state.lock);
>>   }
>>
>> +/* Only allocating and initializing structure fields, not copying 
>> any data. */
>> +
>> +static BlkMigBlock *blk_create(BlkMigDevState *bmds, int64_t sector,
>> +                                int nr_sectors)
>> +{
>> +    BlkMigBlock *blk = g_new(BlkMigBlock, 1);
>> +    blk->buf = g_malloc(BLOCK_SIZE);
>> +    blk->bmds = bmds;
>> +    blk->sector = sector;
>> +    blk->nr_sectors = nr_sectors;
>> +
>> +    blk->iov.iov_base = blk->buf;
>> +    blk->iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE;
>> +    qemu_iovec_init_external(&blk->qiov, &blk->iov, 1);
>> +
>> +    return blk;
>> +}
>> +
>> +static void blk_free(BlkMigBlock *blk)
>> +{
>> +    g_free(blk->buf);
>> +    g_free(blk);
>> +}
>> +
>>   /* Must run outside of the iothread lock during the bulk phase,
>>    * or the VM will stall.
>>    */
>> @@ -285,15 +309,7 @@ static int mig_save_device_bulk(QEMUFile *f, 
>> BlkMigDevState *bmds)
>>           nr_sectors = total_sectors - cur_sector;
>>       }
>>
>> -    blk = g_new(BlkMigBlock, 1);
>> -    blk->buf = g_malloc(BLOCK_SIZE);
>> -    blk->bmds = bmds;
>> -    blk->sector = cur_sector;
>> -    blk->nr_sectors = nr_sectors;
>> -
>> -    blk->iov.iov_base = blk->buf;
>> -    blk->iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE;
>> -    qemu_iovec_init_external(&blk->qiov, &blk->iov, 1);
>> +    blk = blk_create(bmds, cur_sector, nr_sectors);
>>
>>       blk_mig_lock();
>>       block_mig_state.submitted++;
>> @@ -467,17 +483,9 @@ static int mig_save_device_dirty(QEMUFile *f, 
>> BlkMigDevState *bmds,
>>               } else {
>>                   nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
>>               }
>> -            blk = g_new(BlkMigBlock, 1);
>> -            blk->buf = g_malloc(BLOCK_SIZE);
>> -            blk->bmds = bmds;
>> -            blk->sector = sector;
>> -            blk->nr_sectors = nr_sectors;
>> +            blk = blk_create(bmds, sector, nr_sectors);
>>
>>               if (is_async) {
>> -                blk->iov.iov_base = blk->buf;
>> -                blk->iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE;
>> -                qemu_iovec_init_external(&blk->qiov, &blk->iov, 1);
>> -
>
> I suppose in the (!is_async) branch we don't reference iov/qiov again, 
> but the functional difference caught my eye. It used to only be called 
> under the "is_async" branch, but now is going to be executed 
> unconditionally.
>
> Is that fine?
It think it doesn't matter. I can add a parameter 'is_async' to 
blk_create(), but what is worse - excess parameter or excess 
initialization? And why not to initialize the whole structure in 
blk_create() unconditionally?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 6/9] block-migration: tiny refactoring
  2015-01-14 12:26     ` Vladimir Sementsov-Ogievskiy
@ 2015-01-14 16:53       ` John Snow
  0 siblings, 0 replies; 47+ messages in thread
From: John Snow @ 2015-01-14 16:53 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel; +Cc: kwolf, den, stefanha



On 01/14/2015 07:26 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 09.01.2015 00:23, John Snow wrote:
>>
>>
>> On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Add blk_create and blk_free to remove code duplicates. Otherwise,
>>> duplicates will rise in the following patches because of BlkMigBlock
>>> sturcture extendin.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>>> ---
>>>   block-migration.c | 56
>>> +++++++++++++++++++++++++++++--------------------------
>>>   1 file changed, 30 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/block-migration.c b/block-migration.c
>>> index 5b4aa0f..d0c825f 100644
>>> --- a/block-migration.c
>>> +++ b/block-migration.c
>>> @@ -113,6 +113,30 @@ static void blk_mig_unlock(void)
>>>       qemu_mutex_unlock(&block_mig_state.lock);
>>>   }
>>>
>>> +/* Only allocating and initializing structure fields, not copying
>>> any data. */
>>> +
>>> +static BlkMigBlock *blk_create(BlkMigDevState *bmds, int64_t sector,
>>> +                                int nr_sectors)
>>> +{
>>> +    BlkMigBlock *blk = g_new(BlkMigBlock, 1);
>>> +    blk->buf = g_malloc(BLOCK_SIZE);
>>> +    blk->bmds = bmds;
>>> +    blk->sector = sector;
>>> +    blk->nr_sectors = nr_sectors;
>>> +
>>> +    blk->iov.iov_base = blk->buf;
>>> +    blk->iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE;
>>> +    qemu_iovec_init_external(&blk->qiov, &blk->iov, 1);
>>> +
>>> +    return blk;
>>> +}
>>> +
>>> +static void blk_free(BlkMigBlock *blk)
>>> +{
>>> +    g_free(blk->buf);
>>> +    g_free(blk);
>>> +}
>>> +
>>>   /* Must run outside of the iothread lock during the bulk phase,
>>>    * or the VM will stall.
>>>    */
>>> @@ -285,15 +309,7 @@ static int mig_save_device_bulk(QEMUFile *f,
>>> BlkMigDevState *bmds)
>>>           nr_sectors = total_sectors - cur_sector;
>>>       }
>>>
>>> -    blk = g_new(BlkMigBlock, 1);
>>> -    blk->buf = g_malloc(BLOCK_SIZE);
>>> -    blk->bmds = bmds;
>>> -    blk->sector = cur_sector;
>>> -    blk->nr_sectors = nr_sectors;
>>> -
>>> -    blk->iov.iov_base = blk->buf;
>>> -    blk->iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE;
>>> -    qemu_iovec_init_external(&blk->qiov, &blk->iov, 1);
>>> +    blk = blk_create(bmds, cur_sector, nr_sectors);
>>>
>>>       blk_mig_lock();
>>>       block_mig_state.submitted++;
>>> @@ -467,17 +483,9 @@ static int mig_save_device_dirty(QEMUFile *f,
>>> BlkMigDevState *bmds,
>>>               } else {
>>>                   nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
>>>               }
>>> -            blk = g_new(BlkMigBlock, 1);
>>> -            blk->buf = g_malloc(BLOCK_SIZE);
>>> -            blk->bmds = bmds;
>>> -            blk->sector = sector;
>>> -            blk->nr_sectors = nr_sectors;
>>> +            blk = blk_create(bmds, sector, nr_sectors);
>>>
>>>               if (is_async) {
>>> -                blk->iov.iov_base = blk->buf;
>>> -                blk->iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE;
>>> -                qemu_iovec_init_external(&blk->qiov, &blk->iov, 1);
>>> -
>>
>> I suppose in the (!is_async) branch we don't reference iov/qiov again,
>> but the functional difference caught my eye. It used to only be called
>> under the "is_async" branch, but now is going to be executed
>> unconditionally.
>>
>> Is that fine?
> It think it doesn't matter. I can add a parameter 'is_async' to
> blk_create(), but what is worse - excess parameter or excess
> initialization? And why not to initialize the whole structure in
> blk_create() unconditionally?
>

If it's not a problem, leave it as-is. If I am not sure immediately 
myself, I like to ask questions.

Your answer to the question can always be "Yes, that's fine!"

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 7/9] block-migration: remove not needed iothread lock
  2015-01-08 21:24   ` John Snow
@ 2015-01-16 12:54     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-01-16 12:54 UTC (permalink / raw)
  To: John Snow, qemu-devel; +Cc: kwolf, den, stefanha

On 09.01.2015 00:24, John Snow wrote:
>
>
> On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Instead of locking iothread, we can just swap these calls. So, if some
>> write to our range occures before resetting the bitmap, then it will
>> get into subsequent aio read, becouse it occures, in any case, after
>> resetting the bitmap.
>>
>
> s/occures/occurs/g
> s/becouse/because/g
>
> (I hope you are not annoyed by the spelling corrections: They are in 
> good faith and personally I would hope people would correct any of my 
> spelling mistakes before it goes in the git log!)
>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>> ---
>>   block-migration.c | 6 ++----
>>   1 file changed, 2 insertions(+), 4 deletions(-)
>>
>> diff --git a/block-migration.c b/block-migration.c
>> index d0c825f..908a66d 100644
>> --- a/block-migration.c
>> +++ b/block-migration.c
>> @@ -315,13 +315,11 @@ static int mig_save_device_bulk(QEMUFile *f, 
>> BlkMigDevState *bmds)
>>       block_mig_state.submitted++;
>>       blk_mig_unlock();
>>
>> -    qemu_mutex_lock_iothread();
>> +    bdrv_reset_dirty_bitmap(bs, bmds->dirty_bitmap, cur_sector, 
>> nr_sectors);
>> +
>>       blk->aiocb = bdrv_aio_readv(bs, cur_sector, &blk->qiov,
>>                                   nr_sectors, blk_mig_read_cb, blk);
>>
>> -    bdrv_reset_dirty_bitmap(bs, bmds->dirty_bitmap, cur_sector, 
>> nr_sectors);
>> -    qemu_mutex_unlock_iothread();
>> -
>>       bmds->cur_sector = cur_sector + nr_sectors;
>>       return (bmds->cur_sector >= total_sectors);
>>   }
>>
>
> OK, so the justification here is that by ordering it as "reset, read" 
> that any writes that occur after the reset will be simply included in 
> the read, and we won't have reset any bits that we didn't actually get 
> a chance to read.
>
> OK.
>
> But what about losing the ability to clear bits that are set needlessly?
>
> e.g.:
>
> Sector 1 is dirty. Sector 2 is clean.
> We reset the bitmap. All sectors are clean.
> A write occurs to sector 2, marking it dirty.
> We read sectors one and two.
> Sector two is now erroneously marked dirty, when it isn't.
>
> It's not a data integrity problem, but we're swapping one inefficiency 
> for another.
>
> Do you have a justification for why this is better than the lock?

I think without lock is always better. Lock-less operations are usually 
better, aren't they? So, I didn't think about such kind of efficiency, 
but only about that this lock is needless. If we don't swap these reset 
and read, lock is necessary of course.

Furthermore, If we have a lock, this write (to sector 2) will happen 
after unlock, and the sector will be dirty anyway. So, this efficiency 
is the same, but we can remove lock.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 7/9] block-migration: remove not needed iothread lock
  2015-01-08 22:28   ` Paolo Bonzini
@ 2015-01-16 13:03     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-01-16 13:03 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, den, jsnow, stefanha


Best regards,
Vladimir

On 09.01.2015 01:28, Paolo Bonzini wrote:
>
> On 11/12/2014 15:17, Vladimir Sementsov-Ogievskiy wrote:
>> -    qemu_mutex_lock_iothread();
>> +    bdrv_reset_dirty_bitmap(bs, bmds->dirty_bitmap, cur_sector, nr_sectors);
>> +
>>       blk->aiocb = bdrv_aio_readv(bs, cur_sector, &blk->qiov,
>>                                   nr_sectors, blk_mig_read_cb, blk);
>>   
>> -    bdrv_reset_dirty_bitmap(bs, bmds->dirty_bitmap, cur_sector, nr_sectors);
>> -    qemu_mutex_unlock_iothread();
>> -
> bdrv_aio_readv is not thread safe, so it needs the iothread lock.
>
> bdrv_reset_dirty_bitmap is also not thread safe, because
> bmds->dirty_bitmap is accessed concurrently by bdrv_reset_dirty
> (discard) and bdrv_set_dirty (write).  So it needs the iothread lock too.
>
> Moving bdrv_reset_dirty_bitmap before the read is a good idea.
>
> Paolo
Ok.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration
  2015-01-08 22:05   ` John Snow
@ 2015-01-17 17:17     ` Vladimir Sementsov-Ogievskiy
  2015-01-20 17:25       ` John Snow
  0 siblings, 1 reply; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-01-17 17:17 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: kwolf, Juan Q >> Juan Jose Quintela Carreira,
	gilbert >> Dr. David Alan Gilbert, stefanha, Amit Shah,
	den


Best regards,
Vladimir

On 09.01.2015 01:05, John Snow wrote:
> CCing migration maintainers, feedback otherwise in-line.
>
> On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Just migrate parts of dirty bitmaps, corresponding to the block being
>> migrated. Also, skip block migration if it is disabled (blk parameter
>> of migrate command is false).
>>
>> Skipping shared sectors: bitmaps are migrated independently of this,
>> just send blk without block data.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>> ---
>
> In terms of general approach, migrating the dirty bitmap alongside the 
> blocks it describes makes sense when migrating both.
>
> Is this a lot of overhead when that's not the case, though? If we 
> utilize the "bitmap only" pathways added here, don't we iterate 
> through the savevm handlers a lot to only transfer very little data 
> per section?
>
> If we really need migration of bitmaps apart from the data they 
> describe, is it not worth developing a more condensed transfer 
> mechanism to get more bitmap data per iteration, instead of just one 
> block's worth?
The first stage of migration can be easily optimized in this case - just 
take a larger bitmap pieces. For the second stage, it's not as simple. 
If we will take larger pieces on each step - we will just increase dirty 
data transfer. One approach is to maintain "dirty-region" - two numbers, 
representing the region of set bits in migration dirty bitmap, and send 
data only when this region is large enough.

>
>>   block-migration.c | 173 
>> +++++++++++++++++++++++++++++++++++++++++++++++-------
>>   savevm.c          |   1 +
>>   2 files changed, 154 insertions(+), 20 deletions(-)
>>
>> diff --git a/block-migration.c b/block-migration.c
>> index 908a66d..95d54a1 100644
>> --- a/block-migration.c
>> +++ b/block-migration.c
>> @@ -32,6 +32,8 @@
>>   #define BLK_MIG_FLAG_EOS                0x02
>>   #define BLK_MIG_FLAG_PROGRESS           0x04
>>   #define BLK_MIG_FLAG_ZERO_BLOCK         0x08
>> +#define BLK_MIG_FLAG_HAS_BLOCK          0x10
>> +#define BLK_MIG_FLAG_HAS_BITMAPS        0x20
>>
>
> OK: As a result of allowing bitmaps to be migrated without the block 
> data itself, we now must acknowledge the concept that we can send 
> either block data, bitmaps, both, or neither -- so new defines are 
> added here to indicate what data can be found in the section following.
>
>>   #define MAX_IS_ALLOCATED_SEARCH 65536
>>
>> @@ -51,6 +53,8 @@ typedef struct BlkMigDevState {
>>       int shared_base;
>>       int64_t total_sectors;
>>       QSIMPLEQ_ENTRY(BlkMigDevState) entry;
>> +    int nr_bitmaps;
>> +    BdrvDirtyBitmap **dirty_bitmaps;
>>
>>       /* Only used by migration thread.  Does not need a lock. */
>>       int bulk_completed;
>> @@ -64,6 +68,11 @@ typedef struct BlkMigDevState {
>>       Error *blocker;
>>   } BlkMigDevState;
>>
>> +typedef struct BlkMigDirtyBitmap {
>> +    BdrvDirtyBitmap *bitmap;
>> +    uint8_t *buf;
>> +} BlkMigDirtyBitmap;
>> +
>>   typedef struct BlkMigBlock {
>>       /* Only used by migration thread.  */
>>       uint8_t *buf;
>> @@ -74,6 +83,9 @@ typedef struct BlkMigBlock {
>>       QEMUIOVector qiov;
>>       BlockAIOCB *aiocb;
>>
>> +    int nr_bitmaps;
>> +    BlkMigDirtyBitmap *dirty_bitmaps;
>> +
>>       /* Protected by block migration lock.  */
>>       int ret;
>>       QSIMPLEQ_ENTRY(BlkMigBlock) entry;
>> @@ -83,6 +95,7 @@ typedef struct BlkMigState {
>>       /* Written during setup phase.  Can be read without a lock.  */
>>       int blk_enable;
>>       int shared_base;
>> +    int dbm_enable;
>
> Similar to the feedback in a previous patch, we may not want to use 
> 'dbm' to mean "Dirty Bitmap" because we have not been applying the 
> abbreviation consistently.
>
> For now, the recommendation from stefan is to use the full 
> "bdrv_dirty_bitmap" or "dirty_bitmap" in function names.
>
> If we do want an acronym to refer to this particular type of dirty 
> bitmap, we should apply it consistently to all functions that work 
> with the BdrvDirtyBitmap type.
>
> For now, "bdrv_dirty_bitmap_enable" should suffice, even though it's a 
> bit wordy.
It's a flag, that enables the migration of dirty bitmaps, like 
blk_enable enables the migrtion of blocks.. Then, should it be 
"dirty_bitmap_migration_enable"? Isn't it too long for a simple flag 
variable?
>
>>       QSIMPLEQ_HEAD(bmds_list, BlkMigDevState) bmds_list;
>>       int64_t total_sector_sum;
>>       bool zero_blocks;
>> @@ -116,27 +129,63 @@ static void blk_mig_unlock(void)
>>   /* Only allocating and initializing structure fields, not copying 
>> any data. */
>>
>>   static BlkMigBlock *blk_create(BlkMigDevState *bmds, int64_t sector,
>> -                                int nr_sectors)
>> +                               int nr_sectors, bool only_bitmaps)
>>   {
>> +    int i;
>>       BlkMigBlock *blk = g_new(BlkMigBlock, 1);
>> -    blk->buf = g_malloc(BLOCK_SIZE);
>> +    blk->buf = only_bitmaps ? NULL : g_malloc(BLOCK_SIZE);
>>       blk->bmds = bmds;
>>       blk->sector = sector;
>>       blk->nr_sectors = nr_sectors;
>> +    blk->dirty_bitmaps = NULL;
>> +    blk->nr_bitmaps = 0;
>>
>>       blk->iov.iov_base = blk->buf;
>>       blk->iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE;
>>       qemu_iovec_init_external(&blk->qiov, &blk->iov, 1);
>>
>
> We can probably skip this block if only_bitmaps is true.
I agree.
>
>> +    if (block_mig_state.dbm_enable) {
>> +        BlkMigDirtyBitmap *mig_bm;
>> +
>> +        blk->nr_bitmaps = bmds->nr_bitmaps;
>> +        mig_bm = blk->dirty_bitmaps = g_new(BlkMigDirtyBitmap,
>> + bmds->nr_bitmaps);
>> +        for (i = 0; i < bmds->nr_bitmaps; ++i) {
>> +            BdrvDirtyBitmap *bm = bmds->dirty_bitmaps[i];
>> +            mig_bm->buf = g_malloc(bdrv_dbm_data_size(bm, nr_sectors));
>> +            mig_bm->bitmap = bm;
>> +            mig_bm++;
>> +        }
>> +    }
>> +
>
> It's strange to me that we'd give it an "only bitmaps" boolean and 
> then rely on a different condition to populate them. Is it not worthy 
> of an error if we specify only_bitmaps when block_mig_state.dbm_enable 
> is false?
only_bitmaps means: no blocks but only bitmaps. But there is another 
case: blocks with bitmaps, and in this case this "if ()" should execute.

{only_bitmaps = true} when {block_mig_state.dbm_enable = false} - it's 
of course an error. I'll add assert() for this.
>
>>       return blk;
>>   }
>>
>>   static void blk_free(BlkMigBlock *blk)
>>   {
>> +    int i;
>>       g_free(blk->buf);
>> +
>> +    if (blk->dirty_bitmaps) {
>> +        for (i = 0; i < blk->nr_bitmaps; ++i) {
>> +            g_free(blk->dirty_bitmaps[i].buf);
>> +        }
>> +        g_free(blk->dirty_bitmaps);
>> +    }
>> +
>>       g_free(blk);
>>   }
>>
>> +static void blk_store_bitmaps(BlkMigBlock *blk)
>> +{
>> +    int i;
>> +    for (i = 0; i < blk->nr_bitmaps; ++i) {
>> +        bdrv_dbm_store_data(blk->dirty_bitmaps[i].bitmap,
>> +                            blk->dirty_bitmaps[i].buf,
>> +                            blk->sector, blk->nr_sectors);
>> +    }
>> +}
>> +
>
> I am worried about the case in which blk->nr_sectors is not an 
> integral multiple of the bitmap sector granularity, for reasons 
> discussed in the previous patches.
>
> I'm not positive it IS a problem either, but maybe you have already 
> thought about this.
Only allocated memory should be extended in the previous patch, as I've 
already proposed. As a general approach - it's ok, because we always 
calculate the real number of bit in bitmap from the virtual in the same 
way, therefore stored region will be restored in the same place.
>
>>   /* Must run outside of the iothread lock during the bulk phase,
>>    * or the VM will stall.
>>    */
>> @@ -146,9 +195,15 @@ static void blk_send(QEMUFile *f, BlkMigBlock * 
>> blk)
>>       int len;
>>       uint64_t flags = BLK_MIG_FLAG_DEVICE_BLOCK;
>>
>> -    if (block_mig_state.zero_blocks &&
>> +    if (block_mig_state.zero_blocks && blk->buf &&
>>           buffer_is_zero(blk->buf, BLOCK_SIZE)) {
>>           flags |= BLK_MIG_FLAG_ZERO_BLOCK;
>> +    } else if (blk->buf) {
>> +        flags |= BLK_MIG_FLAG_HAS_BLOCK;
>> +    }
>> +
>> +    if (block_mig_state.dbm_enable) {
>> +        flags |= BLK_MIG_FLAG_HAS_BITMAPS;
>>       }
>>
>>       /* sector number and flags */
>> @@ -160,10 +215,27 @@ static void blk_send(QEMUFile *f, BlkMigBlock * 
>> blk)
>>       qemu_put_byte(f, len);
>>       qemu_put_buffer(f, (uint8_t 
>> *)bdrv_get_device_name(blk->bmds->bs), len);
>>
>> +    /* dirty bitmaps */
>> +    if (flags & BLK_MIG_FLAG_HAS_BITMAPS) {
>> +        int i;
>> +        qemu_put_be32(f, blk->nr_bitmaps);
>> +        for (i = 0; i < blk->nr_bitmaps; ++i) {
>> +            BdrvDirtyBitmap *bm = blk->dirty_bitmaps[i].bitmap;
>> +            int buf_size = bdrv_dbm_data_size(bm, blk->nr_sectors);
>> +            const char *name = bdrv_dirty_bitmap_name(bm);
>> +            int len = strlen(name);
>> +
>> +            qemu_put_byte(f, len);
>> +            qemu_put_buffer(f, (const uint8_t *)name, len);
>
> No length in the stream for the size of the dirty bitmap buffer?
>
>> +            qemu_put_buffer(f, blk->dirty_bitmaps[i].buf, buf_size);
>> +        }
>> +    }
>> +
>
> Future patch idea: We can probably optimize this block by testing a 
> range of sectors on the bitmap and skipping bitmaps that don't have 
> data for this block.
>
> Since we populated the bitmaps previously, we should already have the 
> chance to have counted how many of them are nonempty, so we can use 
> that number instead of blk->nr_bitmaps.
>
> Not important for the series right now.
>
>>       /* if a block is zero we need to flush here since the network
>>        * bandwidth is now a lot higher than the storage device 
>> bandwidth.
>> -     * thus if we queue zero blocks we slow down the migration */
>> -    if (flags & BLK_MIG_FLAG_ZERO_BLOCK) {
>> +     * thus if we queue zero blocks we slow down the migration.
>> +     * also, skip writing block when migrate only dirty bitmaps. */
>> +    if (!(flags & BLK_MIG_FLAG_HAS_BLOCK)) {
>
> I'm not completely clear on the reasoning behind the original 
> codeblock here, but changing it from "Only when zeroes" to "Not when 
> we have data: Zeroes and Bitmaps" makes sense, since bitmap-only 
> sections are going to be very sparse too.
>
>>           qemu_fflush(f);
>>           return;
>>       }
>
> A little after this, there's a call to qemu_put_buffer(f, blk->buf, 
> BLOCK_SIZE), but we have the case where blk->buf may be NULL in bitmap 
> only cases. I think we need to guard against that.
in this case (flags & BLK_MIG_FLAG_HAS_BLOCK) should be zero. However an 
assert here won't hurt.
>
>> @@ -282,13 +354,20 @@ static int mig_save_device_bulk(QEMUFile *f, 
>> BlkMigDevState *bmds)
>>       BlockDriverState *bs = bmds->bs;
>>       BlkMigBlock *blk;
>>       int nr_sectors;
>> +    bool skip_chunk = false;
>>
>>       if (bmds->shared_base) {
>>           qemu_mutex_lock_iothread();
>> -        while (cur_sector < total_sectors &&
>> -               !bdrv_is_allocated(bs, cur_sector, 
>> MAX_IS_ALLOCATED_SEARCH,
>> -                                  &nr_sectors)) {
>> -            cur_sector += nr_sectors;
>> +        if (block_mig_state.dbm_enable) {
>> +            bdrv_is_allocated(bs, cur_sector, 
>> BDRV_SECTORS_PER_DIRTY_CHUNK,
>> +                              &nr_sectors);
>> +            skip_chunk = nr_sectors >= BDRV_SECTORS_PER_DIRTY_CHUNK;
>> +        } else {
>> +            while (cur_sector < total_sectors &&
>> +                   !bdrv_is_allocated(bs, cur_sector, 
>> MAX_IS_ALLOCATED_SEARCH,
>> +                                      &nr_sectors)) {
>> +                cur_sector += nr_sectors;
>> +            }
>
> OK, so the approach taken here is that if bitmaps are enabled, we 
> check only to see if this current chunk is allocated or not. If it 
> isn't, we declare this a "bitmap only" chunk and set skip_chunk to true.
>
> The else clause taken implies no bitmap data, because either 
> dbm_enable or blk_enable (or both) MUST be set for us to be here.
>
> (1) Why are you not checking the return value of bdrv_is_allocated? It 
> could return either true or false AND nr_sectors == 
> BDRV_SECTORS_PER_DIRTY_CHUNK, provided it was entirely allocated or 
> entirely unallocated, so I think this condition is not working as you 
> intend it to.
Yes, it's my mistake.
>
> (2) This seems slightly hackey in a sparse or no-data situation. We 
> will transmit many small data sections, each containing a chunk of 
> bitmap data, one at a time, with no data interspersed -- which leaves 
> a high metadata-to-data ratio in these cases.
>
> Would it be an involved process to alter the nr_sectors count to be 
> something that isn't a single sector in the (dbm_enable && 
> !blk_enable) case? That way we can spin until we find data worth 
> transferring and transfer all of the bitmap metadata in a larger chunk 
> all at once.
>
> Whatever approach you choose, it might be nice to have a comment 
> in-line explaining the general approach.
>
>>           }
>>           qemu_mutex_unlock_iothread();
>>       }
>> @@ -309,7 +388,8 @@ static int mig_save_device_bulk(QEMUFile *f, 
>> BlkMigDevState *bmds)
>>           nr_sectors = total_sectors - cur_sector;
>>       }
>>
>> -    blk = blk_create(bmds, cur_sector, nr_sectors);
>> +    blk = blk_create(bmds, cur_sector, nr_sectors,
>> +                     skip_chunk || !block_mig_state.blk_enable);
>>
>>       blk_mig_lock();
>>       block_mig_state.submitted++;
>> @@ -317,8 +397,16 @@ static int mig_save_device_bulk(QEMUFile *f, 
>> BlkMigDevState *bmds)
>>
>>       bdrv_reset_dirty_bitmap(bs, bmds->dirty_bitmap, cur_sector, 
>> nr_sectors);
>>
>> -    blk->aiocb = bdrv_aio_readv(bs, cur_sector, &blk->qiov,
>> -                                nr_sectors, blk_mig_read_cb, blk);
>> +    if (block_mig_state.dbm_enable) {
>> +        blk_store_bitmaps(blk);
>> +    }
>> +
>> +    if (block_mig_state.blk_enable && !skip_chunk) {
>> +        blk->aiocb = bdrv_aio_readv(bs, cur_sector, &blk->qiov,
>> +                                    nr_sectors, blk_mig_read_cb, blk);
>> +    } else if (block_mig_state.dbm_enable) {
>> +        blk_mig_read_cb(blk, 0);
>> +    }
>>
>>       bmds->cur_sector = cur_sector + nr_sectors;
>>       return (bmds->cur_sector >= total_sectors);
>> @@ -403,6 +491,8 @@ static void init_blk_migration(QEMUFile *f)
>>               DPRINTF("Start full migration for %s\n", 
>> bdrv_get_device_name(bs));
>>           }
>>
>> +        bmds->dirty_bitmaps = bdrv_dbm_find_all_named(bs, 
>> &bmds->nr_bitmaps);
>> +
>
> At this point, we should have block_mig_state.dbm_enable set (or 
> disabled), so we can skip this initialization if it is not set.
Agree.
>
>> QSIMPLEQ_INSERT_TAIL(&block_mig_state.bmds_list, bmds, entry);
>>       }
>>   }
>> @@ -481,20 +571,32 @@ static int mig_save_device_dirty(QEMUFile *f, 
>> BlkMigDevState *bmds,
>>               } else {
>>                   nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
>>               }
>> -            blk = blk_create(bmds, sector, nr_sectors);
>> +            blk = blk_create(bmds, sector, nr_sectors,
>> +                             !block_mig_state.blk_enable);
>> +
>> +            if (block_mig_state.dbm_enable) {
>> +                blk_store_bitmaps(blk);
>> +            }
>>
>>               if (is_async) {
>> -                blk->aiocb = bdrv_aio_readv(bmds->bs, sector, 
>> &blk->qiov,
>> -                                            nr_sectors, 
>> blk_mig_read_cb, blk);
>> +                if (block_mig_state.blk_enable) {
>> +                    blk->aiocb = bdrv_aio_readv(bmds->bs, sector, 
>> &blk->qiov,
>> +                                                nr_sectors, 
>> blk_mig_read_cb,
>> +                                                blk);
>> +                } else {
>> +                    blk_mig_read_cb(blk, 0);
>> +                }
>>
>>                   blk_mig_lock();
>>                   block_mig_state.submitted++;
>>                   bmds_set_aio_inflight(bmds, sector, nr_sectors, 1);
>>                   blk_mig_unlock();
>>               } else {
>> -                ret = bdrv_read(bmds->bs, sector, blk->buf, 
>> nr_sectors);
>> -                if (ret < 0) {
>> -                    goto error;
>> +                if (block_mig_state.blk_enable) {
>> +                    ret = bdrv_read(bmds->bs, sector, blk->buf, 
>> nr_sectors);
>> +                    if (ret < 0) {
>> +                        goto error;
>> +                    }
>>                   }
>>                   blk_send(f, blk);
>>
>> @@ -817,10 +919,39 @@ static int block_load(QEMUFile *f, void 
>> *opaque, int version_id)
>>                   nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
>>               }
>>
>> +            /* load dirty bitmaps */
>> +            if (flags & BLK_MIG_FLAG_HAS_BITMAPS) {
>> +                int i, nr_bitmaps = qemu_get_be32(f);
>> +
>> +                for (i = 0; i < nr_bitmaps; ++i) {
>> +                    char bitmap_name[256];
>> +                    BdrvDirtyBitmap *bitmap;
>> +                    int buf_size, len;
>> +
>> +                    len = qemu_get_byte(f);
>> +                    qemu_get_buffer(f, (uint8_t *)bitmap_name, len);
>> +                    bitmap_name[len] = '\0';
>> +
>> +                    bitmap = bdrv_find_dirty_bitmap(bs, bitmap_name);
>> +                    if (!bitmap) {
>> +                        fprintf(stderr, "Error unknown dirty bitmap\
>> +                                %s for block device %s\n",
>> +                                bitmap_name, device_name);
>> +                        return -EINVAL;
>> +                    }
>> +
>> +                    buf_size = bdrv_dbm_data_size(bitmap, nr_sectors);
>
> Oh, this is why you didn't store the length... Still, since it seems 
> as if you expect the bitmap to already exist on the destination, what 
> if the granularity or size is different? It might be nice to 
> explicitly state the size of the buffer -- one user misconfiguration 
> and you might blow the rest of the migration.
>
> This way, if the calculated size doesn't match the received size, you 
> have the chance to throw a meaningful error.
>
>> +                    buf = g_malloc(buf_size);
>> +                    qemu_get_buffer(f, buf, buf_size);
>> +                    bdrv_dbm_restore_data(bitmap, buf, addr, 
>> nr_sectors);
>> +                    g_free(buf);
>> +                }
>> +            }
>> +
>
> So with this approach, the user needs to have created the bitmaps 
> already? Would it be possible to create a mode where they don't, or am 
> I misreading something?
>
> We could re-create them on the fly whenever the offset is 0 on the 
> receiving side, the only other piece of information we'd really need 
> at that point is the granularity.
Agree.
>
>>               if (flags & BLK_MIG_FLAG_ZERO_BLOCK) {
>>                   ret = bdrv_write_zeroes(bs, addr, nr_sectors,
>>                                           BDRV_REQ_MAY_UNMAP);
>> -            } else {
>> +            } else if (flags & BLK_MIG_FLAG_HAS_BLOCK) {
>>                   buf = g_malloc(BLOCK_SIZE);
>>                   qemu_get_buffer(f, buf, BLOCK_SIZE);
>>                   ret = bdrv_write(bs, addr, buf, nr_sectors);
>> @@ -855,6 +986,7 @@ static void block_set_params(const 
>> MigrationParams *params, void *opaque)
>>   {
>>       block_mig_state.blk_enable = params->blk;
>>       block_mig_state.shared_base = params->shared;
>> +    block_mig_state.dbm_enable = params->dirty;
>>
>>       /* shared base means that blk_enable = 1 */
>>       block_mig_state.blk_enable |= params->shared;
>> @@ -862,7 +994,8 @@ static void block_set_params(const 
>> MigrationParams *params, void *opaque)
>>
>>   static bool block_is_active(void *opaque)
>>   {
>> -    return block_mig_state.blk_enable == 1;
>> +    return block_mig_state.blk_enable == 1 ||
>> +           block_mig_state.dbm_enable == 1;
>>   }
>>
>>   static SaveVMHandlers savevm_block_handlers = {
>> diff --git a/savevm.c b/savevm.c
>> index a598d1d..1299faa 100644
>> --- a/savevm.c
>> +++ b/savevm.c
>> @@ -983,6 +983,7 @@ int qemu_loadvm_state(QEMUFile *f)
>>           }
>>       }
>>
>> +    bdrv_dbm_restore_finish();
>
> Do we not have a cleaner way to call this? I guess we have no explicit 
> callback mechanisms for when a particular BDS is completely done, 
> because of the multiple passes over the data that we perform --
>
> Still, it might be nicer to have a callback for the whole of block 
> migration, and call bdrv_dbm_restore_finish() from there instead of 
> putting this implementation detail all the way up inside of 
> qemu_loadvm_state.
>
> Can we migrate an empty-data sentinel section from block migration 
> upon completion that qemu_loadvm_state launches a post_load method for 
> that can invoke bdrv_dbm_restore_finish for us?
>
> Alternatively, can this not be safely done inside of block_load? 
> somewhere around here:
>
> printf("Completed %d %%%c", (int)addr,
>                    (addr == 100) ? '\n' : '\r');
>             fflush(stdout);
>
> The "100%" completion flag is only written in block_save_complete(),
> so once we read this flag we should have necessary and sufficient 
> information to conclude that it's safe to do the BdrvDirtyBitmap 
> migration post-load stuff.
Good point, I think, I'll chose this approach.
>
>
> Of course, maybe generic section post-load callbacks are useful to 
> other devices, and I'd be fine with either approach.
>
> CCing migration people for opinions.
>
>>       cpu_synchronize_all_post_init();
>>
>>       ret = 0;
>>
>
>
> Thank you, and sorry it took me so long to digest the series!
> --John Snow
>
> P.S.: Happy New Year! I hope 2015 treats you well :~)


With all this story about not efficient migration in case of no block in 
blk, I think, that, may be, it would be better to refuse using block 
migration bitmap to migrate bitmaps, make additional dirty bitmaps for 
dirty bitmaps and migrate bitmaps in separate migration/dirty-bitmaps.c 
file (sorry for the pun).. This will additionally provide more precise 
migration in case of resetting the bitmap while migration in progress.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration
  2015-01-17 17:17     ` Vladimir Sementsov-Ogievskiy
@ 2015-01-20 17:25       ` John Snow
  0 siblings, 0 replies; 47+ messages in thread
From: John Snow @ 2015-01-20 17:25 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, Juan Q >> Juan Jose Quintela Carreira,
	gilbert >> Dr. David Alan Gilbert, stefanha, Amit Shah,
	den



On 01/17/2015 12:17 PM, Vladimir Sementsov-Ogievskiy wrote:
>
> Best regards,
> Vladimir
>
> On 09.01.2015 01:05, John Snow wrote:
>> CCing migration maintainers, feedback otherwise in-line.
>>
>> On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Just migrate parts of dirty bitmaps, corresponding to the block being
>>> migrated. Also, skip block migration if it is disabled (blk parameter
>>> of migrate command is false).
>>>
>>> Skipping shared sectors: bitmaps are migrated independently of this,
>>> just send blk without block data.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>>> ---
>>
>> In terms of general approach, migrating the dirty bitmap alongside the
>> blocks it describes makes sense when migrating both.
>>
>> Is this a lot of overhead when that's not the case, though? If we
>> utilize the "bitmap only" pathways added here, don't we iterate
>> through the savevm handlers a lot to only transfer very little data
>> per section?
>>
>> If we really need migration of bitmaps apart from the data they
>> describe, is it not worth developing a more condensed transfer
>> mechanism to get more bitmap data per iteration, instead of just one
>> block's worth?
> The first stage of migration can be easily optimized in this case - just
> take a larger bitmap pieces. For the second stage, it's not as simple.
> If we will take larger pieces on each step - we will just increase dirty
> data transfer. One approach is to maintain "dirty-region" - two numbers,
> representing the region of set bits in migration dirty bitmap, and send
> data only when this region is large enough.
>
>>
>>>   block-migration.c | 173
>>> +++++++++++++++++++++++++++++++++++++++++++++++-------
>>>   savevm.c          |   1 +
>>>   2 files changed, 154 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/block-migration.c b/block-migration.c
>>> index 908a66d..95d54a1 100644
>>> --- a/block-migration.c
>>> +++ b/block-migration.c
>>> @@ -32,6 +32,8 @@
>>>   #define BLK_MIG_FLAG_EOS                0x02
>>>   #define BLK_MIG_FLAG_PROGRESS           0x04
>>>   #define BLK_MIG_FLAG_ZERO_BLOCK         0x08
>>> +#define BLK_MIG_FLAG_HAS_BLOCK          0x10
>>> +#define BLK_MIG_FLAG_HAS_BITMAPS        0x20
>>>
>>
>> OK: As a result of allowing bitmaps to be migrated without the block
>> data itself, we now must acknowledge the concept that we can send
>> either block data, bitmaps, both, or neither -- so new defines are
>> added here to indicate what data can be found in the section following.
>>
>>>   #define MAX_IS_ALLOCATED_SEARCH 65536
>>>
>>> @@ -51,6 +53,8 @@ typedef struct BlkMigDevState {
>>>       int shared_base;
>>>       int64_t total_sectors;
>>>       QSIMPLEQ_ENTRY(BlkMigDevState) entry;
>>> +    int nr_bitmaps;
>>> +    BdrvDirtyBitmap **dirty_bitmaps;
>>>
>>>       /* Only used by migration thread.  Does not need a lock. */
>>>       int bulk_completed;
>>> @@ -64,6 +68,11 @@ typedef struct BlkMigDevState {
>>>       Error *blocker;
>>>   } BlkMigDevState;
>>>
>>> +typedef struct BlkMigDirtyBitmap {
>>> +    BdrvDirtyBitmap *bitmap;
>>> +    uint8_t *buf;
>>> +} BlkMigDirtyBitmap;
>>> +
>>>   typedef struct BlkMigBlock {
>>>       /* Only used by migration thread.  */
>>>       uint8_t *buf;
>>> @@ -74,6 +83,9 @@ typedef struct BlkMigBlock {
>>>       QEMUIOVector qiov;
>>>       BlockAIOCB *aiocb;
>>>
>>> +    int nr_bitmaps;
>>> +    BlkMigDirtyBitmap *dirty_bitmaps;
>>> +
>>>       /* Protected by block migration lock.  */
>>>       int ret;
>>>       QSIMPLEQ_ENTRY(BlkMigBlock) entry;
>>> @@ -83,6 +95,7 @@ typedef struct BlkMigState {
>>>       /* Written during setup phase.  Can be read without a lock.  */
>>>       int blk_enable;
>>>       int shared_base;
>>> +    int dbm_enable;
>>
>> Similar to the feedback in a previous patch, we may not want to use
>> 'dbm' to mean "Dirty Bitmap" because we have not been applying the
>> abbreviation consistently.
>>
>> For now, the recommendation from stefan is to use the full
>> "bdrv_dirty_bitmap" or "dirty_bitmap" in function names.
>>
>> If we do want an acronym to refer to this particular type of dirty
>> bitmap, we should apply it consistently to all functions that work
>> with the BdrvDirtyBitmap type.
>>
>> For now, "bdrv_dirty_bitmap_enable" should suffice, even though it's a
>> bit wordy.
> It's a flag, that enables the migration of dirty bitmaps, like
> blk_enable enables the migrtion of blocks.. Then, should it be
> "dirty_bitmap_migration_enable"? Isn't it too long for a simple flag
> variable?

Yeah, I see your point. I think the only difference is that "blk" is 
used very widely throughout the code, but this is the first occurrence 
of "dbm."

even just "bitmap_enable" would be sufficiently more self-evident than 
"dbm_enable," unless we rework all of the BdrvDirtyBitmap functions to 
use the "dbm" abbreviation.

I'd also be happy with "migrate_bitmaps" which is fairly 
self-explanatory, too.

>>
>>>       QSIMPLEQ_HEAD(bmds_list, BlkMigDevState) bmds_list;
>>>       int64_t total_sector_sum;
>>>       bool zero_blocks;
>>> @@ -116,27 +129,63 @@ static void blk_mig_unlock(void)
>>>   /* Only allocating and initializing structure fields, not copying
>>> any data. */
>>>
>>>   static BlkMigBlock *blk_create(BlkMigDevState *bmds, int64_t sector,
>>> -                                int nr_sectors)
>>> +                               int nr_sectors, bool only_bitmaps)
>>>   {
>>> +    int i;
>>>       BlkMigBlock *blk = g_new(BlkMigBlock, 1);
>>> -    blk->buf = g_malloc(BLOCK_SIZE);
>>> +    blk->buf = only_bitmaps ? NULL : g_malloc(BLOCK_SIZE);
>>>       blk->bmds = bmds;
>>>       blk->sector = sector;
>>>       blk->nr_sectors = nr_sectors;
>>> +    blk->dirty_bitmaps = NULL;
>>> +    blk->nr_bitmaps = 0;
>>>
>>>       blk->iov.iov_base = blk->buf;
>>>       blk->iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE;
>>>       qemu_iovec_init_external(&blk->qiov, &blk->iov, 1);
>>>
>>
>> We can probably skip this block if only_bitmaps is true.
> I agree.
>>
>>> +    if (block_mig_state.dbm_enable) {
>>> +        BlkMigDirtyBitmap *mig_bm;
>>> +
>>> +        blk->nr_bitmaps = bmds->nr_bitmaps;
>>> +        mig_bm = blk->dirty_bitmaps = g_new(BlkMigDirtyBitmap,
>>> + bmds->nr_bitmaps);
>>> +        for (i = 0; i < bmds->nr_bitmaps; ++i) {
>>> +            BdrvDirtyBitmap *bm = bmds->dirty_bitmaps[i];
>>> +            mig_bm->buf = g_malloc(bdrv_dbm_data_size(bm, nr_sectors));
>>> +            mig_bm->bitmap = bm;
>>> +            mig_bm++;
>>> +        }
>>> +    }
>>> +
>>
>> It's strange to me that we'd give it an "only bitmaps" boolean and
>> then rely on a different condition to populate them. Is it not worthy
>> of an error if we specify only_bitmaps when block_mig_state.dbm_enable
>> is false?
> only_bitmaps means: no blocks but only bitmaps. But there is another
> case: blocks with bitmaps, and in this case this "if ()" should execute.
>
> {only_bitmaps = true} when {block_mig_state.dbm_enable = false} - it's
> of course an error. I'll add assert() for this.
>>
>>>       return blk;
>>>   }
>>>
>>>   static void blk_free(BlkMigBlock *blk)
>>>   {
>>> +    int i;
>>>       g_free(blk->buf);
>>> +
>>> +    if (blk->dirty_bitmaps) {
>>> +        for (i = 0; i < blk->nr_bitmaps; ++i) {
>>> +            g_free(blk->dirty_bitmaps[i].buf);
>>> +        }
>>> +        g_free(blk->dirty_bitmaps);
>>> +    }
>>> +
>>>       g_free(blk);
>>>   }
>>>
>>> +static void blk_store_bitmaps(BlkMigBlock *blk)
>>> +{
>>> +    int i;
>>> +    for (i = 0; i < blk->nr_bitmaps; ++i) {
>>> +        bdrv_dbm_store_data(blk->dirty_bitmaps[i].bitmap,
>>> +                            blk->dirty_bitmaps[i].buf,
>>> +                            blk->sector, blk->nr_sectors);
>>> +    }
>>> +}
>>> +
>>
>> I am worried about the case in which blk->nr_sectors is not an
>> integral multiple of the bitmap sector granularity, for reasons
>> discussed in the previous patches.
>>
>> I'm not positive it IS a problem either, but maybe you have already
>> thought about this.
> Only allocated memory should be extended in the previous patch, as I've
> already proposed. As a general approach - it's ok, because we always
> calculate the real number of bit in bitmap from the virtual in the same
> way, therefore stored region will be restored in the same place.

OK, just checking.

>>
>>>   /* Must run outside of the iothread lock during the bulk phase,
>>>    * or the VM will stall.
>>>    */
>>> @@ -146,9 +195,15 @@ static void blk_send(QEMUFile *f, BlkMigBlock *
>>> blk)
>>>       int len;
>>>       uint64_t flags = BLK_MIG_FLAG_DEVICE_BLOCK;
>>>
>>> -    if (block_mig_state.zero_blocks &&
>>> +    if (block_mig_state.zero_blocks && blk->buf &&
>>>           buffer_is_zero(blk->buf, BLOCK_SIZE)) {
>>>           flags |= BLK_MIG_FLAG_ZERO_BLOCK;
>>> +    } else if (blk->buf) {
>>> +        flags |= BLK_MIG_FLAG_HAS_BLOCK;
>>> +    }
>>> +
>>> +    if (block_mig_state.dbm_enable) {
>>> +        flags |= BLK_MIG_FLAG_HAS_BITMAPS;
>>>       }
>>>
>>>       /* sector number and flags */
>>> @@ -160,10 +215,27 @@ static void blk_send(QEMUFile *f, BlkMigBlock *
>>> blk)
>>>       qemu_put_byte(f, len);
>>>       qemu_put_buffer(f, (uint8_t
>>> *)bdrv_get_device_name(blk->bmds->bs), len);
>>>
>>> +    /* dirty bitmaps */
>>> +    if (flags & BLK_MIG_FLAG_HAS_BITMAPS) {
>>> +        int i;
>>> +        qemu_put_be32(f, blk->nr_bitmaps);
>>> +        for (i = 0; i < blk->nr_bitmaps; ++i) {
>>> +            BdrvDirtyBitmap *bm = blk->dirty_bitmaps[i].bitmap;
>>> +            int buf_size = bdrv_dbm_data_size(bm, blk->nr_sectors);
>>> +            const char *name = bdrv_dirty_bitmap_name(bm);
>>> +            int len = strlen(name);
>>> +
>>> +            qemu_put_byte(f, len);
>>> +            qemu_put_buffer(f, (const uint8_t *)name, len);
>>
>> No length in the stream for the size of the dirty bitmap buffer?
>>
>>> +            qemu_put_buffer(f, blk->dirty_bitmaps[i].buf, buf_size);
>>> +        }
>>> +    }
>>> +
>>
>> Future patch idea: We can probably optimize this block by testing a
>> range of sectors on the bitmap and skipping bitmaps that don't have
>> data for this block.
>>
>> Since we populated the bitmaps previously, we should already have the
>> chance to have counted how many of them are nonempty, so we can use
>> that number instead of blk->nr_bitmaps.
>>
>> Not important for the series right now.
>>
>>>       /* if a block is zero we need to flush here since the network
>>>        * bandwidth is now a lot higher than the storage device
>>> bandwidth.
>>> -     * thus if we queue zero blocks we slow down the migration */
>>> -    if (flags & BLK_MIG_FLAG_ZERO_BLOCK) {
>>> +     * thus if we queue zero blocks we slow down the migration.
>>> +     * also, skip writing block when migrate only dirty bitmaps. */
>>> +    if (!(flags & BLK_MIG_FLAG_HAS_BLOCK)) {
>>
>> I'm not completely clear on the reasoning behind the original
>> codeblock here, but changing it from "Only when zeroes" to "Not when
>> we have data: Zeroes and Bitmaps" makes sense, since bitmap-only
>> sections are going to be very sparse too.
>>
>>>           qemu_fflush(f);
>>>           return;
>>>       }
>>
>> A little after this, there's a call to qemu_put_buffer(f, blk->buf,
>> BLOCK_SIZE), but we have the case where blk->buf may be NULL in bitmap
>> only cases. I think we need to guard against that.
> in this case (flags & BLK_MIG_FLAG_HAS_BLOCK) should be zero. However an
> assert here won't hurt.
>>
>>> @@ -282,13 +354,20 @@ static int mig_save_device_bulk(QEMUFile *f,
>>> BlkMigDevState *bmds)
>>>       BlockDriverState *bs = bmds->bs;
>>>       BlkMigBlock *blk;
>>>       int nr_sectors;
>>> +    bool skip_chunk = false;
>>>
>>>       if (bmds->shared_base) {
>>>           qemu_mutex_lock_iothread();
>>> -        while (cur_sector < total_sectors &&
>>> -               !bdrv_is_allocated(bs, cur_sector,
>>> MAX_IS_ALLOCATED_SEARCH,
>>> -                                  &nr_sectors)) {
>>> -            cur_sector += nr_sectors;
>>> +        if (block_mig_state.dbm_enable) {
>>> +            bdrv_is_allocated(bs, cur_sector,
>>> BDRV_SECTORS_PER_DIRTY_CHUNK,
>>> +                              &nr_sectors);
>>> +            skip_chunk = nr_sectors >= BDRV_SECTORS_PER_DIRTY_CHUNK;
>>> +        } else {
>>> +            while (cur_sector < total_sectors &&
>>> +                   !bdrv_is_allocated(bs, cur_sector,
>>> MAX_IS_ALLOCATED_SEARCH,
>>> +                                      &nr_sectors)) {
>>> +                cur_sector += nr_sectors;
>>> +            }
>>
>> OK, so the approach taken here is that if bitmaps are enabled, we
>> check only to see if this current chunk is allocated or not. If it
>> isn't, we declare this a "bitmap only" chunk and set skip_chunk to true.
>>
>> The else clause taken implies no bitmap data, because either
>> dbm_enable or blk_enable (or both) MUST be set for us to be here.
>>
>> (1) Why are you not checking the return value of bdrv_is_allocated? It
>> could return either true or false AND nr_sectors ==
>> BDRV_SECTORS_PER_DIRTY_CHUNK, provided it was entirely allocated or
>> entirely unallocated, so I think this condition is not working as you
>> intend it to.
> Yes, it's my mistake.
>>
>> (2) This seems slightly hackey in a sparse or no-data situation. We
>> will transmit many small data sections, each containing a chunk of
>> bitmap data, one at a time, with no data interspersed -- which leaves
>> a high metadata-to-data ratio in these cases.
>>
>> Would it be an involved process to alter the nr_sectors count to be
>> something that isn't a single sector in the (dbm_enable &&
>> !blk_enable) case? That way we can spin until we find data worth
>> transferring and transfer all of the bitmap metadata in a larger chunk
>> all at once.
>>
>> Whatever approach you choose, it might be nice to have a comment
>> in-line explaining the general approach.
>>
>>>           }
>>>           qemu_mutex_unlock_iothread();
>>>       }
>>> @@ -309,7 +388,8 @@ static int mig_save_device_bulk(QEMUFile *f,
>>> BlkMigDevState *bmds)
>>>           nr_sectors = total_sectors - cur_sector;
>>>       }
>>>
>>> -    blk = blk_create(bmds, cur_sector, nr_sectors);
>>> +    blk = blk_create(bmds, cur_sector, nr_sectors,
>>> +                     skip_chunk || !block_mig_state.blk_enable);
>>>
>>>       blk_mig_lock();
>>>       block_mig_state.submitted++;
>>> @@ -317,8 +397,16 @@ static int mig_save_device_bulk(QEMUFile *f,
>>> BlkMigDevState *bmds)
>>>
>>>       bdrv_reset_dirty_bitmap(bs, bmds->dirty_bitmap, cur_sector,
>>> nr_sectors);
>>>
>>> -    blk->aiocb = bdrv_aio_readv(bs, cur_sector, &blk->qiov,
>>> -                                nr_sectors, blk_mig_read_cb, blk);
>>> +    if (block_mig_state.dbm_enable) {
>>> +        blk_store_bitmaps(blk);
>>> +    }
>>> +
>>> +    if (block_mig_state.blk_enable && !skip_chunk) {
>>> +        blk->aiocb = bdrv_aio_readv(bs, cur_sector, &blk->qiov,
>>> +                                    nr_sectors, blk_mig_read_cb, blk);
>>> +    } else if (block_mig_state.dbm_enable) {
>>> +        blk_mig_read_cb(blk, 0);
>>> +    }
>>>
>>>       bmds->cur_sector = cur_sector + nr_sectors;
>>>       return (bmds->cur_sector >= total_sectors);
>>> @@ -403,6 +491,8 @@ static void init_blk_migration(QEMUFile *f)
>>>               DPRINTF("Start full migration for %s\n",
>>> bdrv_get_device_name(bs));
>>>           }
>>>
>>> +        bmds->dirty_bitmaps = bdrv_dbm_find_all_named(bs,
>>> &bmds->nr_bitmaps);
>>> +
>>
>> At this point, we should have block_mig_state.dbm_enable set (or
>> disabled), so we can skip this initialization if it is not set.
> Agree.
>>
>>> QSIMPLEQ_INSERT_TAIL(&block_mig_state.bmds_list, bmds, entry);
>>>       }
>>>   }
>>> @@ -481,20 +571,32 @@ static int mig_save_device_dirty(QEMUFile *f,
>>> BlkMigDevState *bmds,
>>>               } else {
>>>                   nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
>>>               }
>>> -            blk = blk_create(bmds, sector, nr_sectors);
>>> +            blk = blk_create(bmds, sector, nr_sectors,
>>> +                             !block_mig_state.blk_enable);
>>> +
>>> +            if (block_mig_state.dbm_enable) {
>>> +                blk_store_bitmaps(blk);
>>> +            }
>>>
>>>               if (is_async) {
>>> -                blk->aiocb = bdrv_aio_readv(bmds->bs, sector,
>>> &blk->qiov,
>>> -                                            nr_sectors,
>>> blk_mig_read_cb, blk);
>>> +                if (block_mig_state.blk_enable) {
>>> +                    blk->aiocb = bdrv_aio_readv(bmds->bs, sector,
>>> &blk->qiov,
>>> +                                                nr_sectors,
>>> blk_mig_read_cb,
>>> +                                                blk);
>>> +                } else {
>>> +                    blk_mig_read_cb(blk, 0);
>>> +                }
>>>
>>>                   blk_mig_lock();
>>>                   block_mig_state.submitted++;
>>>                   bmds_set_aio_inflight(bmds, sector, nr_sectors, 1);
>>>                   blk_mig_unlock();
>>>               } else {
>>> -                ret = bdrv_read(bmds->bs, sector, blk->buf,
>>> nr_sectors);
>>> -                if (ret < 0) {
>>> -                    goto error;
>>> +                if (block_mig_state.blk_enable) {
>>> +                    ret = bdrv_read(bmds->bs, sector, blk->buf,
>>> nr_sectors);
>>> +                    if (ret < 0) {
>>> +                        goto error;
>>> +                    }
>>>                   }
>>>                   blk_send(f, blk);
>>>
>>> @@ -817,10 +919,39 @@ static int block_load(QEMUFile *f, void
>>> *opaque, int version_id)
>>>                   nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
>>>               }
>>>
>>> +            /* load dirty bitmaps */
>>> +            if (flags & BLK_MIG_FLAG_HAS_BITMAPS) {
>>> +                int i, nr_bitmaps = qemu_get_be32(f);
>>> +
>>> +                for (i = 0; i < nr_bitmaps; ++i) {
>>> +                    char bitmap_name[256];
>>> +                    BdrvDirtyBitmap *bitmap;
>>> +                    int buf_size, len;
>>> +
>>> +                    len = qemu_get_byte(f);
>>> +                    qemu_get_buffer(f, (uint8_t *)bitmap_name, len);
>>> +                    bitmap_name[len] = '\0';
>>> +
>>> +                    bitmap = bdrv_find_dirty_bitmap(bs, bitmap_name);
>>> +                    if (!bitmap) {
>>> +                        fprintf(stderr, "Error unknown dirty bitmap\
>>> +                                %s for block device %s\n",
>>> +                                bitmap_name, device_name);
>>> +                        return -EINVAL;
>>> +                    }
>>> +
>>> +                    buf_size = bdrv_dbm_data_size(bitmap, nr_sectors);
>>
>> Oh, this is why you didn't store the length... Still, since it seems
>> as if you expect the bitmap to already exist on the destination, what
>> if the granularity or size is different? It might be nice to
>> explicitly state the size of the buffer -- one user misconfiguration
>> and you might blow the rest of the migration.
>>
>> This way, if the calculated size doesn't match the received size, you
>> have the chance to throw a meaningful error.
>>
>>> +                    buf = g_malloc(buf_size);
>>> +                    qemu_get_buffer(f, buf, buf_size);
>>> +                    bdrv_dbm_restore_data(bitmap, buf, addr,
>>> nr_sectors);
>>> +                    g_free(buf);
>>> +                }
>>> +            }
>>> +
>>
>> So with this approach, the user needs to have created the bitmaps
>> already? Would it be possible to create a mode where they don't, or am
>> I misreading something?
>>
>> We could re-create them on the fly whenever the offset is 0 on the
>> receiving side, the only other piece of information we'd really need
>> at that point is the granularity.
> Agree.

The only problem is if we re-send the offset @ 0 a second time, or if we 
skip sending zero sectors and we never receive that block.

Still, there's probably an elegant way to achieve what we want. Maybe a 
pre-amble migration block, or arbitrarily the first sector we happen to 
transmit carries an extra flag with it that describes the granularity of 
the bitmap, or ... something.

>>
>>>               if (flags & BLK_MIG_FLAG_ZERO_BLOCK) {
>>>                   ret = bdrv_write_zeroes(bs, addr, nr_sectors,
>>>                                           BDRV_REQ_MAY_UNMAP);
>>> -            } else {
>>> +            } else if (flags & BLK_MIG_FLAG_HAS_BLOCK) {
>>>                   buf = g_malloc(BLOCK_SIZE);
>>>                   qemu_get_buffer(f, buf, BLOCK_SIZE);
>>>                   ret = bdrv_write(bs, addr, buf, nr_sectors);
>>> @@ -855,6 +986,7 @@ static void block_set_params(const
>>> MigrationParams *params, void *opaque)
>>>   {
>>>       block_mig_state.blk_enable = params->blk;
>>>       block_mig_state.shared_base = params->shared;
>>> +    block_mig_state.dbm_enable = params->dirty;
>>>
>>>       /* shared base means that blk_enable = 1 */
>>>       block_mig_state.blk_enable |= params->shared;
>>> @@ -862,7 +994,8 @@ static void block_set_params(const
>>> MigrationParams *params, void *opaque)
>>>
>>>   static bool block_is_active(void *opaque)
>>>   {
>>> -    return block_mig_state.blk_enable == 1;
>>> +    return block_mig_state.blk_enable == 1 ||
>>> +           block_mig_state.dbm_enable == 1;
>>>   }
>>>
>>>   static SaveVMHandlers savevm_block_handlers = {
>>> diff --git a/savevm.c b/savevm.c
>>> index a598d1d..1299faa 100644
>>> --- a/savevm.c
>>> +++ b/savevm.c
>>> @@ -983,6 +983,7 @@ int qemu_loadvm_state(QEMUFile *f)
>>>           }
>>>       }
>>>
>>> +    bdrv_dbm_restore_finish();
>>
>> Do we not have a cleaner way to call this? I guess we have no explicit
>> callback mechanisms for when a particular BDS is completely done,
>> because of the multiple passes over the data that we perform --
>>
>> Still, it might be nicer to have a callback for the whole of block
>> migration, and call bdrv_dbm_restore_finish() from there instead of
>> putting this implementation detail all the way up inside of
>> qemu_loadvm_state.
>>
>> Can we migrate an empty-data sentinel section from block migration
>> upon completion that qemu_loadvm_state launches a post_load method for
>> that can invoke bdrv_dbm_restore_finish for us?
>>
>> Alternatively, can this not be safely done inside of block_load?
>> somewhere around here:
>>
>> printf("Completed %d %%%c", (int)addr,
>>                    (addr == 100) ? '\n' : '\r');
>>             fflush(stdout);
>>
>> The "100%" completion flag is only written in block_save_complete(),
>> so once we read this flag we should have necessary and sufficient
>> information to conclude that it's safe to do the BdrvDirtyBitmap
>> migration post-load stuff.
> Good point, I think, I'll chose this approach.
>>
>>
>> Of course, maybe generic section post-load callbacks are useful to
>> other devices, and I'd be fine with either approach.
>>
>> CCing migration people for opinions.
>>
>>>       cpu_synchronize_all_post_init();
>>>
>>>       ret = 0;
>>>
>>
>>
>> Thank you, and sorry it took me so long to digest the series!
>> --John Snow
>>
>> P.S.: Happy New Year! I hope 2015 treats you well :~)
>
>
> With all this story about not efficient migration in case of no block in
> blk, I think, that, may be, it would be better to refuse using block
> migration bitmap to migrate bitmaps, make additional dirty bitmaps for
> dirty bitmaps and migrate bitmaps in separate migration/dirty-bitmaps.c
> file (sorry for the pun).. This will additionally provide more precise
> migration in case of resetting the bitmap while migration in progress.
>

Yeah, I tried to review this patch assuming we'd use this technique, 
from a correctness standpoint. I am relying on other reviewers with more 
knowledge of the migration layer to critique the general approach.

I think Paolo had some feedback on that front, but I haven't looked at 
the code much to see how much I agree with his suggestion yet :)

Thanks!
--John

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2015-01-20 17:25 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-11 14:17 [Qemu-devel] [PATCH 0/8] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
2014-12-11 14:17 ` [Qemu-devel] [PATCH 1/9] block: rename bdrv_reset_dirty_bitmap Vladimir Sementsov-Ogievskiy
2015-01-08 21:19   ` John Snow
2014-12-11 14:17 ` [Qemu-devel] [PATCH 2/9] block-migration: fix pending() return value Vladimir Sementsov-Ogievskiy
2015-01-08 21:20   ` John Snow
2015-01-09 19:01     ` Dr. David Alan Gilbert
2014-12-11 14:17 ` [Qemu-devel] [PATCH 3/9] block: fix spoiling all dirty bitmaps by mirror and migration Vladimir Sementsov-Ogievskiy
2015-01-08 21:20   ` John Snow
2014-12-11 14:17 ` [Qemu-devel] [PATCH 4/9] hbitmap: store / restore Vladimir Sementsov-Ogievskiy
2015-01-08 21:21   ` John Snow
2015-01-08 21:37     ` Paolo Bonzini
2015-01-13 12:59     ` Vladimir Sementsov-Ogievskiy
2015-01-13 17:08       ` John Snow
2015-01-14 10:29         ` Vladimir Sementsov-Ogievskiy
2014-12-11 14:17 ` [Qemu-devel] [PATCH 5/9] block: BdrvDirtyBitmap store/restore interface Vladimir Sementsov-Ogievskiy
2015-01-08 21:22   ` John Snow
2015-01-14 11:27   ` Vladimir Sementsov-Ogievskiy
2014-12-11 14:17 ` [Qemu-devel] [PATCH 6/9] block-migration: tiny refactoring Vladimir Sementsov-Ogievskiy
2015-01-08 21:23   ` John Snow
2015-01-14 12:26     ` Vladimir Sementsov-Ogievskiy
2015-01-14 16:53       ` John Snow
2014-12-11 14:17 ` [Qemu-devel] [PATCH 7/9] block-migration: remove not needed iothread lock Vladimir Sementsov-Ogievskiy
2015-01-08 21:24   ` John Snow
2015-01-16 12:54     ` Vladimir Sementsov-Ogievskiy
2015-01-08 22:28   ` Paolo Bonzini
2015-01-16 13:03     ` Vladimir Sementsov-Ogievskiy
2014-12-11 14:17 ` [Qemu-devel] [PATCH 8/9] migration: add dirty parameter Vladimir Sementsov-Ogievskiy
2014-12-11 15:18   ` Eric Blake
2014-12-15  8:33     ` Vladimir Sementsov-Ogievskiy
2015-01-08 21:51   ` John Snow
2015-01-08 22:29     ` Eric Blake
2015-01-08 22:31       ` John Snow
2015-01-08 22:37     ` Paolo Bonzini
2014-12-11 14:17 ` [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration Vladimir Sementsov-Ogievskiy
2015-01-08 22:05   ` John Snow
2015-01-17 17:17     ` Vladimir Sementsov-Ogievskiy
2015-01-20 17:25       ` John Snow
2015-01-08 22:36   ` Paolo Bonzini
2015-01-08 22:45     ` Eric Blake
2015-01-08 22:49       ` Paolo Bonzini
2015-01-12 14:20     ` Vladimir Sementsov-Ogievskiy
2015-01-12 14:42       ` Paolo Bonzini
2015-01-12 16:48   ` Paolo Bonzini
2015-01-12 17:31     ` John Snow
2015-01-12 19:09       ` Paolo Bonzini
2015-01-13  9:16         ` Vladimir Sementsov-Ogievskiy
2015-01-13 16:35           ` John Snow

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.