All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Snow <jsnow@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>,
	qemu-devel@nongnu.org
Cc: kwolf@redhat.com, Peter Maydell <peter.maydell@linaro.org>,
	"Juan quin >> Juan Jose Quintela Carreira" <quintela@redhat.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	stefanha@redhat.com, den@openvz.org,
	amit Shah <amit.shah@redhat.com>,
	pbonzini@redhat.com
Subject: Re: [Qemu-devel] [PATCH RFC v2 8/8] migration: add migration/dirty-bitmap.c
Date: Mon, 16 Feb 2015 13:18:31 -0500	[thread overview]
Message-ID: <54E23477.9040801@redhat.com> (raw)
In-Reply-To: <54E1DD49.3050102@parallels.com>



On 02/16/2015 07:06 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 13.02.2015 23:22, John Snow wrote:
>>
>>
>> On 02/13/2015 03:19 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> On 11.02.2015 00:33, John Snow wrote:
>>>> Peter Maydell: What's the right way to license a file as copied from a
>>>> previous version? See below, please;
>>>>
>>>> Max, Markus: ctrl+f "bdrv_get_device_name" and let me know what you
>>>> think, if you would.
>>>>
>>>> Juan, Amit, David: Copying migration maintainers.
>>>>
>>>> On 01/27/2015 05:56 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>>> Live migration of dirty bitmaps. Only named dirty bitmaps are
>>>>> migrated.
>>>>> If destination qemu is already containing a dirty bitmap with the same
>>>>> name as a migrated bitmap, then their granularities should be the
>>>>> same,
>>>>> otherwise the error will be generated. If destination qemu doesn't
>>>>> contain such bitmap it will be created.
>>>>>
>>>>> format:
>>>>>
>>>>> 1 byte: flags
>>>>>
>>>>> [ 1 byte: node name size ] \  flags & DEVICE_NAME
>>>>> [ n bytes: node name     ] /
>>>>>
>>>>> [ 1 byte: bitmap name size ]       \
>>>>> [ n bytes: bitmap name     ]       | flags & BITMAP_NAME
>>>>> [ [ be64: granularity    ] ]  flags & GRANULARITY
>>>>>
>>>>> [ 1 byte: bitmap enabled bit ] flags & ENABLED
>>>>>
>>>>> [ be64: start sector      ] \ flags & (NORMAL_CHUNK | ZERO_CHUNK)
>>>>> [ be32: number of sectors ] /
>>>>>
>>>>> [ be64: buffer size ] \ flags & NORMAL_CHUNK
>>>>> [ n bytes: buffer   ] /
>>>>>
>>>>> The last chunk should contain flags & EOS. The chunk may skip device
>>>>> and/or bitmap names, assuming them to be the same with the previous
>>>>> chunk. GRANULARITY is sent with the first chunk for the bitmap.
>>>>> ENABLED
>>>>> bit is sent in the end of "complete" stage of migration. So when
>>>>> destination gets ENABLED flag it should deserialize_finish the bitmap
>>>>> and set its enabled bit to corresponding value.
>>>>>
>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>>>>> ---
>>>>>   include/migration/block.h |   1 +
>>>>>   migration/Makefile.objs   |   2 +-
>>>>>   migration/dirty-bitmap.c  | 606
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++
>>>>>   vl.c                      |   1 +
>>>>>   4 files changed, 609 insertions(+), 1 deletion(-)
>>>>>   create mode 100644 migration/dirty-bitmap.c
>>>>>
>>>>> diff --git a/include/migration/block.h b/include/migration/block.h
>>>>> index ffa8ac0..566bb9f 100644
>>>>> --- a/include/migration/block.h
>>>>> +++ b/include/migration/block.h
>>>>> @@ -14,6 +14,7 @@
>>>>>   #ifndef BLOCK_MIGRATION_H
>>>>>   #define BLOCK_MIGRATION_H
>>>>>
>>>>> +void dirty_bitmap_mig_init(void);
>>>>>   void blk_mig_init(void);
>>>>>   int blk_mig_active(void);
>>>>>   uint64_t blk_mig_bytes_transferred(void);
>>>>
>>>> OK.
>>>>
>>>>> diff --git a/migration/Makefile.objs b/migration/Makefile.objs
>>>>> index d929e96..9adfda9 100644
>>>>> --- a/migration/Makefile.objs
>>>>> +++ b/migration/Makefile.objs
>>>>> @@ -6,5 +6,5 @@ common-obj-y += xbzrle.o
>>>>>   common-obj-$(CONFIG_RDMA) += rdma.o
>>>>>   common-obj-$(CONFIG_POSIX) += exec.o unix.o fd.o
>>>>>
>>>>> -common-obj-y += block.o
>>>>> +common-obj-y += block.o dirty-bitmap.o
>>>>>
>>>>
>>>> OK.
>>>>
>>>>> diff --git a/migration/dirty-bitmap.c b/migration/dirty-bitmap.c
>>>>> new file mode 100644
>>>>> index 0000000..8621218
>>>>> --- /dev/null
>>>>> +++ b/migration/dirty-bitmap.c
>>>>> @@ -0,0 +1,606 @@
>>>>> +/*
>>>>> + * QEMU dirty bitmap migration
>>>>> + *
>>>>> + * derived from migration/block.c
>>>>> + *
>>>>> + * Author:
>>>>> + * Sementsov-Ogievskiy Vladimir <vsementsov@parallels.com>
>>>>> + *
>>>>> + * original copyright message:
>>>>> + *
>>>>> =====================================================================
>>>>> + * Copyright IBM, Corp. 2009
>>>>> + *
>>>>> + * Authors:
>>>>> + *  Liran Schour <lirans@il.ibm.com>
>>>>> + *
>>>>> + * This work is licensed under the terms of the GNU GPL, version
>>>>> 2. See
>>>>> + * the COPYING file in the top-level directory.
>>>>> + *
>>>>> + * Contributions after 2012-01-13 are licensed under the terms of the
>>>>> + * GNU GPL, version 2 or (at your option) any later version.
>>>>> + *
>>>>> =====================================================================
>>>>> + */
>>>>> +
>>>>
>>>> Not super familiar with the right way to do licensing here; it's
>>>> possible you may not need to copy the original here, but I'm not sure.
>>>> You will want to make it clear what license applies to /your/ work, I
>>>> think. Maybe Peter Maydell can clue us in.
>>>>
>>>>> +#include "block/block.h"
>>>>> +#include "qemu/main-loop.h"
>>>>> +#include "qemu/error-report.h"
>>>>> +#include "migration/block.h"
>>>>> +#include "migration/migration.h"
>>>>> +#include "qemu/hbitmap.h"
>>>>> +#include <assert.h>
>>>>> +
>>>>> +#define CHUNK_SIZE                       (1 << 20)
>>>>> +
>>>>> +#define DIRTY_BITMAP_MIG_FLAG_EOS           0x01
>>>>> +#define DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK  0x02
>>>>> +#define DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK    0x04
>>>>> +#define DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME   0x08
>>>>> +#define DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME   0x10
>>>>> +#define DIRTY_BITMAP_MIG_FLAG_GRANULARITY   0x20
>>>>> +#define DIRTY_BITMAP_MIG_FLAG_ENABLED       0x40
>>>>> +/* flags should be <= 0xff */
>>>>> +
>>>>
>>>> We should give ourselves a little breathing room with the flags, since
>>>> we've only got room for one more.
>>> Ok. Will one more byte be enough?
>>
>> I should hope so. If you do add a completion chunk and flag, that
>> fills up the first byte completely, so having a second byte is a good
>> idea.
>>
>> I might recommend reserving the last bit of the second byte to be a
>> flag such as DIRTY_BITMAP_EXTRA_FLAGS that indicates the presence of
>> additional byte(s) of flags, to be determined later, if we ever need
>> them, but two bytes for now should be sufficient.
> Ok.
>>
>>>>
>>>>> +/* #define DEBUG_DIRTY_BITMAP_MIGRATION */
>>>>> +
>>>>> +#ifdef DEBUG_DIRTY_BITMAP_MIGRATION
>>>>> +#define DPRINTF(fmt, ...) \
>>>>> +    do { printf("dirty_migration: " fmt, ## __VA_ARGS__); } while (0)
>>>>> +#else
>>>>> +#define DPRINTF(fmt, ...) \
>>>>> +    do { } while (0)
>>>>> +#endif
>>>>> +
>>>>> +typedef struct DirtyBitmapMigBitmapState {
>>>>> +    /* Written during setup phase. */
>>>>> +    BlockDriverState *bs;
>>>>> +    BdrvDirtyBitmap *bitmap;
>>>>> +    HBitmap *dirty_bitmap;
>>>>
>>>> For my own sanity, I'd really prefer "bitmap" and "meta_bitmap" here;
>>>> "dirty_bitmap" is often used as a synonym (outside of this file) to
>>>> refer to the BdrvDirtyBitmap in general, so it's usage here can be
>>>> somewhat confusing.
>>>>
>>>> I'd appreciate "dirty_dirty_bitmap" as in your previous patch for
>>>> consistency, or "meta_bitmap" as I recommend.
>>>>
>>> Ok
>>>>> +    int64_t total_sectors;
>>>>> +    uint64_t sectors_per_chunk;
>>>>> +    QSIMPLEQ_ENTRY(DirtyBitmapMigBitmapState) entry;
>>>>> +
>>>>> +    /* For bulk phase. */
>>>>> +    bool bulk_completed;
>>>>> +    int64_t cur_sector;
>>>>> +    bool granularity_sent;
>>>>> +
>>>>> +    /* For dirty phase. */
>>>>> +    int64_t cur_dirty;
>>>>> +} DirtyBitmapMigBitmapState;
>>>>> +
>>>>> +typedef struct DirtyBitmapMigState {
>>>>> +    int migration_enable;
>>>>> +    QSIMPLEQ_HEAD(dbms_list, DirtyBitmapMigBitmapState) dbms_list;
>>>>> +
>>>>> +    bool bulk_completed;
>>>>> +
>>>>> +    /* for send_bitmap() */
>>>>> +    BlockDriverState *prev_bs;
>>>>> +    BdrvDirtyBitmap *prev_bitmap;
>>>>> +} DirtyBitmapMigState;
>>>>> +
>>>>> +static DirtyBitmapMigState dirty_bitmap_mig_state;
>>>>> +
>>>>> +/* read name from qemu file:
>>>>> + * format:
>>>>> + * 1 byte : len = name length (<256)
>>>>> + * len bytes : name without last zero byte
>>>>> + *
>>>>> + * name should point to the buffer >= 256 bytes length
>>>>> + */
>>>>> +static char *qemu_get_name(QEMUFile *f, char *name)
>>>>> +{
>>>>> +    int len = qemu_get_byte(f);
>>>>> +    qemu_get_buffer(f, (uint8_t *)name, len);
>>>>> +    name[len] = '\0';
>>>>> +
>>>>> +    DPRINTF("get name: %d %s\n", len, name);
>>>>> +
>>>>> +    return name;
>>>>> +}
>>>>> +
>>>>
>>>> OK. Maybe these could be "qemu_put_string" or "qemu_get_string" and
>>>> added to qemu-file.c so others can use them.
>>> If no objections for sharing this format, I'll do it.
>>>>
>>>>> +/* write name to qemu file:
>>>>> + * format:
>>>>> + * same as for qemu_get_name
>>>>> + *
>>>>> + * maximum name length is 255
>>>>> + */
>>>>> +static void qemu_put_name(QEMUFile *f, const char *name)
>>>>> +{
>>>>> +    int len = strlen(name);
>>>>> +
>>>>> +    DPRINTF("put name: %d %s\n", len, name);
>>>>> +
>>>>> +    assert(len < 256);
>>>>> +    qemu_put_byte(f, len);
>>>>> +    qemu_put_buffer(f, (const uint8_t *)name, len);
>>>>> +}
>>>>> +
>>>>
>>>> OK.
>>>>
>>>>> +static void send_bitmap(QEMUFile *f, DirtyBitmapMigBitmapState *dbms,
>>>>> +                        uint64_t start_sector, uint32_t nr_sectors)
>>>>> +{
>>>>> +    BlockDriverState *bs = dbms->bs;
>>>>> +    BdrvDirtyBitmap *bitmap = dbms->bitmap;
>>>>> +    uint8_t flags = 0;
>>>>> +    /* align for buffer_is_zero() */
>>>>> +    uint64_t align = 4 * sizeof(long);
>>>>> +    uint64_t buf_size =
>>>>> +        (bdrv_dirty_bitmap_data_size(bitmap, nr_sectors) + align -
>>>>> 1) &
>>>>> +        ~(align - 1);
>>>>> +    uint8_t *buf = g_malloc0(buf_size);
>>>>> +
>>>>> +    bdrv_dirty_bitmap_serialize_part(bitmap, buf, start_sector,
>>>>> nr_sectors);
>>>>> +
>>>>> +    if (buffer_is_zero(buf, buf_size)) {
>>>>> +        g_free(buf);
>>>>> +        buf = NULL;
>>>>> +        flags |= DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK;
>>>>> +    } else {
>>>>> +        flags |= DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK;
>>>>> +    }
>>>>> +
>>>>> +    if (bs != dirty_bitmap_mig_state.prev_bs) {
>>>>> +        dirty_bitmap_mig_state.prev_bs = bs;
>>>>> +        flags |= DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME;
>>>>> +    }
>>>>> +
>>>>> +    if (bitmap != dirty_bitmap_mig_state.prev_bitmap) {
>>>>> +        dirty_bitmap_mig_state.prev_bitmap = bitmap;
>>>>> +        flags |= DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME;
>>>>> +    }
>>>>
>>>> OK, so we use the current bs/bitmap under consideration to detect if
>>>> we have switched context, and put the names in the stream when it
>>>> happens. OK.
>>>>
>>>>> +
>>>>> +    if (dbms->granularity_sent == 0) {
>>>>> +        dbms->granularity_sent = 1;
>>>>> +        flags |= DIRTY_BITMAP_MIG_FLAG_GRANULARITY;
>>>>> +    }
>>>>> +
>>>>> +    DPRINTF("Enter send_bitmap"
>>>>> +            "\n   flags:        %x"
>>>>> +            "\n   start_sector: %" PRIu64
>>>>> +            "\n   nr_sectors:   %" PRIu32
>>>>> +            "\n   data_size:    %" PRIu64 "\n",
>>>>> +            flags, start_sector, nr_sectors, buf_size);
>>>>> +
>>>>> +    qemu_put_byte(f, flags);
>>>>> +
>>>>> +    if (flags & DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME) {
>>>>> +        qemu_put_name(f, bdrv_get_device_name(bs));
>>>>> +    }
>>>>
>>>> I am still not fully clear on this myself, but I think we are phasing
>>>> out bdrv_get_device_name. In the context of bitmaps, we are mostly
>>>> likely using them one-per-tree, but they /can/ be attached
>>>> one-per-node, so we shouldn't be trying to get the device name here,
>>>> but rather, the node-name.
>>>>
>>>> I /think/ we may want to be using bdrv_get_node_name, but I think it
>>>> is not currently true that all nodes WILL be named ... I am CC'ing
>>>> Markus Armbruster and Max Reitz for (A) A refresher course and (B)
>>>> Opinions on what function call might make sense here, given that we
>>>> wish to migrate bitmaps attached to arbitrary nodes.
>>> Hmm.. I'm not familiar with hierarchy of nodes and devices. As I
>>> understand, both command_line- and qmp- created drives are created
>>> through blockdev_init, which creates both blk(device) and bs(node)
>>> through blk_new_with_bs.. Am I right? Also, bdrv_get_device_name is used
>>> in migration/block.c.
>>
>> Now that I'm more awake, here's a better rundown of what's going on:
>>
>> It's something that is a little bit in flux right now, unfortunately.
>> We're trying to transition to a format where we have arbitrarily
>> complex Block trees, where the root of the tree is always a
>> BlockBackend (See the big series by Max Reitz) and the configuration
>> of the tree may become arbitrarily complex.
>>
>> Simple trees may consist of just one BlockBackend and one
>> BlockDriverState node, where I think we can refer to this BDS as the
>> "root node," not to be confused with the BlockBackend "root." The
>> BlockBackend is a relatively new invention, so it isn't actually used
>> consistently everywhere yet.
>>
>> In the future, we may have commands that make distinctions based on if
>> you want to work on the BlockBackend, the root node under the
>> blockbackend associated with a BDS, only the explicit node/BDS you
>> identify, or some combination of the above semantics.
>>
>> As of right now, bitmaps can be *attached* to any arbitrary node,
>> though they are currently only *useful* when attached to the first
>> child of the BlockBackend, the root node. It's only useful currently
>> in cases where it is attached to the root because I've only proposed
>> patches for adding bitmap support to produce incrementals for Drive
>> Backup, which operates only on drives/devices (the root node of a tree.)
>>
>> However, in the context of migrating, it could be that we want to
>> migrate any bitmaps attached to /any/ nodes, so we should be careful
>> about what names we are pulling - we don't necessarily want the name
>> of the root node or BlockBackend, we may want the BDS and accompanying
>> name of strictly the node the bitmap is attached to.
>>
>> I know other areas of the code don't provide a good example for this
>> distinction, yet, but the block layer people are actively working on
>> fixing that. (See also the back-and-forth reviews for what to name my
>> QMP parameters in the incremental backup patches for some overview of
>> this semantic transition.)
>>
>> That said, We should think carefully about *which* name we want to put
>> in the stream and what implications it has for migration.
>>
>>
>> (1) bdrv_get_node_name and bdrv_find_node
>>
>> This would migrate bitmaps as attached to their specific BDS. This
>> would mean that the node layout on the destination is either
>> identical, or similar enough such that no named bitmaps are attached
>> to a node not present on the destination.
>>
>> This gives us precision: bitmaps may be attached lower in the tree and
>> can provide more fine-grained detail for which layers have been
>> changed or modified during runtime.
>>
>> This also gives us fragility: In cases where we transfer, say, a
>> complex tree of nodes and collapse it to a single destination drive,
>> we'd be unable to migrate bitmaps not attached to the root along with
>> it, because they'd have nowhere meaningful to attach.
>>
>> It is perhaps somewhat unneccessary at this exact moment in time, as
>> well, because bitmaps are currently only useful on root nodes.
>>
>> (2) bdrv_get_device_name and bdrv_lookup_bs(device_name, NULL, errp)
>>
>> This would migrate any bitmaps in a tree and attach them to the entire
>> drive on the destination.
>>
>> This is simpler: You just need to make sure that the root nodes have
>> the same names, which is a lot easier to manage.
>>
>> This matches how drive migration currently appears to work: The entire
>> tree appears to be generally squashed into a single node and
>> transferred cluster-by-cluster, without general consideration as to
>> the layout of the local block tree. As we both know by now, none of
>> the metadata is transferred, just the data.
>>
>> It prevents migration of just bitmaps where you WANT the extra
>> complexity: If a bitmap is attached lower in the tree, re-affixing it
>> to the root of a destination tree might invalidate the semantics of
>> what that bitmap was meant to track, and it may become useless.
>>
>>
>> So in summary:
>> using device names is probably fine for now, as it matches the current
>> use case of bitmaps as well as drive migration; but using node names
>> may give us more power and precision later.
>>
>> I talked to Max about it, and he is leaning towards using device names
>> for now and switching to node names if we decide we want that power.
>>
>> (...I wonder if we could use a flag, for now, that says we're
>> including DEVICE names. Later, we could add a flag that says we're
>> using NODE names and add an option to toggle as the usage case sees fit.)
>>
>>
>> Are you confused yet? :D
> O, thanks for the explanation). Are we really need this flag? As Markus
> wrote, nodes and devices are sharing namespaces.. We can use
> bdrv_lookup_bs(name, name, errp)..

what 'name' are you using here, though? It looked to me like in your 
backup routine we got a list of BDS entries and get the name *from* the 
BDS, so we still have to think about how we want to /get/ the name.

>
> Also, we can, for example, send bitmaps as follows:
>
> if node has name - send bitmap with this name
> if node is root, but hasn't name - send it with blk name
> otherwise - don't send the bitmap

The node a bitmap is attached to should always have a name -- it would 
not be possible via the existing interface to attach it to a node 
without a name.

I *think* the root node should always have a name, but I am actually 
less sure of that.

>>
>>
>>>>
>>>>> +
>>>>> +    if (flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
>>>>> +        qemu_put_name(f, bdrv_dirty_bitmap_name(bitmap));
>>>>> +
>>>>> +        if (flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY) {
>>>>> +            qemu_put_be64(f, bdrv_dirty_bitmap_granularity(bs,
>>>>> bitmap));
>>>>> +        }
>>>>> +    } else {
>>>>> +        assert(!(flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY));
>>>>> +    }
>>>>> +
>>>>
>>>> I thought we were only migrating bitmaps with names?
>>>> I suppose the conditional can't hurt, but I am not clear on when we
>>>> won't have a bitmap name here.
>>> You are right, 'else' case is not possible.. Hmm. I've added it to be
>>> sure that format is not corrupted, when I decided to put granularity
>>> only with name. Wi won't have a bitmap name only when we send the same
>>> bitmap as on the previous send_bitmap() call. May be it will be better
>>> to use two separate if's without else and assert.
>>
>> It's okay if it is just "paranoia," but I was just checking. It would
>> make a decent assert().
>>
>>>>
>>>>> +    qemu_put_be64(f, start_sector);
>>>>> +    qemu_put_be32(f, nr_sectors);
>>>>> +
>>>>> +    /* if a block is zero we need to flush here since the network
>>>>> +     * bandwidth is now a lot higher than the storage device
>>>>> bandwidth.
>>>>> +     * thus if we queue zero blocks we slow down the migration.
>>>>> +     * also, skip writing block when migrate only dirty bitmaps. */
>>>>> +    if (flags & DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK) {
>>>>> +        qemu_fflush(f);
>>>>> +        return;
>>>>> +    }
>>>>> +
>>>>> +    qemu_put_be64(f, buf_size);
>>>>> +    qemu_put_buffer(f, buf, buf_size);
>>>>> +    g_free(buf);
>>>>> +}
>>>>> +
>>>>> +
>>>>> +/* Called with iothread lock taken.  */
>>>>> +
>>>>> +static void set_dirty_tracking(void)
>>>>> +{
>>>>> +    DirtyBitmapMigBitmapState *dbms;
>>>>> +
>>>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list,
>>>>> entry) {
>>>>> +        dbms->dirty_bitmap =
>>>>> +            bdrv_create_dirty_dirty_bitmap(dbms->bitmap, CHUNK_SIZE);
>>>>> +    }
>>>>> +}
>>>>> +
>>>>
>>>> OK: so we only have these dirty-dirty bitmaps when migration is
>>>> starting, which makes sense.
>>>>
>>>>> +static void unset_dirty_tracking(void)
>>>>> +{
>>>>> +    DirtyBitmapMigBitmapState *dbms;
>>>>> +
>>>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list,
>>>>> entry) {
>>>>> +        bdrv_release_dirty_dirty_bitmap(dbms->bitmap);
>>>>> +    }
>>>>> +}
>>>>> +
>>>>
>>>> OK.
>>>>
>>>>> +static void init_dirty_bitmap_migration(QEMUFile *f)
>>>>> +{
>>>>> +    BlockDriverState *bs;
>>>>> +    BdrvDirtyBitmap *bitmap;
>>>>> +    DirtyBitmapMigBitmapState *dbms;
>>>>> +
>>>>> +    dirty_bitmap_mig_state.bulk_completed = false;
>>>>> +    dirty_bitmap_mig_state.prev_bs = NULL;
>>>>> +    dirty_bitmap_mig_state.prev_bitmap = NULL;
>>>>> +
>>>>> +    for (bs = bdrv_next(NULL); bs; bs = bdrv_next(bs)) {
>>>>> +        for (bitmap = bdrv_next_dirty_bitmap(bs, NULL); bitmap;
>>>>> +             bitmap = bdrv_next_dirty_bitmap(bs, bitmap)) {
>>>>> +            if (!bdrv_dirty_bitmap_name(bitmap)) {
>>>>> +                continue;
>>>>> +            }
>>>>> +
>>>>> +            dbms = g_new0(DirtyBitmapMigBitmapState, 1);
>>>>> +            dbms->bs = bs;
>>>>> +            dbms->bitmap = bitmap;
>>>>> +            dbms->total_sectors = bdrv_nb_sectors(bs);
>>>>> +            dbms->sectors_per_chunk = CHUNK_SIZE * 8 *
>>>>> +                bdrv_dirty_bitmap_granularity(dbms->bs, dbms->bitmap)
>>>>> +                >> BDRV_SECTOR_BITS;
>>>>> +
>>>>> + QSIMPLEQ_INSERT_TAIL(&dirty_bitmap_mig_state.dbms_list,
>>>>> +                                 dbms, entry);
>>>>> +        }
>>>>> +    }
>>>>> +}
>>>>> +
>>>>
>>>> OK, but see the note below for dirty_bitmap_mig_init.
>>> actually it is not 'init' but 'reinit' - called on every migration
>>> start.. Hmm. dbms_list should be cleared here before fill it again.
>>>>
>>>>> +/* Called with no lock taken.  */
>>>>> +static void bulk_phase_send_chunk(QEMUFile *f,
>>>>> DirtyBitmapMigBitmapState *dbms)
>>>>> +{
>>>>> +    uint32_t nr_sectors = MIN(dbms->total_sectors - dbms->cur_sector,
>>>>> +                             dbms->sectors_per_chunk);
>>>>
>>>> What about cases where nr_sectors will put us past the end of the
>>>> bitmap? The bitmap serialization implementation might need a touchup
>>>> with this in mind.
>>> I don't understand.. nr_sectors <=  dbms->total_sectors -
>>> dbms->cur_sector and it can't put us past the end...
>>
>> Oh, because you take the minimum, so we don't have to worry about
>> sectors_per_chunk eclipsing what we have.
>>
>> Nevermind, I can't read... :(
>>
>>>>
>>>>> +
>>>>> +    send_bitmap(f, dbms, dbms->cur_sector, nr_sectors);
>>>>> +
>>>>> +    dbms->cur_sector += nr_sectors;
>>>>> +    if (dbms->cur_sector >= dbms->total_sectors) {
>>>>> +        dbms->bulk_completed = true;
>>>>> +    }
>>>>> +}
>>>>> +
>>>>> +/* Called with no lock taken.  */
>>>>> +static void bulk_phase(QEMUFile *f, bool limit)
>>>>> +{
>>>>> +    DirtyBitmapMigBitmapState *dbms;
>>>>> +
>>>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list,
>>>>> entry) {
>>>>> +        while (!dbms->bulk_completed) {
>>>>> +            bulk_phase_send_chunk(f, dbms);
>>>>> +            if (limit && qemu_file_rate_limit(f)) {
>>>>> +                return;
>>>>> +            }
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +    dirty_bitmap_mig_state.bulk_completed = true;
>>>>> +}
>>>>
>>>> OK.
>>>>
>>>>> +
>>>>> +static void blk_mig_reset_dirty_cursor(void)
>>>>> +{
>>>>> +    DirtyBitmapMigBitmapState *dbms;
>>>>> +
>>>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list,
>>>>> entry) {
>>>>> +        dbms->cur_dirty = 0;
>>>>> +    }
>>>>> +}
>>>>> +
>>>>
>>>> OK.
>>>>
>>>>> +/* Called with iothread lock taken. */
>>>>> +static void dirty_phase_send_chunk(QEMUFile *f,
>>>>> DirtyBitmapMigBitmapState *dbms)
>>>>> +{
>>>>> +    uint32_t nr_sectors;
>>>>> +
>>>>> +    while (dbms->cur_dirty < dbms->total_sectors &&
>>>>> +           !hbitmap_get(dbms->dirty_bitmap, dbms->cur_dirty)) {
>>>>> +        dbms->cur_dirty += dbms->sectors_per_chunk;
>>>>> +    }
>>>>
>>>> OK, so we fast forward the dirty cursor while the meta-bitmap is
>>>> empty. Is it not worth using the HBitmapIterator here? You can reset
>>>> them everywhere you reset the dirty cursor, and then just fast-seek to
>>>> the first dirty sector.
>>> Yes, I've thought about it, just used simpler way (copied from
>>> migration/block.c) for an early version of the patch set. I will do it.
>>
>> Only if it doesn't make things more complicated to look at.
>>
>>>>
>>>>> +
>>>>> +    if (dbms->cur_dirty >= dbms->total_sectors) {
>>>>> +        return;
>>>>> +    }
>>>>> +
>>>>> +    nr_sectors = MIN(dbms->total_sectors - dbms->cur_dirty,
>>>>> +                     dbms->sectors_per_chunk);
>>>>
>>>> What happens if nr_sectors goes past the end?
>>
>> Again, I misread.
>>
>>>>
>>>>> +    send_bitmap(f, dbms, dbms->cur_dirty, nr_sectors);
>>>>> +    hbitmap_reset(dbms->dirty_bitmap, dbms->cur_dirty,
>>>>> dbms->sectors_per_chunk);
>>>>> +    dbms->cur_dirty += nr_sectors;
>>>>> +}
>>>>> +
>>>>> +/* Called with iothread lock taken.
>>>>> + *
>>>>> + * return value:
>>>>> + * 0: too much data for max_downtime
>>>>> + * 1: few enough data for max_downtime
>>>>> +*/
>>>>
>>>> dirty_phase below doesn't have a return value.
>>> rudimentary comment.. thanks.
>>>>
>>>>> +static void dirty_phase(QEMUFile *f, bool limit)
>>>>> +{
>>>>> +    DirtyBitmapMigBitmapState *dbms;
>>>>> +
>>>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list,
>>>>> entry) {
>>>>> +        while (dbms->cur_dirty < dbms->total_sectors) {
>>>>> +            dirty_phase_send_chunk(f, dbms);
>>>>> +            if (limit && qemu_file_rate_limit(f)) {
>>>>> +                return;
>>>>> +            }
>>>>> +        }
>>>>> +    }
>>>>> +}
>>>>> +
>>>>
>>>> OK.
>>>>
>>>>> +
>>>>> +/* Called with iothread lock taken.  */
>>>>> +static void dirty_bitmap_mig_cleanup(void)
>>>>> +{
>>>>> +    DirtyBitmapMigBitmapState *dbms;
>>>>> +
>>>>> +    unset_dirty_tracking();
>>>>> +
>>>>> +    while ((dbms =
>>>>> QSIMPLEQ_FIRST(&dirty_bitmap_mig_state.dbms_list)) != NULL) {
>>>>> + QSIMPLEQ_REMOVE_HEAD(&dirty_bitmap_mig_state.dbms_list, entry);
>>>>> +        g_free(dbms);
>>>>> +    }
>>>>> +}
>>>>> +
>>>>
>>>> OK.
>>>>
>>>>> +static void dirty_bitmap_migration_cancel(void *opaque)
>>>>> +{
>>>>> +    dirty_bitmap_mig_cleanup();
>>>>> +}
>>>>> +
>>>>
>>>> OK.
>>>>
>>>>> +static int dirty_bitmap_save_iterate(QEMUFile *f, void *opaque)
>>>>> +{
>>>>> +    DPRINTF("Enter save live iterate\n");
>>>>> +
>>>>> +    blk_mig_reset_dirty_cursor();
>>>>
>>>> I suppose this is because it's easier to check if we are finished by
>>>> starting from sector 0 every time.
>>>>
>>>> A harder, but faster method may be: Use HBitmapIterators, but don't
>>>> reset them every iteration: just iterate until the end, and check that
>>>> the bitmap is empty. If the meta bitmap is empty, the dirty phase is
>>>> complete. If the meta bitmap is NOT empty, reset the HBI and continue
>>>> allowing iterations over the dirty phase.
>>> Ok, will do.
>>>>
>>>>> +
>>>>> +    if (dirty_bitmap_mig_state.bulk_completed) {
>>>>> +        qemu_mutex_lock_iothread();
>>>>> +        dirty_phase(f, true);
>>>>> +        qemu_mutex_unlock_iothread();
>>>>> +    } else {
>>>>> +        bulk_phase(f, true);
>>>>> +    }
>>>>> +
>>>>> +    qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>>>>> +
>>>>> +    return dirty_bitmap_mig_state.bulk_completed;
>>>>> +}
>>>>> +
>>>>> +/* Called with iothread lock taken.  */
>>>>> +
>>>>> +static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
>>>>> +{
>>>>> +    DirtyBitmapMigBitmapState *dbms;
>>>>> +    DPRINTF("Enter save live complete\n");
>>>>> +
>>>>> +    if (!dirty_bitmap_mig_state.bulk_completed) {
>>>>> +        bulk_phase(f, false);
>>>>> +    }
>>>>
>>>> [Not expertly familiar with savevm:] Under what conditions can this
>>>> happen?
>>> This can happen. save_complete will happen when savevm decide that
>>> pending data size to send is small enough. It was the case for my bugfix
>>> for migration/block.c about pending. To prevent save_complete when bulk
>>> phase isn't completed, save_pending returns (in my bugfix for
>>> migration/block.c) big value. Here I decided to make more honest
>>> save_pending, so I need to complete (if it doesn't) bulk phase in
>>> save_complete.
>>
>> OK, Gotcha.
>>
>>>>
>>>>> +
>>>>> +    blk_mig_reset_dirty_cursor();
>>>>> +    dirty_phase(f, false);
>>>>> +
>>>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list,
>>>>> entry) {
>>>>> +        uint8_t flags = DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME |
>>>>> +                        DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME |
>>>>> +                        DIRTY_BITMAP_MIG_FLAG_ENABLED;
>>>>> +
>>>>> +        qemu_put_byte(f, flags);
>>>>> +        qemu_put_name(f, bdrv_get_device_name(dbms->bs));
>>>>> +        qemu_put_name(f, bdrv_dirty_bitmap_name(dbms->bitmap));
>>>>> +        qemu_put_byte(f, bdrv_dirty_bitmap_enabled(dbms->bitmap));
>>>>> +    }
>>>>> +
>>>>> +    qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>>>>> +
>>>>> +    DPRINTF("Dirty bitmaps migration completed\n");
>>>>> +
>>>>> +    dirty_bitmap_mig_cleanup();
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>
>>>> I suppose we don't need a flag that distinctly SAYS this is the end
>>>> section, since we can tell by omission of
>>>> DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK or ZERO_CHUNK.
>>> Hmm. I think it simplifies the logic (to use EOS after each section).
>>> And the same approach is in migration/block.c.. It's a question about
>>> which format is better:  "Each section for dirty_bitmap_load ends with
>>> EOS" or "Each section for dirty_bitmap_load ends with EOS except the
>>> last one. The last one may be recognized by absent NORMAL_CHUNK and
>>> ZERO_CHUNK"
>>>>
>>>>> +static uint64_t dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
>>>>> +                                          uint64_t max_size)
>>>>> +{
>>>>> +    DirtyBitmapMigBitmapState *dbms;
>>>>> +    uint64_t pending = 0;
>>>>> +
>>>>> +    qemu_mutex_lock_iothread();
>>>>> +
>>>>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list,
>>>>> entry) {
>>>>> +        uint64_t sectors = hbitmap_count(dbms->dirty_bitmap);
>>>>> +        if (!dbms->bulk_completed) {
>>>>> +            sectors += dbms->total_sectors - dbms->cur_sector;
>>>>> +        }
>>>>> +        pending += bdrv_dirty_bitmap_data_size(dbms->bitmap,
>>>>> sectors);
>>>>> +    }
>>>>> +
>>>>> +    qemu_mutex_unlock_iothread();
>>>>> +
>>>>> +    DPRINTF("Enter save live pending %" PRIu64 ", max: %" PRIu64
>>>>> "\n",
>>>>> +            pending, max_size);
>>>>> +    return pending;
>>>>> +}
>>>>> +
>>>>
>>>> OK.
>>>>
>>>>> +static int dirty_bitmap_load(QEMUFile *f, void *opaque, int
>>>>> version_id)
>>>>> +{
>>>>> +    int flags;
>>>>> +
>>>>> +    static char device_name[256], bitmap_name[256];
>>>>> +    static BlockDriverState *bs;
>>>>> +    static BdrvDirtyBitmap *bitmap;
>>>>> +
>>>>> +    uint8_t *buf;
>>>>> +    uint64_t first_sector;
>>>>> +    uint32_t  nr_sectors;
>>>>> +    int ret;
>>>>> +
>>>>> +    DPRINTF("load start\n");
>>>>> +
>>>>> +    do {
>>>>> +        flags = qemu_get_byte(f);
>>>>> +        DPRINTF("flags: %x\n", flags);
>>>>> +
>>>>> +        if (flags & DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME) {
>>>>> +            qemu_get_name(f, device_name);
>>>>> +            bs = bdrv_find(device_name);
>>>>
>>>> Similar to the above confusion, you may want bdrv_lookup_bs or
>>>> similar, since we're going to be looking for BDS nodes instead of
>>>> "devices."
>>> In this case, should it be changed in migration/block.c too?
>>
>> [See discussion above!]
>>
>>>>
>>>>> +            if (!bs) {
>>>>> +                fprintf(stderr, "Error: unknown block device '%s'\n",
>>>>> +                        device_name);
>>>>> +                return -EINVAL;
>>>>> +            }
>>>>> +        }
>>>>> +
>>>>> +        if (flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
>>>>> +            if (!bs) {
>>>>> +                fprintf(stderr, "Error: block device name is not
>>>>> set\n");
>>>>> +                return -EINVAL;
>>>>> +            }
>>>>> +
>>>>> +            qemu_get_name(f, bitmap_name);
>>>>> +            bitmap = bdrv_find_dirty_bitmap(bs, bitmap_name);
>>>>> +            if (flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY) {
>>>>> +                /* First chunk from this bitmap */
>>>>> +                uint64_t granularity = qemu_get_be64(f);
>>>>> +                if (!bitmap) {
>>>>> +                    Error *local_err = NULL;
>>>>> +                    bitmap = bdrv_create_dirty_bitmap(bs,
>>>>> granularity,
>>>>> + bitmap_name,
>>>>> + &local_err);
>>>>> +                    if (!bitmap) {
>>>>> +                        error_report("%s",
>>>>> error_get_pretty(local_err));
>>>>> +                        error_free(local_err);
>>>>> +                        return -EINVAL;
>>>>> +                    }
>>>>> +                } else {
>>>>> +                    uint64_t dest_granularity =
>>>>> +                        bdrv_dirty_bitmap_granularity(bs, bitmap);
>>>>> +                    if (dest_granularity != granularity) {
>>>>> +                        fprintf(stderr,
>>>>> +                                "Error: "
>>>>> +                                "Migrated bitmap granularity (%"
>>>>> PRIu64 ") "
>>>>> +                                "is not match with destination
>>>>> bitmap '%s' "
>>>>> +                                "granularity (%" PRIu64 ")\n",
>>>>> +                                granularity,
>>>>> +                                bitmap_name,
>>>>> +                                dest_granularity);
>>>>> +                        return -EINVAL;
>>>>> +                    }
>>>>> +                }
>>>>> +                bdrv_disable_dirty_bitmap(bitmap);
>>>>> +            }
>>>>> +            if (!bitmap) {
>>>>> +                fprintf(stderr, "Error: unknown dirty bitmap "
>>>>> +                        "'%s' for block device '%s'\n",
>>>>> +                        bitmap_name, device_name);
>>>>> +                return -EINVAL;
>>>>> +            }
>>>>> +        }
>>>>> +
>>>>> +        if (flags & DIRTY_BITMAP_MIG_FLAG_ENABLED) {
>>>>> +            bool enabled;
>>>>> +            if (!bitmap) {
>>>>> +                fprintf(stderr, "Error: dirty bitmap name is not
>>>>> set\n");
>>>>> +                return -EINVAL;
>>>>> +            }
>>>>> +            bdrv_dirty_bitmap_deserialize_finish(bitmap);
>>>>> +            /* complete migration */
>>>>> +            enabled = qemu_get_byte(f);
>>>>> +            if (enabled) {
>>>>> +                bdrv_enable_dirty_bitmap(bitmap);
>>>>> +            }
>>>>> +        }
>>>>
>>>> Oh, so you use the ENABLED flag to show that migration is over.
>>> Yes, it was bad idea..
>>>> If we are going to commit to a stream format for bitmaps, though,
>>>> maybe it's best to actually create a "COMPLETION BLOCK" flag and then
>>>> split this function into two pieces:
>>>>
>>>> (1) The part that receives regular / zero blocks, and
>>>> (2) The part that receives completion data.
>>>>
>>>> That way, if we change the properties that bitmaps have down the line,
>>>> we aren't reliant on literally the "enabled" flag to decide what to do.
>>>>
>>>> Also, it might help make this fairly long function a little smaller
>>>> and more readable.
>>> Ok.
>>>>
>>>>> +
>>>>> +        if (flags & (DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK |
>>>>> +                     DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK)) {
>>>>> +            if (!bs) {
>>>>> +                fprintf(stderr, "Error: block device name is not
>>>>> set\n");
>>>>> +                return -EINVAL;
>>>>> +            }
>>>>> +            if (!bitmap) {
>>>>> +                fprintf(stderr, "Error: dirty bitmap name is not
>>>>> set\n");
>>>>> +                return -EINVAL;
>>>>> +            }
>>>>> +
>>>>> +            first_sector = qemu_get_be64(f);
>>>>> +            nr_sectors = qemu_get_be32(f);
>>>>> +            DPRINTF("chunk: %lu %u\n", first_sector, nr_sectors);
>>>>> +
>>>>> +
>>>>> +            if (flags & DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK) {
>>>>> +                bdrv_dirty_bitmap_deserialize_part0(bitmap,
>>>>> first_sector,
>>>>> + nr_sectors);
>>>>> +            } else {
>>>>> +                uint64_t buf_size = qemu_get_be64(f);
>>>>> +                uint64_t needed_size =
>>>>> +                    bdrv_dirty_bitmap_data_size(bitmap, nr_sectors);
>>>>> +
>>>>> +                if (needed_size > buf_size) {
>>>>> +                    fprintf(stderr,
>>>>> +                            "Error: Migrated bitmap granularity is
>>>>> not "
>>>>> +                            "match with destination bitmap
>>>>> granularity\n");
>>>>> +                    return -EINVAL;
>>>>> +                }
>>>>> +
>>>>
>>>> "Migrated bitmap granularity doesn't match the destination bitmap
>>>> granularity" perhaps.
>>>>
>>>>> +                buf = g_malloc(buf_size);
>>>>> +                qemu_get_buffer(f, buf, buf_size);
>>>>> +                bdrv_dirty_bitmap_deserialize_part(bitmap, buf,
>>>>> + first_sector,
>>>>> + nr_sectors);
>>>>> +                g_free(buf);
>>>>> +            }
>>>>> +        }
>>>>> +
>>>>> +        ret = qemu_file_get_error(f);
>>>>> +        if (ret != 0) {
>>>>> +            return ret;
>>>>> +        }
>>>>> +    } while (!(flags & DIRTY_BITMAP_MIG_FLAG_EOS));
>>>>> +
>>>>> +    DPRINTF("load finish\n");
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>> +static void dirty_bitmap_set_params(const MigrationParams *params,
>>>>> void *opaque)
>>>>> +{
>>>>> +    dirty_bitmap_mig_state.migration_enable = params->dirty;
>>>>> +}
>>>>> +
>>>>
>>>> OK; though I am not immediately aware of what changes need to happen
>>>> to accommodate Eric's suggestions.
>>> This function will be dropped in v3.
>>>>
>>>>> +static bool dirty_bitmap_is_active(void *opaque)
>>>>> +{
>>>>> +    return dirty_bitmap_mig_state.migration_enable == 1;
>>>>> +}
>>>>> +
>>>>
>>>> OK.
>>>>
>>>>> +static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
>>>>> +{
>>>>> +    init_dirty_bitmap_migration(f);
>>>>> +
>>>>> +    qemu_mutex_lock_iothread();
>>>>> +    /* start track dirtyness of dirty bitmaps */
>>>>> +    set_dirty_tracking();
>>>>> +    qemu_mutex_unlock_iothread();
>>>>> +
>>>>> +    blk_mig_reset_dirty_cursor();
>>>>> +    qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>>>>> +
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>
>>>> OK; see dirty_bitmap_mig_init below, though.
>>>>
>>>>> +static SaveVMHandlers savevm_block_handlers = {
>>>>> +    .set_params = dirty_bitmap_set_params,
>>>>> +    .save_live_setup = dirty_bitmap_save_setup,
>>>>> +    .save_live_iterate = dirty_bitmap_save_iterate,
>>>>> +    .save_live_complete = dirty_bitmap_save_complete,
>>>>> +    .save_live_pending = dirty_bitmap_save_pending,
>>>>> +    .load_state = dirty_bitmap_load,
>>>>> +    .cancel = dirty_bitmap_migration_cancel,
>>>>> +    .is_active = dirty_bitmap_is_active,
>>>>> +};
>>>>> +
>>>>> +void dirty_bitmap_mig_init(void)
>>>>> +{
>>>>> +    QSIMPLEQ_INIT(&dirty_bitmap_mig_state.dbms_list);
>>>>
>>>> Maybe I haven't looked thoroughly enough yet, but it's weird that part
>>>> of the dirty_bitmap_mig_state is initialized here, and the rest of it
>>>> in init_dirty_bitmap_migration. I'd prefer to keep it all together, if
>>>> possible.
>>> dirty_bitmap_mig_init is called one time when qemu starts. QSIMPLEQ_INIT
>>> should be called once. dirty_bitmap_save_setup is called on every
>>> migration start, it's like 'reinitialize'.
>>>>
>>>>> +
>>>>> +    register_savevm_live(NULL, "dirty-bitmap", 0, 1,
>>>>> &savevm_block_handlers,
>>>>> +                         &dirty_bitmap_mig_state);
>>>>> +}
>>>>
>>>> OK.
>>>>
>>>>> diff --git a/vl.c b/vl.c
>>>>> index a824a7d..dee7220 100644
>>>>> --- a/vl.c
>>>>> +++ b/vl.c
>>>>> @@ -4184,6 +4184,7 @@ int main(int argc, char **argv, char **envp)
>>>>>
>>>>>       blk_mig_init();
>>>>>       ram_mig_init();
>>>>> +    dirty_bitmap_mig_init();
>>>>>
>>>>>       /* If the currently selected machine wishes to override the
>>>>> units-per-bus
>>>>>        * property of its default HBA interface type, do so now. */
>>>>>
>>>>
>>>> Hm, since dirty bitmaps are a sub-component of the block layer, would
>>>> it not make sense to put this hook under blk_mig_init, perhaps?
>>> IMHO the reason to put it here is to keep all
>>> register_savevm_live-entities in one place.
>>
>> If you still feel that way I won't withhold my R-b, but there are
>> already other cases such as ppc_spapr_init which are not in this
>> general area of vl.c.
>>
>> Plus the dozens of devices that use register_savevm as a wrapper to
>> register_savevm_live, so maybe consolidating calls to this function
>> isn't that important.
> Hm, I've missed it, ppc_spapr_init is not here, yes.. Another thing
> here: dirty bitmaps migration are separate from blk migration. And it
> may be used without blk migration (nbd+mirrow migration may be used)..
> Is it ok to connect dirty bitmaps migration to blk_mig_init, which is
> located in migration/block.c, which may not be used at all when we
> bitmaps are migrated using migration/dirty-bitmap.c?
> In other words, yes, dirty bitmaps are a sub-component of the block
> layer, but dirty bitmap migration is not a sub-component of blk migration.
>>
>>>>
>>>>
>>>> Overall this looks very clean compared to the intermingled format in
>>>> V1, and the code is organized pretty well. Just a few minor comments,
>>>> and I'd like to get the opinion of the migration maintainers, but I am
>>>> happy. Sorry it took me so long to review, please feel free to let me
>>>> know if you disagree with any of my opinions :)
>>>>
>>>> Thank you,
>>>> --John
>>>
>>> Thank you for reviewing my series)
>>>
>>
>> Yup. Hopefully I didn't miss too much that will irritate the Migration
>> overlords.
>>
>> Once you respin on top of v12, I can run some thorough migration tests
>> on it (perhaps over a weekend) and verify that it survives a couple
>> hundred migrations without any kind of integrity loss.
> I hope I'll do it with all other things in about two days.
>>
>> This is what makes sense to me right now, anyway.
>>
>> Do you think you'll be including the bitmap checksum in the
>> BlockDirtyInfo command? That'd be convenient for iotests.
> Ok, will do. Good idea. Only two points:
> 1) Is it ok to include debug info into BlockDirtyInfo? Will users be
> happy with it?
> 2) When I was debugging my code, the information about dirty-regions was
> very useful. Now, all is working and checksums are enough for regression
> control.

I think leaving some tactical debug prints in the code, disabled, is 
perfectly fine from my personal standpoint. We're not really done 
implementing all of these features yet and they may yet be useful.

I'd vote for leaving in any non-QMP/QAPI debug information you wish, for 
now. We just have to be careful about interfaces -- no QMP/HMP commands.

As long as it looks reasonably tidy, I don't think it will be a problem.

-- 
—js

  reply	other threads:[~2015-02-16 18:18 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-27 10:56 [Qemu-devel] [PATCH RFC v2 0/8] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 1/8] qmp: print dirty bitmap Vladimir Sementsov-Ogievskiy
2015-01-27 16:17   ` Eric Blake
2015-01-27 16:23     ` Vladimir Sementsov-Ogievskiy
2015-02-10 21:28   ` John Snow
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 2/8] hbitmap: serialization Vladimir Sementsov-Ogievskiy
2015-02-10 21:29   ` John Snow
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 3/8] block: BdrvDirtyBitmap serialization interface Vladimir Sementsov-Ogievskiy
2015-02-10 21:29   ` John Snow
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 4/8] block: add dirty-dirty bitmaps Vladimir Sementsov-Ogievskiy
2015-02-10 21:30   ` John Snow
2015-02-12 10:51     ` Vladimir Sementsov-Ogievskiy
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 5/8] block: add bdrv_dirty_bitmap_enabled() Vladimir Sementsov-Ogievskiy
2015-02-10 21:30   ` John Snow
2015-02-12 10:54     ` Vladimir Sementsov-Ogievskiy
2015-02-12 16:22       ` John Snow
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 6/8] block: add bdrv_next_dirty_bitmap() Vladimir Sementsov-Ogievskiy
2015-02-10 21:31   ` John Snow
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 7/8] migration: add dirty parameter Vladimir Sementsov-Ogievskiy
2015-01-27 16:20   ` Eric Blake
2015-02-04 14:42     ` Vladimir Sementsov-Ogievskiy
2015-02-10 21:32   ` John Snow
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 8/8] migration: add migration/dirty-bitmap.c Vladimir Sementsov-Ogievskiy
2015-02-10 21:33   ` John Snow
2015-02-13  8:19     ` Vladimir Sementsov-Ogievskiy
2015-02-13  9:06       ` Vladimir Sementsov-Ogievskiy
2015-02-13 17:32         ` John Snow
2015-02-13 17:41           ` Vladimir Sementsov-Ogievskiy
2015-02-13 20:22       ` John Snow
2015-02-16 12:06         ` Vladimir Sementsov-Ogievskiy
2015-02-16 18:18           ` John Snow [this message]
2015-02-16 18:22             ` Dr. David Alan Gilbert
2015-02-17  8:54             ` Vladimir Sementsov-Ogievskiy
2015-02-17 18:45               ` John Snow
2015-02-17 19:12                 ` Eric Blake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54E23477.9040801@redhat.com \
    --to=jsnow@redhat.com \
    --cc=amit.shah@redhat.com \
    --cc=den@openvz.org \
    --cc=dgilbert@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.