All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/22] Fix error handling during bitmap postcopy
@ 2020-02-17 15:02 Vladimir Sementsov-Ogievskiy
  2020-02-17 15:02 ` [PATCH v2 01/22] migration/block-dirty-bitmap: fix dirty_bitmap_mig_before_vm_start Vladimir Sementsov-Ogievskiy
                   ` (24 more replies)
  0 siblings, 25 replies; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-17 15:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, vsementsov, Eduardo Habkost, qemu-block,
	quintela, qemu-stable, dgilbert, Max Reitz, Stefan Hajnoczi,
	Cleber Rosa, andrey.shinkevich, John Snow

Original idea of bitmaps postcopy migration is that bitmaps are non
critical data, and their loss is not serious problem. So, using postcopy
method on any failure we should just drop unfinished bitmaps and
continue guest execution.

However, it doesn't work so. It crashes, fails, it goes to
postcopy-recovery feature. It does anything except for behavior we want.
These series fixes at least some problems with error handling during
bitmaps migration postcopy.

v1 was "[PATCH 0/7] Fix crashes on early shutdown during bitmaps postcopy"

v2:

Most of patches are new or changed a lot.
Only patches 06,07 mostly unchanged, just rebased on refactorings.

Vladimir Sementsov-Ogievskiy (22):
  migration/block-dirty-bitmap: fix dirty_bitmap_mig_before_vm_start
  migration/block-dirty-bitmap: rename state structure types
  migration/block-dirty-bitmap: rename dirty_bitmap_mig_cleanup
  migration/block-dirty-bitmap: move mutex init to dirty_bitmap_mig_init
  migration/block-dirty-bitmap: refactor state global variables
  migration/block-dirty-bitmap: rename finish_lock to just lock
  migration/block-dirty-bitmap: simplify dirty_bitmap_load_complete
  migration/block-dirty-bitmap: keep bitmap state for all bitmaps
  migration/block-dirty-bitmap: relax error handling in incoming part
  migration/block-dirty-bitmap: cancel migration on shutdown
  migration/savevm: don't worry if bitmap migration postcopy failed
  qemu-iotests/199: fix style
  qemu-iotests/199: drop extra constraints
  qemu-iotests/199: better catch postcopy time
  qemu-iotests/199: improve performance: set bitmap by discard
  qemu-iotests/199: change discard patterns
  qemu-iotests/199: increase postcopy period
  python/qemu/machine: add kill() method
  qemu-iotests/199: prepare for new test-cases addition
  qemu-iotests/199: check persistent bitmaps
  qemu-iotests/199: add early shutdown case to bitmaps postcopy
  qemu-iotests/199: add source-killed case to bitmaps postcopy

Cc: John Snow <jsnow@redhat.com>
Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Fam Zheng <fam@euphon.net>
Cc: Juan Quintela <quintela@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Cleber Rosa <crosa@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Cc: qemu-block@nongnu.org
Cc: qemu-devel@nongnu.org
Cc: qemu-stable@nongnu.org # for patch 01

 migration/migration.h          |   3 +-
 migration/block-dirty-bitmap.c | 444 +++++++++++++++++++++------------
 migration/migration.c          |  15 +-
 migration/savevm.c             |  37 ++-
 python/qemu/machine.py         |  12 +-
 tests/qemu-iotests/199         | 244 ++++++++++++++----
 tests/qemu-iotests/199.out     |   4 +-
 7 files changed, 529 insertions(+), 230 deletions(-)

-- 
2.21.0



^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v2 01/22] migration/block-dirty-bitmap: fix dirty_bitmap_mig_before_vm_start
  2020-02-17 15:02 [PATCH v2 00/22] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
@ 2020-02-17 15:02 ` Vladimir Sementsov-Ogievskiy
  2020-02-18  9:44   ` Andrey Shinkevich
  2020-02-17 15:02 ` [PATCH v2 02/22] migration/block-dirty-bitmap: rename state structure types Vladimir Sementsov-Ogievskiy
                   ` (23 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-17 15:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, vsementsov, qemu-block, quintela, qemu-stable,
	dgilbert, Stefan Hajnoczi, andrey.shinkevich, John Snow

No reason to use _locked version of bdrv_enable_dirty_bitmap, as we
don't lock this mutex before. Moreover, the adjacent
bdrv_dirty_bitmap_enable_successor do lock the mutex.

Fixes: 58f72b965e9e1q
Cc: qemu-stable@nongnu.org # v3.0
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 migration/block-dirty-bitmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 7eafface61..16f1793ee3 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -498,7 +498,7 @@ void dirty_bitmap_mig_before_vm_start(void)
         DirtyBitmapLoadBitmapState *b = item->data;
 
         if (b->migrated) {
-            bdrv_enable_dirty_bitmap_locked(b->bitmap);
+            bdrv_enable_dirty_bitmap(b->bitmap);
         } else {
             bdrv_dirty_bitmap_enable_successor(b->bitmap);
         }
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v2 02/22] migration/block-dirty-bitmap: rename state structure types
  2020-02-17 15:02 [PATCH v2 00/22] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
  2020-02-17 15:02 ` [PATCH v2 01/22] migration/block-dirty-bitmap: fix dirty_bitmap_mig_before_vm_start Vladimir Sementsov-Ogievskiy
@ 2020-02-17 15:02 ` Vladimir Sementsov-Ogievskiy
  2020-07-23 20:50   ` Eric Blake
  2020-02-17 15:02 ` [PATCH v2 03/22] migration/block-dirty-bitmap: rename dirty_bitmap_mig_cleanup Vladimir Sementsov-Ogievskiy
                   ` (22 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-17 15:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, vsementsov, qemu-block, quintela, dgilbert,
	Stefan Hajnoczi, andrey.shinkevich, John Snow

Rename types to be symmetrical for load/save part and shorter.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 migration/block-dirty-bitmap.c | 68 ++++++++++++++++++----------------
 1 file changed, 36 insertions(+), 32 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 16f1793ee3..73792ab005 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -100,23 +100,25 @@
 /* 0x04 was "AUTOLOAD" flags on elder versions, no it is ignored */
 #define DIRTY_BITMAP_MIG_START_FLAG_RESERVED_MASK    0xf8
 
-typedef struct DirtyBitmapMigBitmapState {
+/* State of one bitmap during save process */
+typedef struct SaveBitmapState {
     /* Written during setup phase. */
     BlockDriverState *bs;
     const char *node_name;
     BdrvDirtyBitmap *bitmap;
     uint64_t total_sectors;
     uint64_t sectors_per_chunk;
-    QSIMPLEQ_ENTRY(DirtyBitmapMigBitmapState) entry;
+    QSIMPLEQ_ENTRY(SaveBitmapState) entry;
     uint8_t flags;
 
     /* For bulk phase. */
     bool bulk_completed;
     uint64_t cur_sector;
-} DirtyBitmapMigBitmapState;
+} SaveBitmapState;
 
-typedef struct DirtyBitmapMigState {
-    QSIMPLEQ_HEAD(, DirtyBitmapMigBitmapState) dbms_list;
+/* State of the dirty bitmap migration (DBM) during save process */
+typedef struct DBMSaveState {
+    QSIMPLEQ_HEAD(, SaveBitmapState) dbms_list;
 
     bool bulk_completed;
     bool no_bitmaps;
@@ -124,23 +126,25 @@ typedef struct DirtyBitmapMigState {
     /* for send_bitmap_bits() */
     BlockDriverState *prev_bs;
     BdrvDirtyBitmap *prev_bitmap;
-} DirtyBitmapMigState;
+} DBMSaveState;
 
-typedef struct DirtyBitmapLoadState {
+/* State of the dirty bitmap migration (DBM) during load process */
+typedef struct DBMLoadState {
     uint32_t flags;
     char node_name[256];
     char bitmap_name[256];
     BlockDriverState *bs;
     BdrvDirtyBitmap *bitmap;
-} DirtyBitmapLoadState;
+} DBMLoadState;
 
-static DirtyBitmapMigState dirty_bitmap_mig_state;
+static DBMSaveState dirty_bitmap_mig_state;
 
-typedef struct DirtyBitmapLoadBitmapState {
+/* State of one bitmap during load process */
+typedef struct LoadBitmapState {
     BlockDriverState *bs;
     BdrvDirtyBitmap *bitmap;
     bool migrated;
-} DirtyBitmapLoadBitmapState;
+} LoadBitmapState;
 static GSList *enabled_bitmaps;
 QemuMutex finish_lock;
 
@@ -170,7 +174,7 @@ static void qemu_put_bitmap_flags(QEMUFile *f, uint32_t flags)
     qemu_put_byte(f, flags);
 }
 
-static void send_bitmap_header(QEMUFile *f, DirtyBitmapMigBitmapState *dbms,
+static void send_bitmap_header(QEMUFile *f, SaveBitmapState *dbms,
                                uint32_t additional_flags)
 {
     BlockDriverState *bs = dbms->bs;
@@ -199,19 +203,19 @@ static void send_bitmap_header(QEMUFile *f, DirtyBitmapMigBitmapState *dbms,
     }
 }
 
-static void send_bitmap_start(QEMUFile *f, DirtyBitmapMigBitmapState *dbms)
+static void send_bitmap_start(QEMUFile *f, SaveBitmapState *dbms)
 {
     send_bitmap_header(f, dbms, DIRTY_BITMAP_MIG_FLAG_START);
     qemu_put_be32(f, bdrv_dirty_bitmap_granularity(dbms->bitmap));
     qemu_put_byte(f, dbms->flags);
 }
 
-static void send_bitmap_complete(QEMUFile *f, DirtyBitmapMigBitmapState *dbms)
+static void send_bitmap_complete(QEMUFile *f, SaveBitmapState *dbms)
 {
     send_bitmap_header(f, dbms, DIRTY_BITMAP_MIG_FLAG_COMPLETE);
 }
 
-static void send_bitmap_bits(QEMUFile *f, DirtyBitmapMigBitmapState *dbms,
+static void send_bitmap_bits(QEMUFile *f, SaveBitmapState *dbms,
                              uint64_t start_sector, uint32_t nr_sectors)
 {
     /* align for buffer_is_zero() */
@@ -257,7 +261,7 @@ static void send_bitmap_bits(QEMUFile *f, DirtyBitmapMigBitmapState *dbms,
 /* Called with iothread lock taken.  */
 static void dirty_bitmap_mig_cleanup(void)
 {
-    DirtyBitmapMigBitmapState *dbms;
+    SaveBitmapState *dbms;
 
     while ((dbms = QSIMPLEQ_FIRST(&dirty_bitmap_mig_state.dbms_list)) != NULL) {
         QSIMPLEQ_REMOVE_HEAD(&dirty_bitmap_mig_state.dbms_list, entry);
@@ -272,7 +276,7 @@ static int init_dirty_bitmap_migration(void)
 {
     BlockDriverState *bs;
     BdrvDirtyBitmap *bitmap;
-    DirtyBitmapMigBitmapState *dbms;
+    SaveBitmapState *dbms;
     Error *local_err = NULL;
 
     dirty_bitmap_mig_state.bulk_completed = false;
@@ -303,7 +307,7 @@ static int init_dirty_bitmap_migration(void)
             bdrv_ref(bs);
             bdrv_dirty_bitmap_set_busy(bitmap, true);
 
-            dbms = g_new0(DirtyBitmapMigBitmapState, 1);
+            dbms = g_new0(SaveBitmapState, 1);
             dbms->bs = bs;
             dbms->node_name = name;
             dbms->bitmap = bitmap;
@@ -340,7 +344,7 @@ fail:
 }
 
 /* Called with no lock taken.  */
-static void bulk_phase_send_chunk(QEMUFile *f, DirtyBitmapMigBitmapState *dbms)
+static void bulk_phase_send_chunk(QEMUFile *f, SaveBitmapState *dbms)
 {
     uint32_t nr_sectors = MIN(dbms->total_sectors - dbms->cur_sector,
                              dbms->sectors_per_chunk);
@@ -356,7 +360,7 @@ static void bulk_phase_send_chunk(QEMUFile *f, DirtyBitmapMigBitmapState *dbms)
 /* Called with no lock taken.  */
 static void bulk_phase(QEMUFile *f, bool limit)
 {
-    DirtyBitmapMigBitmapState *dbms;
+    SaveBitmapState *dbms;
 
     QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
         while (!dbms->bulk_completed) {
@@ -393,7 +397,7 @@ static int dirty_bitmap_save_iterate(QEMUFile *f, void *opaque)
 
 static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
 {
-    DirtyBitmapMigBitmapState *dbms;
+    SaveBitmapState *dbms;
     trace_dirty_bitmap_save_complete_enter();
 
     if (!dirty_bitmap_mig_state.bulk_completed) {
@@ -418,7 +422,7 @@ static void dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
                                       uint64_t *res_compatible,
                                       uint64_t *res_postcopy_only)
 {
-    DirtyBitmapMigBitmapState *dbms;
+    SaveBitmapState *dbms;
     uint64_t pending = 0;
 
     qemu_mutex_lock_iothread();
@@ -439,7 +443,7 @@ static void dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
 }
 
 /* First occurrence of this bitmap. It should be created if doesn't exist */
-static int dirty_bitmap_load_start(QEMUFile *f, DirtyBitmapLoadState *s)
+static int dirty_bitmap_load_start(QEMUFile *f, DBMLoadState *s)
 {
     Error *local_err = NULL;
     uint32_t granularity = qemu_get_be32(f);
@@ -470,7 +474,7 @@ static int dirty_bitmap_load_start(QEMUFile *f, DirtyBitmapLoadState *s)
 
     bdrv_disable_dirty_bitmap(s->bitmap);
     if (flags & DIRTY_BITMAP_MIG_START_FLAG_ENABLED) {
-        DirtyBitmapLoadBitmapState *b;
+        LoadBitmapState *b;
 
         bdrv_dirty_bitmap_create_successor(s->bitmap, &local_err);
         if (local_err) {
@@ -478,7 +482,7 @@ static int dirty_bitmap_load_start(QEMUFile *f, DirtyBitmapLoadState *s)
             return -EINVAL;
         }
 
-        b = g_new(DirtyBitmapLoadBitmapState, 1);
+        b = g_new(LoadBitmapState, 1);
         b->bs = s->bs;
         b->bitmap = s->bitmap;
         b->migrated = false;
@@ -495,7 +499,7 @@ void dirty_bitmap_mig_before_vm_start(void)
     qemu_mutex_lock(&finish_lock);
 
     for (item = enabled_bitmaps; item; item = g_slist_next(item)) {
-        DirtyBitmapLoadBitmapState *b = item->data;
+        LoadBitmapState *b = item->data;
 
         if (b->migrated) {
             bdrv_enable_dirty_bitmap(b->bitmap);
@@ -512,7 +516,7 @@ void dirty_bitmap_mig_before_vm_start(void)
     qemu_mutex_unlock(&finish_lock);
 }
 
-static void dirty_bitmap_load_complete(QEMUFile *f, DirtyBitmapLoadState *s)
+static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
 {
     GSList *item;
     trace_dirty_bitmap_load_complete();
@@ -521,7 +525,7 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DirtyBitmapLoadState *s)
     qemu_mutex_lock(&finish_lock);
 
     for (item = enabled_bitmaps; item; item = g_slist_next(item)) {
-        DirtyBitmapLoadBitmapState *b = item->data;
+        LoadBitmapState *b = item->data;
 
         if (b->bitmap == s->bitmap) {
             b->migrated = true;
@@ -553,7 +557,7 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DirtyBitmapLoadState *s)
     qemu_mutex_unlock(&finish_lock);
 }
 
-static int dirty_bitmap_load_bits(QEMUFile *f, DirtyBitmapLoadState *s)
+static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
 {
     uint64_t first_byte = qemu_get_be64(f) << BDRV_SECTOR_BITS;
     uint64_t nr_bytes = (uint64_t)qemu_get_be32(f) << BDRV_SECTOR_BITS;
@@ -598,7 +602,7 @@ static int dirty_bitmap_load_bits(QEMUFile *f, DirtyBitmapLoadState *s)
     return 0;
 }
 
-static int dirty_bitmap_load_header(QEMUFile *f, DirtyBitmapLoadState *s)
+static int dirty_bitmap_load_header(QEMUFile *f, DBMLoadState *s)
 {
     Error *local_err = NULL;
     bool nothing;
@@ -647,7 +651,7 @@ static int dirty_bitmap_load_header(QEMUFile *f, DirtyBitmapLoadState *s)
 
 static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
 {
-    static DirtyBitmapLoadState s;
+    static DBMLoadState s;
     int ret = 0;
 
     trace_dirty_bitmap_load_enter();
@@ -685,7 +689,7 @@ static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
 
 static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
 {
-    DirtyBitmapMigBitmapState *dbms = NULL;
+    SaveBitmapState *dbms = NULL;
     if (init_dirty_bitmap_migration() < 0) {
         return -1;
     }
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v2 03/22] migration/block-dirty-bitmap: rename dirty_bitmap_mig_cleanup
  2020-02-17 15:02 [PATCH v2 00/22] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
  2020-02-17 15:02 ` [PATCH v2 01/22] migration/block-dirty-bitmap: fix dirty_bitmap_mig_before_vm_start Vladimir Sementsov-Ogievskiy
  2020-02-17 15:02 ` [PATCH v2 02/22] migration/block-dirty-bitmap: rename state structure types Vladimir Sementsov-Ogievskiy
@ 2020-02-17 15:02 ` Vladimir Sementsov-Ogievskiy
  2020-02-18 11:00   ` Andrey Shinkevich
  2020-02-17 15:02 ` [PATCH v2 04/22] migration/block-dirty-bitmap: move mutex init to dirty_bitmap_mig_init Vladimir Sementsov-Ogievskiy
                   ` (21 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-17 15:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, vsementsov, qemu-block, quintela, dgilbert,
	Stefan Hajnoczi, andrey.shinkevich, John Snow

Rename dirty_bitmap_mig_cleanup to dirty_bitmap_do_save_cleanup, to
stress that it is on save part.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 migration/block-dirty-bitmap.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 73792ab005..4e8959ae52 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -259,7 +259,7 @@ static void send_bitmap_bits(QEMUFile *f, SaveBitmapState *dbms,
 }
 
 /* Called with iothread lock taken.  */
-static void dirty_bitmap_mig_cleanup(void)
+static void dirty_bitmap_do_save_cleanup(void)
 {
     SaveBitmapState *dbms;
 
@@ -338,7 +338,7 @@ static int init_dirty_bitmap_migration(void)
     return 0;
 
 fail:
-    dirty_bitmap_mig_cleanup();
+    dirty_bitmap_do_save_cleanup();
 
     return -1;
 }
@@ -377,7 +377,7 @@ static void bulk_phase(QEMUFile *f, bool limit)
 /* for SaveVMHandlers */
 static void dirty_bitmap_save_cleanup(void *opaque)
 {
-    dirty_bitmap_mig_cleanup();
+    dirty_bitmap_do_save_cleanup();
 }
 
 static int dirty_bitmap_save_iterate(QEMUFile *f, void *opaque)
@@ -412,7 +412,7 @@ static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
 
     trace_dirty_bitmap_save_complete_finish();
 
-    dirty_bitmap_mig_cleanup();
+    dirty_bitmap_do_save_cleanup();
     return 0;
 }
 
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v2 04/22] migration/block-dirty-bitmap: move mutex init to dirty_bitmap_mig_init
  2020-02-17 15:02 [PATCH v2 00/22] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2020-02-17 15:02 ` [PATCH v2 03/22] migration/block-dirty-bitmap: rename dirty_bitmap_mig_cleanup Vladimir Sementsov-Ogievskiy
@ 2020-02-17 15:02 ` Vladimir Sementsov-Ogievskiy
  2020-02-18 11:28   ` Andrey Shinkevich
  2020-02-17 15:02 ` [PATCH v2 05/22] migration/block-dirty-bitmap: refactor state global variables Vladimir Sementsov-Ogievskiy
                   ` (20 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-17 15:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, vsementsov, qemu-block, quintela, dgilbert,
	Stefan Hajnoczi, andrey.shinkevich, John Snow

No reasons to keep two public init functions.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 migration/migration.h          | 1 -
 migration/block-dirty-bitmap.c | 6 +-----
 migration/migration.c          | 2 --
 3 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 8473ddfc88..2948f2387b 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -332,7 +332,6 @@ void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
 void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
 
 void dirty_bitmap_mig_before_vm_start(void);
-void init_dirty_bitmap_incoming_migration(void);
 void migrate_add_address(SocketAddress *address);
 
 int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 4e8959ae52..49d4cf8810 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -148,11 +148,6 @@ typedef struct LoadBitmapState {
 static GSList *enabled_bitmaps;
 QemuMutex finish_lock;
 
-void init_dirty_bitmap_incoming_migration(void)
-{
-    qemu_mutex_init(&finish_lock);
-}
-
 static uint32_t qemu_get_bitmap_flags(QEMUFile *f)
 {
     uint8_t flags = qemu_get_byte(f);
@@ -733,6 +728,7 @@ static SaveVMHandlers savevm_dirty_bitmap_handlers = {
 void dirty_bitmap_mig_init(void)
 {
     QSIMPLEQ_INIT(&dirty_bitmap_mig_state.dbms_list);
+    qemu_mutex_init(&finish_lock);
 
     register_savevm_live("dirty-bitmap", 0, 1,
                          &savevm_dirty_bitmap_handlers,
diff --git a/migration/migration.c b/migration/migration.c
index 8fb68795dc..515047932c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -158,8 +158,6 @@ void migration_object_init(void)
     qemu_sem_init(&current_incoming->postcopy_pause_sem_dst, 0);
     qemu_sem_init(&current_incoming->postcopy_pause_sem_fault, 0);
 
-    init_dirty_bitmap_incoming_migration();
-
     if (!migration_object_check(current_migration, &err)) {
         error_report_err(err);
         exit(1);
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v2 05/22] migration/block-dirty-bitmap: refactor state global variables
  2020-02-17 15:02 [PATCH v2 00/22] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (3 preceding siblings ...)
  2020-02-17 15:02 ` [PATCH v2 04/22] migration/block-dirty-bitmap: move mutex init to dirty_bitmap_mig_init Vladimir Sementsov-Ogievskiy
@ 2020-02-17 15:02 ` Vladimir Sementsov-Ogievskiy
  2020-02-18 13:05   ` Andrey Shinkevich
  2020-02-17 15:02 ` [PATCH v2 06/22] migration/block-dirty-bitmap: rename finish_lock to just lock Vladimir Sementsov-Ogievskiy
                   ` (19 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-17 15:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, vsementsov, qemu-block, quintela, dgilbert,
	Stefan Hajnoczi, andrey.shinkevich, John Snow

Move all state variables into one global struct. Reduce global
variable usage, utilizing opaque pointer where possible.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 migration/block-dirty-bitmap.c | 171 ++++++++++++++++++---------------
 1 file changed, 95 insertions(+), 76 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 49d4cf8810..7a82b76809 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -128,6 +128,12 @@ typedef struct DBMSaveState {
     BdrvDirtyBitmap *prev_bitmap;
 } DBMSaveState;
 
+typedef struct LoadBitmapState {
+    BlockDriverState *bs;
+    BdrvDirtyBitmap *bitmap;
+    bool migrated;
+} LoadBitmapState;
+
 /* State of the dirty bitmap migration (DBM) during load process */
 typedef struct DBMLoadState {
     uint32_t flags;
@@ -135,18 +141,17 @@ typedef struct DBMLoadState {
     char bitmap_name[256];
     BlockDriverState *bs;
     BdrvDirtyBitmap *bitmap;
+
+    GSList *enabled_bitmaps;
+    QemuMutex finish_lock;
 } DBMLoadState;
 
-static DBMSaveState dirty_bitmap_mig_state;
+typedef struct DBMState {
+    DBMSaveState save;
+    DBMLoadState load;
+} DBMState;
 
-/* State of one bitmap during load process */
-typedef struct LoadBitmapState {
-    BlockDriverState *bs;
-    BdrvDirtyBitmap *bitmap;
-    bool migrated;
-} LoadBitmapState;
-static GSList *enabled_bitmaps;
-QemuMutex finish_lock;
+static DBMState dbm_state;
 
 static uint32_t qemu_get_bitmap_flags(QEMUFile *f)
 {
@@ -169,21 +174,21 @@ static void qemu_put_bitmap_flags(QEMUFile *f, uint32_t flags)
     qemu_put_byte(f, flags);
 }
 
-static void send_bitmap_header(QEMUFile *f, SaveBitmapState *dbms,
-                               uint32_t additional_flags)
+static void send_bitmap_header(QEMUFile *f, DBMSaveState *s,
+                               SaveBitmapState *dbms, uint32_t additional_flags)
 {
     BlockDriverState *bs = dbms->bs;
     BdrvDirtyBitmap *bitmap = dbms->bitmap;
     uint32_t flags = additional_flags;
     trace_send_bitmap_header_enter();
 
-    if (bs != dirty_bitmap_mig_state.prev_bs) {
-        dirty_bitmap_mig_state.prev_bs = bs;
+    if (bs != s->prev_bs) {
+        s->prev_bs = bs;
         flags |= DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME;
     }
 
-    if (bitmap != dirty_bitmap_mig_state.prev_bitmap) {
-        dirty_bitmap_mig_state.prev_bitmap = bitmap;
+    if (bitmap != s->prev_bitmap) {
+        s->prev_bitmap = bitmap;
         flags |= DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME;
     }
 
@@ -198,19 +203,22 @@ static void send_bitmap_header(QEMUFile *f, SaveBitmapState *dbms,
     }
 }
 
-static void send_bitmap_start(QEMUFile *f, SaveBitmapState *dbms)
+static void send_bitmap_start(QEMUFile *f, DBMSaveState *s,
+                              SaveBitmapState *dbms)
 {
-    send_bitmap_header(f, dbms, DIRTY_BITMAP_MIG_FLAG_START);
+    send_bitmap_header(f, s, dbms, DIRTY_BITMAP_MIG_FLAG_START);
     qemu_put_be32(f, bdrv_dirty_bitmap_granularity(dbms->bitmap));
     qemu_put_byte(f, dbms->flags);
 }
 
-static void send_bitmap_complete(QEMUFile *f, SaveBitmapState *dbms)
+static void send_bitmap_complete(QEMUFile *f, DBMSaveState *s,
+                                 SaveBitmapState *dbms)
 {
-    send_bitmap_header(f, dbms, DIRTY_BITMAP_MIG_FLAG_COMPLETE);
+    send_bitmap_header(f, s, dbms, DIRTY_BITMAP_MIG_FLAG_COMPLETE);
 }
 
-static void send_bitmap_bits(QEMUFile *f, SaveBitmapState *dbms,
+static void send_bitmap_bits(QEMUFile *f, DBMSaveState *s,
+                             SaveBitmapState *dbms,
                              uint64_t start_sector, uint32_t nr_sectors)
 {
     /* align for buffer_is_zero() */
@@ -235,7 +243,7 @@ static void send_bitmap_bits(QEMUFile *f, SaveBitmapState *dbms,
 
     trace_send_bitmap_bits(flags, start_sector, nr_sectors, buf_size);
 
-    send_bitmap_header(f, dbms, flags);
+    send_bitmap_header(f, s, dbms, flags);
 
     qemu_put_be64(f, start_sector);
     qemu_put_be32(f, nr_sectors);
@@ -254,12 +262,12 @@ static void send_bitmap_bits(QEMUFile *f, SaveBitmapState *dbms,
 }
 
 /* Called with iothread lock taken.  */
-static void dirty_bitmap_do_save_cleanup(void)
+static void dirty_bitmap_do_save_cleanup(DBMSaveState *s)
 {
     SaveBitmapState *dbms;
 
-    while ((dbms = QSIMPLEQ_FIRST(&dirty_bitmap_mig_state.dbms_list)) != NULL) {
-        QSIMPLEQ_REMOVE_HEAD(&dirty_bitmap_mig_state.dbms_list, entry);
+    while ((dbms = QSIMPLEQ_FIRST(&s->dbms_list)) != NULL) {
+        QSIMPLEQ_REMOVE_HEAD(&s->dbms_list, entry);
         bdrv_dirty_bitmap_set_busy(dbms->bitmap, false);
         bdrv_unref(dbms->bs);
         g_free(dbms);
@@ -267,17 +275,17 @@ static void dirty_bitmap_do_save_cleanup(void)
 }
 
 /* Called with iothread lock taken. */
-static int init_dirty_bitmap_migration(void)
+static int init_dirty_bitmap_migration(DBMSaveState *s)
 {
     BlockDriverState *bs;
     BdrvDirtyBitmap *bitmap;
     SaveBitmapState *dbms;
     Error *local_err = NULL;
 
-    dirty_bitmap_mig_state.bulk_completed = false;
-    dirty_bitmap_mig_state.prev_bs = NULL;
-    dirty_bitmap_mig_state.prev_bitmap = NULL;
-    dirty_bitmap_mig_state.no_bitmaps = false;
+    s->bulk_completed = false;
+    s->prev_bs = NULL;
+    s->prev_bitmap = NULL;
+    s->no_bitmaps = false;
 
     for (bs = bdrv_next_all_states(NULL); bs; bs = bdrv_next_all_states(bs)) {
         const char *name = bdrv_get_device_or_node_name(bs);
@@ -316,35 +324,36 @@ static int init_dirty_bitmap_migration(void)
                 dbms->flags |= DIRTY_BITMAP_MIG_START_FLAG_PERSISTENT;
             }
 
-            QSIMPLEQ_INSERT_TAIL(&dirty_bitmap_mig_state.dbms_list,
+            QSIMPLEQ_INSERT_TAIL(&s->dbms_list,
                                  dbms, entry);
         }
     }
 
     /* unset migration flags here, to not roll back it */
-    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
+    QSIMPLEQ_FOREACH(dbms, &s->dbms_list, entry) {
         bdrv_dirty_bitmap_skip_store(dbms->bitmap, true);
     }
 
-    if (QSIMPLEQ_EMPTY(&dirty_bitmap_mig_state.dbms_list)) {
-        dirty_bitmap_mig_state.no_bitmaps = true;
+    if (QSIMPLEQ_EMPTY(&s->dbms_list)) {
+        s->no_bitmaps = true;
     }
 
     return 0;
 
 fail:
-    dirty_bitmap_do_save_cleanup();
+    dirty_bitmap_do_save_cleanup(s);
 
     return -1;
 }
 
 /* Called with no lock taken.  */
-static void bulk_phase_send_chunk(QEMUFile *f, SaveBitmapState *dbms)
+static void bulk_phase_send_chunk(QEMUFile *f, DBMSaveState *s,
+                                  SaveBitmapState *dbms)
 {
     uint32_t nr_sectors = MIN(dbms->total_sectors - dbms->cur_sector,
                              dbms->sectors_per_chunk);
 
-    send_bitmap_bits(f, dbms, dbms->cur_sector, nr_sectors);
+    send_bitmap_bits(f, s, dbms, dbms->cur_sector, nr_sectors);
 
     dbms->cur_sector += nr_sectors;
     if (dbms->cur_sector >= dbms->total_sectors) {
@@ -353,61 +362,66 @@ static void bulk_phase_send_chunk(QEMUFile *f, SaveBitmapState *dbms)
 }
 
 /* Called with no lock taken.  */
-static void bulk_phase(QEMUFile *f, bool limit)
+static void bulk_phase(QEMUFile *f, DBMSaveState *s, bool limit)
 {
     SaveBitmapState *dbms;
 
-    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
+    QSIMPLEQ_FOREACH(dbms, &s->dbms_list, entry) {
         while (!dbms->bulk_completed) {
-            bulk_phase_send_chunk(f, dbms);
+            bulk_phase_send_chunk(f, s, dbms);
             if (limit && qemu_file_rate_limit(f)) {
                 return;
             }
         }
     }
 
-    dirty_bitmap_mig_state.bulk_completed = true;
+    s->bulk_completed = true;
 }
 
 /* for SaveVMHandlers */
 static void dirty_bitmap_save_cleanup(void *opaque)
 {
-    dirty_bitmap_do_save_cleanup();
+    DBMSaveState *s = &((DBMState *)opaque)->save;
+
+    dirty_bitmap_do_save_cleanup(s);
 }
 
 static int dirty_bitmap_save_iterate(QEMUFile *f, void *opaque)
 {
+    DBMSaveState *s = &((DBMState *)opaque)->save;
+
     trace_dirty_bitmap_save_iterate(migration_in_postcopy());
 
-    if (migration_in_postcopy() && !dirty_bitmap_mig_state.bulk_completed) {
-        bulk_phase(f, true);
+    if (migration_in_postcopy() && !s->bulk_completed) {
+        bulk_phase(f, s, true);
     }
 
     qemu_put_bitmap_flags(f, DIRTY_BITMAP_MIG_FLAG_EOS);
 
-    return dirty_bitmap_mig_state.bulk_completed;
+    return s->bulk_completed;
 }
 
 /* Called with iothread lock taken.  */
 
 static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
 {
+    DBMSaveState *s = &((DBMState *)opaque)->save;
     SaveBitmapState *dbms;
     trace_dirty_bitmap_save_complete_enter();
 
-    if (!dirty_bitmap_mig_state.bulk_completed) {
-        bulk_phase(f, false);
+    if (!s->bulk_completed) {
+        bulk_phase(f, s, false);
     }
 
-    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
-        send_bitmap_complete(f, dbms);
+    QSIMPLEQ_FOREACH(dbms, &s->dbms_list, entry) {
+        send_bitmap_complete(f, s, dbms);
     }
 
     qemu_put_bitmap_flags(f, DIRTY_BITMAP_MIG_FLAG_EOS);
 
     trace_dirty_bitmap_save_complete_finish();
 
-    dirty_bitmap_do_save_cleanup();
+    dirty_bitmap_save_cleanup(opaque);
     return 0;
 }
 
@@ -417,12 +431,13 @@ static void dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
                                       uint64_t *res_compatible,
                                       uint64_t *res_postcopy_only)
 {
+    DBMSaveState *s = &((DBMState *)opaque)->save;
     SaveBitmapState *dbms;
     uint64_t pending = 0;
 
     qemu_mutex_lock_iothread();
 
-    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
+    QSIMPLEQ_FOREACH(dbms, &s->dbms_list, entry) {
         uint64_t gran = bdrv_dirty_bitmap_granularity(dbms->bitmap);
         uint64_t sectors = dbms->bulk_completed ? 0 :
                            dbms->total_sectors - dbms->cur_sector;
@@ -481,7 +496,7 @@ static int dirty_bitmap_load_start(QEMUFile *f, DBMLoadState *s)
         b->bs = s->bs;
         b->bitmap = s->bitmap;
         b->migrated = false;
-        enabled_bitmaps = g_slist_prepend(enabled_bitmaps, b);
+        s->enabled_bitmaps = g_slist_prepend(s->enabled_bitmaps, b);
     }
 
     return 0;
@@ -489,11 +504,12 @@ static int dirty_bitmap_load_start(QEMUFile *f, DBMLoadState *s)
 
 void dirty_bitmap_mig_before_vm_start(void)
 {
+    DBMLoadState *s = &dbm_state.load;
     GSList *item;
 
-    qemu_mutex_lock(&finish_lock);
+    qemu_mutex_lock(&s->finish_lock);
 
-    for (item = enabled_bitmaps; item; item = g_slist_next(item)) {
+    for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
         LoadBitmapState *b = item->data;
 
         if (b->migrated) {
@@ -505,10 +521,10 @@ void dirty_bitmap_mig_before_vm_start(void)
         g_free(b);
     }
 
-    g_slist_free(enabled_bitmaps);
-    enabled_bitmaps = NULL;
+    g_slist_free(s->enabled_bitmaps);
+    s->enabled_bitmaps = NULL;
 
-    qemu_mutex_unlock(&finish_lock);
+    qemu_mutex_unlock(&s->finish_lock);
 }
 
 static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
@@ -517,9 +533,9 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
     trace_dirty_bitmap_load_complete();
     bdrv_dirty_bitmap_deserialize_finish(s->bitmap);
 
-    qemu_mutex_lock(&finish_lock);
+    qemu_mutex_lock(&s->finish_lock);
 
-    for (item = enabled_bitmaps; item; item = g_slist_next(item)) {
+    for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
         LoadBitmapState *b = item->data;
 
         if (b->bitmap == s->bitmap) {
@@ -530,7 +546,7 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
 
     if (bdrv_dirty_bitmap_has_successor(s->bitmap)) {
         bdrv_dirty_bitmap_lock(s->bitmap);
-        if (enabled_bitmaps == NULL) {
+        if (s->enabled_bitmaps == NULL) {
             /* in postcopy */
             bdrv_reclaim_dirty_bitmap_locked(s->bitmap, &error_abort);
             bdrv_enable_dirty_bitmap_locked(s->bitmap);
@@ -549,7 +565,7 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
         bdrv_dirty_bitmap_unlock(s->bitmap);
     }
 
-    qemu_mutex_unlock(&finish_lock);
+    qemu_mutex_unlock(&s->finish_lock);
 }
 
 static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
@@ -646,7 +662,7 @@ static int dirty_bitmap_load_header(QEMUFile *f, DBMLoadState *s)
 
 static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
 {
-    static DBMLoadState s;
+    DBMLoadState *s = &((DBMState *)opaque)->load;
     int ret = 0;
 
     trace_dirty_bitmap_load_enter();
@@ -656,17 +672,17 @@ static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
     }
 
     do {
-        ret = dirty_bitmap_load_header(f, &s);
+        ret = dirty_bitmap_load_header(f, s);
         if (ret < 0) {
             return ret;
         }
 
-        if (s.flags & DIRTY_BITMAP_MIG_FLAG_START) {
-            ret = dirty_bitmap_load_start(f, &s);
-        } else if (s.flags & DIRTY_BITMAP_MIG_FLAG_COMPLETE) {
-            dirty_bitmap_load_complete(f, &s);
-        } else if (s.flags & DIRTY_BITMAP_MIG_FLAG_BITS) {
-            ret = dirty_bitmap_load_bits(f, &s);
+        if (s->flags & DIRTY_BITMAP_MIG_FLAG_START) {
+            ret = dirty_bitmap_load_start(f, s);
+        } else if (s->flags & DIRTY_BITMAP_MIG_FLAG_COMPLETE) {
+            dirty_bitmap_load_complete(f, s);
+        } else if (s->flags & DIRTY_BITMAP_MIG_FLAG_BITS) {
+            ret = dirty_bitmap_load_bits(f, s);
         }
 
         if (!ret) {
@@ -676,7 +692,7 @@ static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
         if (ret) {
             return ret;
         }
-    } while (!(s.flags & DIRTY_BITMAP_MIG_FLAG_EOS));
+    } while (!(s->flags & DIRTY_BITMAP_MIG_FLAG_EOS));
 
     trace_dirty_bitmap_load_success();
     return 0;
@@ -684,13 +700,14 @@ static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
 
 static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
 {
+    DBMSaveState *s = &((DBMState *)opaque)->save;
     SaveBitmapState *dbms = NULL;
-    if (init_dirty_bitmap_migration() < 0) {
+    if (init_dirty_bitmap_migration(s) < 0) {
         return -1;
     }
 
-    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
-        send_bitmap_start(f, dbms);
+    QSIMPLEQ_FOREACH(dbms, &s->dbms_list, entry) {
+        send_bitmap_start(f, s, dbms);
     }
     qemu_put_bitmap_flags(f, DIRTY_BITMAP_MIG_FLAG_EOS);
 
@@ -699,7 +716,9 @@ static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
 
 static bool dirty_bitmap_is_active(void *opaque)
 {
-    return migrate_dirty_bitmaps() && !dirty_bitmap_mig_state.no_bitmaps;
+    DBMSaveState *s = &((DBMState *)opaque)->save;
+
+    return migrate_dirty_bitmaps() && !s->no_bitmaps;
 }
 
 static bool dirty_bitmap_is_active_iterate(void *opaque)
@@ -727,10 +746,10 @@ static SaveVMHandlers savevm_dirty_bitmap_handlers = {
 
 void dirty_bitmap_mig_init(void)
 {
-    QSIMPLEQ_INIT(&dirty_bitmap_mig_state.dbms_list);
-    qemu_mutex_init(&finish_lock);
+    QSIMPLEQ_INIT(&dbm_state.save.dbms_list);
+    qemu_mutex_init(&dbm_state.load.finish_lock);
 
     register_savevm_live("dirty-bitmap", 0, 1,
                          &savevm_dirty_bitmap_handlers,
-                         &dirty_bitmap_mig_state);
+                         &dbm_state);
 }
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v2 06/22] migration/block-dirty-bitmap: rename finish_lock to just lock
  2020-02-17 15:02 [PATCH v2 00/22] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (4 preceding siblings ...)
  2020-02-17 15:02 ` [PATCH v2 05/22] migration/block-dirty-bitmap: refactor state global variables Vladimir Sementsov-Ogievskiy
@ 2020-02-17 15:02 ` Vladimir Sementsov-Ogievskiy
  2020-02-18 13:20   ` Andrey Shinkevich
  2020-02-17 15:02 ` [PATCH v2 07/22] migration/block-dirty-bitmap: simplify dirty_bitmap_load_complete Vladimir Sementsov-Ogievskiy
                   ` (18 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-17 15:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, vsementsov, qemu-block, quintela, dgilbert,
	Stefan Hajnoczi, andrey.shinkevich, John Snow

finish_lock is bad name, as lock used not only on process end.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 migration/block-dirty-bitmap.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 7a82b76809..440c41cfca 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -143,7 +143,7 @@ typedef struct DBMLoadState {
     BdrvDirtyBitmap *bitmap;
 
     GSList *enabled_bitmaps;
-    QemuMutex finish_lock;
+    QemuMutex lock; /* protect enabled_bitmaps */
 } DBMLoadState;
 
 typedef struct DBMState {
@@ -507,7 +507,7 @@ void dirty_bitmap_mig_before_vm_start(void)
     DBMLoadState *s = &dbm_state.load;
     GSList *item;
 
-    qemu_mutex_lock(&s->finish_lock);
+    qemu_mutex_lock(&s->lock);
 
     for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
         LoadBitmapState *b = item->data;
@@ -524,7 +524,7 @@ void dirty_bitmap_mig_before_vm_start(void)
     g_slist_free(s->enabled_bitmaps);
     s->enabled_bitmaps = NULL;
 
-    qemu_mutex_unlock(&s->finish_lock);
+    qemu_mutex_unlock(&s->lock);
 }
 
 static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
@@ -533,7 +533,7 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
     trace_dirty_bitmap_load_complete();
     bdrv_dirty_bitmap_deserialize_finish(s->bitmap);
 
-    qemu_mutex_lock(&s->finish_lock);
+    qemu_mutex_lock(&s->lock);
 
     for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
         LoadBitmapState *b = item->data;
@@ -565,7 +565,7 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
         bdrv_dirty_bitmap_unlock(s->bitmap);
     }
 
-    qemu_mutex_unlock(&s->finish_lock);
+    qemu_mutex_unlock(&s->lock);
 }
 
 static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
@@ -747,7 +747,7 @@ static SaveVMHandlers savevm_dirty_bitmap_handlers = {
 void dirty_bitmap_mig_init(void)
 {
     QSIMPLEQ_INIT(&dbm_state.save.dbms_list);
-    qemu_mutex_init(&dbm_state.load.finish_lock);
+    qemu_mutex_init(&dbm_state.load.lock);
 
     register_savevm_live("dirty-bitmap", 0, 1,
                          &savevm_dirty_bitmap_handlers,
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v2 07/22] migration/block-dirty-bitmap: simplify dirty_bitmap_load_complete
  2020-02-17 15:02 [PATCH v2 00/22] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (5 preceding siblings ...)
  2020-02-17 15:02 ` [PATCH v2 06/22] migration/block-dirty-bitmap: rename finish_lock to just lock Vladimir Sementsov-Ogievskiy
@ 2020-02-17 15:02 ` Vladimir Sementsov-Ogievskiy
  2020-02-18 14:26   ` Andrey Shinkevich
  2020-02-17 15:02 ` [PATCH v2 08/22] migration/block-dirty-bitmap: keep bitmap state for all bitmaps Vladimir Sementsov-Ogievskiy
                   ` (17 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-17 15:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, vsementsov, qemu-block, quintela, dgilbert,
	Stefan Hajnoczi, andrey.shinkevich, John Snow

bdrv_enable_dirty_bitmap_locked() call does nothing, as if we are in
postcopy, bitmap successor must be enabled, and reclaim operation will
enable the bitmap.

So, actually we need just call _reclaim_ in both if branches, and
making differences only to add an assertion seems not really good. The
logic becomes simple: on load complete we do reclaim and that's all.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 migration/block-dirty-bitmap.c | 25 ++++---------------------
 1 file changed, 4 insertions(+), 21 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 440c41cfca..9cc750d93b 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -535,6 +535,10 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
 
     qemu_mutex_lock(&s->lock);
 
+    if (bdrv_dirty_bitmap_has_successor(s->bitmap)) {
+        bdrv_reclaim_dirty_bitmap(s->bitmap, &error_abort);
+    }
+
     for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
         LoadBitmapState *b = item->data;
 
@@ -544,27 +548,6 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
         }
     }
 
-    if (bdrv_dirty_bitmap_has_successor(s->bitmap)) {
-        bdrv_dirty_bitmap_lock(s->bitmap);
-        if (s->enabled_bitmaps == NULL) {
-            /* in postcopy */
-            bdrv_reclaim_dirty_bitmap_locked(s->bitmap, &error_abort);
-            bdrv_enable_dirty_bitmap_locked(s->bitmap);
-        } else {
-            /* target not started, successor must be empty */
-            int64_t count = bdrv_get_dirty_count(s->bitmap);
-            BdrvDirtyBitmap *ret = bdrv_reclaim_dirty_bitmap_locked(s->bitmap,
-                                                                    NULL);
-            /* bdrv_reclaim_dirty_bitmap can fail only on no successor (it
-             * must be) or on merge fail, but merge can't fail when second
-             * bitmap is empty
-             */
-            assert(ret == s->bitmap &&
-                   count == bdrv_get_dirty_count(s->bitmap));
-        }
-        bdrv_dirty_bitmap_unlock(s->bitmap);
-    }
-
     qemu_mutex_unlock(&s->lock);
 }
 
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v2 08/22] migration/block-dirty-bitmap: keep bitmap state for all bitmaps
  2020-02-17 15:02 [PATCH v2 00/22] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (6 preceding siblings ...)
  2020-02-17 15:02 ` [PATCH v2 07/22] migration/block-dirty-bitmap: simplify dirty_bitmap_load_complete Vladimir Sementsov-Ogievskiy
@ 2020-02-17 15:02 ` Vladimir Sementsov-Ogievskiy
  2020-02-18 17:07   ` Andrey Shinkevich
  2020-07-23 21:30   ` Eric Blake
  2020-02-17 15:02 ` [PATCH v2 09/22] migration/block-dirty-bitmap: relax error handling in incoming part Vladimir Sementsov-Ogievskiy
                   ` (16 subsequent siblings)
  24 siblings, 2 replies; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-17 15:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, vsementsov, qemu-block, quintela, dgilbert,
	Stefan Hajnoczi, andrey.shinkevich, John Snow

Keep bitmap state for disabled bitmaps too. Keep the state until the
end of the process. It's needed for the following commit to implement
bitmap postcopy canceling.

To clean-up the new list the following logic is used:
We need two events to consider bitmap migration finished:
1. chunk with DIRTY_BITMAP_MIG_FLAG_COMPLETE flag should be received
2. dirty_bitmap_mig_before_vm_start should be called
These two events may come in any order, so we understand which one is
last, and on the last of them we remove bitmap migration state from the
list.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 migration/block-dirty-bitmap.c | 64 +++++++++++++++++++++++-----------
 1 file changed, 43 insertions(+), 21 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 9cc750d93b..1329db8d7d 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -132,6 +132,7 @@ typedef struct LoadBitmapState {
     BlockDriverState *bs;
     BdrvDirtyBitmap *bitmap;
     bool migrated;
+    bool enabled;
 } LoadBitmapState;
 
 /* State of the dirty bitmap migration (DBM) during load process */
@@ -142,8 +143,10 @@ typedef struct DBMLoadState {
     BlockDriverState *bs;
     BdrvDirtyBitmap *bitmap;
 
-    GSList *enabled_bitmaps;
-    QemuMutex lock; /* protect enabled_bitmaps */
+    bool before_vm_start_handled; /* set in dirty_bitmap_mig_before_vm_start */
+
+    GSList *bitmaps;
+    QemuMutex lock; /* protect bitmaps */
 } DBMLoadState;
 
 typedef struct DBMState {
@@ -458,6 +461,7 @@ static int dirty_bitmap_load_start(QEMUFile *f, DBMLoadState *s)
     Error *local_err = NULL;
     uint32_t granularity = qemu_get_be32(f);
     uint8_t flags = qemu_get_byte(f);
+    LoadBitmapState *b;
 
     if (s->bitmap) {
         error_report("Bitmap with the same name ('%s') already exists on "
@@ -484,45 +488,59 @@ static int dirty_bitmap_load_start(QEMUFile *f, DBMLoadState *s)
 
     bdrv_disable_dirty_bitmap(s->bitmap);
     if (flags & DIRTY_BITMAP_MIG_START_FLAG_ENABLED) {
-        LoadBitmapState *b;
-
         bdrv_dirty_bitmap_create_successor(s->bitmap, &local_err);
         if (local_err) {
             error_report_err(local_err);
             return -EINVAL;
         }
-
-        b = g_new(LoadBitmapState, 1);
-        b->bs = s->bs;
-        b->bitmap = s->bitmap;
-        b->migrated = false;
-        s->enabled_bitmaps = g_slist_prepend(s->enabled_bitmaps, b);
     }
 
+    b = g_new(LoadBitmapState, 1);
+    b->bs = s->bs;
+    b->bitmap = s->bitmap;
+    b->migrated = false;
+    b->enabled = flags & DIRTY_BITMAP_MIG_START_FLAG_ENABLED,
+
+    s->bitmaps = g_slist_prepend(s->bitmaps, b);
+
     return 0;
 }
 
-void dirty_bitmap_mig_before_vm_start(void)
+/*
+ * before_vm_start_handle_item
+ *
+ * g_slist_foreach helper
+ *
+ * item is LoadBitmapState*
+ * opaque is DBMLoadState*
+ */
+static void before_vm_start_handle_item(void *item, void *opaque)
 {
-    DBMLoadState *s = &dbm_state.load;
-    GSList *item;
-
-    qemu_mutex_lock(&s->lock);
-
-    for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
-        LoadBitmapState *b = item->data;
+    DBMLoadState *s = opaque;
+    LoadBitmapState *b = item;
 
+    if (b->enabled) {
         if (b->migrated) {
             bdrv_enable_dirty_bitmap(b->bitmap);
         } else {
             bdrv_dirty_bitmap_enable_successor(b->bitmap);
         }
+    }
 
+    if (b->migrated) {
+        s->bitmaps = g_slist_remove(s->bitmaps, b);
         g_free(b);
     }
+}
 
-    g_slist_free(s->enabled_bitmaps);
-    s->enabled_bitmaps = NULL;
+void dirty_bitmap_mig_before_vm_start(void)
+{
+    DBMLoadState *s = &dbm_state.load;
+    qemu_mutex_lock(&s->lock);
+
+    assert(!s->before_vm_start_handled);
+    g_slist_foreach(s->bitmaps, before_vm_start_handle_item, s);
+    s->before_vm_start_handled = true;
 
     qemu_mutex_unlock(&s->lock);
 }
@@ -539,11 +557,15 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
         bdrv_reclaim_dirty_bitmap(s->bitmap, &error_abort);
     }
 
-    for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
+    for (item = s->bitmaps; item; item = g_slist_next(item)) {
         LoadBitmapState *b = item->data;
 
         if (b->bitmap == s->bitmap) {
             b->migrated = true;
+            if (s->before_vm_start_handled) {
+                s->bitmaps = g_slist_remove(s->bitmaps, b);
+                g_free(b);
+            }
             break;
         }
     }
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v2 09/22] migration/block-dirty-bitmap: relax error handling in incoming part
  2020-02-17 15:02 [PATCH v2 00/22] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (7 preceding siblings ...)
  2020-02-17 15:02 ` [PATCH v2 08/22] migration/block-dirty-bitmap: keep bitmap state for all bitmaps Vladimir Sementsov-Ogievskiy
@ 2020-02-17 15:02 ` Vladimir Sementsov-Ogievskiy
  2020-02-18 18:54   ` Andrey Shinkevich
  2020-02-17 15:02 ` [PATCH v2 10/22] migration/block-dirty-bitmap: cancel migration on shutdown Vladimir Sementsov-Ogievskiy
                   ` (15 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-17 15:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, vsementsov, qemu-block, quintela, dgilbert,
	Stefan Hajnoczi, andrey.shinkevich, John Snow

Bitmaps data is not critical, and we should not fail the migration (or
use postcopy recovering) because of dirty-bitmaps migration failure.
Instead we should just lose unfinished bitmaps.

Still we have to report io stream violation errors, as they affect the
whole migration stream.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 migration/block-dirty-bitmap.c | 148 +++++++++++++++++++++++++--------
 1 file changed, 113 insertions(+), 35 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 1329db8d7d..aea5326804 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -145,6 +145,15 @@ typedef struct DBMLoadState {
 
     bool before_vm_start_handled; /* set in dirty_bitmap_mig_before_vm_start */
 
+    /*
+     * cancelled
+     * Incoming migration is cancelled for some reason. That means that we
+     * still should read our chunks from migration stream, to not affect other
+     * migration objects (like RAM), but just ignore them and do not touch any
+     * bitmaps or nodes.
+     */
+    bool cancelled;
+
     GSList *bitmaps;
     QemuMutex lock; /* protect bitmaps */
 } DBMLoadState;
@@ -545,13 +554,47 @@ void dirty_bitmap_mig_before_vm_start(void)
     qemu_mutex_unlock(&s->lock);
 }
 
+static void cancel_incoming_locked(DBMLoadState *s)
+{
+    GSList *item;
+
+    if (s->cancelled) {
+        return;
+    }
+
+    s->cancelled = true;
+    s->bs = NULL;
+    s->bitmap = NULL;
+
+    /* Drop all unfinished bitmaps */
+    for (item = s->bitmaps; item; item = g_slist_next(item)) {
+        LoadBitmapState *b = item->data;
+
+        /*
+         * Bitmap must be unfinished, as finished bitmaps should already be
+         * removed from the list.
+         */
+        assert(!s->before_vm_start_handled || !b->migrated);
+        if (bdrv_dirty_bitmap_has_successor(b->bitmap)) {
+            bdrv_reclaim_dirty_bitmap(b->bitmap, &error_abort);
+        }
+        bdrv_release_dirty_bitmap(b->bitmap);
+    }
+
+    g_slist_free_full(s->bitmaps, g_free);
+    s->bitmaps = NULL;
+}
+
 static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
 {
     GSList *item;
     trace_dirty_bitmap_load_complete();
-    bdrv_dirty_bitmap_deserialize_finish(s->bitmap);
 
-    qemu_mutex_lock(&s->lock);
+    if (s->cancelled) {
+        return;
+    }
+
+    bdrv_dirty_bitmap_deserialize_finish(s->bitmap);
 
     if (bdrv_dirty_bitmap_has_successor(s->bitmap)) {
         bdrv_reclaim_dirty_bitmap(s->bitmap, &error_abort);
@@ -569,8 +612,6 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
             break;
         }
     }
-
-    qemu_mutex_unlock(&s->lock);
 }
 
 static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
@@ -582,15 +623,32 @@ static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
 
     if (s->flags & DIRTY_BITMAP_MIG_FLAG_ZEROES) {
         trace_dirty_bitmap_load_bits_zeroes();
-        bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_byte, nr_bytes,
-                                             false);
+        if (!s->cancelled) {
+            bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_byte,
+                                                 nr_bytes, false);
+        }
     } else {
         size_t ret;
         uint8_t *buf;
         uint64_t buf_size = qemu_get_be64(f);
-        uint64_t needed_size =
-            bdrv_dirty_bitmap_serialization_size(s->bitmap,
-                                                 first_byte, nr_bytes);
+        uint64_t needed_size;
+
+        buf = g_malloc(buf_size);
+        ret = qemu_get_buffer(f, buf, buf_size);
+        if (ret != buf_size) {
+            error_report("Failed to read bitmap bits");
+            g_free(buf);
+            return -EIO;
+        }
+
+        if (s->cancelled) {
+            g_free(buf);
+            return 0;
+        }
+
+        needed_size = bdrv_dirty_bitmap_serialization_size(s->bitmap,
+                                                           first_byte,
+                                                           nr_bytes);
 
         if (needed_size > buf_size ||
             buf_size > QEMU_ALIGN_UP(needed_size, 4 * sizeof(long))
@@ -599,15 +657,8 @@ static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
             error_report("Migrated bitmap granularity doesn't "
                          "match the destination bitmap '%s' granularity",
                          bdrv_dirty_bitmap_name(s->bitmap));
-            return -EINVAL;
-        }
-
-        buf = g_malloc(buf_size);
-        ret = qemu_get_buffer(f, buf, buf_size);
-        if (ret != buf_size) {
-            error_report("Failed to read bitmap bits");
-            g_free(buf);
-            return -EIO;
+            cancel_incoming_locked(s);
+            return 0;
         }
 
         bdrv_dirty_bitmap_deserialize_part(s->bitmap, buf, first_byte, nr_bytes,
@@ -632,14 +683,16 @@ static int dirty_bitmap_load_header(QEMUFile *f, DBMLoadState *s)
             error_report("Unable to read node name string");
             return -EINVAL;
         }
-        s->bs = bdrv_lookup_bs(s->node_name, s->node_name, &local_err);
-        if (!s->bs) {
-            error_report_err(local_err);
-            return -EINVAL;
+        if (!s->cancelled) {
+            s->bs = bdrv_lookup_bs(s->node_name, s->node_name, &local_err);
+            if (!s->bs) {
+                error_report_err(local_err);
+                cancel_incoming_locked(s);
+            }
         }
-    } else if (!s->bs && !nothing) {
+    } else if (!s->bs && !nothing && !s->cancelled) {
         error_report("Error: block device name is not set");
-        return -EINVAL;
+        cancel_incoming_locked(s);
     }
 
     if (s->flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
@@ -647,24 +700,38 @@ static int dirty_bitmap_load_header(QEMUFile *f, DBMLoadState *s)
             error_report("Unable to read bitmap name string");
             return -EINVAL;
         }
-        s->bitmap = bdrv_find_dirty_bitmap(s->bs, s->bitmap_name);
-
-        /* bitmap may be NULL here, it wouldn't be an error if it is the
-         * first occurrence of the bitmap */
-        if (!s->bitmap && !(s->flags & DIRTY_BITMAP_MIG_FLAG_START)) {
-            error_report("Error: unknown dirty bitmap "
-                         "'%s' for block device '%s'",
-                         s->bitmap_name, s->node_name);
-            return -EINVAL;
+        if (!s->cancelled) {
+            s->bitmap = bdrv_find_dirty_bitmap(s->bs, s->bitmap_name);
+
+            /*
+             * bitmap may be NULL here, it wouldn't be an error if it is the
+             * first occurrence of the bitmap
+             */
+            if (!s->bitmap && !(s->flags & DIRTY_BITMAP_MIG_FLAG_START)) {
+                error_report("Error: unknown dirty bitmap "
+                             "'%s' for block device '%s'",
+                             s->bitmap_name, s->node_name);
+                cancel_incoming_locked(s);
+            }
         }
-    } else if (!s->bitmap && !nothing) {
+    } else if (!s->bitmap && !nothing && !s->cancelled) {
         error_report("Error: block device name is not set");
-        return -EINVAL;
+        cancel_incoming_locked(s);
     }
 
     return 0;
 }
 
+/*
+ * dirty_bitmap_load
+ *
+ * Load sequence of dirty bitmap chunks. Return error only on fatal io stream
+ * violations. On other errors just cancel bitmaps incoming migration and return
+ * 0.
+ *
+ * Note, than when incoming bitmap migration is canceled, we still must read all
+ * our chunks (and just ignore them), to not affect other migration objects.
+ */
 static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
 {
     DBMLoadState *s = &((DBMState *)opaque)->load;
@@ -673,12 +740,19 @@ static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
     trace_dirty_bitmap_load_enter();
 
     if (version_id != 1) {
+        qemu_mutex_lock(&s->lock);
+        cancel_incoming_locked(s);
+        qemu_mutex_unlock(&s->lock);
         return -EINVAL;
     }
 
     do {
+        qemu_mutex_lock(&s->lock);
+
         ret = dirty_bitmap_load_header(f, s);
         if (ret < 0) {
+            cancel_incoming_locked(s);
+            qemu_mutex_unlock(&s->lock);
             return ret;
         }
 
@@ -695,8 +769,12 @@ static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
         }
 
         if (ret) {
+            cancel_incoming_locked(s);
+            qemu_mutex_unlock(&s->lock);
             return ret;
         }
+
+        qemu_mutex_unlock(&s->lock);
     } while (!(s->flags & DIRTY_BITMAP_MIG_FLAG_EOS));
 
     trace_dirty_bitmap_load_success();
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v2 10/22] migration/block-dirty-bitmap: cancel migration on shutdown
  2020-02-17 15:02 [PATCH v2 00/22] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (8 preceding siblings ...)
  2020-02-17 15:02 ` [PATCH v2 09/22] migration/block-dirty-bitmap: relax error handling in incoming part Vladimir Sementsov-Ogievskiy
@ 2020-02-17 15:02 ` Vladimir Sementsov-Ogievskiy
  2020-02-18 19:11   ` Andrey Shinkevich
  2020-07-23 21:04   ` Eric Blake
  2020-02-17 15:02 ` [PATCH v2 11/22] migration/savevm: don't worry if bitmap migration postcopy failed Vladimir Sementsov-Ogievskiy
                   ` (14 subsequent siblings)
  24 siblings, 2 replies; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-17 15:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, vsementsov, qemu-block, quintela, dgilbert,
	Stefan Hajnoczi, andrey.shinkevich, John Snow

If target is turned of prior to postcopy finished, target crashes
because busy bitmaps are found at shutdown.
Canceling incoming migration helps, as it removes all unfinished (and
therefore busy) bitmaps.

Similarly on source we crash in bdrv_close_all which asserts that all
bdrv states are removed, because bdrv states involved into dirty bitmap
migration are referenced by it. So, we need to cancel outgoing
migration as well.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 migration/migration.h          |  2 ++
 migration/block-dirty-bitmap.c | 16 ++++++++++++++++
 migration/migration.c          | 13 +++++++++++++
 3 files changed, 31 insertions(+)

diff --git a/migration/migration.h b/migration/migration.h
index 2948f2387b..2de6b8bbe2 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -332,6 +332,8 @@ void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
 void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
 
 void dirty_bitmap_mig_before_vm_start(void);
+void dirty_bitmap_mig_cancel_outgoing(void);
+void dirty_bitmap_mig_cancel_incoming(void);
 void migrate_add_address(SocketAddress *address);
 
 int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index aea5326804..3ca425d95e 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -585,6 +585,22 @@ static void cancel_incoming_locked(DBMLoadState *s)
     s->bitmaps = NULL;
 }
 
+void dirty_bitmap_mig_cancel_outgoing(void)
+{
+    dirty_bitmap_do_save_cleanup(&dbm_state.save);
+}
+
+void dirty_bitmap_mig_cancel_incoming(void)
+{
+    DBMLoadState *s = &dbm_state.load;
+
+    qemu_mutex_lock(&s->lock);
+
+    cancel_incoming_locked(s);
+
+    qemu_mutex_unlock(&s->lock);
+}
+
 static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
 {
     GSList *item;
diff --git a/migration/migration.c b/migration/migration.c
index 515047932c..7c605ba218 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -181,6 +181,19 @@ void migration_shutdown(void)
      */
     migrate_fd_cancel(current_migration);
     object_unref(OBJECT(current_migration));
+
+    /*
+     * Cancel outgoing migration of dirty bitmaps. It should
+     * at least unref used block nodes.
+     */
+    dirty_bitmap_mig_cancel_outgoing();
+
+    /*
+     * Cancel incoming migration of dirty bitmaps. Dirty bitmaps
+     * are non-critical data, and their loss never considered as
+     * something serious.
+     */
+    dirty_bitmap_mig_cancel_incoming();
 }
 
 /* For outgoing */
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v2 11/22] migration/savevm: don't worry if bitmap migration postcopy failed
  2020-02-17 15:02 [PATCH v2 00/22] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (9 preceding siblings ...)
  2020-02-17 15:02 ` [PATCH v2 10/22] migration/block-dirty-bitmap: cancel migration on shutdown Vladimir Sementsov-Ogievskiy
@ 2020-02-17 15:02 ` Vladimir Sementsov-Ogievskiy
  2020-02-17 16:57   ` Dr. David Alan Gilbert
  2020-02-18 19:44   ` Andrey Shinkevich
  2020-02-17 15:02 ` [PATCH v2 12/22] qemu-iotests/199: fix style Vladimir Sementsov-Ogievskiy
                   ` (13 subsequent siblings)
  24 siblings, 2 replies; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-17 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: andrey.shinkevich, vsementsov, dgilbert, quintela

First, if only bitmaps postcopy enabled (not ram postcopy)
postcopy_pause_incoming crashes on assertion assert(mis->to_src_file).

And anyway, bitmaps postcopy is not prepared to be somehow recovered.
The original idea instead is that if bitmaps postcopy failed, we just
loss some bitmaps, which is not critical. So, on failure we just need
to remove unfinished bitmaps and guest should continue execution on
destination.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 migration/savevm.c | 37 ++++++++++++++++++++++++++++++++-----
 1 file changed, 32 insertions(+), 5 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 1d4220ece8..7e9dd58ccb 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1812,6 +1812,9 @@ static void *postcopy_ram_listen_thread(void *opaque)
     MigrationIncomingState *mis = migration_incoming_get_current();
     QEMUFile *f = mis->from_src_file;
     int load_res;
+    MigrationState *migr = migrate_get_current();
+
+    object_ref(OBJECT(migr));
 
     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
                                    MIGRATION_STATUS_POSTCOPY_ACTIVE);
@@ -1838,11 +1841,24 @@ static void *postcopy_ram_listen_thread(void *opaque)
 
     trace_postcopy_ram_listen_thread_exit();
     if (load_res < 0) {
-        error_report("%s: loadvm failed: %d", __func__, load_res);
         qemu_file_set_error(f, load_res);
-        migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
-                                       MIGRATION_STATUS_FAILED);
-    } else {
+        dirty_bitmap_mig_cancel_incoming();
+        if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
+            !migrate_postcopy_ram() && migrate_dirty_bitmaps())
+        {
+            error_report("%s: loadvm failed during postcopy: %d. All state is "
+                         "migrated except for dirty bitmaps. Some dirty "
+                         "bitmaps may be lost, and present migrated dirty "
+                         "bitmaps are correctly migrated and valid.",
+                         __func__, load_res);
+            load_res = 0; /* prevent further exit() */
+        } else {
+            error_report("%s: loadvm failed: %d", __func__, load_res);
+            migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+                                           MIGRATION_STATUS_FAILED);
+        }
+    }
+    if (load_res >= 0) {
         /*
          * This looks good, but it's possible that the device loading in the
          * main thread hasn't finished yet, and so we might not be in 'RUN'
@@ -1878,6 +1894,8 @@ static void *postcopy_ram_listen_thread(void *opaque)
     mis->have_listen_thread = false;
     postcopy_state_set(POSTCOPY_INCOMING_END);
 
+    object_unref(OBJECT(migr));
+
     return NULL;
 }
 
@@ -2429,6 +2447,8 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
 {
     trace_postcopy_pause_incoming();
 
+    assert(migrate_postcopy_ram());
+
     /* Clear the triggered bit to allow one recovery */
     mis->postcopy_recover_triggered = false;
 
@@ -2513,15 +2533,22 @@ out:
     if (ret < 0) {
         qemu_file_set_error(f, ret);
 
+        /* Cancel bitmaps incoming regardless of recovery */
+        dirty_bitmap_mig_cancel_incoming();
+
         /*
          * If we are during an active postcopy, then we pause instead
          * of bail out to at least keep the VM's dirty data.  Note
          * that POSTCOPY_INCOMING_LISTENING stage is still not enough,
          * during which we're still receiving device states and we
          * still haven't yet started the VM on destination.
+         *
+         * Only RAM postcopy supports recovery. Still, if RAM postcopy is
+         * enabled, canceled bitmaps postcopy will not affect RAM postcopy
+         * recovering.
          */
         if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
-            postcopy_pause_incoming(mis)) {
+            migrate_postcopy_ram() && postcopy_pause_incoming(mis)) {
             /* Reset f to point to the newly created channel */
             f = mis->from_src_file;
             goto retry;
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v2 12/22] qemu-iotests/199: fix style
  2020-02-17 15:02 [PATCH v2 00/22] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (10 preceding siblings ...)
  2020-02-17 15:02 ` [PATCH v2 11/22] migration/savevm: don't worry if bitmap migration postcopy failed Vladimir Sementsov-Ogievskiy
@ 2020-02-17 15:02 ` Vladimir Sementsov-Ogievskiy
  2020-02-19  7:04   ` Andrey Shinkevich
  2020-07-23 22:03   ` Eric Blake
  2020-02-17 15:02 ` [PATCH v2 13/22] qemu-iotests/199: drop extra constraints Vladimir Sementsov-Ogievskiy
                   ` (12 subsequent siblings)
  24 siblings, 2 replies; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-17 15:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, vsementsov, qemu-block, quintela, dgilbert,
	Max Reitz, andrey.shinkevich

Mostly, satisfy pep8 complains.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/199 | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index 40774eed74..de9ba8d94c 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -28,8 +28,8 @@ disk_b = os.path.join(iotests.test_dir, 'disk_b')
 size = '256G'
 fifo = os.path.join(iotests.test_dir, 'mig_fifo')
 
-class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 
+class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
     def tearDown(self):
         self.vm_a.shutdown()
         self.vm_b.shutdown()
@@ -54,7 +54,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 
         result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
                                name='bitmap', granularity=granularity)
-        self.assert_qmp(result, 'return', {});
+        self.assert_qmp(result, 'return', {})
 
         s = 0
         while s < write_size:
@@ -71,7 +71,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 
         result = self.vm_a.qmp('block-dirty-bitmap-clear', node='drive0',
                                name='bitmap')
-        self.assert_qmp(result, 'return', {});
+        self.assert_qmp(result, 'return', {})
         s = 0
         while s < write_size:
             self.vm_a.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
@@ -104,15 +104,16 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
             self.vm_b.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
             s += 0x10000
 
-        result = self.vm_b.qmp('query-block');
+        result = self.vm_b.qmp('query-block')
         while len(result['return'][0]['dirty-bitmaps']) > 1:
             time.sleep(2)
-            result = self.vm_b.qmp('query-block');
+            result = self.vm_b.qmp('query-block')
 
         result = self.vm_b.qmp('x-debug-block-dirty-bitmap-sha256',
                                node='drive0', name='bitmap')
 
-        self.assert_qmp(result, 'return/sha256', sha256);
+        self.assert_qmp(result, 'return/sha256', sha256)
+
 
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2'], supported_cache_modes=['none'],
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v2 13/22] qemu-iotests/199: drop extra constraints
  2020-02-17 15:02 [PATCH v2 00/22] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (11 preceding siblings ...)
  2020-02-17 15:02 ` [PATCH v2 12/22] qemu-iotests/199: fix style Vladimir Sementsov-Ogievskiy
@ 2020-02-17 15:02 ` Vladimir Sementsov-Ogievskiy
  2020-02-19  8:02   ` Andrey Shinkevich
  2020-02-17 15:02 ` [PATCH v2 14/22] qemu-iotests/199: better catch postcopy time Vladimir Sementsov-Ogievskiy
                   ` (11 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-17 15:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, vsementsov, qemu-block, quintela, dgilbert,
	Max Reitz, andrey.shinkevich

We don't need any specific format constraints here. Still keep qcow2
for two reasons:
1. No extra calls of format-unrelated test
2. Add some check around persistent bitmap in future (require qcow2)

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/199 | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index de9ba8d94c..dda918450a 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -116,5 +116,4 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 
 
 if __name__ == '__main__':
-    iotests.main(supported_fmts=['qcow2'], supported_cache_modes=['none'],
-                 supported_protocols=['file'])
+    iotests.main(supported_fmts=['qcow2'])
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v2 14/22] qemu-iotests/199: better catch postcopy time
  2020-02-17 15:02 [PATCH v2 00/22] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (12 preceding siblings ...)
  2020-02-17 15:02 ` [PATCH v2 13/22] qemu-iotests/199: drop extra constraints Vladimir Sementsov-Ogievskiy
@ 2020-02-17 15:02 ` Vladimir Sementsov-Ogievskiy
  2020-02-19 13:16   ` Andrey Shinkevich
  2020-02-17 15:02 ` [PATCH v2 15/22] qemu-iotests/199: improve performance: set bitmap by discard Vladimir Sementsov-Ogievskiy
                   ` (10 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-17 15:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, vsementsov, qemu-block, quintela, dgilbert,
	Max Reitz, andrey.shinkevich

The test aims to test _postcopy_ migration, and wants to do some write
operations during postcopy time.

Test considers migrate status=complete event on source as start of
postcopy. This is completely wrong, completion is completion of the
whole migration process. Let's instead consider destination start as
start of postcopy, and use RESUME event for it.

Next, as migration finish, let's use migration status=complete event on
target, as such method is closer to what libvirt or another user will
do, than tracking number of dirty-bitmaps.

Finally, add a possibility to dump events for debug. And if
set debug to True, we see, that actual postcopy period is very small
relatively to the whole test duration time (~0.2 seconds to >40 seconds
for me). This means, that test is very inefficient in what it supposed
to do. Let's improve it in following commits.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/199 | 72 +++++++++++++++++++++++++++++++++---------
 1 file changed, 57 insertions(+), 15 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index dda918450a..6599fc6fb4 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -20,17 +20,43 @@
 
 import os
 import iotests
-import time
 from iotests import qemu_img
 
+debug = False
+
 disk_a = os.path.join(iotests.test_dir, 'disk_a')
 disk_b = os.path.join(iotests.test_dir, 'disk_b')
 size = '256G'
 fifo = os.path.join(iotests.test_dir, 'mig_fifo')
 
 
+def event_seconds(event):
+    return event['timestamp']['seconds'] + \
+        event['timestamp']['microseconds'] / 1000000.0
+
+
+def event_dist(e1, e2):
+    return event_seconds(e2) - event_seconds(e1)
+
+
 class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
     def tearDown(self):
+        if debug:
+            self.vm_a_events += self.vm_a.get_qmp_events()
+            self.vm_b_events += self.vm_b.get_qmp_events()
+            for e in self.vm_a_events:
+                e['vm'] = 'SRC'
+            for e in self.vm_b_events:
+                e['vm'] = 'DST'
+            events = (self.vm_a_events + self.vm_b_events)
+            events = [(e['timestamp']['seconds'],
+                       e['timestamp']['microseconds'],
+                       e['vm'],
+                       e['event'],
+                       e.get('data', '')) for e in events]
+            for e in sorted(events):
+                print('{}.{:06} {} {} {}'.format(*e))
+
         self.vm_a.shutdown()
         self.vm_b.shutdown()
         os.remove(disk_a)
@@ -47,6 +73,10 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
         self.vm_a.launch()
         self.vm_b.launch()
 
+        # collect received events for debug
+        self.vm_a_events = []
+        self.vm_b_events = []
+
     def test_postcopy(self):
         write_size = 0x40000000
         granularity = 512
@@ -77,15 +107,13 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
             self.vm_a.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
             s += 0x10000
 
-        bitmaps_cap = {'capability': 'dirty-bitmaps', 'state': True}
-        events_cap = {'capability': 'events', 'state': True}
+        caps = [{'capability': 'dirty-bitmaps', 'state': True},
+                {'capability': 'events', 'state': True}]
 
-        result = self.vm_a.qmp('migrate-set-capabilities',
-                               capabilities=[bitmaps_cap, events_cap])
+        result = self.vm_a.qmp('migrate-set-capabilities', capabilities=caps)
         self.assert_qmp(result, 'return', {})
 
-        result = self.vm_b.qmp('migrate-set-capabilities',
-                               capabilities=[bitmaps_cap])
+        result = self.vm_b.qmp('migrate-set-capabilities', capabilities=caps)
         self.assert_qmp(result, 'return', {})
 
         result = self.vm_a.qmp('migrate', uri='exec:cat>' + fifo)
@@ -94,24 +122,38 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
         result = self.vm_a.qmp('migrate-start-postcopy')
         self.assert_qmp(result, 'return', {})
 
-        while True:
-            event = self.vm_a.event_wait('MIGRATION')
-            if event['data']['status'] == 'completed':
-                break
+        e_resume = self.vm_b.event_wait('RESUME')
+        self.vm_b_events.append(e_resume)
 
         s = 0x8000
         while s < write_size:
             self.vm_b.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
             s += 0x10000
 
+        match = {'data': {'status': 'completed'}}
+        e_complete = self.vm_b.event_wait('MIGRATION', match=match)
+        self.vm_b_events.append(e_complete)
+
+        # take queued event, should already been happened
+        e_stop = self.vm_a.event_wait('STOP')
+        self.vm_a_events.append(e_stop)
+
+        downtime = event_dist(e_stop, e_resume)
+        postcopy_time = event_dist(e_resume, e_complete)
+
+        # TODO: assert downtime * 10 < postcopy_time
+        if debug:
+            print('downtime:', downtime)
+            print('postcopy_time:', postcopy_time)
+
+        # Assert that bitmap migration is finished (check that successor bitmap
+        # is removed)
         result = self.vm_b.qmp('query-block')
-        while len(result['return'][0]['dirty-bitmaps']) > 1:
-            time.sleep(2)
-            result = self.vm_b.qmp('query-block')
+        assert len(result['return'][0]['dirty-bitmaps']) == 1
 
+        # Check content of migrated (and updated by new writes) bitmap
         result = self.vm_b.qmp('x-debug-block-dirty-bitmap-sha256',
                                node='drive0', name='bitmap')
-
         self.assert_qmp(result, 'return/sha256', sha256)
 
 
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v2 15/22] qemu-iotests/199: improve performance: set bitmap by discard
  2020-02-17 15:02 [PATCH v2 00/22] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (13 preceding siblings ...)
  2020-02-17 15:02 ` [PATCH v2 14/22] qemu-iotests/199: better catch postcopy time Vladimir Sementsov-Ogievskiy
@ 2020-02-17 15:02 ` Vladimir Sementsov-Ogievskiy
  2020-02-19 14:17   ` Andrey Shinkevich
  2020-02-17 15:02 ` [PATCH v2 16/22] qemu-iotests/199: change discard patterns Vladimir Sementsov-Ogievskiy
                   ` (9 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-17 15:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, vsementsov, qemu-block, quintela, dgilbert,
	Max Reitz, andrey.shinkevich

Discard dirties dirty-bitmap as well as write, but works faster. Let's
use it instead.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/199 | 31 ++++++++++++++++++++-----------
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index 6599fc6fb4..d78f81b71c 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -67,8 +67,10 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
         os.mkfifo(fifo)
         qemu_img('create', '-f', iotests.imgfmt, disk_a, size)
         qemu_img('create', '-f', iotests.imgfmt, disk_b, size)
-        self.vm_a = iotests.VM(path_suffix='a').add_drive(disk_a)
-        self.vm_b = iotests.VM(path_suffix='b').add_drive(disk_b)
+        self.vm_a = iotests.VM(path_suffix='a').add_drive(disk_a,
+                                                          'discard=unmap')
+        self.vm_b = iotests.VM(path_suffix='b').add_drive(disk_b,
+                                                          'discard=unmap')
         self.vm_b.add_incoming("exec: cat '" + fifo + "'")
         self.vm_a.launch()
         self.vm_b.launch()
@@ -78,7 +80,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
         self.vm_b_events = []
 
     def test_postcopy(self):
-        write_size = 0x40000000
+        discard_size = 0x40000000
         granularity = 512
         chunk = 4096
 
@@ -86,25 +88,32 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
                                name='bitmap', granularity=granularity)
         self.assert_qmp(result, 'return', {})
 
+        result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
+                               node='drive0', name='bitmap')
+        empty_sha256 = result['return']['sha256']
+
         s = 0
-        while s < write_size:
-            self.vm_a.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
+        while s < discard_size:
+            self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
             s += 0x10000
         s = 0x8000
-        while s < write_size:
-            self.vm_a.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
+        while s < discard_size:
+            self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
             s += 0x10000
 
         result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
                                node='drive0', name='bitmap')
         sha256 = result['return']['sha256']
 
+        # Check, that updating the bitmap by discards works
+        assert sha256 != empty_sha256
+
         result = self.vm_a.qmp('block-dirty-bitmap-clear', node='drive0',
                                name='bitmap')
         self.assert_qmp(result, 'return', {})
         s = 0
-        while s < write_size:
-            self.vm_a.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
+        while s < discard_size:
+            self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
             s += 0x10000
 
         caps = [{'capability': 'dirty-bitmaps', 'state': True},
@@ -126,8 +135,8 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
         self.vm_b_events.append(e_resume)
 
         s = 0x8000
-        while s < write_size:
-            self.vm_b.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
+        while s < discard_size:
+            self.vm_b.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
             s += 0x10000
 
         match = {'data': {'status': 'completed'}}
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v2 16/22] qemu-iotests/199: change discard patterns
  2020-02-17 15:02 [PATCH v2 00/22] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (14 preceding siblings ...)
  2020-02-17 15:02 ` [PATCH v2 15/22] qemu-iotests/199: improve performance: set bitmap by discard Vladimir Sementsov-Ogievskiy
@ 2020-02-17 15:02 ` Vladimir Sementsov-Ogievskiy
  2020-02-19 14:33   ` Andrey Shinkevich
  2020-07-24  0:23   ` Eric Blake
  2020-02-17 15:02 ` [PATCH v2 17/22] qemu-iotests/199: increase postcopy period Vladimir Sementsov-Ogievskiy
                   ` (8 subsequent siblings)
  24 siblings, 2 replies; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-17 15:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, vsementsov, qemu-block, quintela, dgilbert,
	Max Reitz, andrey.shinkevich

iotest 40 works too long because of many discard opertion. On the same
time, postcopy period is very short, in spite of all these efforts.

So, let's use less discards (and with more interesting patterns) to
reduce test timing. In the next commit we'll increase postcopy period.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/199 | 44 +++++++++++++++++++++++++-----------------
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index d78f81b71c..7914fd0b2b 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -30,6 +30,28 @@ size = '256G'
 fifo = os.path.join(iotests.test_dir, 'mig_fifo')
 
 
+GiB = 1024 * 1024 * 1024
+
+discards1 = (
+    (0, GiB),
+    (2 * GiB + 512 * 5, 512),
+    (3 * GiB + 512 * 5, 512),
+    (100 * GiB, GiB)
+)
+
+discards2 = (
+    (3 * GiB + 512 * 8, 512),
+    (4 * GiB + 512 * 8, 512),
+    (50 * GiB, GiB),
+    (100 * GiB + GiB // 2, GiB)
+)
+
+
+def apply_discards(vm, discards):
+    for d in discards:
+        vm.hmp_qemu_io('drive0', 'discard {} {}'.format(*d))
+
+
 def event_seconds(event):
     return event['timestamp']['seconds'] + \
         event['timestamp']['microseconds'] / 1000000.0
@@ -80,9 +102,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
         self.vm_b_events = []
 
     def test_postcopy(self):
-        discard_size = 0x40000000
         granularity = 512
-        chunk = 4096
 
         result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
                                name='bitmap', granularity=granularity)
@@ -92,14 +112,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
                                node='drive0', name='bitmap')
         empty_sha256 = result['return']['sha256']
 
-        s = 0
-        while s < discard_size:
-            self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
-            s += 0x10000
-        s = 0x8000
-        while s < discard_size:
-            self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
-            s += 0x10000
+        apply_discards(self.vm_a, discards1 + discards2)
 
         result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
                                node='drive0', name='bitmap')
@@ -111,10 +124,8 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
         result = self.vm_a.qmp('block-dirty-bitmap-clear', node='drive0',
                                name='bitmap')
         self.assert_qmp(result, 'return', {})
-        s = 0
-        while s < discard_size:
-            self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
-            s += 0x10000
+
+        apply_discards(self.vm_a, discards1)
 
         caps = [{'capability': 'dirty-bitmaps', 'state': True},
                 {'capability': 'events', 'state': True}]
@@ -134,10 +145,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
         e_resume = self.vm_b.event_wait('RESUME')
         self.vm_b_events.append(e_resume)
 
-        s = 0x8000
-        while s < discard_size:
-            self.vm_b.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
-            s += 0x10000
+        apply_discards(self.vm_b, discards2)
 
         match = {'data': {'status': 'completed'}}
         e_complete = self.vm_b.event_wait('MIGRATION', match=match)
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v2 17/22] qemu-iotests/199: increase postcopy period
  2020-02-17 15:02 [PATCH v2 00/22] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (15 preceding siblings ...)
  2020-02-17 15:02 ` [PATCH v2 16/22] qemu-iotests/199: change discard patterns Vladimir Sementsov-Ogievskiy
@ 2020-02-17 15:02 ` Vladimir Sementsov-Ogievskiy
  2020-02-19 14:56   ` Andrey Shinkevich
  2020-07-24  0:14   ` Eric Blake
  2020-02-17 15:02 ` [PATCH v2 18/22] python/qemu/machine: add kill() method Vladimir Sementsov-Ogievskiy
                   ` (7 subsequent siblings)
  24 siblings, 2 replies; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-17 15:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, vsementsov, qemu-block, quintela, dgilbert,
	Max Reitz, andrey.shinkevich

Test wants force bitmap postcopy. Still, resulting postcopy period is
very small. Let's increase it by adding more bitmaps to migrate. Also,
test disabled bitmaps migration.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/199 | 58 ++++++++++++++++++++++++++++--------------
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index 7914fd0b2b..9a6e8dcb9d 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -103,29 +103,45 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 
     def test_postcopy(self):
         granularity = 512
+        nb_bitmaps = 15
 
-        result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
-                               name='bitmap', granularity=granularity)
-        self.assert_qmp(result, 'return', {})
+        for i in range(nb_bitmaps):
+            result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
+                                   name='bitmap{}'.format(i),
+                                   granularity=granularity)
+            self.assert_qmp(result, 'return', {})
 
         result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
-                               node='drive0', name='bitmap')
+                               node='drive0', name='bitmap0')
         empty_sha256 = result['return']['sha256']
 
-        apply_discards(self.vm_a, discards1 + discards2)
+        apply_discards(self.vm_a, discards1)
 
         result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
-                               node='drive0', name='bitmap')
-        sha256 = result['return']['sha256']
+                               node='drive0', name='bitmap0')
+        discards1_sha256 = result['return']['sha256']
 
         # Check, that updating the bitmap by discards works
-        assert sha256 != empty_sha256
+        assert discards1_sha256 != empty_sha256
 
-        result = self.vm_a.qmp('block-dirty-bitmap-clear', node='drive0',
-                               name='bitmap')
-        self.assert_qmp(result, 'return', {})
+        # We want to calculate resulting sha256. Do it in bitmap0, so, disable
+        # other bitmaps
+        for i in range(1, nb_bitmaps):
+            result = self.vm_a.qmp('block-dirty-bitmap-disable', node='drive0',
+                                   name='bitmap{}'.format(i))
+            self.assert_qmp(result, 'return', {})
 
-        apply_discards(self.vm_a, discards1)
+        apply_discards(self.vm_a, discards2)
+
+        result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
+                               node='drive0', name='bitmap0')
+        all_discards_sha256 = result['return']['sha256']
+
+        # Now, enable some bitmaps, to be updated during migration
+        for i in range(2, nb_bitmaps, 2):
+            result = self.vm_a.qmp('block-dirty-bitmap-enable', node='drive0',
+                                   name='bitmap{}'.format(i))
+            self.assert_qmp(result, 'return', {})
 
         caps = [{'capability': 'dirty-bitmaps', 'state': True},
                 {'capability': 'events', 'state': True}]
@@ -145,6 +161,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
         e_resume = self.vm_b.event_wait('RESUME')
         self.vm_b_events.append(e_resume)
 
+        # enabled bitmaps should be updated
         apply_discards(self.vm_b, discards2)
 
         match = {'data': {'status': 'completed'}}
@@ -158,7 +175,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
         downtime = event_dist(e_stop, e_resume)
         postcopy_time = event_dist(e_resume, e_complete)
 
-        # TODO: assert downtime * 10 < postcopy_time
+        assert downtime * 10 < postcopy_time
         if debug:
             print('downtime:', downtime)
             print('postcopy_time:', postcopy_time)
@@ -166,12 +183,15 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
         # Assert that bitmap migration is finished (check that successor bitmap
         # is removed)
         result = self.vm_b.qmp('query-block')
-        assert len(result['return'][0]['dirty-bitmaps']) == 1
-
-        # Check content of migrated (and updated by new writes) bitmap
-        result = self.vm_b.qmp('x-debug-block-dirty-bitmap-sha256',
-                               node='drive0', name='bitmap')
-        self.assert_qmp(result, 'return/sha256', sha256)
+        assert len(result['return'][0]['dirty-bitmaps']) == nb_bitmaps
+
+        # Check content of migrated bitmaps. Still, don't waste time checking
+        # every bitmap
+        for i in range(0, nb_bitmaps, 5):
+            result = self.vm_b.qmp('x-debug-block-dirty-bitmap-sha256',
+                                   node='drive0', name='bitmap{}'.format(i))
+            sha256 = discards1_sha256 if i % 2 else all_discards_sha256
+            self.assert_qmp(result, 'return/sha256', sha256)
 
 
 if __name__ == '__main__':
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v2 18/22] python/qemu/machine: add kill() method
  2020-02-17 15:02 [PATCH v2 00/22] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (16 preceding siblings ...)
  2020-02-17 15:02 ` [PATCH v2 17/22] qemu-iotests/199: increase postcopy period Vladimir Sementsov-Ogievskiy
@ 2020-02-17 15:02 ` Vladimir Sementsov-Ogievskiy
  2020-02-19 17:00   ` Andrey Shinkevich
  2020-05-29 10:09   ` Philippe Mathieu-Daudé
  2020-02-17 15:02 ` [PATCH v2 19/22] qemu-iotests/199: prepare for new test-cases addition Vladimir Sementsov-Ogievskiy
                   ` (6 subsequent siblings)
  24 siblings, 2 replies; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-17 15:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: vsementsov, Eduardo Habkost, quintela, dgilbert, Cleber Rosa,
	andrey.shinkevich

Add method to hard-kill vm, without any quit commands.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 python/qemu/machine.py | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/python/qemu/machine.py b/python/qemu/machine.py
index 183d8f3d38..9918e0d8aa 100644
--- a/python/qemu/machine.py
+++ b/python/qemu/machine.py
@@ -341,7 +341,7 @@ class QEMUMachine(object):
         self._load_io_log()
         self._post_shutdown()
 
-    def shutdown(self, has_quit=False):
+    def shutdown(self, has_quit=False, hard=False):
         """
         Terminate the VM and clean up
         """
@@ -353,7 +353,9 @@ class QEMUMachine(object):
             self._console_socket = None
 
         if self.is_running():
-            if self._qmp:
+            if hard:
+                self._popen.kill()
+            elif self._qmp:
                 try:
                     if not has_quit:
                         self._qmp.cmd('quit')
@@ -366,7 +368,8 @@ class QEMUMachine(object):
         self._post_shutdown()
 
         exitcode = self.exitcode()
-        if exitcode is not None and exitcode < 0:
+        if exitcode is not None and exitcode < 0 and \
+                not (exitcode == -9 and hard):
             msg = 'qemu received signal %i: %s'
             if self._qemu_full_args:
                 command = ' '.join(self._qemu_full_args)
@@ -376,6 +379,9 @@ class QEMUMachine(object):
 
         self._launched = False
 
+    def kill(self):
+        self.shutdown(hard=True)
+
     def set_qmp_monitor(self, enabled=True):
         """
         Set the QMP monitor.
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v2 19/22] qemu-iotests/199: prepare for new test-cases addition
  2020-02-17 15:02 [PATCH v2 00/22] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (17 preceding siblings ...)
  2020-02-17 15:02 ` [PATCH v2 18/22] python/qemu/machine: add kill() method Vladimir Sementsov-Ogievskiy
@ 2020-02-17 15:02 ` Vladimir Sementsov-Ogievskiy
  2020-02-19 16:10   ` Andrey Shinkevich
  2020-02-17 15:02 ` [PATCH v2 20/22] qemu-iotests/199: check persistent bitmaps Vladimir Sementsov-Ogievskiy
                   ` (5 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-17 15:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, vsementsov, qemu-block, quintela, dgilbert,
	Max Reitz, andrey.shinkevich

Move future common part to start_postcopy() method. Move checking
number of bitmaps to check_bitmap().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/199 | 36 +++++++++++++++++++++++-------------
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index 9a6e8dcb9d..969620b103 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -29,6 +29,8 @@ disk_b = os.path.join(iotests.test_dir, 'disk_b')
 size = '256G'
 fifo = os.path.join(iotests.test_dir, 'mig_fifo')
 
+granularity = 512
+nb_bitmaps = 15
 
 GiB = 1024 * 1024 * 1024
 
@@ -61,6 +63,15 @@ def event_dist(e1, e2):
     return event_seconds(e2) - event_seconds(e1)
 
 
+def check_bitmaps(vm, count):
+    result = vm.qmp('query-block')
+
+    if count == 0:
+        assert 'dirty-bitmaps' not in result['return'][0]
+    else:
+        assert len(result['return'][0]['dirty-bitmaps']) == count
+
+
 class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
     def tearDown(self):
         if debug:
@@ -101,10 +112,8 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
         self.vm_a_events = []
         self.vm_b_events = []
 
-    def test_postcopy(self):
-        granularity = 512
-        nb_bitmaps = 15
-
+    def start_postcopy(self):
+        """ Run migration until RESUME event on target. Return this event. """
         for i in range(nb_bitmaps):
             result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
                                    name='bitmap{}'.format(i),
@@ -119,10 +128,10 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 
         result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
                                node='drive0', name='bitmap0')
-        discards1_sha256 = result['return']['sha256']
+        self.discards1_sha256 = result['return']['sha256']
 
         # Check, that updating the bitmap by discards works
-        assert discards1_sha256 != empty_sha256
+        assert self.discards1_sha256 != empty_sha256
 
         # We want to calculate resulting sha256. Do it in bitmap0, so, disable
         # other bitmaps
@@ -135,7 +144,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 
         result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
                                node='drive0', name='bitmap0')
-        all_discards_sha256 = result['return']['sha256']
+        self.all_discards_sha256 = result['return']['sha256']
 
         # Now, enable some bitmaps, to be updated during migration
         for i in range(2, nb_bitmaps, 2):
@@ -160,6 +169,10 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 
         e_resume = self.vm_b.event_wait('RESUME')
         self.vm_b_events.append(e_resume)
+        return e_resume
+
+    def test_postcopy_success(self):
+        e_resume = self.start_postcopy()
 
         # enabled bitmaps should be updated
         apply_discards(self.vm_b, discards2)
@@ -180,18 +193,15 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
             print('downtime:', downtime)
             print('postcopy_time:', postcopy_time)
 
-        # Assert that bitmap migration is finished (check that successor bitmap
-        # is removed)
-        result = self.vm_b.qmp('query-block')
-        assert len(result['return'][0]['dirty-bitmaps']) == nb_bitmaps
+        check_bitmaps(self.vm_b, nb_bitmaps)
 
         # Check content of migrated bitmaps. Still, don't waste time checking
         # every bitmap
         for i in range(0, nb_bitmaps, 5):
             result = self.vm_b.qmp('x-debug-block-dirty-bitmap-sha256',
                                    node='drive0', name='bitmap{}'.format(i))
-            sha256 = discards1_sha256 if i % 2 else all_discards_sha256
-            self.assert_qmp(result, 'return/sha256', sha256)
+            sha = self.discards1_sha256 if i % 2 else self.all_discards_sha256
+            self.assert_qmp(result, 'return/sha256', sha)
 
 
 if __name__ == '__main__':
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v2 20/22] qemu-iotests/199: check persistent bitmaps
  2020-02-17 15:02 [PATCH v2 00/22] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (18 preceding siblings ...)
  2020-02-17 15:02 ` [PATCH v2 19/22] qemu-iotests/199: prepare for new test-cases addition Vladimir Sementsov-Ogievskiy
@ 2020-02-17 15:02 ` Vladimir Sementsov-Ogievskiy
  2020-02-19 16:28   ` Andrey Shinkevich
  2020-02-17 15:02 ` [PATCH v2 21/22] qemu-iotests/199: add early shutdown case to bitmaps postcopy Vladimir Sementsov-Ogievskiy
                   ` (4 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-17 15:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, vsementsov, qemu-block, quintela, dgilbert,
	Max Reitz, andrey.shinkevich

Check that persistent bitmaps are not stored on source and that bitmaps
are persistent on destination.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/199 | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index 969620b103..8baa078151 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -117,7 +117,8 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
         for i in range(nb_bitmaps):
             result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
                                    name='bitmap{}'.format(i),
-                                   granularity=granularity)
+                                   granularity=granularity,
+                                   persistent=True)
             self.assert_qmp(result, 'return', {})
 
         result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
@@ -193,6 +194,19 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
             print('downtime:', downtime)
             print('postcopy_time:', postcopy_time)
 
+        # check that there are no bitmaps stored on source
+        self.vm_a_events += self.vm_a.get_qmp_events()
+        self.vm_a.shutdown()
+        self.vm_a.launch()
+        check_bitmaps(self.vm_a, 0)
+
+        # check that bitmaps are migrated and persistence works
+        check_bitmaps(self.vm_b, nb_bitmaps)
+        self.vm_b.shutdown()
+        # recreate vm_b, so there is no incoming option, which prevents
+        # loading bitmaps from disk
+        self.vm_b = iotests.VM(path_suffix='b').add_drive(disk_b)
+        self.vm_b.launch()
         check_bitmaps(self.vm_b, nb_bitmaps)
 
         # Check content of migrated bitmaps. Still, don't waste time checking
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v2 21/22] qemu-iotests/199: add early shutdown case to bitmaps postcopy
  2020-02-17 15:02 [PATCH v2 00/22] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (19 preceding siblings ...)
  2020-02-17 15:02 ` [PATCH v2 20/22] qemu-iotests/199: check persistent bitmaps Vladimir Sementsov-Ogievskiy
@ 2020-02-17 15:02 ` Vladimir Sementsov-Ogievskiy
  2020-02-19 16:48   ` Andrey Shinkevich
  2020-02-19 16:50   ` Andrey Shinkevich
  2020-02-17 15:02 ` [PATCH v2 22/22] qemu-iotests/199: add source-killed " Vladimir Sementsov-Ogievskiy
                   ` (3 subsequent siblings)
  24 siblings, 2 replies; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-17 15:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, vsementsov, qemu-block, quintela, dgilbert,
	Max Reitz, andrey.shinkevich

Previous patches fixed two crashes which may occur on shutdown prior to
bitmaps postcopy finished. Check that it works now.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/199     | 18 ++++++++++++++++++
 tests/qemu-iotests/199.out |  4 ++--
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index 8baa078151..0d12e6b1ae 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -217,6 +217,24 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
             sha = self.discards1_sha256 if i % 2 else self.all_discards_sha256
             self.assert_qmp(result, 'return/sha256', sha)
 
+    def test_early_shutdown_destination(self):
+        self.start_postcopy()
+
+        self.vm_b_events += self.vm_b.get_qmp_events()
+        self.vm_b.shutdown()
+        # recreate vm_b, so there is no incoming option, which prevents
+        # loading bitmaps from disk
+        self.vm_b = iotests.VM(path_suffix='b').add_drive(disk_b)
+        self.vm_b.launch()
+        check_bitmaps(self.vm_b, 0)
+
+        result = self.vm_a.qmp('query-status')
+        assert not result['return']['running']
+        self.vm_a_events += self.vm_a.get_qmp_events()
+        self.vm_a.shutdown()
+        self.vm_a.launch()
+        check_bitmaps(self.vm_a, 0)
+
 
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2'])
diff --git a/tests/qemu-iotests/199.out b/tests/qemu-iotests/199.out
index ae1213e6f8..fbc63e62f8 100644
--- a/tests/qemu-iotests/199.out
+++ b/tests/qemu-iotests/199.out
@@ -1,5 +1,5 @@
-.
+..
 ----------------------------------------------------------------------
-Ran 1 tests
+Ran 2 tests
 
 OK
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v2 22/22] qemu-iotests/199: add source-killed case to bitmaps postcopy
  2020-02-17 15:02 [PATCH v2 00/22] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (20 preceding siblings ...)
  2020-02-17 15:02 ` [PATCH v2 21/22] qemu-iotests/199: add early shutdown case to bitmaps postcopy Vladimir Sementsov-Ogievskiy
@ 2020-02-17 15:02 ` Vladimir Sementsov-Ogievskiy
  2020-02-19 17:15   ` Andrey Shinkevich
  2020-02-17 19:31 ` [PATCH v2 00/22] Fix error handling during bitmap postcopy no-reply
                   ` (2 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-17 15:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, vsementsov, qemu-block, quintela, dgilbert,
	Max Reitz, andrey.shinkevich

Previous patches fixes behavior of bitmaps migration, so that errors
are handled by just removing unfinished bitmaps, and not fail or try to
recover postcopy migration. Add corresponding test.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/199     | 15 +++++++++++++++
 tests/qemu-iotests/199.out |  4 ++--
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index 0d12e6b1ae..d38913fa44 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -235,6 +235,21 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
         self.vm_a.launch()
         check_bitmaps(self.vm_a, 0)
 
+    def test_early_kill_source(self):
+        self.start_postcopy()
+
+        self.vm_a_events = self.vm_a.get_qmp_events()
+        self.vm_a.kill()
+
+        self.vm_a.launch()
+
+        match = {'data': {'status': 'completed'}}
+        e_complete = self.vm_b.event_wait('MIGRATION', match=match)
+        self.vm_b_events.append(e_complete)
+
+        check_bitmaps(self.vm_a, 0)
+        check_bitmaps(self.vm_b, 0)
+
 
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2'])
diff --git a/tests/qemu-iotests/199.out b/tests/qemu-iotests/199.out
index fbc63e62f8..8d7e996700 100644
--- a/tests/qemu-iotests/199.out
+++ b/tests/qemu-iotests/199.out
@@ -1,5 +1,5 @@
-..
+...
 ----------------------------------------------------------------------
-Ran 2 tests
+Ran 3 tests
 
 OK
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 11/22] migration/savevm: don't worry if bitmap migration postcopy failed
  2020-02-17 15:02 ` [PATCH v2 11/22] migration/savevm: don't worry if bitmap migration postcopy failed Vladimir Sementsov-Ogievskiy
@ 2020-02-17 16:57   ` Dr. David Alan Gilbert
  2020-02-18 19:44   ` Andrey Shinkevich
  1 sibling, 0 replies; 80+ messages in thread
From: Dr. David Alan Gilbert @ 2020-02-17 16:57 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: andrey.shinkevich, qemu-devel, quintela

* Vladimir Sementsov-Ogievskiy (vsementsov@virtuozzo.com) wrote:
> First, if only bitmaps postcopy enabled (not ram postcopy)
> postcopy_pause_incoming crashes on assertion assert(mis->to_src_file).
> 
> And anyway, bitmaps postcopy is not prepared to be somehow recovered.
> The original idea instead is that if bitmaps postcopy failed, we just
> loss some bitmaps, which is not critical. So, on failure we just need
> to remove unfinished bitmaps and guest should continue execution on
> destination.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/savevm.c | 37 ++++++++++++++++++++++++++++++++-----
>  1 file changed, 32 insertions(+), 5 deletions(-)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 1d4220ece8..7e9dd58ccb 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1812,6 +1812,9 @@ static void *postcopy_ram_listen_thread(void *opaque)
>      MigrationIncomingState *mis = migration_incoming_get_current();
>      QEMUFile *f = mis->from_src_file;
>      int load_res;
> +    MigrationState *migr = migrate_get_current();
> +
> +    object_ref(OBJECT(migr));
>  
>      migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
>                                     MIGRATION_STATUS_POSTCOPY_ACTIVE);
> @@ -1838,11 +1841,24 @@ static void *postcopy_ram_listen_thread(void *opaque)
>  
>      trace_postcopy_ram_listen_thread_exit();
>      if (load_res < 0) {
> -        error_report("%s: loadvm failed: %d", __func__, load_res);
>          qemu_file_set_error(f, load_res);
> -        migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> -                                       MIGRATION_STATUS_FAILED);
> -    } else {
> +        dirty_bitmap_mig_cancel_incoming();
> +        if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
> +            !migrate_postcopy_ram() && migrate_dirty_bitmaps())
> +        {
> +            error_report("%s: loadvm failed during postcopy: %d. All state is "
> +                         "migrated except for dirty bitmaps. Some dirty "
> +                         "bitmaps may be lost, and present migrated dirty "
> +                         "bitmaps are correctly migrated and valid.",
> +                         __func__, load_res);
> +            load_res = 0; /* prevent further exit() */
> +        } else {
> +            error_report("%s: loadvm failed: %d", __func__, load_res);
> +            migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> +                                           MIGRATION_STATUS_FAILED);
> +        }
> +    }
> +    if (load_res >= 0) {
>          /*
>           * This looks good, but it's possible that the device loading in the
>           * main thread hasn't finished yet, and so we might not be in 'RUN'
> @@ -1878,6 +1894,8 @@ static void *postcopy_ram_listen_thread(void *opaque)
>      mis->have_listen_thread = false;
>      postcopy_state_set(POSTCOPY_INCOMING_END);
>  
> +    object_unref(OBJECT(migr));
> +
>      return NULL;
>  }
>  
> @@ -2429,6 +2447,8 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
>  {
>      trace_postcopy_pause_incoming();
>  
> +    assert(migrate_postcopy_ram());
> +
>      /* Clear the triggered bit to allow one recovery */
>      mis->postcopy_recover_triggered = false;
>  
> @@ -2513,15 +2533,22 @@ out:
>      if (ret < 0) {
>          qemu_file_set_error(f, ret);
>  
> +        /* Cancel bitmaps incoming regardless of recovery */
> +        dirty_bitmap_mig_cancel_incoming();
> +
>          /*
>           * If we are during an active postcopy, then we pause instead
>           * of bail out to at least keep the VM's dirty data.  Note
>           * that POSTCOPY_INCOMING_LISTENING stage is still not enough,
>           * during which we're still receiving device states and we
>           * still haven't yet started the VM on destination.
> +         *
> +         * Only RAM postcopy supports recovery. Still, if RAM postcopy is
> +         * enabled, canceled bitmaps postcopy will not affect RAM postcopy
> +         * recovering.
>           */
>          if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
> -            postcopy_pause_incoming(mis)) {
> +            migrate_postcopy_ram() && postcopy_pause_incoming(mis)) {
>              /* Reset f to point to the newly created channel */
>              f = mis->from_src_file;
>              goto retry;
> -- 
> 2.21.0
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 00/22] Fix error handling during bitmap postcopy
  2020-02-17 15:02 [PATCH v2 00/22] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (21 preceding siblings ...)
  2020-02-17 15:02 ` [PATCH v2 22/22] qemu-iotests/199: add source-killed " Vladimir Sementsov-Ogievskiy
@ 2020-02-17 19:31 ` no-reply
  2020-02-18 20:02 ` Andrey Shinkevich
  2020-04-02  7:42 ` Vladimir Sementsov-Ogievskiy
  24 siblings, 0 replies; 80+ messages in thread
From: no-reply @ 2020-02-17 19:31 UTC (permalink / raw)
  To: vsementsov
  Cc: fam, kwolf, vsementsov, ehabkost, qemu-block, quintela,
	qemu-devel, qemu-stable, stefanha, crosa, andrey.shinkevich,
	mreitz, jsnow, dgilbert

Patchew URL: https://patchew.org/QEMU/20200217150246.29180-1-vsementsov@virtuozzo.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [PATCH v2 00/22] Fix error handling during bitmap postcopy
Message-id: 20200217150246.29180-1-vsementsov@virtuozzo.com
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
fatal: git fetch_pack: expected ACK/NAK, got 'ERR upload-pack: not our ref 247b588c357694c896d056836da2341d75451c4f'
fatal: The remote end hung up unexpectedly
error: Could not fetch 3c8cf5a9c21ff8782164d1def7f44bd888713384
Traceback (most recent call last):
  File "patchew-tester/src/patchew-cli", line 521, in test_one
    git_clone_repo(clone, r["repo"], r["head"], logf, True)
  File "patchew-tester/src/patchew-cli", line 48, in git_clone_repo
    stdout=logf, stderr=logf)
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/subprocess.py", line 291, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['git', 'remote', 'add', '-f', '--mirror=fetch', '3c8cf5a9c21ff8782164d1def7f44bd888713384', 'https://github.com/patchew-project/qemu']' returned non-zero exit status 1.



The full log is available at
http://patchew.org/logs/20200217150246.29180-1-vsementsov@virtuozzo.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 01/22] migration/block-dirty-bitmap: fix dirty_bitmap_mig_before_vm_start
  2020-02-17 15:02 ` [PATCH v2 01/22] migration/block-dirty-bitmap: fix dirty_bitmap_mig_before_vm_start Vladimir Sementsov-Ogievskiy
@ 2020-02-18  9:44   ` Andrey Shinkevich
  0 siblings, 0 replies; 80+ messages in thread
From: Andrey Shinkevich @ 2020-02-18  9:44 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Fam Zheng, qemu-block, quintela, qemu-stable, dgilbert,
	Stefan Hajnoczi, John Snow

On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
> No reason to use _locked version of bdrv_enable_dirty_bitmap, as we
> don't lock this mutex before. Moreover, the adjacent
> bdrv_dirty_bitmap_enable_successor do lock the mutex.
> 
> Fixes: 58f72b965e9e1q
> Cc: qemu-stable@nongnu.org # v3.0
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   migration/block-dirty-bitmap.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
> index 7eafface61..16f1793ee3 100644
> --- a/migration/block-dirty-bitmap.c
> +++ b/migration/block-dirty-bitmap.c
> @@ -498,7 +498,7 @@ void dirty_bitmap_mig_before_vm_start(void)
>           DirtyBitmapLoadBitmapState *b = item->data;
>   
>           if (b->migrated) {
> -            bdrv_enable_dirty_bitmap_locked(b->bitmap);
> +            bdrv_enable_dirty_bitmap(b->bitmap);
>           } else {
>               bdrv_dirty_bitmap_enable_successor(b->bitmap);
>           }
> 

Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
-- 
With the best regards,
Andrey Shinkevich


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 03/22] migration/block-dirty-bitmap: rename dirty_bitmap_mig_cleanup
  2020-02-17 15:02 ` [PATCH v2 03/22] migration/block-dirty-bitmap: rename dirty_bitmap_mig_cleanup Vladimir Sementsov-Ogievskiy
@ 2020-02-18 11:00   ` Andrey Shinkevich
  2020-02-19 14:20     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 80+ messages in thread
From: Andrey Shinkevich @ 2020-02-18 11:00 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Fam Zheng, qemu-block, quintela, dgilbert, Stefan Hajnoczi, John Snow

On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
> Rename dirty_bitmap_mig_cleanup to dirty_bitmap_do_save_cleanup, to
> stress that it is on save part.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   migration/block-dirty-bitmap.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
> index 73792ab005..4e8959ae52 100644
> --- a/migration/block-dirty-bitmap.c
> +++ b/migration/block-dirty-bitmap.c
> @@ -259,7 +259,7 @@ static void send_bitmap_bits(QEMUFile *f, SaveBitmapState *dbms,
>   }
>   
>   /* Called with iothread lock taken.  */
> -static void dirty_bitmap_mig_cleanup(void)
> +static void dirty_bitmap_do_save_cleanup(void)
>   {
>       SaveBitmapState *dbms;
>   
> @@ -338,7 +338,7 @@ static int init_dirty_bitmap_migration(void)
>       return 0;
>   
>   fail:
> -    dirty_bitmap_mig_cleanup();
> +    dirty_bitmap_do_save_cleanup();
>   
>       return -1;
>   }
> @@ -377,7 +377,7 @@ static void bulk_phase(QEMUFile *f, bool limit)
>   /* for SaveVMHandlers */
>   static void dirty_bitmap_save_cleanup(void *opaque)
>   {
> -    dirty_bitmap_mig_cleanup();
> +    dirty_bitmap_do_save_cleanup();
>   }
>   
>   static int dirty_bitmap_save_iterate(QEMUFile *f, void *opaque)
> @@ -412,7 +412,7 @@ static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
>   
>       trace_dirty_bitmap_save_complete_finish();
>   
> -    dirty_bitmap_mig_cleanup();
> +    dirty_bitmap_do_save_cleanup();
>       return 0;
>   }
>   
> 

At the next opportunity, I would suggest the name like
"dirty_bitmap_do_clean_after_saving()"
and similar for dirty_bitmap_save_cleanup()
"dirty_bitmap_clean_after_saving()".

Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
-- 
With the best regards,
Andrey Shinkevich


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 04/22] migration/block-dirty-bitmap: move mutex init to dirty_bitmap_mig_init
  2020-02-17 15:02 ` [PATCH v2 04/22] migration/block-dirty-bitmap: move mutex init to dirty_bitmap_mig_init Vladimir Sementsov-Ogievskiy
@ 2020-02-18 11:28   ` Andrey Shinkevich
  0 siblings, 0 replies; 80+ messages in thread
From: Andrey Shinkevich @ 2020-02-18 11:28 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Fam Zheng, qemu-block, quintela, dgilbert, Stefan Hajnoczi, John Snow

On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
> No reasons to keep two public init functions.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   migration/migration.h          | 1 -
>   migration/block-dirty-bitmap.c | 6 +-----
>   migration/migration.c          | 2 --
>   3 files changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/migration/migration.h b/migration/migration.h
> index 8473ddfc88..2948f2387b 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -332,7 +332,6 @@ void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
>   void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
>   
>   void dirty_bitmap_mig_before_vm_start(void);
> -void init_dirty_bitmap_incoming_migration(void);
>   void migrate_add_address(SocketAddress *address);
>   
>   int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
> index 4e8959ae52..49d4cf8810 100644
> --- a/migration/block-dirty-bitmap.c
> +++ b/migration/block-dirty-bitmap.c
> @@ -148,11 +148,6 @@ typedef struct LoadBitmapState {
>   static GSList *enabled_bitmaps;
>   QemuMutex finish_lock;
>   
> -void init_dirty_bitmap_incoming_migration(void)
> -{
> -    qemu_mutex_init(&finish_lock);
> -}
> -
>   static uint32_t qemu_get_bitmap_flags(QEMUFile *f)
>   {
>       uint8_t flags = qemu_get_byte(f);
> @@ -733,6 +728,7 @@ static SaveVMHandlers savevm_dirty_bitmap_handlers = {
>   void dirty_bitmap_mig_init(void)
>   {
>       QSIMPLEQ_INIT(&dirty_bitmap_mig_state.dbms_list);
> +    qemu_mutex_init(&finish_lock);
>   
>       register_savevm_live("dirty-bitmap", 0, 1,
>                            &savevm_dirty_bitmap_handlers,
> diff --git a/migration/migration.c b/migration/migration.c
> index 8fb68795dc..515047932c 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -158,8 +158,6 @@ void migration_object_init(void)
>       qemu_sem_init(&current_incoming->postcopy_pause_sem_dst, 0);
>       qemu_sem_init(&current_incoming->postcopy_pause_sem_fault, 0);
>   
> -    init_dirty_bitmap_incoming_migration();
> -
>       if (!migration_object_check(current_migration, &err)) {
>           error_report_err(err);
>           exit(1);
> 

I rely on that the mutex initialization is being made in proper time.

Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
-- 
With the best regards,
Andrey Shinkevich


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 05/22] migration/block-dirty-bitmap: refactor state global variables
  2020-02-17 15:02 ` [PATCH v2 05/22] migration/block-dirty-bitmap: refactor state global variables Vladimir Sementsov-Ogievskiy
@ 2020-02-18 13:05   ` Andrey Shinkevich
  2020-02-19 15:29     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 80+ messages in thread
From: Andrey Shinkevich @ 2020-02-18 13:05 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Fam Zheng, qemu-block, quintela, dgilbert, Stefan Hajnoczi, John Snow



On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
> Move all state variables into one global struct. Reduce global
> variable usage, utilizing opaque pointer where possible.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   migration/block-dirty-bitmap.c | 171 ++++++++++++++++++---------------
>   1 file changed, 95 insertions(+), 76 deletions(-)
> 
> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
> index 49d4cf8810..7a82b76809 100644
> --- a/migration/block-dirty-bitmap.c
> +++ b/migration/block-dirty-bitmap.c
> @@ -128,6 +128,12 @@ typedef struct DBMSaveState {
>       BdrvDirtyBitmap *prev_bitmap;
>   } DBMSaveState;
>   
> +typedef struct LoadBitmapState {
> +    BlockDriverState *bs;
> +    BdrvDirtyBitmap *bitmap;
> +    bool migrated;
> +} LoadBitmapState;
> +
>   /* State of the dirty bitmap migration (DBM) during load process */
>   typedef struct DBMLoadState {
>       uint32_t flags;
> @@ -135,18 +141,17 @@ typedef struct DBMLoadState {
>       char bitmap_name[256];
>       BlockDriverState *bs;
>       BdrvDirtyBitmap *bitmap;
> +
> +    GSList *enabled_bitmaps;
> +    QemuMutex finish_lock;
>   } DBMLoadState;
>   
> -static DBMSaveState dirty_bitmap_mig_state;
> +typedef struct DBMState {
> +    DBMSaveState save;
> +    DBMLoadState load;
> +} DBMState;
>   
> -/* State of one bitmap during load process */
> -typedef struct LoadBitmapState {
> -    BlockDriverState *bs;
> -    BdrvDirtyBitmap *bitmap;
> -    bool migrated;
> -} LoadBitmapState;
> -static GSList *enabled_bitmaps;
> -QemuMutex finish_lock;
> +static DBMState dbm_state;
>   
>   static uint32_t qemu_get_bitmap_flags(QEMUFile *f)
>   {
> @@ -169,21 +174,21 @@ static void qemu_put_bitmap_flags(QEMUFile *f, uint32_t flags)
>       qemu_put_byte(f, flags);
>   }
>   
> -static void send_bitmap_header(QEMUFile *f, SaveBitmapState *dbms,
> -                               uint32_t additional_flags)
> +static void send_bitmap_header(QEMUFile *f, DBMSaveState *s,
> +                               SaveBitmapState *dbms, uint32_t additional_flags)
>   {
>       BlockDriverState *bs = dbms->bs;
>       BdrvDirtyBitmap *bitmap = dbms->bitmap;
>       uint32_t flags = additional_flags;
>       trace_send_bitmap_header_enter();
>   
> -    if (bs != dirty_bitmap_mig_state.prev_bs) {
> -        dirty_bitmap_mig_state.prev_bs = bs;
> +    if (bs != s->prev_bs) {
> +        s->prev_bs = bs;
>           flags |= DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME;
>       }
>   
> -    if (bitmap != dirty_bitmap_mig_state.prev_bitmap) {
> -        dirty_bitmap_mig_state.prev_bitmap = bitmap;
> +    if (bitmap != s->prev_bitmap) {
> +        s->prev_bitmap = bitmap;
>           flags |= DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME;
>       }
>   
> @@ -198,19 +203,22 @@ static void send_bitmap_header(QEMUFile *f, SaveBitmapState *dbms,
>       }
>   }
>   
> -static void send_bitmap_start(QEMUFile *f, SaveBitmapState *dbms)
> +static void send_bitmap_start(QEMUFile *f, DBMSaveState *s,
> +                              SaveBitmapState *dbms)
>   {
> -    send_bitmap_header(f, dbms, DIRTY_BITMAP_MIG_FLAG_START);
> +    send_bitmap_header(f, s, dbms, DIRTY_BITMAP_MIG_FLAG_START);
>       qemu_put_be32(f, bdrv_dirty_bitmap_granularity(dbms->bitmap));
>       qemu_put_byte(f, dbms->flags);
>   }
>   
> -static void send_bitmap_complete(QEMUFile *f, SaveBitmapState *dbms)
> +static void send_bitmap_complete(QEMUFile *f, DBMSaveState *s,
> +                                 SaveBitmapState *dbms)
>   {
> -    send_bitmap_header(f, dbms, DIRTY_BITMAP_MIG_FLAG_COMPLETE);
> +    send_bitmap_header(f, s, dbms, DIRTY_BITMAP_MIG_FLAG_COMPLETE);
>   }
>   
> -static void send_bitmap_bits(QEMUFile *f, SaveBitmapState *dbms,
> +static void send_bitmap_bits(QEMUFile *f, DBMSaveState *s,
> +                             SaveBitmapState *dbms,
>                                uint64_t start_sector, uint32_t nr_sectors)
>   {
>       /* align for buffer_is_zero() */
> @@ -235,7 +243,7 @@ static void send_bitmap_bits(QEMUFile *f, SaveBitmapState *dbms,
>   
>       trace_send_bitmap_bits(flags, start_sector, nr_sectors, buf_size);
>   
> -    send_bitmap_header(f, dbms, flags);
> +    send_bitmap_header(f, s, dbms, flags);
>   
>       qemu_put_be64(f, start_sector);
>       qemu_put_be32(f, nr_sectors);
> @@ -254,12 +262,12 @@ static void send_bitmap_bits(QEMUFile *f, SaveBitmapState *dbms,
>   }
>   
>   /* Called with iothread lock taken.  */
> -static void dirty_bitmap_do_save_cleanup(void)
> +static void dirty_bitmap_do_save_cleanup(DBMSaveState *s)
>   {
>       SaveBitmapState *dbms;
>   
> -    while ((dbms = QSIMPLEQ_FIRST(&dirty_bitmap_mig_state.dbms_list)) != NULL) {
> -        QSIMPLEQ_REMOVE_HEAD(&dirty_bitmap_mig_state.dbms_list, entry);
> +    while ((dbms = QSIMPLEQ_FIRST(&s->dbms_list)) != NULL) {
> +        QSIMPLEQ_REMOVE_HEAD(&s->dbms_list, entry);
>           bdrv_dirty_bitmap_set_busy(dbms->bitmap, false);
>           bdrv_unref(dbms->bs);
>           g_free(dbms);
> @@ -267,17 +275,17 @@ static void dirty_bitmap_do_save_cleanup(void)
>   }
>   
>   /* Called with iothread lock taken. */
> -static int init_dirty_bitmap_migration(void)
> +static int init_dirty_bitmap_migration(DBMSaveState *s)
>   {
>       BlockDriverState *bs;
>       BdrvDirtyBitmap *bitmap;
>       SaveBitmapState *dbms;
>       Error *local_err = NULL;
>   
> -    dirty_bitmap_mig_state.bulk_completed = false;
> -    dirty_bitmap_mig_state.prev_bs = NULL;
> -    dirty_bitmap_mig_state.prev_bitmap = NULL;
> -    dirty_bitmap_mig_state.no_bitmaps = false;
> +    s->bulk_completed = false;
> +    s->prev_bs = NULL;
> +    s->prev_bitmap = NULL;
> +    s->no_bitmaps = false;
>   
>       for (bs = bdrv_next_all_states(NULL); bs; bs = bdrv_next_all_states(bs)) {
>           const char *name = bdrv_get_device_or_node_name(bs);
> @@ -316,35 +324,36 @@ static int init_dirty_bitmap_migration(void)
>                   dbms->flags |= DIRTY_BITMAP_MIG_START_FLAG_PERSISTENT;
>               }
>   
> -            QSIMPLEQ_INSERT_TAIL(&dirty_bitmap_mig_state.dbms_list,
> +            QSIMPLEQ_INSERT_TAIL(&s->dbms_list,
>                                    dbms, entry);
>           }
>       }
>   
>       /* unset migration flags here, to not roll back it */
> -    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
> +    QSIMPLEQ_FOREACH(dbms, &s->dbms_list, entry) {
>           bdrv_dirty_bitmap_skip_store(dbms->bitmap, true);
>       }
>   
> -    if (QSIMPLEQ_EMPTY(&dirty_bitmap_mig_state.dbms_list)) {
> -        dirty_bitmap_mig_state.no_bitmaps = true;
> +    if (QSIMPLEQ_EMPTY(&s->dbms_list)) {
> +        s->no_bitmaps = true;
>       }
>   
>       return 0;
>   
>   fail:
> -    dirty_bitmap_do_save_cleanup();
> +    dirty_bitmap_do_save_cleanup(s);
>   
>       return -1;
>   }
>   
>   /* Called with no lock taken.  */
> -static void bulk_phase_send_chunk(QEMUFile *f, SaveBitmapState *dbms)
> +static void bulk_phase_send_chunk(QEMUFile *f, DBMSaveState *s,
> +                                  SaveBitmapState *dbms)
>   {
>       uint32_t nr_sectors = MIN(dbms->total_sectors - dbms->cur_sector,
>                                dbms->sectors_per_chunk);
>   
> -    send_bitmap_bits(f, dbms, dbms->cur_sector, nr_sectors);
> +    send_bitmap_bits(f, s, dbms, dbms->cur_sector, nr_sectors);
>   
>       dbms->cur_sector += nr_sectors;
>       if (dbms->cur_sector >= dbms->total_sectors) {
> @@ -353,61 +362,66 @@ static void bulk_phase_send_chunk(QEMUFile *f, SaveBitmapState *dbms)
>   }
>   
>   /* Called with no lock taken.  */
> -static void bulk_phase(QEMUFile *f, bool limit)
> +static void bulk_phase(QEMUFile *f, DBMSaveState *s, bool limit)
>   {
>       SaveBitmapState *dbms;
>   
> -    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
> +    QSIMPLEQ_FOREACH(dbms, &s->dbms_list, entry) {
>           while (!dbms->bulk_completed) {
> -            bulk_phase_send_chunk(f, dbms);
> +            bulk_phase_send_chunk(f, s, dbms);
>               if (limit && qemu_file_rate_limit(f)) {
>                   return;
>               }
>           }
>       }
>   
> -    dirty_bitmap_mig_state.bulk_completed = true;
> +    s->bulk_completed = true;
>   }
>   
>   /* for SaveVMHandlers */
>   static void dirty_bitmap_save_cleanup(void *opaque)
>   {
> -    dirty_bitmap_do_save_cleanup();
> +    DBMSaveState *s = &((DBMState *)opaque)->save;
> +
> +    dirty_bitmap_do_save_cleanup(s);
>   }

Why do one need the extra nested "do" function?

>   
>   static int dirty_bitmap_save_iterate(QEMUFile *f, void *opaque)
>   {
> +    DBMSaveState *s = &((DBMState *)opaque)->save;
> +
>       trace_dirty_bitmap_save_iterate(migration_in_postcopy());
>   
> -    if (migration_in_postcopy() && !dirty_bitmap_mig_state.bulk_completed) {
> -        bulk_phase(f, true);
> +    if (migration_in_postcopy() && !s->bulk_completed) {
> +        bulk_phase(f, s, true);
>       }
>   
>       qemu_put_bitmap_flags(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>   
> -    return dirty_bitmap_mig_state.bulk_completed;
> +    return s->bulk_completed;
>   }
>   
>   /* Called with iothread lock taken.  */
>   
>   static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
>   {
> +    DBMSaveState *s = &((DBMState *)opaque)->save;
>       SaveBitmapState *dbms;
>       trace_dirty_bitmap_save_complete_enter();
>   
> -    if (!dirty_bitmap_mig_state.bulk_completed) {
> -        bulk_phase(f, false);
> +    if (!s->bulk_completed) {
> +        bulk_phase(f, s, false);
>       }
>   
> -    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
> -        send_bitmap_complete(f, dbms);
> +    QSIMPLEQ_FOREACH(dbms, &s->dbms_list, entry) {
> +        send_bitmap_complete(f, s, dbms);
>       }
>   
>       qemu_put_bitmap_flags(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>   
>       trace_dirty_bitmap_save_complete_finish();
>   
> -    dirty_bitmap_do_save_cleanup();
> +    dirty_bitmap_save_cleanup(opaque);
>       return 0;
>   }
>   
> @@ -417,12 +431,13 @@ static void dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
>                                         uint64_t *res_compatible,
>                                         uint64_t *res_postcopy_only)
>   {
> +    DBMSaveState *s = &((DBMState *)opaque)->save;
>       SaveBitmapState *dbms;
>       uint64_t pending = 0;
>   
>       qemu_mutex_lock_iothread();
>   
> -    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
> +    QSIMPLEQ_FOREACH(dbms, &s->dbms_list, entry) {
>           uint64_t gran = bdrv_dirty_bitmap_granularity(dbms->bitmap);
>           uint64_t sectors = dbms->bulk_completed ? 0 :
>                              dbms->total_sectors - dbms->cur_sector;
> @@ -481,7 +496,7 @@ static int dirty_bitmap_load_start(QEMUFile *f, DBMLoadState *s)
>           b->bs = s->bs;
>           b->bitmap = s->bitmap;
>           b->migrated = false;
> -        enabled_bitmaps = g_slist_prepend(enabled_bitmaps, b);
> +        s->enabled_bitmaps = g_slist_prepend(s->enabled_bitmaps, b);
>       }
>   
>       return 0;
> @@ -489,11 +504,12 @@ static int dirty_bitmap_load_start(QEMUFile *f, DBMLoadState *s)
>   
>   void dirty_bitmap_mig_before_vm_start(void)
>   {
> +    DBMLoadState *s = &dbm_state.load;
>       GSList *item;
>   
> -    qemu_mutex_lock(&finish_lock);
> +    qemu_mutex_lock(&s->finish_lock);
>   
> -    for (item = enabled_bitmaps; item; item = g_slist_next(item)) {
> +    for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
>           LoadBitmapState *b = item->data;
>   
>           if (b->migrated) {
> @@ -505,10 +521,10 @@ void dirty_bitmap_mig_before_vm_start(void)
>           g_free(b);
>       }
>   
> -    g_slist_free(enabled_bitmaps);
> -    enabled_bitmaps = NULL;
> +    g_slist_free(s->enabled_bitmaps);
> +    s->enabled_bitmaps = NULL;
>   
> -    qemu_mutex_unlock(&finish_lock);
> +    qemu_mutex_unlock(&s->finish_lock);
>   }
>   
>   static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
> @@ -517,9 +533,9 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>       trace_dirty_bitmap_load_complete();
>       bdrv_dirty_bitmap_deserialize_finish(s->bitmap);
>   
> -    qemu_mutex_lock(&finish_lock);
> +    qemu_mutex_lock(&s->finish_lock);
>   
> -    for (item = enabled_bitmaps; item; item = g_slist_next(item)) {
> +    for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
>           LoadBitmapState *b = item->data;
>   
>           if (b->bitmap == s->bitmap) {
> @@ -530,7 +546,7 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>   
>       if (bdrv_dirty_bitmap_has_successor(s->bitmap)) {
>           bdrv_dirty_bitmap_lock(s->bitmap);
> -        if (enabled_bitmaps == NULL) {
> +        if (s->enabled_bitmaps == NULL) {
>               /* in postcopy */
>               bdrv_reclaim_dirty_bitmap_locked(s->bitmap, &error_abort);
>               bdrv_enable_dirty_bitmap_locked(s->bitmap);
> @@ -549,7 +565,7 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>           bdrv_dirty_bitmap_unlock(s->bitmap);
>       }
>   
> -    qemu_mutex_unlock(&finish_lock);
> +    qemu_mutex_unlock(&s->finish_lock);
>   }
>   
>   static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
> @@ -646,7 +662,7 @@ static int dirty_bitmap_load_header(QEMUFile *f, DBMLoadState *s)
>   
>   static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
>   {
> -    static DBMLoadState s;
> +    DBMLoadState *s = &((DBMState *)opaque)->load;
>       int ret = 0;
>   
>       trace_dirty_bitmap_load_enter();
> @@ -656,17 +672,17 @@ static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
>       }
>   
>       do {
> -        ret = dirty_bitmap_load_header(f, &s);
> +        ret = dirty_bitmap_load_header(f, s);
>           if (ret < 0) {
>               return ret;
>           }
>   
> -        if (s.flags & DIRTY_BITMAP_MIG_FLAG_START) {
> -            ret = dirty_bitmap_load_start(f, &s);
> -        } else if (s.flags & DIRTY_BITMAP_MIG_FLAG_COMPLETE) {
> -            dirty_bitmap_load_complete(f, &s);
> -        } else if (s.flags & DIRTY_BITMAP_MIG_FLAG_BITS) {
> -            ret = dirty_bitmap_load_bits(f, &s);
> +        if (s->flags & DIRTY_BITMAP_MIG_FLAG_START) {
> +            ret = dirty_bitmap_load_start(f, s);
> +        } else if (s->flags & DIRTY_BITMAP_MIG_FLAG_COMPLETE) {
> +            dirty_bitmap_load_complete(f, s);
> +        } else if (s->flags & DIRTY_BITMAP_MIG_FLAG_BITS) {
> +            ret = dirty_bitmap_load_bits(f, s);
>           }
>   
>           if (!ret) {
> @@ -676,7 +692,7 @@ static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
>           if (ret) {
>               return ret;
>           }
> -    } while (!(s.flags & DIRTY_BITMAP_MIG_FLAG_EOS));
> +    } while (!(s->flags & DIRTY_BITMAP_MIG_FLAG_EOS));
>   
>       trace_dirty_bitmap_load_success();
>       return 0;
> @@ -684,13 +700,14 @@ static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
>   
>   static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
>   {
> +    DBMSaveState *s = &((DBMState *)opaque)->save;
>       SaveBitmapState *dbms = NULL;
> -    if (init_dirty_bitmap_migration() < 0) {
> +    if (init_dirty_bitmap_migration(s) < 0) {
>           return -1;
>       }
>   
> -    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
> -        send_bitmap_start(f, dbms);
> +    QSIMPLEQ_FOREACH(dbms, &s->dbms_list, entry) {
> +        send_bitmap_start(f, s, dbms);
>       }
>       qemu_put_bitmap_flags(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>   
> @@ -699,7 +716,9 @@ static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
>   
>   static bool dirty_bitmap_is_active(void *opaque)
>   {
> -    return migrate_dirty_bitmaps() && !dirty_bitmap_mig_state.no_bitmaps;
> +    DBMSaveState *s = &((DBMState *)opaque)->save;
> +
> +    return migrate_dirty_bitmaps() && !s->no_bitmaps;
>   }
>   
>   static bool dirty_bitmap_is_active_iterate(void *opaque)
> @@ -727,10 +746,10 @@ static SaveVMHandlers savevm_dirty_bitmap_handlers = {
>   
>   void dirty_bitmap_mig_init(void)
>   {
> -    QSIMPLEQ_INIT(&dirty_bitmap_mig_state.dbms_list);
> -    qemu_mutex_init(&finish_lock);
> +    QSIMPLEQ_INIT(&dbm_state.save.dbms_list);
> +    qemu_mutex_init(&dbm_state.load.finish_lock);
>   
>       register_savevm_live("dirty-bitmap", 0, 1,
>                            &savevm_dirty_bitmap_handlers,
> -                         &dirty_bitmap_mig_state);
> +                         &dbm_state);
>   }
> 

Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
-- 
With the best regards,
Andrey Shinkevich


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 06/22] migration/block-dirty-bitmap: rename finish_lock to just lock
  2020-02-17 15:02 ` [PATCH v2 06/22] migration/block-dirty-bitmap: rename finish_lock to just lock Vladimir Sementsov-Ogievskiy
@ 2020-02-18 13:20   ` Andrey Shinkevich
  0 siblings, 0 replies; 80+ messages in thread
From: Andrey Shinkevich @ 2020-02-18 13:20 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Fam Zheng, qemu-block, quintela, dgilbert, Stefan Hajnoczi, John Snow

On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
> finish_lock is bad name, as lock used not only on process end.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   migration/block-dirty-bitmap.c | 12 ++++++------
>   1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
> index 7a82b76809..440c41cfca 100644
> --- a/migration/block-dirty-bitmap.c
> +++ b/migration/block-dirty-bitmap.c
> @@ -143,7 +143,7 @@ typedef struct DBMLoadState {
>       BdrvDirtyBitmap *bitmap;
>   
>       GSList *enabled_bitmaps;
> -    QemuMutex finish_lock;
> +    QemuMutex lock; /* protect enabled_bitmaps */
>   } DBMLoadState;
>   
>   typedef struct DBMState {
> @@ -507,7 +507,7 @@ void dirty_bitmap_mig_before_vm_start(void)
>       DBMLoadState *s = &dbm_state.load;
>       GSList *item;
>   
> -    qemu_mutex_lock(&s->finish_lock);
> +    qemu_mutex_lock(&s->lock);
>   
>       for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
>           LoadBitmapState *b = item->data;
> @@ -524,7 +524,7 @@ void dirty_bitmap_mig_before_vm_start(void)
>       g_slist_free(s->enabled_bitmaps);
>       s->enabled_bitmaps = NULL;
>   
> -    qemu_mutex_unlock(&s->finish_lock);
> +    qemu_mutex_unlock(&s->lock);
>   }
>   
>   static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
> @@ -533,7 +533,7 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>       trace_dirty_bitmap_load_complete();
>       bdrv_dirty_bitmap_deserialize_finish(s->bitmap);
>   
> -    qemu_mutex_lock(&s->finish_lock);
> +    qemu_mutex_lock(&s->lock);
>   
>       for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
>           LoadBitmapState *b = item->data;
> @@ -565,7 +565,7 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>           bdrv_dirty_bitmap_unlock(s->bitmap);
>       }
>   
> -    qemu_mutex_unlock(&s->finish_lock);
> +    qemu_mutex_unlock(&s->lock);
>   }
>   
>   static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
> @@ -747,7 +747,7 @@ static SaveVMHandlers savevm_dirty_bitmap_handlers = {
>   void dirty_bitmap_mig_init(void)
>   {
>       QSIMPLEQ_INIT(&dbm_state.save.dbms_list);
> -    qemu_mutex_init(&dbm_state.load.finish_lock);
> +    qemu_mutex_init(&dbm_state.load.lock);
>   
>       register_savevm_live("dirty-bitmap", 0, 1,
>                            &savevm_dirty_bitmap_handlers,
> 

Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
-- 
With the best regards,
Andrey Shinkevich


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 07/22] migration/block-dirty-bitmap: simplify dirty_bitmap_load_complete
  2020-02-17 15:02 ` [PATCH v2 07/22] migration/block-dirty-bitmap: simplify dirty_bitmap_load_complete Vladimir Sementsov-Ogievskiy
@ 2020-02-18 14:26   ` Andrey Shinkevich
  2020-02-19 15:30     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 80+ messages in thread
From: Andrey Shinkevich @ 2020-02-18 14:26 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Fam Zheng, qemu-block, quintela, dgilbert, Stefan Hajnoczi, John Snow

On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
> bdrv_enable_dirty_bitmap_locked() call does nothing, as if we are in
> postcopy, bitmap successor must be enabled, and reclaim operation will
> enable the bitmap.
> 
> So, actually we need just call _reclaim_ in both if branches, and
> making differences only to add an assertion seems not really good. The
> logic becomes simple: on load complete we do reclaim and that's all.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   migration/block-dirty-bitmap.c | 25 ++++---------------------
>   1 file changed, 4 insertions(+), 21 deletions(-)
> 
> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
> index 440c41cfca..9cc750d93b 100644
> --- a/migration/block-dirty-bitmap.c
> +++ b/migration/block-dirty-bitmap.c
> @@ -535,6 +535,10 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>   
>       qemu_mutex_lock(&s->lock);
>   
> +    if (bdrv_dirty_bitmap_has_successor(s->bitmap)) {
What about making it sure?
            assert(!s->bitmap->successor->disabled);

> +        bdrv_reclaim_dirty_bitmap(s->bitmap, &error_abort);
> +    }
> +
>       for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
>           LoadBitmapState *b = item->data;
>   
> @@ -544,27 +548,6 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>           }
>       }
>   
> -    if (bdrv_dirty_bitmap_has_successor(s->bitmap)) {
> -        bdrv_dirty_bitmap_lock(s->bitmap);
> -        if (s->enabled_bitmaps == NULL) {
> -            /* in postcopy */
> -            bdrv_reclaim_dirty_bitmap_locked(s->bitmap, &error_abort);
> -            bdrv_enable_dirty_bitmap_locked(s->bitmap);
> -        } else {
> -            /* target not started, successor must be empty */
> -            int64_t count = bdrv_get_dirty_count(s->bitmap);
> -            BdrvDirtyBitmap *ret = bdrv_reclaim_dirty_bitmap_locked(s->bitmap,
> -                                                                    NULL);
> -            /* bdrv_reclaim_dirty_bitmap can fail only on no successor (it
> -             * must be) or on merge fail, but merge can't fail when second
> -             * bitmap is empty
> -             */
> -            assert(ret == s->bitmap &&
> -                   count == bdrv_get_dirty_count(s->bitmap));
> -        }
> -        bdrv_dirty_bitmap_unlock(s->bitmap);
> -    }
> -
>       qemu_mutex_unlock(&s->lock);
>   }
>   
> 

Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
-- 
With the best regards,
Andrey Shinkevich


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 08/22] migration/block-dirty-bitmap: keep bitmap state for all bitmaps
  2020-02-17 15:02 ` [PATCH v2 08/22] migration/block-dirty-bitmap: keep bitmap state for all bitmaps Vladimir Sementsov-Ogievskiy
@ 2020-02-18 17:07   ` Andrey Shinkevich
  2020-07-23 21:30   ` Eric Blake
  1 sibling, 0 replies; 80+ messages in thread
From: Andrey Shinkevich @ 2020-02-18 17:07 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Fam Zheng, qemu-block, quintela, dgilbert, Stefan Hajnoczi, John Snow



On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
> Keep bitmap state for disabled bitmaps too. Keep the state until the
> end of the process. It's needed for the following commit to implement
> bitmap postcopy canceling.
> 
> To clean-up the new list the following logic is used:
> We need two events to consider bitmap migration finished:
> 1. chunk with DIRTY_BITMAP_MIG_FLAG_COMPLETE flag should be received
> 2. dirty_bitmap_mig_before_vm_start should be called
> These two events may come in any order, so we understand which one is
> last, and on the last of them we remove bitmap migration state from the
> list.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   migration/block-dirty-bitmap.c | 64 +++++++++++++++++++++++-----------
>   1 file changed, 43 insertions(+), 21 deletions(-)
> 
> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
> index 9cc750d93b..1329db8d7d 100644
> --- a/migration/block-dirty-bitmap.c
> +++ b/migration/block-dirty-bitmap.c
> @@ -132,6 +132,7 @@ typedef struct LoadBitmapState {
>       BlockDriverState *bs;
>       BdrvDirtyBitmap *bitmap;
>       bool migrated;
> +    bool enabled;
>   } LoadBitmapState;
>   
>   /* State of the dirty bitmap migration (DBM) during load process */
> @@ -142,8 +143,10 @@ typedef struct DBMLoadState {
>       BlockDriverState *bs;
>       BdrvDirtyBitmap *bitmap;
>   
> -    GSList *enabled_bitmaps;
> -    QemuMutex lock; /* protect enabled_bitmaps */
> +    bool before_vm_start_handled; /* set in dirty_bitmap_mig_before_vm_start */
> +
> +    GSList *bitmaps;
> +    QemuMutex lock; /* protect bitmaps */
>   } DBMLoadState;
>   
>   typedef struct DBMState {
> @@ -458,6 +461,7 @@ static int dirty_bitmap_load_start(QEMUFile *f, DBMLoadState *s)
>       Error *local_err = NULL;
>       uint32_t granularity = qemu_get_be32(f);
>       uint8_t flags = qemu_get_byte(f);
> +    LoadBitmapState *b;
>   
>       if (s->bitmap) {
>           error_report("Bitmap with the same name ('%s') already exists on "
> @@ -484,45 +488,59 @@ static int dirty_bitmap_load_start(QEMUFile *f, DBMLoadState *s)
>   
>       bdrv_disable_dirty_bitmap(s->bitmap);
>       if (flags & DIRTY_BITMAP_MIG_START_FLAG_ENABLED) {
> -        LoadBitmapState *b;
> -
>           bdrv_dirty_bitmap_create_successor(s->bitmap, &local_err);
>           if (local_err) {
>               error_report_err(local_err);
>               return -EINVAL;
>           }
> -
> -        b = g_new(LoadBitmapState, 1);
> -        b->bs = s->bs;
> -        b->bitmap = s->bitmap;
> -        b->migrated = false;
> -        s->enabled_bitmaps = g_slist_prepend(s->enabled_bitmaps, b);
>       }
>   
> +    b = g_new(LoadBitmapState, 1);
> +    b->bs = s->bs;
> +    b->bitmap = s->bitmap;
> +    b->migrated = false;
> +    b->enabled = flags & DIRTY_BITMAP_MIG_START_FLAG_ENABLED,
> +
> +    s->bitmaps = g_slist_prepend(s->bitmaps, b);
> +
>       return 0;
>   }
>   
> -void dirty_bitmap_mig_before_vm_start(void)
> +/*
> + * before_vm_start_handle_item
> + *
> + * g_slist_foreach helper
> + *
> + * item is LoadBitmapState*
> + * opaque is DBMLoadState*
> + */
> +static void before_vm_start_handle_item(void *item, void *opaque)
>   {
> -    DBMLoadState *s = &dbm_state.load;
> -    GSList *item;
> -
> -    qemu_mutex_lock(&s->lock);
> -
> -    for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
> -        LoadBitmapState *b = item->data;
> +    DBMLoadState *s = opaque;
> +    LoadBitmapState *b = item;
>   
> +    if (b->enabled) {
>           if (b->migrated) {
>               bdrv_enable_dirty_bitmap(b->bitmap);
>           } else {
>               bdrv_dirty_bitmap_enable_successor(b->bitmap);
>           }
> +    }
>   
> +    if (b->migrated) {
> +        s->bitmaps = g_slist_remove(s->bitmaps, b);
>           g_free(b);
>       }
> +}
>   
> -    g_slist_free(s->enabled_bitmaps);
> -    s->enabled_bitmaps = NULL;
> +void dirty_bitmap_mig_before_vm_start(void)
> +{
> +    DBMLoadState *s = &dbm_state.load;
> +    qemu_mutex_lock(&s->lock);
> +
> +    assert(!s->before_vm_start_handled);
> +    g_slist_foreach(s->bitmaps, before_vm_start_handle_item, s);
> +    s->before_vm_start_handled = true;
>   
>       qemu_mutex_unlock(&s->lock);
>   }
> @@ -539,11 +557,15 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>           bdrv_reclaim_dirty_bitmap(s->bitmap, &error_abort);
>       }
>   
> -    for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
> +    for (item = s->bitmaps; item; item = g_slist_next(item)) {
>           LoadBitmapState *b = item->data;
>   
>           if (b->bitmap == s->bitmap) {
>               b->migrated = true;
> +            if (s->before_vm_start_handled) {
> +                s->bitmaps = g_slist_remove(s->bitmaps, b);
> +                g_free(b);
> +            }
>               break;
>           }
>       }
> 

Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
-- 
With the best regards,
Andrey Shinkevich


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 09/22] migration/block-dirty-bitmap: relax error handling in incoming part
  2020-02-17 15:02 ` [PATCH v2 09/22] migration/block-dirty-bitmap: relax error handling in incoming part Vladimir Sementsov-Ogievskiy
@ 2020-02-18 18:54   ` Andrey Shinkevich
  2020-02-19 15:34     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 80+ messages in thread
From: Andrey Shinkevich @ 2020-02-18 18:54 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Fam Zheng, qemu-block, quintela, dgilbert, Stefan Hajnoczi, John Snow



On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
> Bitmaps data is not critical, and we should not fail the migration (or
> use postcopy recovering) because of dirty-bitmaps migration failure.
> Instead we should just lose unfinished bitmaps.
> 
> Still we have to report io stream violation errors, as they affect the
> whole migration stream.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   migration/block-dirty-bitmap.c | 148 +++++++++++++++++++++++++--------
>   1 file changed, 113 insertions(+), 35 deletions(-)
> 
> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
> index 1329db8d7d..aea5326804 100644
> --- a/migration/block-dirty-bitmap.c
> +++ b/migration/block-dirty-bitmap.c
> @@ -145,6 +145,15 @@ typedef struct DBMLoadState {
>   
>       bool before_vm_start_handled; /* set in dirty_bitmap_mig_before_vm_start */
>   
> +    /*
> +     * cancelled
> +     * Incoming migration is cancelled for some reason. That means that we
> +     * still should read our chunks from migration stream, to not affect other
> +     * migration objects (like RAM), but just ignore them and do not touch any
> +     * bitmaps or nodes.
> +     */
> +    bool cancelled;
> +
>       GSList *bitmaps;
>       QemuMutex lock; /* protect bitmaps */
>   } DBMLoadState;
> @@ -545,13 +554,47 @@ void dirty_bitmap_mig_before_vm_start(void)
>       qemu_mutex_unlock(&s->lock);
>   }
>   
> +static void cancel_incoming_locked(DBMLoadState *s)
> +{
> +    GSList *item;
> +
> +    if (s->cancelled) {
> +        return;
> +    }
> +
> +    s->cancelled = true;
> +    s->bs = NULL;
> +    s->bitmap = NULL;
> +
> +    /* Drop all unfinished bitmaps */
> +    for (item = s->bitmaps; item; item = g_slist_next(item)) {
> +        LoadBitmapState *b = item->data;
> +
> +        /*
> +         * Bitmap must be unfinished, as finished bitmaps should already be
> +         * removed from the list.
> +         */
> +        assert(!s->before_vm_start_handled || !b->migrated);
> +        if (bdrv_dirty_bitmap_has_successor(b->bitmap)) {
> +            bdrv_reclaim_dirty_bitmap(b->bitmap, &error_abort);
> +        }
> +        bdrv_release_dirty_bitmap(b->bitmap);
> +    }
> +
> +    g_slist_free_full(s->bitmaps, g_free);
> +    s->bitmaps = NULL;
> +}
> +
>   static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>   {
>       GSList *item;
>       trace_dirty_bitmap_load_complete();
> -    bdrv_dirty_bitmap_deserialize_finish(s->bitmap);
>   
> -    qemu_mutex_lock(&s->lock);

Why is it safe to remove the critical section?

> +    if (s->cancelled) {
> +        return;
> +    }
> +
> +    bdrv_dirty_bitmap_deserialize_finish(s->bitmap);
>   
>       if (bdrv_dirty_bitmap_has_successor(s->bitmap)) {
>           bdrv_reclaim_dirty_bitmap(s->bitmap, &error_abort);
> @@ -569,8 +612,6 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>               break;
>           }
>       }
> -
> -    qemu_mutex_unlock(&s->lock);
>   }
>   
>   static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
> @@ -582,15 +623,32 @@ static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
>   
>       if (s->flags & DIRTY_BITMAP_MIG_FLAG_ZEROES) {
>           trace_dirty_bitmap_load_bits_zeroes();
> -        bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_byte, nr_bytes,
> -                                             false);
> +        if (!s->cancelled) {
> +            bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_byte,
> +                                                 nr_bytes, false);
> +        }
>       } else {
>           size_t ret;
>           uint8_t *buf;
>           uint64_t buf_size = qemu_get_be64(f);
> -        uint64_t needed_size =
> -            bdrv_dirty_bitmap_serialization_size(s->bitmap,
> -                                                 first_byte, nr_bytes);
> +        uint64_t needed_size;
> +
> +        buf = g_malloc(buf_size);
> +        ret = qemu_get_buffer(f, buf, buf_size);
> +        if (ret != buf_size) {
> +            error_report("Failed to read bitmap bits");
> +            g_free(buf);
> +            return -EIO;
> +        }
> +
> +        if (s->cancelled) {
> +            g_free(buf);
> +            return 0;
> +        }
> +
> +        needed_size = bdrv_dirty_bitmap_serialization_size(s->bitmap,
> +                                                           first_byte,
> +                                                           nr_bytes);
>   
>           if (needed_size > buf_size ||
>               buf_size > QEMU_ALIGN_UP(needed_size, 4 * sizeof(long))
> @@ -599,15 +657,8 @@ static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
>               error_report("Migrated bitmap granularity doesn't "
>                            "match the destination bitmap '%s' granularity",
>                            bdrv_dirty_bitmap_name(s->bitmap));
> -            return -EINVAL;
> -        }
> -
> -        buf = g_malloc(buf_size);
> -        ret = qemu_get_buffer(f, buf, buf_size);
> -        if (ret != buf_size) {
> -            error_report("Failed to read bitmap bits");
> -            g_free(buf);
> -            return -EIO;
> +            cancel_incoming_locked(s);

                /* Continue the VM migration as bitmaps data are not 
critical */

> +            return 0;
>           }
>   
>           bdrv_dirty_bitmap_deserialize_part(s->bitmap, buf, first_byte, nr_bytes,
> @@ -632,14 +683,16 @@ static int dirty_bitmap_load_header(QEMUFile *f, DBMLoadState *s)
>               error_report("Unable to read node name string");
>               return -EINVAL;
>           }
> -        s->bs = bdrv_lookup_bs(s->node_name, s->node_name, &local_err);
> -        if (!s->bs) {
> -            error_report_err(local_err);
> -            return -EINVAL;
> +        if (!s->cancelled) {
> +            s->bs = bdrv_lookup_bs(s->node_name, s->node_name, &local_err);
> +            if (!s->bs) {
> +                error_report_err(local_err);

The error message may be supplemented with a report about the canceled 
bitmap migration. Also down there at cancel_incoming_locked(s).

> +                cancel_incoming_locked(s);
> +            }
>           }
> -    } else if (!s->bs && !nothing) {
> +    } else if (!s->bs && !nothing && !s->cancelled) {
>           error_report("Error: block device name is not set");
> -        return -EINVAL;
> +        cancel_incoming_locked(s);
>       }
>   
>       if (s->flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
> @@ -647,24 +700,38 @@ static int dirty_bitmap_load_header(QEMUFile *f, DBMLoadState *s)
>               error_report("Unable to read bitmap name string");
>               return -EINVAL;
>           }
> -        s->bitmap = bdrv_find_dirty_bitmap(s->bs, s->bitmap_name);
> -
> -        /* bitmap may be NULL here, it wouldn't be an error if it is the
> -         * first occurrence of the bitmap */
> -        if (!s->bitmap && !(s->flags & DIRTY_BITMAP_MIG_FLAG_START)) {
> -            error_report("Error: unknown dirty bitmap "
> -                         "'%s' for block device '%s'",
> -                         s->bitmap_name, s->node_name);
> -            return -EINVAL;
> +        if (!s->cancelled) {
> +            s->bitmap = bdrv_find_dirty_bitmap(s->bs, s->bitmap_name);
> +
> +            /*
> +             * bitmap may be NULL here, it wouldn't be an error if it is the
> +             * first occurrence of the bitmap
> +             */
> +            if (!s->bitmap && !(s->flags & DIRTY_BITMAP_MIG_FLAG_START)) {
> +                error_report("Error: unknown dirty bitmap "
> +                             "'%s' for block device '%s'",
> +                             s->bitmap_name, s->node_name);
> +                cancel_incoming_locked(s);
> +            }
>           }
> -    } else if (!s->bitmap && !nothing) {
> +    } else if (!s->bitmap && !nothing && !s->cancelled) {
>           error_report("Error: block device name is not set");
> -        return -EINVAL;
> +        cancel_incoming_locked(s);
>       }
>   
>       return 0;
>   }
>   
> +/*
> + * dirty_bitmap_load
> + *
> + * Load sequence of dirty bitmap chunks. Return error only on fatal io stream
> + * violations. On other errors just cancel bitmaps incoming migration and return
> + * 0.
> + *
> + * Note, than when incoming bitmap migration is canceled, we still must read all
"than (that)" may be omitted

> + * our chunks (and just ignore them), to not affect other migration objects.
> + */
>   static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
>   {
>       DBMLoadState *s = &((DBMState *)opaque)->load;
> @@ -673,12 +740,19 @@ static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
>       trace_dirty_bitmap_load_enter();
>   
>       if (version_id != 1) {
> +        qemu_mutex_lock(&s->lock);
> +        cancel_incoming_locked(s);
> +        qemu_mutex_unlock(&s->lock);
>           return -EINVAL;
>       }
>   
>       do {
> +        qemu_mutex_lock(&s->lock);
> +
>           ret = dirty_bitmap_load_header(f, s);
>           if (ret < 0) {
> +            cancel_incoming_locked(s);
> +            qemu_mutex_unlock(&s->lock);
>               return ret;
>           }
>   
> @@ -695,8 +769,12 @@ static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
>           }
>   
>           if (ret) {
> +            cancel_incoming_locked(s);
> +            qemu_mutex_unlock(&s->lock);
>               return ret;
>           }
> +
> +        qemu_mutex_unlock(&s->lock);
>       } while (!(s->flags & DIRTY_BITMAP_MIG_FLAG_EOS));
>   
>       trace_dirty_bitmap_load_success();
> 

Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
-- 
With the best regards,
Andrey Shinkevich


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 10/22] migration/block-dirty-bitmap: cancel migration on shutdown
  2020-02-17 15:02 ` [PATCH v2 10/22] migration/block-dirty-bitmap: cancel migration on shutdown Vladimir Sementsov-Ogievskiy
@ 2020-02-18 19:11   ` Andrey Shinkevich
  2020-07-23 21:04   ` Eric Blake
  1 sibling, 0 replies; 80+ messages in thread
From: Andrey Shinkevich @ 2020-02-18 19:11 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Fam Zheng, qemu-block, quintela, dgilbert, Stefan Hajnoczi, John Snow



On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
> If target is turned of prior to postcopy finished, target crashes
> because busy bitmaps are found at shutdown.
> Canceling incoming migration helps, as it removes all unfinished (and
> therefore busy) bitmaps.
> 
> Similarly on source we crash in bdrv_close_all which asserts that all
> bdrv states are removed, because bdrv states involved into dirty bitmap
> migration are referenced by it. So, we need to cancel outgoing
> migration as well.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   migration/migration.h          |  2 ++
>   migration/block-dirty-bitmap.c | 16 ++++++++++++++++
>   migration/migration.c          | 13 +++++++++++++
>   3 files changed, 31 insertions(+)
> 
> diff --git a/migration/migration.h b/migration/migration.h
> index 2948f2387b..2de6b8bbe2 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -332,6 +332,8 @@ void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
>   void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
>   
>   void dirty_bitmap_mig_before_vm_start(void);
> +void dirty_bitmap_mig_cancel_outgoing(void);
> +void dirty_bitmap_mig_cancel_incoming(void);
>   void migrate_add_address(SocketAddress *address);
>   
>   int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
> index aea5326804..3ca425d95e 100644
> --- a/migration/block-dirty-bitmap.c
> +++ b/migration/block-dirty-bitmap.c
> @@ -585,6 +585,22 @@ static void cancel_incoming_locked(DBMLoadState *s)
>       s->bitmaps = NULL;
>   }
>   
> +void dirty_bitmap_mig_cancel_outgoing(void)
> +{
> +    dirty_bitmap_do_save_cleanup(&dbm_state.save);

The comment above the dirty_bitmap_do_save_cleanup() says:
"Called with iothread lock taken"

> +}
> +
> +void dirty_bitmap_mig_cancel_incoming(void)
> +{
> +    DBMLoadState *s = &dbm_state.load;
> +
> +    qemu_mutex_lock(&s->lock);
> +
> +    cancel_incoming_locked(s);
> +
> +    qemu_mutex_unlock(&s->lock);
> +}
> +
>   static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>   {
>       GSList *item;
> diff --git a/migration/migration.c b/migration/migration.c
> index 515047932c..7c605ba218 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -181,6 +181,19 @@ void migration_shutdown(void)
>        */
>       migrate_fd_cancel(current_migration);
>       object_unref(OBJECT(current_migration));
> +
> +    /*
> +     * Cancel outgoing migration of dirty bitmaps. It should
> +     * at least unref used block nodes.
> +     */
> +    dirty_bitmap_mig_cancel_outgoing();
> +
> +    /*
> +     * Cancel incoming migration of dirty bitmaps. Dirty bitmaps
> +     * are non-critical data, and their loss never considered as
> +     * something serious.
> +     */
> +    dirty_bitmap_mig_cancel_incoming();
>   }
>   
>   /* For outgoing */
> 

Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
-- 
With the best regards,
Andrey Shinkevich


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 11/22] migration/savevm: don't worry if bitmap migration postcopy failed
  2020-02-17 15:02 ` [PATCH v2 11/22] migration/savevm: don't worry if bitmap migration postcopy failed Vladimir Sementsov-Ogievskiy
  2020-02-17 16:57   ` Dr. David Alan Gilbert
@ 2020-02-18 19:44   ` Andrey Shinkevich
  1 sibling, 0 replies; 80+ messages in thread
From: Andrey Shinkevich @ 2020-02-18 19:44 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel; +Cc: dgilbert, quintela

On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
> First, if only bitmaps postcopy enabled (not ram postcopy)
> postcopy_pause_incoming crashes on assertion assert(mis->to_src_file).
> 
> And anyway, bitmaps postcopy is not prepared to be somehow recovered.
> The original idea instead is that if bitmaps postcopy failed, we just
> loss some bitmaps, which is not critical. So, on failure we just need
> to remove unfinished bitmaps and guest should continue execution on
> destination.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   migration/savevm.c | 37 ++++++++++++++++++++++++++++++++-----
>   1 file changed, 32 insertions(+), 5 deletions(-)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 1d4220ece8..7e9dd58ccb 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1812,6 +1812,9 @@ static void *postcopy_ram_listen_thread(void *opaque)
>       MigrationIncomingState *mis = migration_incoming_get_current();
>       QEMUFile *f = mis->from_src_file;
>       int load_res;
> +    MigrationState *migr = migrate_get_current();
> +
> +    object_ref(OBJECT(migr));
>   
>       migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
>                                      MIGRATION_STATUS_POSTCOPY_ACTIVE);
> @@ -1838,11 +1841,24 @@ static void *postcopy_ram_listen_thread(void *opaque)
>   
>       trace_postcopy_ram_listen_thread_exit();
>       if (load_res < 0) {
> -        error_report("%s: loadvm failed: %d", __func__, load_res);
>           qemu_file_set_error(f, load_res);
> -        migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> -                                       MIGRATION_STATUS_FAILED);
> -    } else {
> +        dirty_bitmap_mig_cancel_incoming();
> +        if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
> +            !migrate_postcopy_ram() && migrate_dirty_bitmaps())
> +        {
> +            error_report("%s: loadvm failed during postcopy: %d. All state is "
> +                         "migrated except for dirty bitmaps. Some dirty "

"All states migrated except dirty bitmaps"

> +                         "bitmaps may be lost, and present migrated dirty "
> +                         "bitmaps are correctly migrated and valid.",
> +                         __func__, load_res);
> +            load_res = 0; /* prevent further exit() */
> +        } else {
> +            error_report("%s: loadvm failed: %d", __func__, load_res);
> +            migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> +                                           MIGRATION_STATUS_FAILED);
> +        }
> +    }
> +    if (load_res >= 0) {
>           /*
>            * This looks good, but it's possible that the device loading in the
>            * main thread hasn't finished yet, and so we might not be in 'RUN'
> @@ -1878,6 +1894,8 @@ static void *postcopy_ram_listen_thread(void *opaque)
>       mis->have_listen_thread = false;
>       postcopy_state_set(POSTCOPY_INCOMING_END);
>   
> +    object_unref(OBJECT(migr));
> +
>       return NULL;
>   }
>   
> @@ -2429,6 +2447,8 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
>   {
>       trace_postcopy_pause_incoming();
>   
> +    assert(migrate_postcopy_ram());
> +
>       /* Clear the triggered bit to allow one recovery */
>       mis->postcopy_recover_triggered = false;
>   
> @@ -2513,15 +2533,22 @@ out:
>       if (ret < 0) {
>           qemu_file_set_error(f, ret);
>   
> +        /* Cancel bitmaps incoming regardless of recovery */
> +        dirty_bitmap_mig_cancel_incoming();
> +
>           /*
>            * If we are during an active postcopy, then we pause instead
>            * of bail out to at least keep the VM's dirty data.  Note
>            * that POSTCOPY_INCOMING_LISTENING stage is still not enough,
>            * during which we're still receiving device states and we
>            * still haven't yet started the VM on destination.
> +         *
> +         * Only RAM postcopy supports recovery. Still, if RAM postcopy is
> +         * enabled, canceled bitmaps postcopy will not affect RAM postcopy
> +         * recovering.
>            */
>           if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
> -            postcopy_pause_incoming(mis)) {
> +            migrate_postcopy_ram() && postcopy_pause_incoming(mis)) {
>               /* Reset f to point to the newly created channel */
>               f = mis->from_src_file;
>               goto retry;
> 

Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
-- 
With the best regards,
Andrey Shinkevich


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 00/22] Fix error handling during bitmap postcopy
  2020-02-17 15:02 [PATCH v2 00/22] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (22 preceding siblings ...)
  2020-02-17 19:31 ` [PATCH v2 00/22] Fix error handling during bitmap postcopy no-reply
@ 2020-02-18 20:02 ` Andrey Shinkevich
  2020-02-18 20:57   ` Eric Blake
  2020-04-02  7:42 ` Vladimir Sementsov-Ogievskiy
  24 siblings, 1 reply; 80+ messages in thread
From: Andrey Shinkevich @ 2020-02-18 20:02 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Fam Zheng, Kevin Wolf, Eduardo Habkost, qemu-block, quintela,
	qemu-stable, dgilbert, Max Reitz, Stefan Hajnoczi, Cleber Rosa,
	John Snow

qemu-iotests:$ ./check -qcow2
PASSED
(except always failed 261 and 272)

Andrey

On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
> Original idea of bitmaps postcopy migration is that bitmaps are non
> critical data, and their loss is not serious problem. So, using postcopy
> method on any failure we should just drop unfinished bitmaps and
> continue guest execution.
> 
> However, it doesn't work so. It crashes, fails, it goes to
> postcopy-recovery feature. It does anything except for behavior we want.
> These series fixes at least some problems with error handling during
> bitmaps migration postcopy.
> 
> v1 was "[PATCH 0/7] Fix crashes on early shutdown during bitmaps postcopy"
> 
> v2:
> 
> Most of patches are new or changed a lot.
> Only patches 06,07 mostly unchanged, just rebased on refactorings.
> 
> Vladimir Sementsov-Ogievskiy (22):
>    migration/block-dirty-bitmap: fix dirty_bitmap_mig_before_vm_start
>    migration/block-dirty-bitmap: rename state structure types
>    migration/block-dirty-bitmap: rename dirty_bitmap_mig_cleanup
>    migration/block-dirty-bitmap: move mutex init to dirty_bitmap_mig_init
>    migration/block-dirty-bitmap: refactor state global variables
>    migration/block-dirty-bitmap: rename finish_lock to just lock
>    migration/block-dirty-bitmap: simplify dirty_bitmap_load_complete
>    migration/block-dirty-bitmap: keep bitmap state for all bitmaps
>    migration/block-dirty-bitmap: relax error handling in incoming part
>    migration/block-dirty-bitmap: cancel migration on shutdown
>    migration/savevm: don't worry if bitmap migration postcopy failed
>    qemu-iotests/199: fix style
>    qemu-iotests/199: drop extra constraints
>    qemu-iotests/199: better catch postcopy time
>    qemu-iotests/199: improve performance: set bitmap by discard
>    qemu-iotests/199: change discard patterns
>    qemu-iotests/199: increase postcopy period
>    python/qemu/machine: add kill() method
>    qemu-iotests/199: prepare for new test-cases addition
>    qemu-iotests/199: check persistent bitmaps
>    qemu-iotests/199: add early shutdown case to bitmaps postcopy
>    qemu-iotests/199: add source-killed case to bitmaps postcopy
> 
> Cc: John Snow <jsnow@redhat.com>
> Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Cc: Fam Zheng <fam@euphon.net>
> Cc: Juan Quintela <quintela@redhat.com>
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Cleber Rosa <crosa@redhat.com>
> Cc: Kevin Wolf <kwolf@redhat.com>
> Cc: Max Reitz <mreitz@redhat.com>
> Cc: qemu-block@nongnu.org
> Cc: qemu-devel@nongnu.org
> Cc: qemu-stable@nongnu.org # for patch 01
> 
>   migration/migration.h          |   3 +-
>   migration/block-dirty-bitmap.c | 444 +++++++++++++++++++++------------
>   migration/migration.c          |  15 +-
>   migration/savevm.c             |  37 ++-
>   python/qemu/machine.py         |  12 +-
>   tests/qemu-iotests/199         | 244 ++++++++++++++----
>   tests/qemu-iotests/199.out     |   4 +-
>   7 files changed, 529 insertions(+), 230 deletions(-)
> 

-- 
With the best regards,
Andrey Shinkevich


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 00/22] Fix error handling during bitmap postcopy
  2020-02-18 20:02 ` Andrey Shinkevich
@ 2020-02-18 20:57   ` Eric Blake
  2020-02-19 13:25     ` Andrey Shinkevich
  0 siblings, 1 reply; 80+ messages in thread
From: Eric Blake @ 2020-02-18 20:57 UTC (permalink / raw)
  To: Andrey Shinkevich, Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Fam Zheng, Kevin Wolf, Eduardo Habkost, qemu-block, quintela,
	qemu-stable, dgilbert, Stefan Hajnoczi, Cleber Rosa, Max Reitz,
	John Snow

On 2/18/20 2:02 PM, Andrey Shinkevich wrote:
> qemu-iotests:$ ./check -qcow2
> PASSED
> (except always failed 261 and 272)

Have you reported those failures on the threads that introduced those tests?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 12/22] qemu-iotests/199: fix style
  2020-02-17 15:02 ` [PATCH v2 12/22] qemu-iotests/199: fix style Vladimir Sementsov-Ogievskiy
@ 2020-02-19  7:04   ` Andrey Shinkevich
  2020-07-23 22:03   ` Eric Blake
  1 sibling, 0 replies; 80+ messages in thread
From: Andrey Shinkevich @ 2020-02-19  7:04 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Max Reitz, dgilbert, qemu-block, quintela

On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
> Mostly, satisfy pep8 complains.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   tests/qemu-iotests/199 | 13 +++++++------
>   1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
> index 40774eed74..de9ba8d94c 100755
> --- a/tests/qemu-iotests/199
> +++ b/tests/qemu-iotests/199
> @@ -28,8 +28,8 @@ disk_b = os.path.join(iotests.test_dir, 'disk_b')
>   size = '256G'
>   fifo = os.path.join(iotests.test_dir, 'mig_fifo')
>   
> -class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>   
> +class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>       def tearDown(self):
>           self.vm_a.shutdown()
>           self.vm_b.shutdown()
> @@ -54,7 +54,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>   
>           result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
>                                  name='bitmap', granularity=granularity)
> -        self.assert_qmp(result, 'return', {});
> +        self.assert_qmp(result, 'return', {})
>   
>           s = 0
>           while s < write_size:
> @@ -71,7 +71,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>   
>           result = self.vm_a.qmp('block-dirty-bitmap-clear', node='drive0',
>                                  name='bitmap')
> -        self.assert_qmp(result, 'return', {});
> +        self.assert_qmp(result, 'return', {})
>           s = 0
>           while s < write_size:
>               self.vm_a.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
> @@ -104,15 +104,16 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>               self.vm_b.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
>               s += 0x10000
>   
> -        result = self.vm_b.qmp('query-block');
> +        result = self.vm_b.qmp('query-block')
>           while len(result['return'][0]['dirty-bitmaps']) > 1:
>               time.sleep(2)
> -            result = self.vm_b.qmp('query-block');
> +            result = self.vm_b.qmp('query-block')
>   
>           result = self.vm_b.qmp('x-debug-block-dirty-bitmap-sha256',
>                                  node='drive0', name='bitmap')
>   
> -        self.assert_qmp(result, 'return/sha256', sha256);
> +        self.assert_qmp(result, 'return/sha256', sha256)
> +
>   
>   if __name__ == '__main__':
>       iotests.main(supported_fmts=['qcow2'], supported_cache_modes=['none'],
> 

Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
-- 
With the best regards,
Andrey Shinkevich


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 13/22] qemu-iotests/199: drop extra constraints
  2020-02-17 15:02 ` [PATCH v2 13/22] qemu-iotests/199: drop extra constraints Vladimir Sementsov-Ogievskiy
@ 2020-02-19  8:02   ` Andrey Shinkevich
  0 siblings, 0 replies; 80+ messages in thread
From: Andrey Shinkevich @ 2020-02-19  8:02 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Max Reitz, dgilbert, qemu-block, quintela



On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
> We don't need any specific format constraints here. Still keep qcow2
> for two reasons:
> 1. No extra calls of format-unrelated test
> 2. Add some check around persistent bitmap in future (require qcow2)
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   tests/qemu-iotests/199 | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
> index de9ba8d94c..dda918450a 100755
> --- a/tests/qemu-iotests/199
> +++ b/tests/qemu-iotests/199
> @@ -116,5 +116,4 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>   
>   
>   if __name__ == '__main__':
> -    iotests.main(supported_fmts=['qcow2'], supported_cache_modes=['none'],
> -                 supported_protocols=['file'])
> +    iotests.main(supported_fmts=['qcow2'])
> 

Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
-- 
With the best regards,
Andrey Shinkevich


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 14/22] qemu-iotests/199: better catch postcopy time
  2020-02-17 15:02 ` [PATCH v2 14/22] qemu-iotests/199: better catch postcopy time Vladimir Sementsov-Ogievskiy
@ 2020-02-19 13:16   ` Andrey Shinkevich
  2020-02-19 15:44     ` Vladimir Sementsov-Ogievskiy
  2020-07-24  6:50     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 2 replies; 80+ messages in thread
From: Andrey Shinkevich @ 2020-02-19 13:16 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Max Reitz, dgilbert, qemu-block, quintela

On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
> The test aims to test _postcopy_ migration, and wants to do some write
> operations during postcopy time.
> 
> Test considers migrate status=complete event on source as start of
> postcopy. This is completely wrong, completion is completion of the
> whole migration process. Let's instead consider destination start as
> start of postcopy, and use RESUME event for it.
> 
> Next, as migration finish, let's use migration status=complete event on
> target, as such method is closer to what libvirt or another user will
> do, than tracking number of dirty-bitmaps.
> 
> Finally, add a possibility to dump events for debug. And if
> set debug to True, we see, that actual postcopy period is very small
> relatively to the whole test duration time (~0.2 seconds to >40 seconds
> for me). This means, that test is very inefficient in what it supposed
> to do. Let's improve it in following commits.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   tests/qemu-iotests/199 | 72 +++++++++++++++++++++++++++++++++---------
>   1 file changed, 57 insertions(+), 15 deletions(-)
> 
> diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
> index dda918450a..6599fc6fb4 100755
> --- a/tests/qemu-iotests/199
> +++ b/tests/qemu-iotests/199
> @@ -20,17 +20,43 @@
>   
>   import os
>   import iotests
> -import time
>   from iotests import qemu_img
>   
> +debug = False
> +
>   disk_a = os.path.join(iotests.test_dir, 'disk_a')
>   disk_b = os.path.join(iotests.test_dir, 'disk_b')
>   size = '256G'
>   fifo = os.path.join(iotests.test_dir, 'mig_fifo')
>   
>   
> +def event_seconds(event):
> +    return event['timestamp']['seconds'] + \
> +        event['timestamp']['microseconds'] / 1000000.0
> +
> +
> +def event_dist(e1, e2):
> +    return event_seconds(e2) - event_seconds(e1)
> +
> +
>   class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>       def tearDown(self):
It's common to put the definition of setUp() ahead

> +        if debug:
> +            self.vm_a_events += self.vm_a.get_qmp_events()
> +            self.vm_b_events += self.vm_b.get_qmp_events()
> +            for e in self.vm_a_events:
> +                e['vm'] = 'SRC'
> +            for e in self.vm_b_events:
> +                e['vm'] = 'DST'
> +            events = (self.vm_a_events + self.vm_b_events)
> +            events = [(e['timestamp']['seconds'],
> +                       e['timestamp']['microseconds'],
> +                       e['vm'],
> +                       e['event'],
> +                       e.get('data', '')) for e in events]
> +            for e in sorted(events):
> +                print('{}.{:06} {} {} {}'.format(*e))
> +
>           self.vm_a.shutdown()
>           self.vm_b.shutdown()
>           os.remove(disk_a)
> @@ -47,6 +73,10 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>           self.vm_a.launch()
>           self.vm_b.launch()
>   
> +        # collect received events for debug
> +        self.vm_a_events = []
> +        self.vm_b_events = []
> +
>       def test_postcopy(self):
>           write_size = 0x40000000
>           granularity = 512
> @@ -77,15 +107,13 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>               self.vm_a.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
>               s += 0x10000
>   
> -        bitmaps_cap = {'capability': 'dirty-bitmaps', 'state': True}
> -        events_cap = {'capability': 'events', 'state': True}
> +        caps = [{'capability': 'dirty-bitmaps', 'state': True},
The name "capabilities" would be an appropriate identifier.

> +                {'capability': 'events', 'state': True}]
>   
> -        result = self.vm_a.qmp('migrate-set-capabilities',
> -                               capabilities=[bitmaps_cap, events_cap])
> +        result = self.vm_a.qmp('migrate-set-capabilities', capabilities=caps)
>           self.assert_qmp(result, 'return', {})
>   
> -        result = self.vm_b.qmp('migrate-set-capabilities',
> -                               capabilities=[bitmaps_cap])
> +        result = self.vm_b.qmp('migrate-set-capabilities', capabilities=caps)
>           self.assert_qmp(result, 'return', {})
>   
>           result = self.vm_a.qmp('migrate', uri='exec:cat>' + fifo)
> @@ -94,24 +122,38 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>           result = self.vm_a.qmp('migrate-start-postcopy')
>           self.assert_qmp(result, 'return', {})
>   
> -        while True:
> -            event = self.vm_a.event_wait('MIGRATION')
> -            if event['data']['status'] == 'completed':
> -                break
> +        e_resume = self.vm_b.event_wait('RESUME')
"event_resume" gives a faster understanding

> +        self.vm_b_events.append(e_resume)
>   
>           s = 0x8000
>           while s < write_size:
>               self.vm_b.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
>               s += 0x10000
>   
> +        match = {'data': {'status': 'completed'}}
> +        e_complete = self.vm_b.event_wait('MIGRATION', match=match)
"event_complete" also

> +        self.vm_b_events.append(e_complete)
> +
> +        # take queued event, should already been happened
> +        e_stop = self.vm_a.event_wait('STOP')
"event_stop"

> +        self.vm_a_events.append(e_stop)
> +
> +        downtime = event_dist(e_stop, e_resume)
> +        postcopy_time = event_dist(e_resume, e_complete)
> +
> +        # TODO: assert downtime * 10 < postcopy_time

I got the results below in debug mode:

downtime: 6.194924831390381
postcopy_time: 0.1592559814453125
1582102669.764919 SRC MIGRATION {'status': 'setup'}
1582102669.766179 SRC MIGRATION_PASS {'pass': 1}
1582102669.766234 SRC MIGRATION {'status': 'active'}
1582102669.768058 DST MIGRATION {'status': 'active'}
1582102669.801422 SRC MIGRATION {'status': 'postcopy-active'}
1582102669.801510 SRC STOP
1582102675.990041 DST MIGRATION {'status': 'postcopy-active'}
1582102675.996435 DST RESUME
1582102676.111313 SRC MIGRATION {'status': 'completed'}
1582102676.155691 DST MIGRATION {'status': 'completed'}

> +        if debug:
with no usage in the following patches, you can put the whole block of 
relative code above under the "if debug: section

> +            print('downtime:', downtime)
> +            print('postcopy_time:', postcopy_time)
> +
> +        # Assert that bitmap migration is finished (check that successor bitmap
> +        # is removed)
>           result = self.vm_b.qmp('query-block')
> -        while len(result['return'][0]['dirty-bitmaps']) > 1:
> -            time.sleep(2)
> -            result = self.vm_b.qmp('query-block')
> +        assert len(result['return'][0]['dirty-bitmaps']) == 1
>   
> +        # Check content of migrated (and updated by new writes) bitmap
>           result = self.vm_b.qmp('x-debug-block-dirty-bitmap-sha256',
>                                  node='drive0', name='bitmap')
> -
>           self.assert_qmp(result, 'return/sha256', sha256)
>   
>   
> 

Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
-- 
With the best regards,
Andrey Shinkevich


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 00/22] Fix error handling during bitmap postcopy
  2020-02-18 20:57   ` Eric Blake
@ 2020-02-19 13:25     ` Andrey Shinkevich
  2020-02-19 13:36       ` Eric Blake
  0 siblings, 1 reply; 80+ messages in thread
From: Andrey Shinkevich @ 2020-02-19 13:25 UTC (permalink / raw)
  To: Eric Blake, Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Fam Zheng, Kevin Wolf, Eduardo Habkost, qemu-block, quintela,
	qemu-stable, dgilbert, Stefan Hajnoczi, Cleber Rosa, Max Reitz,
	John Snow



On 18/02/2020 23:57, Eric Blake wrote:
> On 2/18/20 2:02 PM, Andrey Shinkevich wrote:
>> qemu-iotests:$ ./check -qcow2
>> PASSED
>> (except always failed 261 and 272)
> 
> Have you reported those failures on the threads that introduced those 
> tests?
> 

Not yet unfortunately. I have not investigated the case.
"$ ./check -qcow2 261" dumps

+od: unrecognized option '--endian=big'
+Try 'od --help' for more information.
+od: invalid -N argument '--endian=big'
+qemu-img: Could not open 'TEST_DIR/t.IMGFMT': IMGFMT header exceeds 
cluster size

and "$ ./check -qcow2 272" dumps

+od: unrecognized option '--endian=big'
+Try 'od --help' for more information.
+od: invalid -N argument '--endian=big'
+qemu-io: can't open device .../qemu/tests/qemu-iotests/scratch/t.qcow2: 
Image is not in qcow2 format

-- 
With the best regards,
Andrey Shinkevich


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 00/22] Fix error handling during bitmap postcopy
  2020-02-19 13:25     ` Andrey Shinkevich
@ 2020-02-19 13:36       ` Eric Blake
  2020-02-19 13:52         ` Andrey Shinkevich
  2020-02-19 14:00         ` Eric Blake
  0 siblings, 2 replies; 80+ messages in thread
From: Eric Blake @ 2020-02-19 13:36 UTC (permalink / raw)
  To: Andrey Shinkevich, Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Fam Zheng, Kevin Wolf, Eduardo Habkost, qemu-block, quintela,
	qemu-stable, dgilbert, Stefan Hajnoczi, Cleber Rosa, Max Reitz,
	John Snow

On 2/19/20 7:25 AM, Andrey Shinkevich wrote:
> 
> 
> On 18/02/2020 23:57, Eric Blake wrote:
>> On 2/18/20 2:02 PM, Andrey Shinkevich wrote:
>>> qemu-iotests:$ ./check -qcow2
>>> PASSED
>>> (except always failed 261 and 272)
>>
>> Have you reported those failures on the threads that introduced those 
>> tests?
>>
> 
> Not yet unfortunately. I have not investigated the case.
> "$ ./check -qcow2 261" dumps
> 
> +od: unrecognized option '--endian=big'
> +Try 'od --help' for more information.
> +od: invalid -N argument '--endian=big'
> +qemu-img: Could not open 'TEST_DIR/t.IMGFMT': IMGFMT header exceeds 
> cluster size

Which version of od are you using?  I do recall wondering whether 
reliance on the GNU coreutils extension --endian=big was going to cause 
problems later - well, here we are, it's later :)

https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg06781.html

> 
> and "$ ./check -qcow2 272" dumps
> 
> +od: unrecognized option '--endian=big'
> +Try 'od --help' for more information.
> +od: invalid -N argument '--endian=big'

Yay, same problem for both tests.  Fix common.rc once, and both tests 
should start working for you.

> +qemu-io: can't open device .../qemu/tests/qemu-iotests/scratch/t.qcow2: 
> Image is not in qcow2 format
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 00/22] Fix error handling during bitmap postcopy
  2020-02-19 13:36       ` Eric Blake
@ 2020-02-19 13:52         ` Andrey Shinkevich
  2020-02-19 14:58           ` Eric Blake
  2020-02-19 14:00         ` Eric Blake
  1 sibling, 1 reply; 80+ messages in thread
From: Andrey Shinkevich @ 2020-02-19 13:52 UTC (permalink / raw)
  To: Eric Blake, Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Fam Zheng, Kevin Wolf, Eduardo Habkost, qemu-block, quintela,
	qemu-stable, dgilbert, Stefan Hajnoczi, Cleber Rosa, Max Reitz,
	John Snow



On 19/02/2020 16:36, Eric Blake wrote:
> On 2/19/20 7:25 AM, Andrey Shinkevich wrote:
>>
>>
>> On 18/02/2020 23:57, Eric Blake wrote:
>>> On 2/18/20 2:02 PM, Andrey Shinkevich wrote:
>>>> qemu-iotests:$ ./check -qcow2
>>>> PASSED
>>>> (except always failed 261 and 272)
>>>
>>> Have you reported those failures on the threads that introduced those 
>>> tests?
>>>
>>
>> Not yet unfortunately. I have not investigated the case.
>> "$ ./check -qcow2 261" dumps
>>
>> +od: unrecognized option '--endian=big'
>> +Try 'od --help' for more information.
>> +od: invalid -N argument '--endian=big'
>> +qemu-img: Could not open 'TEST_DIR/t.IMGFMT': IMGFMT header exceeds 
>> cluster size
> 
> Which version of od are you using?  I do recall wondering whether 

$ od --version
od (GNU coreutils) 8.22
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later...

> reliance on the GNU coreutils extension --endian=big was going to cause 
> problems later - well, here we are, it's later :)
> 
> https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg06781.html
> 
>>
>> and "$ ./check -qcow2 272" dumps
>>
>> +od: unrecognized option '--endian=big'
>> +Try 'od --help' for more information.
>> +od: invalid -N argument '--endian=big'
> 
> Yay, same problem for both tests.  Fix common.rc once, and both tests 
> should start working for you.

Thank you Eric! I want to sort it out later...

> 
>> +qemu-io: can't open device 
>> .../qemu/tests/qemu-iotests/scratch/t.qcow2: Image is not in qcow2 format
>>
> 

-- 
With the best regards,
Andrey Shinkevich



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 00/22] Fix error handling during bitmap postcopy
  2020-02-19 13:36       ` Eric Blake
  2020-02-19 13:52         ` Andrey Shinkevich
@ 2020-02-19 14:00         ` Eric Blake
  1 sibling, 0 replies; 80+ messages in thread
From: Eric Blake @ 2020-02-19 14:00 UTC (permalink / raw)
  To: Andrey Shinkevich, Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Fam Zheng, Kevin Wolf, Eduardo Habkost, qemu-block, quintela,
	qemu-stable, dgilbert, Stefan Hajnoczi, Cleber Rosa, Max Reitz,
	John Snow

On 2/19/20 7:36 AM, Eric Blake wrote:

>> +od: unrecognized option '--endian=big'
>> +Try 'od --help' for more information.
>> +od: invalid -N argument '--endian=big'
>> +qemu-img: Could not open 'TEST_DIR/t.IMGFMT': IMGFMT header exceeds 
>> cluster size
> 
> Which version of od are you using?  I do recall wondering whether 
> reliance on the GNU coreutils extension --endian=big was going to cause 
> problems later - well, here we are, it's later :)
> 
> https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg06781.html

coreutils documents that od --endian was added in 8.23, released in 
2014-07-18.  Per https://wiki.qemu.org/Supported_Build_Platforms, we 
still have support for RHEL 7 through 2022, and it was first released 
2014-06-09 (all other supported distros have newer releases, but I 
didn't check what coreutils version they included, or even if the BSD 
builds which don't use coreutils would also be impacted by this 
problem).  Still, I'd like to know your specific setup, and why the CI 
tools have not flagged it.

But even one counterexample within the bounds of our supported distro 
page is a good argument that use of od --endian is not yet portable. 
Or, if your setup is not on the supported page, it becomes a question of 
whether it should be added or whether you should upgrade to something 
that is supported.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 15/22] qemu-iotests/199: improve performance: set bitmap by discard
  2020-02-17 15:02 ` [PATCH v2 15/22] qemu-iotests/199: improve performance: set bitmap by discard Vladimir Sementsov-Ogievskiy
@ 2020-02-19 14:17   ` Andrey Shinkevich
  0 siblings, 0 replies; 80+ messages in thread
From: Andrey Shinkevich @ 2020-02-19 14:17 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Max Reitz, dgilbert, qemu-block, quintela

On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
> Discard dirties dirty-bitmap as well as write, but works faster. Let's
> use it instead.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   tests/qemu-iotests/199 | 31 ++++++++++++++++++++-----------
>   1 file changed, 20 insertions(+), 11 deletions(-)
> 
> diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
> index 6599fc6fb4..d78f81b71c 100755
> --- a/tests/qemu-iotests/199
> +++ b/tests/qemu-iotests/199
> @@ -67,8 +67,10 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>           os.mkfifo(fifo)
>           qemu_img('create', '-f', iotests.imgfmt, disk_a, size)
>           qemu_img('create', '-f', iotests.imgfmt, disk_b, size)
> -        self.vm_a = iotests.VM(path_suffix='a').add_drive(disk_a)
> -        self.vm_b = iotests.VM(path_suffix='b').add_drive(disk_b)
> +        self.vm_a = iotests.VM(path_suffix='a').add_drive(disk_a,
> +                                                          'discard=unmap')
> +        self.vm_b = iotests.VM(path_suffix='b').add_drive(disk_b,
> +                                                          'discard=unmap')
>           self.vm_b.add_incoming("exec: cat '" + fifo + "'")
>           self.vm_a.launch()
>           self.vm_b.launch()
> @@ -78,7 +80,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>           self.vm_b_events = []
>   
>       def test_postcopy(self):
> -        write_size = 0x40000000
> +        discard_size = 0x40000000
>           granularity = 512
>           chunk = 4096
>   
> @@ -86,25 +88,32 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>                                  name='bitmap', granularity=granularity)
>           self.assert_qmp(result, 'return', {})
>   
> +        result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
> +                               node='drive0', name='bitmap')
> +        empty_sha256 = result['return']['sha256']
> +
>           s = 0
> -        while s < write_size:
> -            self.vm_a.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
> +        while s < discard_size:
> +            self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
>               s += 0x10000
>           s = 0x8000
> -        while s < write_size:
> -            self.vm_a.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
> +        while s < discard_size:
> +            self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
>               s += 0x10000
>   
>           result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
>                                  node='drive0', name='bitmap')
>           sha256 = result['return']['sha256']
>   
> +        # Check, that updating the bitmap by discards works
> +        assert sha256 != empty_sha256
> +
>           result = self.vm_a.qmp('block-dirty-bitmap-clear', node='drive0',
>                                  name='bitmap')
>           self.assert_qmp(result, 'return', {})
>           s = 0
> -        while s < write_size:
> -            self.vm_a.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
> +        while s < discard_size:
> +            self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
>               s += 0x10000
>   
>           caps = [{'capability': 'dirty-bitmaps', 'state': True},
> @@ -126,8 +135,8 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>           self.vm_b_events.append(e_resume)
>   
>           s = 0x8000
> -        while s < write_size:
> -            self.vm_b.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
> +        while s < discard_size:
> +            self.vm_b.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
>               s += 0x10000
>   
>           match = {'data': {'status': 'completed'}}
> 

Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
-- 
With the best regards,
Andrey Shinkevich


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 03/22] migration/block-dirty-bitmap: rename dirty_bitmap_mig_cleanup
  2020-02-18 11:00   ` Andrey Shinkevich
@ 2020-02-19 14:20     ` Vladimir Sementsov-Ogievskiy
  2020-07-23 20:54       ` Eric Blake
  0 siblings, 1 reply; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-19 14:20 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-devel
  Cc: Fam Zheng, qemu-block, quintela, dgilbert, Stefan Hajnoczi, John Snow

18.02.2020 14:00, Andrey Shinkevich wrote:
> On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
>> Rename dirty_bitmap_mig_cleanup to dirty_bitmap_do_save_cleanup, to
>> stress that it is on save part.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   migration/block-dirty-bitmap.c | 8 ++++----
>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
>> index 73792ab005..4e8959ae52 100644
>> --- a/migration/block-dirty-bitmap.c
>> +++ b/migration/block-dirty-bitmap.c
>> @@ -259,7 +259,7 @@ static void send_bitmap_bits(QEMUFile *f, SaveBitmapState *dbms,
>>   }
>>   /* Called with iothread lock taken.  */
>> -static void dirty_bitmap_mig_cleanup(void)
>> +static void dirty_bitmap_do_save_cleanup(void)
>>   {
>>       SaveBitmapState *dbms;
>> @@ -338,7 +338,7 @@ static int init_dirty_bitmap_migration(void)
>>       return 0;
>>   fail:
>> -    dirty_bitmap_mig_cleanup();
>> +    dirty_bitmap_do_save_cleanup();
>>       return -1;
>>   }
>> @@ -377,7 +377,7 @@ static void bulk_phase(QEMUFile *f, bool limit)
>>   /* for SaveVMHandlers */
>>   static void dirty_bitmap_save_cleanup(void *opaque)
>>   {
>> -    dirty_bitmap_mig_cleanup();
>> +    dirty_bitmap_do_save_cleanup();
>>   }
>>   static int dirty_bitmap_save_iterate(QEMUFile *f, void *opaque)
>> @@ -412,7 +412,7 @@ static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
>>       trace_dirty_bitmap_save_complete_finish();
>> -    dirty_bitmap_mig_cleanup();
>> +    dirty_bitmap_do_save_cleanup();
>>       return 0;
>>   }
>>
> 
> At the next opportunity, I would suggest the name like
> "dirty_bitmap_do_clean_after_saving()"
> and similar for dirty_bitmap_save_cleanup()
> "dirty_bitmap_clean_after_saving()".

I'd keep my naming, it corresponds to .save_cleanup handler name.

> 
> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 16/22] qemu-iotests/199: change discard patterns
  2020-02-17 15:02 ` [PATCH v2 16/22] qemu-iotests/199: change discard patterns Vladimir Sementsov-Ogievskiy
@ 2020-02-19 14:33   ` Andrey Shinkevich
  2020-02-19 14:44     ` Andrey Shinkevich
  2020-02-19 15:46     ` Vladimir Sementsov-Ogievskiy
  2020-07-24  0:23   ` Eric Blake
  1 sibling, 2 replies; 80+ messages in thread
From: Andrey Shinkevich @ 2020-02-19 14:33 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Max Reitz, dgilbert, qemu-block, quintela

On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
> iotest 40 works too long because of many discard opertion. On the same

operations
At the same time

> time, postcopy period is very short, in spite of all these efforts.
> 
> So, let's use less discards (and with more interesting patterns) to
> reduce test timing. In the next commit we'll increase postcopy period.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   tests/qemu-iotests/199 | 44 +++++++++++++++++++++++++-----------------
>   1 file changed, 26 insertions(+), 18 deletions(-)
> 
> diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
> index d78f81b71c..7914fd0b2b 100755
> --- a/tests/qemu-iotests/199
> +++ b/tests/qemu-iotests/199
> @@ -30,6 +30,28 @@ size = '256G'
>   fifo = os.path.join(iotests.test_dir, 'mig_fifo')
>   
>   
> +GiB = 1024 * 1024 * 1024
> +
> +discards1 = (
> +    (0, GiB),
> +    (2 * GiB + 512 * 5, 512),
> +    (3 * GiB + 512 * 5, 512),
> +    (100 * GiB, GiB)
> +)
> +
> +discards2 = (
> +    (3 * GiB + 512 * 8, 512),
> +    (4 * GiB + 512 * 8, 512),
> +    (50 * GiB, GiB),
> +    (100 * GiB + GiB // 2, GiB)
> +)
> +
> +
> +def apply_discards(vm, discards):
> +    for d in discards:

If we run qemu-io only once, it will update the bitmap state and will 
speed the test performance up. Is that wrong idea?

> +        vm.hmp_qemu_io('drive0', 'discard {} {}'.format(*d))
> +
> +
>   def event_seconds(event):
>       return event['timestamp']['seconds'] + \
>           event['timestamp']['microseconds'] / 1000000.0
> @@ -80,9 +102,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>           self.vm_b_events = []
>   
>       def test_postcopy(self):
> -        discard_size = 0x40000000
>           granularity = 512
> -        chunk = 4096
>   
>           result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
>                                  name='bitmap', granularity=granularity)
> @@ -92,14 +112,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>                                  node='drive0', name='bitmap')
>           empty_sha256 = result['return']['sha256']
>   
> -        s = 0
> -        while s < discard_size:
> -            self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
> -            s += 0x10000
> -        s = 0x8000
> -        while s < discard_size:
> -            self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
> -            s += 0x10000
> +        apply_discards(self.vm_a, discards1 + discards2)
>   
>           result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
>                                  node='drive0', name='bitmap')
> @@ -111,10 +124,8 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>           result = self.vm_a.qmp('block-dirty-bitmap-clear', node='drive0',
>                                  name='bitmap')
>           self.assert_qmp(result, 'return', {})
> -        s = 0
> -        while s < discard_size:
> -            self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
> -            s += 0x10000
> +
> +        apply_discards(self.vm_a, discards1)
>   
>           caps = [{'capability': 'dirty-bitmaps', 'state': True},
>                   {'capability': 'events', 'state': True}]
> @@ -134,10 +145,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>           e_resume = self.vm_b.event_wait('RESUME')
>           self.vm_b_events.append(e_resume)
>   
> -        s = 0x8000
> -        while s < discard_size:
> -            self.vm_b.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
> -            s += 0x10000
> +        apply_discards(self.vm_b, discards2)
>   
>           match = {'data': {'status': 'completed'}}
>           e_complete = self.vm_b.event_wait('MIGRATION', match=match)
> 

Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
-- 
With the best regards,
Andrey Shinkevich


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 16/22] qemu-iotests/199: change discard patterns
  2020-02-19 14:33   ` Andrey Shinkevich
@ 2020-02-19 14:44     ` Andrey Shinkevich
  2020-02-19 15:46     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 80+ messages in thread
From: Andrey Shinkevich @ 2020-02-19 14:44 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Max Reitz, dgilbert, qemu-block, quintela



On 19/02/2020 17:33, Andrey Shinkevich wrote:
> On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
>> iotest 40 works too long because of many discard opertion. On the same
> 
> operations
> At the same time
> 
>> time, postcopy period is very short, in spite of all these efforts.
>>
>> So, let's use less discards (and with more interesting patterns) to
>> reduce test timing. In the next commit we'll increase postcopy period.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   tests/qemu-iotests/199 | 44 +++++++++++++++++++++++++-----------------
>>   1 file changed, 26 insertions(+), 18 deletions(-)
>>
>> diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
>> index d78f81b71c..7914fd0b2b 100755
>> --- a/tests/qemu-iotests/199
>> +++ b/tests/qemu-iotests/199
>> @@ -30,6 +30,28 @@ size = '256G'
>>   fifo = os.path.join(iotests.test_dir, 'mig_fifo')
>> +GiB = 1024 * 1024 * 1024
>> +
>> +discards1 = (
>> +    (0, GiB),
>> +    (2 * GiB + 512 * 5, 512),
>> +    (3 * GiB + 512 * 5, 512),
>> +    (100 * GiB, GiB)
>> +)
>> +
>> +discards2 = (
>> +    (3 * GiB + 512 * 8, 512),
>> +    (4 * GiB + 512 * 8, 512),
>> +    (50 * GiB, GiB),
>> +    (100 * GiB + GiB // 2, GiB)
>> +)
>> +
>> +
>> +def apply_discards(vm, discards):
>> +    for d in discards:
> 
> If we run qemu-io only once, it will update the bitmap state and will 
> speed the test performance up. Is that wrong idea?

Yes, that is. I have seen with the later review.

Andrey

> 
>> +        vm.hmp_qemu_io('drive0', 'discard {} {}'.format(*d))
>> +
>> +
>>   def event_seconds(event):
>>       return event['timestamp']['seconds'] + \
>>           event['timestamp']['microseconds'] / 1000000.0
>> @@ -80,9 +102,7 @@ class 
>> TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>>           self.vm_b_events = []
>>       def test_postcopy(self):
>> -        discard_size = 0x40000000
>>           granularity = 512
>> -        chunk = 4096
>>           result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
>>                                  name='bitmap', granularity=granularity)
>> @@ -92,14 +112,7 @@ class 
>> TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>>                                  node='drive0', name='bitmap')
>>           empty_sha256 = result['return']['sha256']
>> -        s = 0
>> -        while s < discard_size:
>> -            self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, 
>> chunk))
>> -            s += 0x10000
>> -        s = 0x8000
>> -        while s < discard_size:
>> -            self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, 
>> chunk))
>> -            s += 0x10000
>> +        apply_discards(self.vm_a, discards1 + discards2)
>>           result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
>>                                  node='drive0', name='bitmap')
>> @@ -111,10 +124,8 @@ class 
>> TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>>           result = self.vm_a.qmp('block-dirty-bitmap-clear', 
>> node='drive0',
>>                                  name='bitmap')
>>           self.assert_qmp(result, 'return', {})
>> -        s = 0
>> -        while s < discard_size:
>> -            self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, 
>> chunk))
>> -            s += 0x10000
>> +
>> +        apply_discards(self.vm_a, discards1)
>>           caps = [{'capability': 'dirty-bitmaps', 'state': True},
>>                   {'capability': 'events', 'state': True}]
>> @@ -134,10 +145,7 @@ class 
>> TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>>           e_resume = self.vm_b.event_wait('RESUME')
>>           self.vm_b_events.append(e_resume)
>> -        s = 0x8000
>> -        while s < discard_size:
>> -            self.vm_b.hmp_qemu_io('drive0', 'discard %d %d' % (s, 
>> chunk))
>> -            s += 0x10000
>> +        apply_discards(self.vm_b, discards2)
>>           match = {'data': {'status': 'completed'}}
>>           e_complete = self.vm_b.event_wait('MIGRATION', match=match)
>>
> 
> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>

-- 
With the best regards,
Andrey Shinkevich



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 17/22] qemu-iotests/199: increase postcopy period
  2020-02-17 15:02 ` [PATCH v2 17/22] qemu-iotests/199: increase postcopy period Vladimir Sementsov-Ogievskiy
@ 2020-02-19 14:56   ` Andrey Shinkevich
  2020-07-24  0:14   ` Eric Blake
  1 sibling, 0 replies; 80+ messages in thread
From: Andrey Shinkevich @ 2020-02-19 14:56 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Max Reitz, dgilbert, qemu-block, quintela

On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
> Test wants force bitmap postcopy. Still, resulting postcopy period is
> very small. Let's increase it by adding more bitmaps to migrate. Also,
> test disabled bitmaps migration.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   tests/qemu-iotests/199 | 58 ++++++++++++++++++++++++++++--------------
>   1 file changed, 39 insertions(+), 19 deletions(-)
> 
> diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
> index 7914fd0b2b..9a6e8dcb9d 100755
> --- a/tests/qemu-iotests/199
> +++ b/tests/qemu-iotests/199
> @@ -103,29 +103,45 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>   
>       def test_postcopy(self):
>           granularity = 512
> +        nb_bitmaps = 15
>   
> -        result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
> -                               name='bitmap', granularity=granularity)
> -        self.assert_qmp(result, 'return', {})
> +        for i in range(nb_bitmaps):
> +            result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
> +                                   name='bitmap{}'.format(i),
> +                                   granularity=granularity)
> +            self.assert_qmp(result, 'return', {})
>   
>           result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
> -                               node='drive0', name='bitmap')
> +                               node='drive0', name='bitmap0')
>           empty_sha256 = result['return']['sha256']
>   
> -        apply_discards(self.vm_a, discards1 + discards2)
> +        apply_discards(self.vm_a, discards1)
>   
>           result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
> -                               node='drive0', name='bitmap')
> -        sha256 = result['return']['sha256']
> +                               node='drive0', name='bitmap0')
> +        discards1_sha256 = result['return']['sha256']
>   
>           # Check, that updating the bitmap by discards works
> -        assert sha256 != empty_sha256
> +        assert discards1_sha256 != empty_sha256
>   
> -        result = self.vm_a.qmp('block-dirty-bitmap-clear', node='drive0',
> -                               name='bitmap')
> -        self.assert_qmp(result, 'return', {})
> +        # We want to calculate resulting sha256. Do it in bitmap0, so, disable
> +        # other bitmaps
> +        for i in range(1, nb_bitmaps):
> +            result = self.vm_a.qmp('block-dirty-bitmap-disable', node='drive0',
> +                                   name='bitmap{}'.format(i))
> +            self.assert_qmp(result, 'return', {})
>   
> -        apply_discards(self.vm_a, discards1)
> +        apply_discards(self.vm_a, discards2)
> +
> +        result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
> +                               node='drive0', name='bitmap0')
> +        all_discards_sha256 = result['return']['sha256']
> +
> +        # Now, enable some bitmaps, to be updated during migration
> +        for i in range(2, nb_bitmaps, 2):
> +            result = self.vm_a.qmp('block-dirty-bitmap-enable', node='drive0',
> +                                   name='bitmap{}'.format(i))
> +            self.assert_qmp(result, 'return', {})
>   
>           caps = [{'capability': 'dirty-bitmaps', 'state': True},
>                   {'capability': 'events', 'state': True}]
> @@ -145,6 +161,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>           e_resume = self.vm_b.event_wait('RESUME')
>           self.vm_b_events.append(e_resume)
>   
> +        # enabled bitmaps should be updated
>           apply_discards(self.vm_b, discards2)
>   
>           match = {'data': {'status': 'completed'}}
> @@ -158,7 +175,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>           downtime = event_dist(e_stop, e_resume)
>           postcopy_time = event_dist(e_resume, e_complete)
>   
> -        # TODO: assert downtime * 10 < postcopy_time
> +        assert downtime * 10 < postcopy_time
>           if debug:
>               print('downtime:', downtime)
>               print('postcopy_time:', postcopy_time)
> @@ -166,12 +183,15 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>           # Assert that bitmap migration is finished (check that successor bitmap
>           # is removed)
>           result = self.vm_b.qmp('query-block')
> -        assert len(result['return'][0]['dirty-bitmaps']) == 1
> -
> -        # Check content of migrated (and updated by new writes) bitmap
> -        result = self.vm_b.qmp('x-debug-block-dirty-bitmap-sha256',
> -                               node='drive0', name='bitmap')
> -        self.assert_qmp(result, 'return/sha256', sha256)
> +        assert len(result['return'][0]['dirty-bitmaps']) == nb_bitmaps
> +
> +        # Check content of migrated bitmaps. Still, don't waste time checking
> +        # every bitmap
> +        for i in range(0, nb_bitmaps, 5):
> +            result = self.vm_b.qmp('x-debug-block-dirty-bitmap-sha256',
> +                                   node='drive0', name='bitmap{}'.format(i))
> +            sha256 = discards1_sha256 if i % 2 else all_discards_sha256
> +            self.assert_qmp(result, 'return/sha256', sha256)
>   
>   
>   if __name__ == '__main__':
> 

The updated test passed.

Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
-- 
With the best regards,
Andrey Shinkevich


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 00/22] Fix error handling during bitmap postcopy
  2020-02-19 13:52         ` Andrey Shinkevich
@ 2020-02-19 14:58           ` Eric Blake
  2020-02-19 17:22             ` Andrey Shinkevich
  0 siblings, 1 reply; 80+ messages in thread
From: Eric Blake @ 2020-02-19 14:58 UTC (permalink / raw)
  To: Andrey Shinkevich, Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Fam Zheng, Kevin Wolf, Eduardo Habkost, qemu-block, quintela,
	qemu-stable, dgilbert, Stefan Hajnoczi, Cleber Rosa, Max Reitz,
	John Snow

On 2/19/20 7:52 AM, Andrey Shinkevich wrote:

>>> +od: unrecognized option '--endian=big'
>>> +Try 'od --help' for more information.
>>> +od: invalid -N argument '--endian=big'
>>
>> Yay, same problem for both tests.  Fix common.rc once, and both tests 
>> should start working for you.
> 
> Thank you Eric! I want to sort it out later...

Patch proposed:
https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg05188.html


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 05/22] migration/block-dirty-bitmap: refactor state global variables
  2020-02-18 13:05   ` Andrey Shinkevich
@ 2020-02-19 15:29     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-19 15:29 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-devel
  Cc: Fam Zheng, qemu-block, quintela, dgilbert, Stefan Hajnoczi, John Snow

18.02.2020 16:05, Andrey Shinkevich wrote:
> 
> 
> On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
>> Move all state variables into one global struct. Reduce global
>> variable usage, utilizing opaque pointer where possible.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   migration/block-dirty-bitmap.c | 171 ++++++++++++++++++---------------
>>   1 file changed, 95 insertions(+), 76 deletions(-)
>>
>> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
>> index 49d4cf8810..7a82b76809 100644
>> --- a/migration/block-dirty-bitmap.c
>> +++ b/migration/block-dirty-bitmap.c
>> @@ -128,6 +128,12 @@ typedef struct DBMSaveState {
>>       BdrvDirtyBitmap *prev_bitmap;
>>   } DBMSaveState;
>> +typedef struct LoadBitmapState {
>> +    BlockDriverState *bs;
>> +    BdrvDirtyBitmap *bitmap;
>> +    bool migrated;
>> +} LoadBitmapState;
>> +
>>   /* State of the dirty bitmap migration (DBM) during load process */
>>   typedef struct DBMLoadState {
>>       uint32_t flags;
>> @@ -135,18 +141,17 @@ typedef struct DBMLoadState {
>>       char bitmap_name[256];
>>       BlockDriverState *bs;
>>       BdrvDirtyBitmap *bitmap;
>> +
>> +    GSList *enabled_bitmaps;
>> +    QemuMutex finish_lock;
>>   } DBMLoadState;
>> -static DBMSaveState dirty_bitmap_mig_state;
>> +typedef struct DBMState {
>> +    DBMSaveState save;
>> +    DBMLoadState load;
>> +} DBMState;
>> -/* State of one bitmap during load process */
>> -typedef struct LoadBitmapState {
>> -    BlockDriverState *bs;
>> -    BdrvDirtyBitmap *bitmap;
>> -    bool migrated;
>> -} LoadBitmapState;
>> -static GSList *enabled_bitmaps;
>> -QemuMutex finish_lock;
>> +static DBMState dbm_state;
>>   static uint32_t qemu_get_bitmap_flags(QEMUFile *f)
>>   {
>> @@ -169,21 +174,21 @@ static void qemu_put_bitmap_flags(QEMUFile *f, uint32_t flags)
>>       qemu_put_byte(f, flags);
>>   }
>> -static void send_bitmap_header(QEMUFile *f, SaveBitmapState *dbms,
>> -                               uint32_t additional_flags)
>> +static void send_bitmap_header(QEMUFile *f, DBMSaveState *s,
>> +                               SaveBitmapState *dbms, uint32_t additional_flags)
>>   {
>>       BlockDriverState *bs = dbms->bs;
>>       BdrvDirtyBitmap *bitmap = dbms->bitmap;
>>       uint32_t flags = additional_flags;
>>       trace_send_bitmap_header_enter();
>> -    if (bs != dirty_bitmap_mig_state.prev_bs) {
>> -        dirty_bitmap_mig_state.prev_bs = bs;
>> +    if (bs != s->prev_bs) {
>> +        s->prev_bs = bs;
>>           flags |= DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME;
>>       }
>> -    if (bitmap != dirty_bitmap_mig_state.prev_bitmap) {
>> -        dirty_bitmap_mig_state.prev_bitmap = bitmap;
>> +    if (bitmap != s->prev_bitmap) {
>> +        s->prev_bitmap = bitmap;
>>           flags |= DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME;
>>       }
>> @@ -198,19 +203,22 @@ static void send_bitmap_header(QEMUFile *f, SaveBitmapState *dbms,
>>       }
>>   }
>> -static void send_bitmap_start(QEMUFile *f, SaveBitmapState *dbms)
>> +static void send_bitmap_start(QEMUFile *f, DBMSaveState *s,
>> +                              SaveBitmapState *dbms)
>>   {
>> -    send_bitmap_header(f, dbms, DIRTY_BITMAP_MIG_FLAG_START);
>> +    send_bitmap_header(f, s, dbms, DIRTY_BITMAP_MIG_FLAG_START);
>>       qemu_put_be32(f, bdrv_dirty_bitmap_granularity(dbms->bitmap));
>>       qemu_put_byte(f, dbms->flags);
>>   }
>> -static void send_bitmap_complete(QEMUFile *f, SaveBitmapState *dbms)
>> +static void send_bitmap_complete(QEMUFile *f, DBMSaveState *s,
>> +                                 SaveBitmapState *dbms)
>>   {
>> -    send_bitmap_header(f, dbms, DIRTY_BITMAP_MIG_FLAG_COMPLETE);
>> +    send_bitmap_header(f, s, dbms, DIRTY_BITMAP_MIG_FLAG_COMPLETE);
>>   }
>> -static void send_bitmap_bits(QEMUFile *f, SaveBitmapState *dbms,
>> +static void send_bitmap_bits(QEMUFile *f, DBMSaveState *s,
>> +                             SaveBitmapState *dbms,
>>                                uint64_t start_sector, uint32_t nr_sectors)
>>   {
>>       /* align for buffer_is_zero() */
>> @@ -235,7 +243,7 @@ static void send_bitmap_bits(QEMUFile *f, SaveBitmapState *dbms,
>>       trace_send_bitmap_bits(flags, start_sector, nr_sectors, buf_size);
>> -    send_bitmap_header(f, dbms, flags);
>> +    send_bitmap_header(f, s, dbms, flags);
>>       qemu_put_be64(f, start_sector);
>>       qemu_put_be32(f, nr_sectors);
>> @@ -254,12 +262,12 @@ static void send_bitmap_bits(QEMUFile *f, SaveBitmapState *dbms,
>>   }
>>   /* Called with iothread lock taken.  */
>> -static void dirty_bitmap_do_save_cleanup(void)
>> +static void dirty_bitmap_do_save_cleanup(DBMSaveState *s)
>>   {
>>       SaveBitmapState *dbms;
>> -    while ((dbms = QSIMPLEQ_FIRST(&dirty_bitmap_mig_state.dbms_list)) != NULL) {
>> -        QSIMPLEQ_REMOVE_HEAD(&dirty_bitmap_mig_state.dbms_list, entry);
>> +    while ((dbms = QSIMPLEQ_FIRST(&s->dbms_list)) != NULL) {
>> +        QSIMPLEQ_REMOVE_HEAD(&s->dbms_list, entry);
>>           bdrv_dirty_bitmap_set_busy(dbms->bitmap, false);
>>           bdrv_unref(dbms->bs);
>>           g_free(dbms);
>> @@ -267,17 +275,17 @@ static void dirty_bitmap_do_save_cleanup(void)
>>   }
>>   /* Called with iothread lock taken. */
>> -static int init_dirty_bitmap_migration(void)
>> +static int init_dirty_bitmap_migration(DBMSaveState *s)
>>   {
>>       BlockDriverState *bs;
>>       BdrvDirtyBitmap *bitmap;
>>       SaveBitmapState *dbms;
>>       Error *local_err = NULL;
>> -    dirty_bitmap_mig_state.bulk_completed = false;
>> -    dirty_bitmap_mig_state.prev_bs = NULL;
>> -    dirty_bitmap_mig_state.prev_bitmap = NULL;
>> -    dirty_bitmap_mig_state.no_bitmaps = false;
>> +    s->bulk_completed = false;
>> +    s->prev_bs = NULL;
>> +    s->prev_bitmap = NULL;
>> +    s->no_bitmaps = false;
>>       for (bs = bdrv_next_all_states(NULL); bs; bs = bdrv_next_all_states(bs)) {
>>           const char *name = bdrv_get_device_or_node_name(bs);
>> @@ -316,35 +324,36 @@ static int init_dirty_bitmap_migration(void)
>>                   dbms->flags |= DIRTY_BITMAP_MIG_START_FLAG_PERSISTENT;
>>               }
>> -            QSIMPLEQ_INSERT_TAIL(&dirty_bitmap_mig_state.dbms_list,
>> +            QSIMPLEQ_INSERT_TAIL(&s->dbms_list,
>>                                    dbms, entry);
>>           }
>>       }
>>       /* unset migration flags here, to not roll back it */
>> -    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> +    QSIMPLEQ_FOREACH(dbms, &s->dbms_list, entry) {
>>           bdrv_dirty_bitmap_skip_store(dbms->bitmap, true);
>>       }
>> -    if (QSIMPLEQ_EMPTY(&dirty_bitmap_mig_state.dbms_list)) {
>> -        dirty_bitmap_mig_state.no_bitmaps = true;
>> +    if (QSIMPLEQ_EMPTY(&s->dbms_list)) {
>> +        s->no_bitmaps = true;
>>       }
>>       return 0;
>>   fail:
>> -    dirty_bitmap_do_save_cleanup();
>> +    dirty_bitmap_do_save_cleanup(s);
>>       return -1;
>>   }
>>   /* Called with no lock taken.  */
>> -static void bulk_phase_send_chunk(QEMUFile *f, SaveBitmapState *dbms)
>> +static void bulk_phase_send_chunk(QEMUFile *f, DBMSaveState *s,
>> +                                  SaveBitmapState *dbms)
>>   {
>>       uint32_t nr_sectors = MIN(dbms->total_sectors - dbms->cur_sector,
>>                                dbms->sectors_per_chunk);
>> -    send_bitmap_bits(f, dbms, dbms->cur_sector, nr_sectors);
>> +    send_bitmap_bits(f, s, dbms, dbms->cur_sector, nr_sectors);
>>       dbms->cur_sector += nr_sectors;
>>       if (dbms->cur_sector >= dbms->total_sectors) {
>> @@ -353,61 +362,66 @@ static void bulk_phase_send_chunk(QEMUFile *f, SaveBitmapState *dbms)
>>   }
>>   /* Called with no lock taken.  */
>> -static void bulk_phase(QEMUFile *f, bool limit)
>> +static void bulk_phase(QEMUFile *f, DBMSaveState *s, bool limit)
>>   {
>>       SaveBitmapState *dbms;
>> -    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> +    QSIMPLEQ_FOREACH(dbms, &s->dbms_list, entry) {
>>           while (!dbms->bulk_completed) {
>> -            bulk_phase_send_chunk(f, dbms);
>> +            bulk_phase_send_chunk(f, s, dbms);
>>               if (limit && qemu_file_rate_limit(f)) {
>>                   return;
>>               }
>>           }
>>       }
>> -    dirty_bitmap_mig_state.bulk_completed = true;
>> +    s->bulk_completed = true;
>>   }
>>   /* for SaveVMHandlers */
>>   static void dirty_bitmap_save_cleanup(void *opaque)
>>   {
>> -    dirty_bitmap_do_save_cleanup();
>> +    DBMSaveState *s = &((DBMState *)opaque)->save;
>> +
>> +    dirty_bitmap_do_save_cleanup(s);
>>   }
> 
> Why do one need the extra nested "do" function?

Because "_do_" has sub-structure argument, and I don't have pointer to the
whole structure in another "_do_" caller..

> 
>>   static int dirty_bitmap_save_iterate(QEMUFile *f, void *opaque)
>>   {
>> +    DBMSaveState *s = &((DBMState *)opaque)->save;
>> +
>>       trace_dirty_bitmap_save_iterate(migration_in_postcopy());
>> -    if (migration_in_postcopy() && !dirty_bitmap_mig_state.bulk_completed) {
>> -        bulk_phase(f, true);
>> +    if (migration_in_postcopy() && !s->bulk_completed) {
>> +        bulk_phase(f, s, true);
>>       }
>>       qemu_put_bitmap_flags(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>> -    return dirty_bitmap_mig_state.bulk_completed;
>> +    return s->bulk_completed;
>>   }
>>   /* Called with iothread lock taken.  */
>>   static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
>>   {
>> +    DBMSaveState *s = &((DBMState *)opaque)->save;
>>       SaveBitmapState *dbms;
>>       trace_dirty_bitmap_save_complete_enter();
>> -    if (!dirty_bitmap_mig_state.bulk_completed) {
>> -        bulk_phase(f, false);
>> +    if (!s->bulk_completed) {
>> +        bulk_phase(f, s, false);
>>       }
>> -    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> -        send_bitmap_complete(f, dbms);
>> +    QSIMPLEQ_FOREACH(dbms, &s->dbms_list, entry) {
>> +        send_bitmap_complete(f, s, dbms);
>>       }
>>       qemu_put_bitmap_flags(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>>       trace_dirty_bitmap_save_complete_finish();
>> -    dirty_bitmap_do_save_cleanup();
>> +    dirty_bitmap_save_cleanup(opaque);
>>       return 0;
>>   }
>> @@ -417,12 +431,13 @@ static void dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
>>                                         uint64_t *res_compatible,
>>                                         uint64_t *res_postcopy_only)
>>   {
>> +    DBMSaveState *s = &((DBMState *)opaque)->save;
>>       SaveBitmapState *dbms;
>>       uint64_t pending = 0;
>>       qemu_mutex_lock_iothread();
>> -    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> +    QSIMPLEQ_FOREACH(dbms, &s->dbms_list, entry) {
>>           uint64_t gran = bdrv_dirty_bitmap_granularity(dbms->bitmap);
>>           uint64_t sectors = dbms->bulk_completed ? 0 :
>>                              dbms->total_sectors - dbms->cur_sector;
>> @@ -481,7 +496,7 @@ static int dirty_bitmap_load_start(QEMUFile *f, DBMLoadState *s)
>>           b->bs = s->bs;
>>           b->bitmap = s->bitmap;
>>           b->migrated = false;
>> -        enabled_bitmaps = g_slist_prepend(enabled_bitmaps, b);
>> +        s->enabled_bitmaps = g_slist_prepend(s->enabled_bitmaps, b);
>>       }
>>       return 0;
>> @@ -489,11 +504,12 @@ static int dirty_bitmap_load_start(QEMUFile *f, DBMLoadState *s)
>>   void dirty_bitmap_mig_before_vm_start(void)
>>   {
>> +    DBMLoadState *s = &dbm_state.load;
>>       GSList *item;
>> -    qemu_mutex_lock(&finish_lock);
>> +    qemu_mutex_lock(&s->finish_lock);
>> -    for (item = enabled_bitmaps; item; item = g_slist_next(item)) {
>> +    for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
>>           LoadBitmapState *b = item->data;
>>           if (b->migrated) {
>> @@ -505,10 +521,10 @@ void dirty_bitmap_mig_before_vm_start(void)
>>           g_free(b);
>>       }
>> -    g_slist_free(enabled_bitmaps);
>> -    enabled_bitmaps = NULL;
>> +    g_slist_free(s->enabled_bitmaps);
>> +    s->enabled_bitmaps = NULL;
>> -    qemu_mutex_unlock(&finish_lock);
>> +    qemu_mutex_unlock(&s->finish_lock);
>>   }
>>   static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>> @@ -517,9 +533,9 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>>       trace_dirty_bitmap_load_complete();
>>       bdrv_dirty_bitmap_deserialize_finish(s->bitmap);
>> -    qemu_mutex_lock(&finish_lock);
>> +    qemu_mutex_lock(&s->finish_lock);
>> -    for (item = enabled_bitmaps; item; item = g_slist_next(item)) {
>> +    for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
>>           LoadBitmapState *b = item->data;
>>           if (b->bitmap == s->bitmap) {
>> @@ -530,7 +546,7 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>>       if (bdrv_dirty_bitmap_has_successor(s->bitmap)) {
>>           bdrv_dirty_bitmap_lock(s->bitmap);
>> -        if (enabled_bitmaps == NULL) {
>> +        if (s->enabled_bitmaps == NULL) {
>>               /* in postcopy */
>>               bdrv_reclaim_dirty_bitmap_locked(s->bitmap, &error_abort);
>>               bdrv_enable_dirty_bitmap_locked(s->bitmap);
>> @@ -549,7 +565,7 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>>           bdrv_dirty_bitmap_unlock(s->bitmap);
>>       }
>> -    qemu_mutex_unlock(&finish_lock);
>> +    qemu_mutex_unlock(&s->finish_lock);
>>   }
>>   static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
>> @@ -646,7 +662,7 @@ static int dirty_bitmap_load_header(QEMUFile *f, DBMLoadState *s)
>>   static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
>>   {
>> -    static DBMLoadState s;
>> +    DBMLoadState *s = &((DBMState *)opaque)->load;
>>       int ret = 0;
>>       trace_dirty_bitmap_load_enter();
>> @@ -656,17 +672,17 @@ static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
>>       }
>>       do {
>> -        ret = dirty_bitmap_load_header(f, &s);
>> +        ret = dirty_bitmap_load_header(f, s);
>>           if (ret < 0) {
>>               return ret;
>>           }
>> -        if (s.flags & DIRTY_BITMAP_MIG_FLAG_START) {
>> -            ret = dirty_bitmap_load_start(f, &s);
>> -        } else if (s.flags & DIRTY_BITMAP_MIG_FLAG_COMPLETE) {
>> -            dirty_bitmap_load_complete(f, &s);
>> -        } else if (s.flags & DIRTY_BITMAP_MIG_FLAG_BITS) {
>> -            ret = dirty_bitmap_load_bits(f, &s);
>> +        if (s->flags & DIRTY_BITMAP_MIG_FLAG_START) {
>> +            ret = dirty_bitmap_load_start(f, s);
>> +        } else if (s->flags & DIRTY_BITMAP_MIG_FLAG_COMPLETE) {
>> +            dirty_bitmap_load_complete(f, s);
>> +        } else if (s->flags & DIRTY_BITMAP_MIG_FLAG_BITS) {
>> +            ret = dirty_bitmap_load_bits(f, s);
>>           }
>>           if (!ret) {
>> @@ -676,7 +692,7 @@ static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
>>           if (ret) {
>>               return ret;
>>           }
>> -    } while (!(s.flags & DIRTY_BITMAP_MIG_FLAG_EOS));
>> +    } while (!(s->flags & DIRTY_BITMAP_MIG_FLAG_EOS));
>>       trace_dirty_bitmap_load_success();
>>       return 0;
>> @@ -684,13 +700,14 @@ static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
>>   static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
>>   {
>> +    DBMSaveState *s = &((DBMState *)opaque)->save;
>>       SaveBitmapState *dbms = NULL;
>> -    if (init_dirty_bitmap_migration() < 0) {
>> +    if (init_dirty_bitmap_migration(s) < 0) {
>>           return -1;
>>       }
>> -    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> -        send_bitmap_start(f, dbms);
>> +    QSIMPLEQ_FOREACH(dbms, &s->dbms_list, entry) {
>> +        send_bitmap_start(f, s, dbms);
>>       }
>>       qemu_put_bitmap_flags(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>> @@ -699,7 +716,9 @@ static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
>>   static bool dirty_bitmap_is_active(void *opaque)
>>   {
>> -    return migrate_dirty_bitmaps() && !dirty_bitmap_mig_state.no_bitmaps;
>> +    DBMSaveState *s = &((DBMState *)opaque)->save;
>> +
>> +    return migrate_dirty_bitmaps() && !s->no_bitmaps;
>>   }
>>   static bool dirty_bitmap_is_active_iterate(void *opaque)
>> @@ -727,10 +746,10 @@ static SaveVMHandlers savevm_dirty_bitmap_handlers = {
>>   void dirty_bitmap_mig_init(void)
>>   {
>> -    QSIMPLEQ_INIT(&dirty_bitmap_mig_state.dbms_list);
>> -    qemu_mutex_init(&finish_lock);
>> +    QSIMPLEQ_INIT(&dbm_state.save.dbms_list);
>> +    qemu_mutex_init(&dbm_state.load.finish_lock);
>>       register_savevm_live("dirty-bitmap", 0, 1,
>>                            &savevm_dirty_bitmap_handlers,
>> -                         &dirty_bitmap_mig_state);
>> +                         &dbm_state);
>>   }
>>
> 
> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 07/22] migration/block-dirty-bitmap: simplify dirty_bitmap_load_complete
  2020-02-18 14:26   ` Andrey Shinkevich
@ 2020-02-19 15:30     ` Vladimir Sementsov-Ogievskiy
  2020-02-19 16:14       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-19 15:30 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-devel
  Cc: Fam Zheng, qemu-block, quintela, dgilbert, Stefan Hajnoczi, John Snow

18.02.2020 17:26, Andrey Shinkevich wrote:
> On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
>> bdrv_enable_dirty_bitmap_locked() call does nothing, as if we are in
>> postcopy, bitmap successor must be enabled, and reclaim operation will
>> enable the bitmap.
>>
>> So, actually we need just call _reclaim_ in both if branches, and
>> making differences only to add an assertion seems not really good. The
>> logic becomes simple: on load complete we do reclaim and that's all.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   migration/block-dirty-bitmap.c | 25 ++++---------------------
>>   1 file changed, 4 insertions(+), 21 deletions(-)
>>
>> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
>> index 440c41cfca..9cc750d93b 100644
>> --- a/migration/block-dirty-bitmap.c
>> +++ b/migration/block-dirty-bitmap.c
>> @@ -535,6 +535,10 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>>       qemu_mutex_lock(&s->lock);
>> +    if (bdrv_dirty_bitmap_has_successor(s->bitmap)) {
> What about making it sure?
>             assert(!s->bitmap->successor->disabled);

I'm afraid we can't as BdrvDirtyBitmap is not public structure

> 
>> +        bdrv_reclaim_dirty_bitmap(s->bitmap, &error_abort);

But we can assert that resulting bitmap is enabled.

>> +    }
>> +
>>       for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
>>           LoadBitmapState *b = item->data;
>> @@ -544,27 +548,6 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>>           }
>>       }
>> -    if (bdrv_dirty_bitmap_has_successor(s->bitmap)) {
>> -        bdrv_dirty_bitmap_lock(s->bitmap);
>> -        if (s->enabled_bitmaps == NULL) {
>> -            /* in postcopy */
>> -            bdrv_reclaim_dirty_bitmap_locked(s->bitmap, &error_abort);
>> -            bdrv_enable_dirty_bitmap_locked(s->bitmap);
>> -        } else {
>> -            /* target not started, successor must be empty */
>> -            int64_t count = bdrv_get_dirty_count(s->bitmap);
>> -            BdrvDirtyBitmap *ret = bdrv_reclaim_dirty_bitmap_locked(s->bitmap,
>> -                                                                    NULL);
>> -            /* bdrv_reclaim_dirty_bitmap can fail only on no successor (it
>> -             * must be) or on merge fail, but merge can't fail when second
>> -             * bitmap is empty
>> -             */
>> -            assert(ret == s->bitmap &&
>> -                   count == bdrv_get_dirty_count(s->bitmap));
>> -        }
>> -        bdrv_dirty_bitmap_unlock(s->bitmap);
>> -    }
>> -
>>       qemu_mutex_unlock(&s->lock);
>>   }
>>
> 
> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 09/22] migration/block-dirty-bitmap: relax error handling in incoming part
  2020-02-18 18:54   ` Andrey Shinkevich
@ 2020-02-19 15:34     ` Vladimir Sementsov-Ogievskiy
  2020-07-24  7:23       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-19 15:34 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-devel
  Cc: Fam Zheng, qemu-block, quintela, dgilbert, Stefan Hajnoczi, John Snow

18.02.2020 21:54, Andrey Shinkevich wrote:
> 
> 
> On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
>> Bitmaps data is not critical, and we should not fail the migration (or
>> use postcopy recovering) because of dirty-bitmaps migration failure.
>> Instead we should just lose unfinished bitmaps.
>>
>> Still we have to report io stream violation errors, as they affect the
>> whole migration stream.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   migration/block-dirty-bitmap.c | 148 +++++++++++++++++++++++++--------
>>   1 file changed, 113 insertions(+), 35 deletions(-)
>>
>> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
>> index 1329db8d7d..aea5326804 100644
>> --- a/migration/block-dirty-bitmap.c
>> +++ b/migration/block-dirty-bitmap.c
>> @@ -145,6 +145,15 @@ typedef struct DBMLoadState {
>>       bool before_vm_start_handled; /* set in dirty_bitmap_mig_before_vm_start */
>> +    /*
>> +     * cancelled
>> +     * Incoming migration is cancelled for some reason. That means that we
>> +     * still should read our chunks from migration stream, to not affect other
>> +     * migration objects (like RAM), but just ignore them and do not touch any
>> +     * bitmaps or nodes.
>> +     */
>> +    bool cancelled;
>> +
>>       GSList *bitmaps;
>>       QemuMutex lock; /* protect bitmaps */
>>   } DBMLoadState;
>> @@ -545,13 +554,47 @@ void dirty_bitmap_mig_before_vm_start(void)
>>       qemu_mutex_unlock(&s->lock);
>>   }
>> +static void cancel_incoming_locked(DBMLoadState *s)
>> +{
>> +    GSList *item;
>> +
>> +    if (s->cancelled) {
>> +        return;
>> +    }
>> +
>> +    s->cancelled = true;
>> +    s->bs = NULL;
>> +    s->bitmap = NULL;
>> +
>> +    /* Drop all unfinished bitmaps */
>> +    for (item = s->bitmaps; item; item = g_slist_next(item)) {
>> +        LoadBitmapState *b = item->data;
>> +
>> +        /*
>> +         * Bitmap must be unfinished, as finished bitmaps should already be
>> +         * removed from the list.
>> +         */
>> +        assert(!s->before_vm_start_handled || !b->migrated);
>> +        if (bdrv_dirty_bitmap_has_successor(b->bitmap)) {
>> +            bdrv_reclaim_dirty_bitmap(b->bitmap, &error_abort);
>> +        }
>> +        bdrv_release_dirty_bitmap(b->bitmap);
>> +    }
>> +
>> +    g_slist_free_full(s->bitmaps, g_free);
>> +    s->bitmaps = NULL;
>> +}
>> +
>>   static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>>   {
>>       GSList *item;
>>       trace_dirty_bitmap_load_complete();
>> -    bdrv_dirty_bitmap_deserialize_finish(s->bitmap);
>> -    qemu_mutex_lock(&s->lock);
> 
> Why is it safe to remove the critical section?

It's not removed, it becomes wider in this patch.

> 
>> +    if (s->cancelled) {
>> +        return;
>> +    }
>> +
>> +    bdrv_dirty_bitmap_deserialize_finish(s->bitmap);
>>       if (bdrv_dirty_bitmap_has_successor(s->bitmap)) {
>>           bdrv_reclaim_dirty_bitmap(s->bitmap, &error_abort);
>> @@ -569,8 +612,6 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>>               break;
>>           }
>>       }
>> -
>> -    qemu_mutex_unlock(&s->lock);
>>   }
>>   static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
>> @@ -582,15 +623,32 @@ static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
>>       if (s->flags & DIRTY_BITMAP_MIG_FLAG_ZEROES) {
>>           trace_dirty_bitmap_load_bits_zeroes();
>> -        bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_byte, nr_bytes,
>> -                                             false);
>> +        if (!s->cancelled) {
>> +            bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_byte,
>> +                                                 nr_bytes, false);
>> +        }
>>       } else {
>>           size_t ret;
>>           uint8_t *buf;
>>           uint64_t buf_size = qemu_get_be64(f);
>> -        uint64_t needed_size =
>> -            bdrv_dirty_bitmap_serialization_size(s->bitmap,
>> -                                                 first_byte, nr_bytes);
>> +        uint64_t needed_size;
>> +
>> +        buf = g_malloc(buf_size);
>> +        ret = qemu_get_buffer(f, buf, buf_size);
>> +        if (ret != buf_size) {
>> +            error_report("Failed to read bitmap bits");
>> +            g_free(buf);
>> +            return -EIO;
>> +        }
>> +
>> +        if (s->cancelled) {
>> +            g_free(buf);
>> +            return 0;
>> +        }
>> +
>> +        needed_size = bdrv_dirty_bitmap_serialization_size(s->bitmap,
>> +                                                           first_byte,
>> +                                                           nr_bytes);
>>           if (needed_size > buf_size ||
>>               buf_size > QEMU_ALIGN_UP(needed_size, 4 * sizeof(long))
>> @@ -599,15 +657,8 @@ static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
>>               error_report("Migrated bitmap granularity doesn't "
>>                            "match the destination bitmap '%s' granularity",
>>                            bdrv_dirty_bitmap_name(s->bitmap));
>> -            return -EINVAL;
>> -        }
>> -
>> -        buf = g_malloc(buf_size);
>> -        ret = qemu_get_buffer(f, buf, buf_size);
>> -        if (ret != buf_size) {
>> -            error_report("Failed to read bitmap bits");
>> -            g_free(buf);
>> -            return -EIO;
>> +            cancel_incoming_locked(s);
> 
>                 /* Continue the VM migration as bitmaps data are not critical */

Hmm yes it what this patch does.. But I don't think we should add comment to each call of cancel_..()

> 
>> +            return 0;
>>           }
>>           bdrv_dirty_bitmap_deserialize_part(s->bitmap, buf, first_byte, nr_bytes,
>> @@ -632,14 +683,16 @@ static int dirty_bitmap_load_header(QEMUFile *f, DBMLoadState *s)
>>               error_report("Unable to read node name string");
>>               return -EINVAL;
>>           }
>> -        s->bs = bdrv_lookup_bs(s->node_name, s->node_name, &local_err);
>> -        if (!s->bs) {
>> -            error_report_err(local_err);
>> -            return -EINVAL;
>> +        if (!s->cancelled) {
>> +            s->bs = bdrv_lookup_bs(s->node_name, s->node_name, &local_err);
>> +            if (!s->bs) {
>> +                error_report_err(local_err);
> 
> The error message may be supplemented with a report about the canceled bitmap migration. Also down there at cancel_incoming_locked(s).
> 
>> +                cancel_incoming_locked(s);
>> +            }
>>           }
>> -    } else if (!s->bs && !nothing) {
>> +    } else if (!s->bs && !nothing && !s->cancelled) {
>>           error_report("Error: block device name is not set");
>> -        return -EINVAL;
>> +        cancel_incoming_locked(s);
>>       }
>>       if (s->flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
>> @@ -647,24 +700,38 @@ static int dirty_bitmap_load_header(QEMUFile *f, DBMLoadState *s)
>>               error_report("Unable to read bitmap name string");
>>               return -EINVAL;
>>           }
>> -        s->bitmap = bdrv_find_dirty_bitmap(s->bs, s->bitmap_name);
>> -
>> -        /* bitmap may be NULL here, it wouldn't be an error if it is the
>> -         * first occurrence of the bitmap */
>> -        if (!s->bitmap && !(s->flags & DIRTY_BITMAP_MIG_FLAG_START)) {
>> -            error_report("Error: unknown dirty bitmap "
>> -                         "'%s' for block device '%s'",
>> -                         s->bitmap_name, s->node_name);
>> -            return -EINVAL;
>> +        if (!s->cancelled) {
>> +            s->bitmap = bdrv_find_dirty_bitmap(s->bs, s->bitmap_name);
>> +
>> +            /*
>> +             * bitmap may be NULL here, it wouldn't be an error if it is the
>> +             * first occurrence of the bitmap
>> +             */
>> +            if (!s->bitmap && !(s->flags & DIRTY_BITMAP_MIG_FLAG_START)) {
>> +                error_report("Error: unknown dirty bitmap "
>> +                             "'%s' for block device '%s'",
>> +                             s->bitmap_name, s->node_name);
>> +                cancel_incoming_locked(s);
>> +            }
>>           }
>> -    } else if (!s->bitmap && !nothing) {
>> +    } else if (!s->bitmap && !nothing && !s->cancelled) {
>>           error_report("Error: block device name is not set");
>> -        return -EINVAL;
>> +        cancel_incoming_locked(s);
>>       }
>>       return 0;
>>   }
>> +/*
>> + * dirty_bitmap_load
>> + *
>> + * Load sequence of dirty bitmap chunks. Return error only on fatal io stream
>> + * violations. On other errors just cancel bitmaps incoming migration and return
>> + * 0.
>> + *
>> + * Note, than when incoming bitmap migration is canceled, we still must read all
> "than (that)" may be omitted
> 
>> + * our chunks (and just ignore them), to not affect other migration objects.
>> + */
>>   static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
>>   {
>>       DBMLoadState *s = &((DBMState *)opaque)->load;
>> @@ -673,12 +740,19 @@ static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
>>       trace_dirty_bitmap_load_enter();
>>       if (version_id != 1) {
>> +        qemu_mutex_lock(&s->lock);
>> +        cancel_incoming_locked(s);
>> +        qemu_mutex_unlock(&s->lock);
>>           return -EINVAL;
>>       }
>>       do {
>> +        qemu_mutex_lock(&s->lock);
>> +
>>           ret = dirty_bitmap_load_header(f, s);
>>           if (ret < 0) {
>> +            cancel_incoming_locked(s);
>> +            qemu_mutex_unlock(&s->lock);
>>               return ret;
>>           }
>> @@ -695,8 +769,12 @@ static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
>>           }
>>           if (ret) {
>> +            cancel_incoming_locked(s);
>> +            qemu_mutex_unlock(&s->lock);
>>               return ret;
>>           }
>> +
>> +        qemu_mutex_unlock(&s->lock);
>>       } while (!(s->flags & DIRTY_BITMAP_MIG_FLAG_EOS));
>>       trace_dirty_bitmap_load_success();
>>
> 
> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 14/22] qemu-iotests/199: better catch postcopy time
  2020-02-19 13:16   ` Andrey Shinkevich
@ 2020-02-19 15:44     ` Vladimir Sementsov-Ogievskiy
  2020-07-24  6:50     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-19 15:44 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-devel
  Cc: Kevin Wolf, Max Reitz, dgilbert, qemu-block, quintela

19.02.2020 16:16, Andrey Shinkevich wrote:
> On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
>> The test aims to test _postcopy_ migration, and wants to do some write
>> operations during postcopy time.
>>
>> Test considers migrate status=complete event on source as start of
>> postcopy. This is completely wrong, completion is completion of the
>> whole migration process. Let's instead consider destination start as
>> start of postcopy, and use RESUME event for it.
>>
>> Next, as migration finish, let's use migration status=complete event on
>> target, as such method is closer to what libvirt or another user will
>> do, than tracking number of dirty-bitmaps.
>>
>> Finally, add a possibility to dump events for debug. And if
>> set debug to True, we see, that actual postcopy period is very small
>> relatively to the whole test duration time (~0.2 seconds to >40 seconds
>> for me). This means, that test is very inefficient in what it supposed
>> to do. Let's improve it in following commits.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   tests/qemu-iotests/199 | 72 +++++++++++++++++++++++++++++++++---------
>>   1 file changed, 57 insertions(+), 15 deletions(-)
>>
>> diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
>> index dda918450a..6599fc6fb4 100755
>> --- a/tests/qemu-iotests/199
>> +++ b/tests/qemu-iotests/199
>> @@ -20,17 +20,43 @@
>>   import os
>>   import iotests
>> -import time
>>   from iotests import qemu_img
>> +debug = False
>> +
>>   disk_a = os.path.join(iotests.test_dir, 'disk_a')
>>   disk_b = os.path.join(iotests.test_dir, 'disk_b')
>>   size = '256G'
>>   fifo = os.path.join(iotests.test_dir, 'mig_fifo')
>> +def event_seconds(event):
>> +    return event['timestamp']['seconds'] + \
>> +        event['timestamp']['microseconds'] / 1000000.0
>> +
>> +
>> +def event_dist(e1, e2):
>> +    return event_seconds(e2) - event_seconds(e1)
>> +
>> +
>>   class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>>       def tearDown(self):
> It's common to put the definition of setUp() ahead
> 
>> +        if debug:
>> +            self.vm_a_events += self.vm_a.get_qmp_events()
>> +            self.vm_b_events += self.vm_b.get_qmp_events()
>> +            for e in self.vm_a_events:
>> +                e['vm'] = 'SRC'
>> +            for e in self.vm_b_events:
>> +                e['vm'] = 'DST'
>> +            events = (self.vm_a_events + self.vm_b_events)
>> +            events = [(e['timestamp']['seconds'],
>> +                       e['timestamp']['microseconds'],
>> +                       e['vm'],
>> +                       e['event'],
>> +                       e.get('data', '')) for e in events]
>> +            for e in sorted(events):
>> +                print('{}.{:06} {} {} {}'.format(*e))
>> +
>>           self.vm_a.shutdown()
>>           self.vm_b.shutdown()
>>           os.remove(disk_a)
>> @@ -47,6 +73,10 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>>           self.vm_a.launch()
>>           self.vm_b.launch()
>> +        # collect received events for debug
>> +        self.vm_a_events = []
>> +        self.vm_b_events = []
>> +
>>       def test_postcopy(self):
>>           write_size = 0x40000000
>>           granularity = 512
>> @@ -77,15 +107,13 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>>               self.vm_a.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
>>               s += 0x10000
>> -        bitmaps_cap = {'capability': 'dirty-bitmaps', 'state': True}
>> -        events_cap = {'capability': 'events', 'state': True}
>> +        caps = [{'capability': 'dirty-bitmaps', 'state': True},
> The name "capabilities" would be an appropriate identifier.
> 
>> +                {'capability': 'events', 'state': True}]
>> -        result = self.vm_a.qmp('migrate-set-capabilities',
>> -                               capabilities=[bitmaps_cap, events_cap])
>> +        result = self.vm_a.qmp('migrate-set-capabilities', capabilities=caps)
>>           self.assert_qmp(result, 'return', {})
>> -        result = self.vm_b.qmp('migrate-set-capabilities',
>> -                               capabilities=[bitmaps_cap])
>> +        result = self.vm_b.qmp('migrate-set-capabilities', capabilities=caps)
>>           self.assert_qmp(result, 'return', {})
>>           result = self.vm_a.qmp('migrate', uri='exec:cat>' + fifo)
>> @@ -94,24 +122,38 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>>           result = self.vm_a.qmp('migrate-start-postcopy')
>>           self.assert_qmp(result, 'return', {})
>> -        while True:
>> -            event = self.vm_a.event_wait('MIGRATION')
>> -            if event['data']['status'] == 'completed':
>> -                break
>> +        e_resume = self.vm_b.event_wait('RESUME')
> "event_resume" gives a faster understanding
> 
>> +        self.vm_b_events.append(e_resume)
>>           s = 0x8000
>>           while s < write_size:
>>               self.vm_b.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
>>               s += 0x10000
>> +        match = {'data': {'status': 'completed'}}
>> +        e_complete = self.vm_b.event_wait('MIGRATION', match=match)
> "event_complete" also
> 
>> +        self.vm_b_events.append(e_complete)
>> +
>> +        # take queued event, should already been happened
>> +        e_stop = self.vm_a.event_wait('STOP')
> "event_stop"
> 
>> +        self.vm_a_events.append(e_stop)
>> +
>> +        downtime = event_dist(e_stop, e_resume)
>> +        postcopy_time = event_dist(e_resume, e_complete)
>> +
>> +        # TODO: assert downtime * 10 < postcopy_time
> 
> I got the results below in debug mode:

That's why it's a TODO

> 
> downtime: 6.194924831390381
> postcopy_time: 0.1592559814453125
> 1582102669.764919 SRC MIGRATION {'status': 'setup'}
> 1582102669.766179 SRC MIGRATION_PASS {'pass': 1}
> 1582102669.766234 SRC MIGRATION {'status': 'active'}
> 1582102669.768058 DST MIGRATION {'status': 'active'}
> 1582102669.801422 SRC MIGRATION {'status': 'postcopy-active'}
> 1582102669.801510 SRC STOP
> 1582102675.990041 DST MIGRATION {'status': 'postcopy-active'}
> 1582102675.996435 DST RESUME
> 1582102676.111313 SRC MIGRATION {'status': 'completed'}
> 1582102676.155691 DST MIGRATION {'status': 'completed'}
> 
>> +        if debug:
> with no usage in the following patches, you can put the whole block of relative code above under the "if debug: section

TODO will be uncommented soon

> 
>> +            print('downtime:', downtime)
>> +            print('postcopy_time:', postcopy_time)
>> +
>> +        # Assert that bitmap migration is finished (check that successor bitmap
>> +        # is removed)
>>           result = self.vm_b.qmp('query-block')
>> -        while len(result['return'][0]['dirty-bitmaps']) > 1:
>> -            time.sleep(2)
>> -            result = self.vm_b.qmp('query-block')
>> +        assert len(result['return'][0]['dirty-bitmaps']) == 1
>> +        # Check content of migrated (and updated by new writes) bitmap
>>           result = self.vm_b.qmp('x-debug-block-dirty-bitmap-sha256',
>>                                  node='drive0', name='bitmap')
>> -
>>           self.assert_qmp(result, 'return/sha256', sha256)
>>
> 
> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 16/22] qemu-iotests/199: change discard patterns
  2020-02-19 14:33   ` Andrey Shinkevich
  2020-02-19 14:44     ` Andrey Shinkevich
@ 2020-02-19 15:46     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-19 15:46 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-devel
  Cc: Kevin Wolf, Max Reitz, dgilbert, qemu-block, quintela

19.02.2020 17:33, Andrey Shinkevich wrote:
> On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
>> iotest 40 works too long because of many discard opertion. On the same
> 
> operations
> At the same time
> 
>> time, postcopy period is very short, in spite of all these efforts.
>>
>> So, let's use less discards (and with more interesting patterns) to
>> reduce test timing. In the next commit we'll increase postcopy period.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   tests/qemu-iotests/199 | 44 +++++++++++++++++++++++++-----------------
>>   1 file changed, 26 insertions(+), 18 deletions(-)
>>
>> diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
>> index d78f81b71c..7914fd0b2b 100755
>> --- a/tests/qemu-iotests/199
>> +++ b/tests/qemu-iotests/199
>> @@ -30,6 +30,28 @@ size = '256G'
>>   fifo = os.path.join(iotests.test_dir, 'mig_fifo')
>> +GiB = 1024 * 1024 * 1024
>> +
>> +discards1 = (
>> +    (0, GiB),
>> +    (2 * GiB + 512 * 5, 512),
>> +    (3 * GiB + 512 * 5, 512),
>> +    (100 * GiB, GiB)
>> +)
>> +
>> +discards2 = (
>> +    (3 * GiB + 512 * 8, 512),
>> +    (4 * GiB + 512 * 8, 512),
>> +    (50 * GiB, GiB),
>> +    (100 * GiB + GiB // 2, GiB)
>> +)
>> +
>> +
>> +def apply_discards(vm, discards):
>> +    for d in discards:
> 
> If we run qemu-io only once, it will update the bitmap state and will speed the test performance up. Is that wrong idea?

But it will be less interesting test. Now we have several regions, which are in different relations in discards1 and
discards2.

And four elements are handled fast enough.

> 
>> +        vm.hmp_qemu_io('drive0', 'discard {} {}'.format(*d))
>> +
>> +
>>   def event_seconds(event):
>>       return event['timestamp']['seconds'] + \
>>           event['timestamp']['microseconds'] / 1000000.0
>> @@ -80,9 +102,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>>           self.vm_b_events = []
>>       def test_postcopy(self):
>> -        discard_size = 0x40000000
>>           granularity = 512
>> -        chunk = 4096
>>           result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
>>                                  name='bitmap', granularity=granularity)
>> @@ -92,14 +112,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>>                                  node='drive0', name='bitmap')
>>           empty_sha256 = result['return']['sha256']
>> -        s = 0
>> -        while s < discard_size:
>> -            self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
>> -            s += 0x10000
>> -        s = 0x8000
>> -        while s < discard_size:
>> -            self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
>> -            s += 0x10000
>> +        apply_discards(self.vm_a, discards1 + discards2)
>>           result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
>>                                  node='drive0', name='bitmap')
>> @@ -111,10 +124,8 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>>           result = self.vm_a.qmp('block-dirty-bitmap-clear', node='drive0',
>>                                  name='bitmap')
>>           self.assert_qmp(result, 'return', {})
>> -        s = 0
>> -        while s < discard_size:
>> -            self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
>> -            s += 0x10000
>> +
>> +        apply_discards(self.vm_a, discards1)
>>           caps = [{'capability': 'dirty-bitmaps', 'state': True},
>>                   {'capability': 'events', 'state': True}]
>> @@ -134,10 +145,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>>           e_resume = self.vm_b.event_wait('RESUME')
>>           self.vm_b_events.append(e_resume)
>> -        s = 0x8000
>> -        while s < discard_size:
>> -            self.vm_b.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
>> -            s += 0x10000
>> +        apply_discards(self.vm_b, discards2)
>>           match = {'data': {'status': 'completed'}}
>>           e_complete = self.vm_b.event_wait('MIGRATION', match=match)
>>
> 
> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 19/22] qemu-iotests/199: prepare for new test-cases addition
  2020-02-17 15:02 ` [PATCH v2 19/22] qemu-iotests/199: prepare for new test-cases addition Vladimir Sementsov-Ogievskiy
@ 2020-02-19 16:10   ` Andrey Shinkevich
  0 siblings, 0 replies; 80+ messages in thread
From: Andrey Shinkevich @ 2020-02-19 16:10 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Max Reitz, dgilbert, qemu-block, quintela

On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
> Move future common part to start_postcopy() method. Move checking
> number of bitmaps to check_bitmap().
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   tests/qemu-iotests/199 | 36 +++++++++++++++++++++++-------------
>   1 file changed, 23 insertions(+), 13 deletions(-)
> 
> diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
> index 9a6e8dcb9d..969620b103 100755
> --- a/tests/qemu-iotests/199
> +++ b/tests/qemu-iotests/199
> @@ -29,6 +29,8 @@ disk_b = os.path.join(iotests.test_dir, 'disk_b')
>   size = '256G'
>   fifo = os.path.join(iotests.test_dir, 'mig_fifo')
>   
> +granularity = 512
> +nb_bitmaps = 15
>   
>   GiB = 1024 * 1024 * 1024
>   
> @@ -61,6 +63,15 @@ def event_dist(e1, e2):
>       return event_seconds(e2) - event_seconds(e1)
>   
>   
> +def check_bitmaps(vm, count):
> +    result = vm.qmp('query-block')
> +
> +    if count == 0:
> +        assert 'dirty-bitmaps' not in result['return'][0]
> +    else:
> +        assert len(result['return'][0]['dirty-bitmaps']) == count
> +
> +
>   class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>       def tearDown(self):
>           if debug:
> @@ -101,10 +112,8 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>           self.vm_a_events = []
>           self.vm_b_events = []
>   
> -    def test_postcopy(self):
> -        granularity = 512
> -        nb_bitmaps = 15
> -
> +    def start_postcopy(self):
> +        """ Run migration until RESUME event on target. Return this event. """
>           for i in range(nb_bitmaps):
>               result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
>                                      name='bitmap{}'.format(i),
> @@ -119,10 +128,10 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>   
>           result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
>                                  node='drive0', name='bitmap0')
> -        discards1_sha256 = result['return']['sha256']
> +        self.discards1_sha256 = result['return']['sha256']
>   
>           # Check, that updating the bitmap by discards works
> -        assert discards1_sha256 != empty_sha256
> +        assert self.discards1_sha256 != empty_sha256
>   
>           # We want to calculate resulting sha256. Do it in bitmap0, so, disable
>           # other bitmaps
> @@ -135,7 +144,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>   
>           result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
>                                  node='drive0', name='bitmap0')
> -        all_discards_sha256 = result['return']['sha256']
> +        self.all_discards_sha256 = result['return']['sha256']
>   
>           # Now, enable some bitmaps, to be updated during migration
>           for i in range(2, nb_bitmaps, 2):
> @@ -160,6 +169,10 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>   
>           e_resume = self.vm_b.event_wait('RESUME')
>           self.vm_b_events.append(e_resume)
> +        return e_resume
> +
> +    def test_postcopy_success(self):
> +        e_resume = self.start_postcopy()
>   
>           # enabled bitmaps should be updated
>           apply_discards(self.vm_b, discards2)
> @@ -180,18 +193,15 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>               print('downtime:', downtime)
>               print('postcopy_time:', postcopy_time)
>   
> -        # Assert that bitmap migration is finished (check that successor bitmap
> -        # is removed)
> -        result = self.vm_b.qmp('query-block')
> -        assert len(result['return'][0]['dirty-bitmaps']) == nb_bitmaps
> +        check_bitmaps(self.vm_b, nb_bitmaps)
>   
>           # Check content of migrated bitmaps. Still, don't waste time checking
>           # every bitmap
>           for i in range(0, nb_bitmaps, 5):
>               result = self.vm_b.qmp('x-debug-block-dirty-bitmap-sha256',
>                                      node='drive0', name='bitmap{}'.format(i))
> -            sha256 = discards1_sha256 if i % 2 else all_discards_sha256
> -            self.assert_qmp(result, 'return/sha256', sha256)
> +            sha = self.discards1_sha256 if i % 2 else self.all_discards_sha256
> +            self.assert_qmp(result, 'return/sha256', sha)
>   
>   
>   if __name__ == '__main__':
> 

Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
-- 
With the best regards,
Andrey Shinkevich


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 07/22] migration/block-dirty-bitmap: simplify dirty_bitmap_load_complete
  2020-02-19 15:30     ` Vladimir Sementsov-Ogievskiy
@ 2020-02-19 16:14       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-19 16:14 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-devel
  Cc: Fam Zheng, qemu-block, quintela, dgilbert, Stefan Hajnoczi, John Snow

19.02.2020 18:30, Vladimir Sementsov-Ogievskiy wrote:
> 18.02.2020 17:26, Andrey Shinkevich wrote:
>> On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
>>> bdrv_enable_dirty_bitmap_locked() call does nothing, as if we are in
>>> postcopy, bitmap successor must be enabled, and reclaim operation will
>>> enable the bitmap.
>>>
>>> So, actually we need just call _reclaim_ in both if branches, and
>>> making differences only to add an assertion seems not really good. The
>>> logic becomes simple: on load complete we do reclaim and that's all.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   migration/block-dirty-bitmap.c | 25 ++++---------------------
>>>   1 file changed, 4 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
>>> index 440c41cfca..9cc750d93b 100644
>>> --- a/migration/block-dirty-bitmap.c
>>> +++ b/migration/block-dirty-bitmap.c
>>> @@ -535,6 +535,10 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>>>       qemu_mutex_lock(&s->lock);
>>> +    if (bdrv_dirty_bitmap_has_successor(s->bitmap)) {
>> What about making it sure?
>>             assert(!s->bitmap->successor->disabled);
> 
> I'm afraid we can't as BdrvDirtyBitmap is not public structure
> 
>>
>>> +        bdrv_reclaim_dirty_bitmap(s->bitmap, &error_abort);
> 
> But we can assert that resulting bitmap is enabled.

Or not, as bitmap may be not yet enabled, if guest is not yet started.

> 
>>> +    }
>>> +
>>>       for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
>>>           LoadBitmapState *b = item->data;
>>> @@ -544,27 +548,6 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>>>           }
>>>       }
>>> -    if (bdrv_dirty_bitmap_has_successor(s->bitmap)) {
>>> -        bdrv_dirty_bitmap_lock(s->bitmap);
>>> -        if (s->enabled_bitmaps == NULL) {
>>> -            /* in postcopy */
>>> -            bdrv_reclaim_dirty_bitmap_locked(s->bitmap, &error_abort);
>>> -            bdrv_enable_dirty_bitmap_locked(s->bitmap);
>>> -        } else {
>>> -            /* target not started, successor must be empty */
>>> -            int64_t count = bdrv_get_dirty_count(s->bitmap);
>>> -            BdrvDirtyBitmap *ret = bdrv_reclaim_dirty_bitmap_locked(s->bitmap,
>>> -                                                                    NULL);
>>> -            /* bdrv_reclaim_dirty_bitmap can fail only on no successor (it
>>> -             * must be) or on merge fail, but merge can't fail when second
>>> -             * bitmap is empty
>>> -             */
>>> -            assert(ret == s->bitmap &&
>>> -                   count == bdrv_get_dirty_count(s->bitmap));
>>> -        }
>>> -        bdrv_dirty_bitmap_unlock(s->bitmap);
>>> -    }
>>> -
>>>       qemu_mutex_unlock(&s->lock);
>>>   }
>>>
>>
>> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> 
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 20/22] qemu-iotests/199: check persistent bitmaps
  2020-02-17 15:02 ` [PATCH v2 20/22] qemu-iotests/199: check persistent bitmaps Vladimir Sementsov-Ogievskiy
@ 2020-02-19 16:28   ` Andrey Shinkevich
  0 siblings, 0 replies; 80+ messages in thread
From: Andrey Shinkevich @ 2020-02-19 16:28 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Max Reitz, dgilbert, qemu-block, quintela



On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
> Check that persistent bitmaps are not stored on source and that bitmaps
> are persistent on destination.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   tests/qemu-iotests/199 | 16 +++++++++++++++-
>   1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
> index 969620b103..8baa078151 100755
> --- a/tests/qemu-iotests/199
> +++ b/tests/qemu-iotests/199
> @@ -117,7 +117,8 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>           for i in range(nb_bitmaps):
>               result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
>                                      name='bitmap{}'.format(i),
> -                                   granularity=granularity)
> +                                   granularity=granularity,
> +                                   persistent=True)
>               self.assert_qmp(result, 'return', {})
>   
>           result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
> @@ -193,6 +194,19 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>               print('downtime:', downtime)
>               print('postcopy_time:', postcopy_time)
>   
> +        # check that there are no bitmaps stored on source
> +        self.vm_a_events += self.vm_a.get_qmp_events()
> +        self.vm_a.shutdown()
> +        self.vm_a.launch()
> +        check_bitmaps(self.vm_a, 0)
> +
> +        # check that bitmaps are migrated and persistence works
> +        check_bitmaps(self.vm_b, nb_bitmaps)
> +        self.vm_b.shutdown()
> +        # recreate vm_b, so there is no incoming option, which prevents
> +        # loading bitmaps from disk
> +        self.vm_b = iotests.VM(path_suffix='b').add_drive(disk_b)
> +        self.vm_b.launch()
>           check_bitmaps(self.vm_b, nb_bitmaps)
>   
>           # Check content of migrated bitmaps. Still, don't waste time checking
> 

Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
-- 
With the best regards,
Andrey Shinkevich


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 21/22] qemu-iotests/199: add early shutdown case to bitmaps postcopy
  2020-02-17 15:02 ` [PATCH v2 21/22] qemu-iotests/199: add early shutdown case to bitmaps postcopy Vladimir Sementsov-Ogievskiy
@ 2020-02-19 16:48   ` Andrey Shinkevich
  2020-02-19 16:50   ` Andrey Shinkevich
  1 sibling, 0 replies; 80+ messages in thread
From: Andrey Shinkevich @ 2020-02-19 16:48 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Max Reitz, dgilbert, qemu-block, quintela



On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
> Previous patches fixed two crashes which may occur on shutdown prior to
> bitmaps postcopy finished. Check that it works now.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   tests/qemu-iotests/199     | 18 ++++++++++++++++++
>   tests/qemu-iotests/199.out |  4 ++--
>   2 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
> index 8baa078151..0d12e6b1ae 100755
> --- a/tests/qemu-iotests/199
> +++ b/tests/qemu-iotests/199
> @@ -217,6 +217,24 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>               sha = self.discards1_sha256 if i % 2 else self.all_discards_sha256
>               self.assert_qmp(result, 'return/sha256', sha)
>   
> +    def test_early_shutdown_destination(self):
> +        self.start_postcopy()
> +
> +        self.vm_b_events += self.vm_b.get_qmp_events()
> +        self.vm_b.shutdown()
> +        # recreate vm_b, so there is no incoming option, which prevents
> +        # loading bitmaps from disk
> +        self.vm_b = iotests.VM(path_suffix='b').add_drive(disk_b)
> +        self.vm_b.launch()
> +        check_bitmaps(self.vm_b, 0)
> +
> +        result = self.vm_a.qmp('query-status')
> +        assert not result['return']['running']
> +        self.vm_a_events += self.vm_a.get_qmp_events()
> +        self.vm_a.shutdown()
> +        self.vm_a.launch()
> +        check_bitmaps(self.vm_a, 0)
> +
>   
>   if __name__ == '__main__':
>       iotests.main(supported_fmts=['qcow2'])
> diff --git a/tests/qemu-iotests/199.out b/tests/qemu-iotests/199.out
> index ae1213e6f8..fbc63e62f8 100644
> --- a/tests/qemu-iotests/199.out
> +++ b/tests/qemu-iotests/199.out
> @@ -1,5 +1,5 @@
> -.
> +..
>   ----------------------------------------------------------------------
> -Ran 1 tests
> +Ran 2 tests
>   
>   OK
> 

The updated test passed.

Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
-- 
With the best regards,
Andrey Shinkevich


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 21/22] qemu-iotests/199: add early shutdown case to bitmaps postcopy
  2020-02-17 15:02 ` [PATCH v2 21/22] qemu-iotests/199: add early shutdown case to bitmaps postcopy Vladimir Sementsov-Ogievskiy
  2020-02-19 16:48   ` Andrey Shinkevich
@ 2020-02-19 16:50   ` Andrey Shinkevich
  1 sibling, 0 replies; 80+ messages in thread
From: Andrey Shinkevich @ 2020-02-19 16:50 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Max Reitz, dgilbert, qemu-block, quintela



On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
> Previous patches fixed two crashes which may occur on shutdown prior to
> bitmaps postcopy finished. Check that it works now.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   tests/qemu-iotests/199     | 18 ++++++++++++++++++
>   tests/qemu-iotests/199.out |  4 ++--
>   2 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
> index 8baa078151..0d12e6b1ae 100755
> --- a/tests/qemu-iotests/199
> +++ b/tests/qemu-iotests/199
> @@ -217,6 +217,24 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>               sha = self.discards1_sha256 if i % 2 else self.all_discards_sha256
>               self.assert_qmp(result, 'return/sha256', sha)
>   
> +    def test_early_shutdown_destination(self):
> +        self.start_postcopy()
> +
> +        self.vm_b_events += self.vm_b.get_qmp_events()
> +        self.vm_b.shutdown()
> +        # recreate vm_b, so there is no incoming option, which prevents
> +        # loading bitmaps from disk
> +        self.vm_b = iotests.VM(path_suffix='b').add_drive(disk_b)
> +        self.vm_b.launch()
> +        check_bitmaps(self.vm_b, 0)
> +

Comments would help to understand the idea behind

> +        result = self.vm_a.qmp('query-status')
> +        assert not result['return']['running']
> +        self.vm_a_events += self.vm_a.get_qmp_events()
> +        self.vm_a.shutdown()
> +        self.vm_a.launch()
> +        check_bitmaps(self.vm_a, 0)
> +
>   
>   if __name__ == '__main__':
>       iotests.main(supported_fmts=['qcow2'])
> diff --git a/tests/qemu-iotests/199.out b/tests/qemu-iotests/199.out
> index ae1213e6f8..fbc63e62f8 100644
> --- a/tests/qemu-iotests/199.out
> +++ b/tests/qemu-iotests/199.out
> @@ -1,5 +1,5 @@
> -.
> +..
>   ----------------------------------------------------------------------
> -Ran 1 tests
> +Ran 2 tests
>   
>   OK
> 

-- 
With the best regards,
Andrey Shinkevich


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 18/22] python/qemu/machine: add kill() method
  2020-02-17 15:02 ` [PATCH v2 18/22] python/qemu/machine: add kill() method Vladimir Sementsov-Ogievskiy
@ 2020-02-19 17:00   ` Andrey Shinkevich
  2020-05-29 10:09   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 80+ messages in thread
From: Andrey Shinkevich @ 2020-02-19 17:00 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Cleber Rosa, dgilbert, Eduardo Habkost, quintela

On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
> Add method to hard-kill vm, without any quit commands.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   python/qemu/machine.py | 12 +++++++++---
>   1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/python/qemu/machine.py b/python/qemu/machine.py
> index 183d8f3d38..9918e0d8aa 100644
> --- a/python/qemu/machine.py
> +++ b/python/qemu/machine.py
> @@ -341,7 +341,7 @@ class QEMUMachine(object):
>           self._load_io_log()
>           self._post_shutdown()
>   
> -    def shutdown(self, has_quit=False):
> +    def shutdown(self, has_quit=False, hard=False):
>           """
>           Terminate the VM and clean up
>           """
> @@ -353,7 +353,9 @@ class QEMUMachine(object):
>               self._console_socket = None
>   
>           if self.is_running():
> -            if self._qmp:
> +            if hard:
> +                self._popen.kill()
> +            elif self._qmp:
>                   try:
>                       if not has_quit:
>                           self._qmp.cmd('quit')
> @@ -366,7 +368,8 @@ class QEMUMachine(object):
>           self._post_shutdown()
>   
>           exitcode = self.exitcode()
> -        if exitcode is not None and exitcode < 0:
> +        if exitcode is not None and exitcode < 0 and \
> +                not (exitcode == -9 and hard):
>               msg = 'qemu received signal %i: %s'
>               if self._qemu_full_args:
>                   command = ' '.join(self._qemu_full_args)
> @@ -376,6 +379,9 @@ class QEMUMachine(object):
>   
>           self._launched = False
>   
> +    def kill(self):
> +        self.shutdown(hard=True)
> +
>       def set_qmp_monitor(self, enabled=True):
>           """
>           Set the QMP monitor.
> 

It would be reasonable to number this patch the last but one, that's 
right before a usage in the patch that follows.

Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
-- 
With the best regards,
Andrey Shinkevich


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 22/22] qemu-iotests/199: add source-killed case to bitmaps postcopy
  2020-02-17 15:02 ` [PATCH v2 22/22] qemu-iotests/199: add source-killed " Vladimir Sementsov-Ogievskiy
@ 2020-02-19 17:15   ` Andrey Shinkevich
  2020-07-24  7:50     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 80+ messages in thread
From: Andrey Shinkevich @ 2020-02-19 17:15 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Max Reitz, dgilbert, qemu-block, quintela

On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
> Previous patches fixes behavior of bitmaps migration, so that errors
> are handled by just removing unfinished bitmaps, and not fail or try to
> recover postcopy migration. Add corresponding test.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   tests/qemu-iotests/199     | 15 +++++++++++++++
>   tests/qemu-iotests/199.out |  4 ++--
>   2 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
> index 0d12e6b1ae..d38913fa44 100755
> --- a/tests/qemu-iotests/199
> +++ b/tests/qemu-iotests/199
> @@ -235,6 +235,21 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>           self.vm_a.launch()
>           check_bitmaps(self.vm_a, 0)
>   
> +    def test_early_kill_source(self):
> +        self.start_postcopy()
> +
> +        self.vm_a_events = self.vm_a.get_qmp_events()
> +        self.vm_a.kill()
> +
> +        self.vm_a.launch()
> +
> +        match = {'data': {'status': 'completed'}}
> +        e_complete = self.vm_b.event_wait('MIGRATION', match=match)

A failed migration gets the status 'completed'. That misleads a user but 
is not in the scope of this series, I guess.

> +        self.vm_b_events.append(e_complete)
> +
> +        check_bitmaps(self.vm_a, 0)
> +        check_bitmaps(self.vm_b, 0)
> +
>   
>   if __name__ == '__main__':
>       iotests.main(supported_fmts=['qcow2'])
> diff --git a/tests/qemu-iotests/199.out b/tests/qemu-iotests/199.out
> index fbc63e62f8..8d7e996700 100644
> --- a/tests/qemu-iotests/199.out
> +++ b/tests/qemu-iotests/199.out
> @@ -1,5 +1,5 @@
> -..
> +...
>   ----------------------------------------------------------------------
> -Ran 2 tests
> +Ran 3 tests
>   
>   OK
> 

The updated test passed.

Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
-- 
With the best regards,
Andrey Shinkevich


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 00/22] Fix error handling during bitmap postcopy
  2020-02-19 14:58           ` Eric Blake
@ 2020-02-19 17:22             ` Andrey Shinkevich
  0 siblings, 0 replies; 80+ messages in thread
From: Andrey Shinkevich @ 2020-02-19 17:22 UTC (permalink / raw)
  To: Eric Blake, Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Fam Zheng, Kevin Wolf, Eduardo Habkost, qemu-block, quintela,
	qemu-stable, dgilbert, Stefan Hajnoczi, Cleber Rosa, Max Reitz,
	John Snow



On 19/02/2020 17:58, Eric Blake wrote:
> On 2/19/20 7:52 AM, Andrey Shinkevich wrote:
> 
>>>> +od: unrecognized option '--endian=big'
>>>> +Try 'od --help' for more information.
>>>> +od: invalid -N argument '--endian=big'
>>>
>>> Yay, same problem for both tests.  Fix common.rc once, and both tests 
>>> should start working for you.
>>
>> Thank you Eric! I want to sort it out later...
> 
> Patch proposed:
> https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg05188.html
> 
> 

Thank you Eric, I appreciate.
-- 
With the best regards,
Andrey Shinkevich



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 00/22] Fix error handling during bitmap postcopy
  2020-02-17 15:02 [PATCH v2 00/22] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (23 preceding siblings ...)
  2020-02-18 20:02 ` Andrey Shinkevich
@ 2020-04-02  7:42 ` Vladimir Sementsov-Ogievskiy
  2020-05-29 11:58   ` Eric Blake
  24 siblings, 1 reply; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-02  7:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, Eduardo Habkost, qemu-block, quintela,
	qemu-stable, dgilbert, Max Reitz, Stefan Hajnoczi, Cleber Rosa,
	andrey.shinkevich, John Snow

Ping!

It's a fix, but not a degradation and I'm afraid too big for 5.0.

Still, I think I should ping it anyway. John, I'm afraid, that this all is for your branch :)


17.02.2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
> Original idea of bitmaps postcopy migration is that bitmaps are non
> critical data, and their loss is not serious problem. So, using postcopy
> method on any failure we should just drop unfinished bitmaps and
> continue guest execution.
> 
> However, it doesn't work so. It crashes, fails, it goes to
> postcopy-recovery feature. It does anything except for behavior we want.
> These series fixes at least some problems with error handling during
> bitmaps migration postcopy.
> 
> v1 was "[PATCH 0/7] Fix crashes on early shutdown during bitmaps postcopy"
> 
> v2:
> 
> Most of patches are new or changed a lot.
> Only patches 06,07 mostly unchanged, just rebased on refactorings.
> 
> Vladimir Sementsov-Ogievskiy (22):
>    migration/block-dirty-bitmap: fix dirty_bitmap_mig_before_vm_start
>    migration/block-dirty-bitmap: rename state structure types
>    migration/block-dirty-bitmap: rename dirty_bitmap_mig_cleanup
>    migration/block-dirty-bitmap: move mutex init to dirty_bitmap_mig_init
>    migration/block-dirty-bitmap: refactor state global variables
>    migration/block-dirty-bitmap: rename finish_lock to just lock
>    migration/block-dirty-bitmap: simplify dirty_bitmap_load_complete
>    migration/block-dirty-bitmap: keep bitmap state for all bitmaps
>    migration/block-dirty-bitmap: relax error handling in incoming part
>    migration/block-dirty-bitmap: cancel migration on shutdown
>    migration/savevm: don't worry if bitmap migration postcopy failed
>    qemu-iotests/199: fix style
>    qemu-iotests/199: drop extra constraints
>    qemu-iotests/199: better catch postcopy time
>    qemu-iotests/199: improve performance: set bitmap by discard
>    qemu-iotests/199: change discard patterns
>    qemu-iotests/199: increase postcopy period
>    python/qemu/machine: add kill() method
>    qemu-iotests/199: prepare for new test-cases addition
>    qemu-iotests/199: check persistent bitmaps
>    qemu-iotests/199: add early shutdown case to bitmaps postcopy
>    qemu-iotests/199: add source-killed case to bitmaps postcopy
> 
> Cc: John Snow <jsnow@redhat.com>
> Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Cc: Fam Zheng <fam@euphon.net>
> Cc: Juan Quintela <quintela@redhat.com>
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Cleber Rosa <crosa@redhat.com>
> Cc: Kevin Wolf <kwolf@redhat.com>
> Cc: Max Reitz <mreitz@redhat.com>
> Cc: qemu-block@nongnu.org
> Cc: qemu-devel@nongnu.org
> Cc: qemu-stable@nongnu.org # for patch 01
> 
>   migration/migration.h          |   3 +-
>   migration/block-dirty-bitmap.c | 444 +++++++++++++++++++++------------
>   migration/migration.c          |  15 +-
>   migration/savevm.c             |  37 ++-
>   python/qemu/machine.py         |  12 +-
>   tests/qemu-iotests/199         | 244 ++++++++++++++----
>   tests/qemu-iotests/199.out     |   4 +-
>   7 files changed, 529 insertions(+), 230 deletions(-)
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 18/22] python/qemu/machine: add kill() method
  2020-02-17 15:02 ` [PATCH v2 18/22] python/qemu/machine: add kill() method Vladimir Sementsov-Ogievskiy
  2020-02-19 17:00   ` Andrey Shinkevich
@ 2020-05-29 10:09   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 80+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-05-29 10:09 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: andrey.shinkevich, Cleber Rosa, Eduardo Habkost, dgilbert, quintela

On 2/17/20 4:02 PM, Vladimir Sementsov-Ogievskiy wrote:
> Add method to hard-kill vm, without any quit commands.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  python/qemu/machine.py | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)

Thanks, applied to my python-next tree:
https://gitlab.com/philmd/qemu/commits/python-next



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 00/22] Fix error handling during bitmap postcopy
  2020-04-02  7:42 ` Vladimir Sementsov-Ogievskiy
@ 2020-05-29 11:58   ` Eric Blake
  2020-05-29 12:16     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 80+ messages in thread
From: Eric Blake @ 2020-05-29 11:58 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Fam Zheng, Kevin Wolf, Eduardo Habkost, qemu-block, quintela,
	qemu-stable, dgilbert, Stefan Hajnoczi, Cleber Rosa,
	andrey.shinkevich, Max Reitz

On 4/2/20 2:42 AM, Vladimir Sementsov-Ogievskiy wrote:
> Ping!
> 
> It's a fix, but not a degradation and I'm afraid too big for 5.0.
> 
> Still, I think I should ping it anyway. John, I'm afraid, that this all 
> is for your branch :)

Just noticing this thread, now that we've shuffled bitmaps maintainers. 
Is there anything here that we still need to include in 5.1?

> 
> 
> 17.02.2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
>> Original idea of bitmaps postcopy migration is that bitmaps are non
>> critical data, and their loss is not serious problem. So, using postcopy
>> method on any failure we should just drop unfinished bitmaps and
>> continue guest execution.
>>
>> However, it doesn't work so. It crashes, fails, it goes to
>> postcopy-recovery feature. It does anything except for behavior we want.
>> These series fixes at least some problems with error handling during
>> bitmaps migration postcopy.
>>
>> v1 was "[PATCH 0/7] Fix crashes on early shutdown during bitmaps 
>> postcopy"
>>
>> v2:
>>
>> Most of patches are new or changed a lot.
>> Only patches 06,07 mostly unchanged, just rebased on refactorings.
>>
>> Vladimir Sementsov-Ogievskiy (22):
>>    migration/block-dirty-bitmap: fix dirty_bitmap_mig_before_vm_start
>>    migration/block-dirty-bitmap: rename state structure types
>>    migration/block-dirty-bitmap: rename dirty_bitmap_mig_cleanup
>>    migration/block-dirty-bitmap: move mutex init to dirty_bitmap_mig_init
>>    migration/block-dirty-bitmap: refactor state global variables
>>    migration/block-dirty-bitmap: rename finish_lock to just lock
>>    migration/block-dirty-bitmap: simplify dirty_bitmap_load_complete
>>    migration/block-dirty-bitmap: keep bitmap state for all bitmaps
>>    migration/block-dirty-bitmap: relax error handling in incoming part
>>    migration/block-dirty-bitmap: cancel migration on shutdown
>>    migration/savevm: don't worry if bitmap migration postcopy failed
>>    qemu-iotests/199: fix style
>>    qemu-iotests/199: drop extra constraints
>>    qemu-iotests/199: better catch postcopy time
>>    qemu-iotests/199: improve performance: set bitmap by discard
>>    qemu-iotests/199: change discard patterns
>>    qemu-iotests/199: increase postcopy period
>>    python/qemu/machine: add kill() method
>>    qemu-iotests/199: prepare for new test-cases addition
>>    qemu-iotests/199: check persistent bitmaps
>>    qemu-iotests/199: add early shutdown case to bitmaps postcopy
>>    qemu-iotests/199: add source-killed case to bitmaps postcopy
>>
>> Cc: John Snow <jsnow@redhat.com>
>> Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> Cc: Stefan Hajnoczi <stefanha@redhat.com>
>> Cc: Fam Zheng <fam@euphon.net>
>> Cc: Juan Quintela <quintela@redhat.com>
>> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>> Cc: Eduardo Habkost <ehabkost@redhat.com>
>> Cc: Cleber Rosa <crosa@redhat.com>
>> Cc: Kevin Wolf <kwolf@redhat.com>
>> Cc: Max Reitz <mreitz@redhat.com>
>> Cc: qemu-block@nongnu.org
>> Cc: qemu-devel@nongnu.org
>> Cc: qemu-stable@nongnu.org # for patch 01
>>
>>   migration/migration.h          |   3 +-
>>   migration/block-dirty-bitmap.c | 444 +++++++++++++++++++++------------
>>   migration/migration.c          |  15 +-
>>   migration/savevm.c             |  37 ++-
>>   python/qemu/machine.py         |  12 +-
>>   tests/qemu-iotests/199         | 244 ++++++++++++++----
>>   tests/qemu-iotests/199.out     |   4 +-
>>   7 files changed, 529 insertions(+), 230 deletions(-)
>>
> 
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 00/22] Fix error handling during bitmap postcopy
  2020-05-29 11:58   ` Eric Blake
@ 2020-05-29 12:16     ` Vladimir Sementsov-Ogievskiy
  2020-07-23 20:39       ` Eric Blake
  0 siblings, 1 reply; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-05-29 12:16 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: Fam Zheng, Kevin Wolf, Eduardo Habkost, qemu-block, quintela,
	qemu-stable, dgilbert, Stefan Hajnoczi, Cleber Rosa,
	andrey.shinkevich, Max Reitz

29.05.2020 14:58, Eric Blake wrote:
> On 4/2/20 2:42 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Ping!
>>
>> It's a fix, but not a degradation and I'm afraid too big for 5.0.
>>
>> Still, I think I should ping it anyway. John, I'm afraid, that this all is for your branch :)
> 
> Just noticing this thread, now that we've shuffled bitmaps maintainers. Is there anything here that we still need to include in 5.1?

Yes, we need the whole series.

> 
>>
>>
>> 17.02.2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
>>> Original idea of bitmaps postcopy migration is that bitmaps are non
>>> critical data, and their loss is not serious problem. So, using postcopy
>>> method on any failure we should just drop unfinished bitmaps and
>>> continue guest execution.
>>>
>>> However, it doesn't work so. It crashes, fails, it goes to
>>> postcopy-recovery feature. It does anything except for behavior we want.
>>> These series fixes at least some problems with error handling during
>>> bitmaps migration postcopy.
>>>
>>> v1 was "[PATCH 0/7] Fix crashes on early shutdown during bitmaps postcopy"
>>>
>>> v2:
>>>
>>> Most of patches are new or changed a lot.
>>> Only patches 06,07 mostly unchanged, just rebased on refactorings.
>>>
>>> Vladimir Sementsov-Ogievskiy (22):
>>>    migration/block-dirty-bitmap: fix dirty_bitmap_mig_before_vm_start
>>>    migration/block-dirty-bitmap: rename state structure types
>>>    migration/block-dirty-bitmap: rename dirty_bitmap_mig_cleanup
>>>    migration/block-dirty-bitmap: move mutex init to dirty_bitmap_mig_init
>>>    migration/block-dirty-bitmap: refactor state global variables
>>>    migration/block-dirty-bitmap: rename finish_lock to just lock
>>>    migration/block-dirty-bitmap: simplify dirty_bitmap_load_complete
>>>    migration/block-dirty-bitmap: keep bitmap state for all bitmaps
>>>    migration/block-dirty-bitmap: relax error handling in incoming part
>>>    migration/block-dirty-bitmap: cancel migration on shutdown
>>>    migration/savevm: don't worry if bitmap migration postcopy failed
>>>    qemu-iotests/199: fix style
>>>    qemu-iotests/199: drop extra constraints
>>>    qemu-iotests/199: better catch postcopy time
>>>    qemu-iotests/199: improve performance: set bitmap by discard
>>>    qemu-iotests/199: change discard patterns
>>>    qemu-iotests/199: increase postcopy period
>>>    python/qemu/machine: add kill() method
>>>    qemu-iotests/199: prepare for new test-cases addition
>>>    qemu-iotests/199: check persistent bitmaps
>>>    qemu-iotests/199: add early shutdown case to bitmaps postcopy
>>>    qemu-iotests/199: add source-killed case to bitmaps postcopy
>>>
>>> Cc: John Snow <jsnow@redhat.com>
>>> Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> Cc: Stefan Hajnoczi <stefanha@redhat.com>
>>> Cc: Fam Zheng <fam@euphon.net>
>>> Cc: Juan Quintela <quintela@redhat.com>
>>> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>> Cc: Eduardo Habkost <ehabkost@redhat.com>
>>> Cc: Cleber Rosa <crosa@redhat.com>
>>> Cc: Kevin Wolf <kwolf@redhat.com>
>>> Cc: Max Reitz <mreitz@redhat.com>
>>> Cc: qemu-block@nongnu.org
>>> Cc: qemu-devel@nongnu.org
>>> Cc: qemu-stable@nongnu.org # for patch 01
>>>
>>>   migration/migration.h          |   3 +-
>>>   migration/block-dirty-bitmap.c | 444 +++++++++++++++++++++------------
>>>   migration/migration.c          |  15 +-
>>>   migration/savevm.c             |  37 ++-
>>>   python/qemu/machine.py         |  12 +-
>>>   tests/qemu-iotests/199         | 244 ++++++++++++++----
>>>   tests/qemu-iotests/199.out     |   4 +-
>>>   7 files changed, 529 insertions(+), 230 deletions(-)
>>>
>>
>>
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 00/22] Fix error handling during bitmap postcopy
  2020-05-29 12:16     ` Vladimir Sementsov-Ogievskiy
@ 2020-07-23 20:39       ` Eric Blake
  0 siblings, 0 replies; 80+ messages in thread
From: Eric Blake @ 2020-07-23 20:39 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Fam Zheng, Kevin Wolf, Eduardo Habkost, qemu-block, quintela,
	qemu-stable, dgilbert, Stefan Hajnoczi, Cleber Rosa,
	andrey.shinkevich, Max Reitz

On 5/29/20 7:16 AM, Vladimir Sementsov-Ogievskiy wrote:
> 29.05.2020 14:58, Eric Blake wrote:
>> On 4/2/20 2:42 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Ping!
>>>
>>> It's a fix, but not a degradation and I'm afraid too big for 5.0.
>>>
>>> Still, I think I should ping it anyway. John, I'm afraid, that this 
>>> all is for your branch :)
>>
>> Just noticing this thread, now that we've shuffled bitmaps 
>> maintainers. Is there anything here that we still need to include in 5.1?
> 
> Yes, we need the whole series.

I'm starting to go through it now, to see what is still worth getting in 
to 5.1-rc2, but no promises as it is a long series and I don't want to 
introduce last-minute regressions (the fact that this missed 5.0 says 
that 5.1 will be no worse than 5.0 if we don't get this in until 5.2).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 02/22] migration/block-dirty-bitmap: rename state structure types
  2020-02-17 15:02 ` [PATCH v2 02/22] migration/block-dirty-bitmap: rename state structure types Vladimir Sementsov-Ogievskiy
@ 2020-07-23 20:50   ` Eric Blake
  0 siblings, 0 replies; 80+ messages in thread
From: Eric Blake @ 2020-07-23 20:50 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Fam Zheng, qemu-block, quintela, dgilbert, Stefan Hajnoczi,
	andrey.shinkevich, John Snow

On 2/17/20 9:02 AM, Vladimir Sementsov-Ogievskiy wrote:
> Rename types to be symmetrical for load/save part and shorter.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   migration/block-dirty-bitmap.c | 68 ++++++++++++++++++----------------
>   1 file changed, 36 insertions(+), 32 deletions(-)

No longer applies to master, but the mechanical aspect of the change 
makes sense. If you rebase the series to make review easier,

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 03/22] migration/block-dirty-bitmap: rename dirty_bitmap_mig_cleanup
  2020-02-19 14:20     ` Vladimir Sementsov-Ogievskiy
@ 2020-07-23 20:54       ` Eric Blake
  0 siblings, 0 replies; 80+ messages in thread
From: Eric Blake @ 2020-07-23 20:54 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Andrey Shinkevich, qemu-devel
  Cc: Fam Zheng, Stefan Hajnoczi, dgilbert, qemu-block, quintela

On 2/19/20 8:20 AM, Vladimir Sementsov-Ogievskiy wrote:
> 18.02.2020 14:00, Andrey Shinkevich wrote:
>> On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
>>> Rename dirty_bitmap_mig_cleanup to dirty_bitmap_do_save_cleanup, to
>>> stress that it is on save part.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   migration/block-dirty-bitmap.c | 8 ++++----
>>>   1 file changed, 4 insertions(+), 4 deletions(-)

>>
>> At the next opportunity, I would suggest the name like
>> "dirty_bitmap_do_clean_after_saving()"
>> and similar for dirty_bitmap_save_cleanup()
>> "dirty_bitmap_clean_after_saving()".
> 
> I'd keep my naming, it corresponds to .save_cleanup handler name.

I'm fine with that explanation, so no need to rename again.

Reviewed-by: Eric Blake <eblake@redhat.com>

> 
>>
>> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> 
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 10/22] migration/block-dirty-bitmap: cancel migration on shutdown
  2020-02-17 15:02 ` [PATCH v2 10/22] migration/block-dirty-bitmap: cancel migration on shutdown Vladimir Sementsov-Ogievskiy
  2020-02-18 19:11   ` Andrey Shinkevich
@ 2020-07-23 21:04   ` Eric Blake
  1 sibling, 0 replies; 80+ messages in thread
From: Eric Blake @ 2020-07-23 21:04 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Fam Zheng, qemu-block, quintela, dgilbert, Stefan Hajnoczi,
	andrey.shinkevich

On 2/17/20 9:02 AM, Vladimir Sementsov-Ogievskiy wrote:
> If target is turned of prior to postcopy finished, target crashes

s/of/off/

> because busy bitmaps are found at shutdown.
> Canceling incoming migration helps, as it removes all unfinished (and
> therefore busy) bitmaps.
> 
> Similarly on source we crash in bdrv_close_all which asserts that all
> bdrv states are removed, because bdrv states involved into dirty bitmap
> migration are referenced by it. So, we need to cancel outgoing
> migration as well.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 08/22] migration/block-dirty-bitmap: keep bitmap state for all bitmaps
  2020-02-17 15:02 ` [PATCH v2 08/22] migration/block-dirty-bitmap: keep bitmap state for all bitmaps Vladimir Sementsov-Ogievskiy
  2020-02-18 17:07   ` Andrey Shinkevich
@ 2020-07-23 21:30   ` Eric Blake
  2020-07-24  5:18     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 80+ messages in thread
From: Eric Blake @ 2020-07-23 21:30 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Fam Zheng, qemu-block, quintela, dgilbert, Stefan Hajnoczi,
	andrey.shinkevich

On 2/17/20 9:02 AM, Vladimir Sementsov-Ogievskiy wrote:
> Keep bitmap state for disabled bitmaps too. Keep the state until the
> end of the process. It's needed for the following commit to implement
> bitmap postcopy canceling.
> 
> To clean-up the new list the following logic is used:
> We need two events to consider bitmap migration finished:
> 1. chunk with DIRTY_BITMAP_MIG_FLAG_COMPLETE flag should be received
> 2. dirty_bitmap_mig_before_vm_start should be called
> These two events may come in any order, so we understand which one is
> last, and on the last of them we remove bitmap migration state from the
> list.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   migration/block-dirty-bitmap.c | 64 +++++++++++++++++++++++-----------
>   1 file changed, 43 insertions(+), 21 deletions(-)

> @@ -484,45 +488,59 @@ static int dirty_bitmap_load_start(QEMUFile *f, DBMLoadState *s)
>   
>       bdrv_disable_dirty_bitmap(s->bitmap);
>       if (flags & DIRTY_BITMAP_MIG_START_FLAG_ENABLED) {
> -        LoadBitmapState *b;
> -
>           bdrv_dirty_bitmap_create_successor(s->bitmap, &local_err);
>           if (local_err) {
>               error_report_err(local_err);
>               return -EINVAL;
>           }
> -
> -        b = g_new(LoadBitmapState, 1);
> -        b->bs = s->bs;
> -        b->bitmap = s->bitmap;
> -        b->migrated = false;
> -        s->enabled_bitmaps = g_slist_prepend(s->enabled_bitmaps, b);
>       }
>   
> +    b = g_new(LoadBitmapState, 1);
> +    b->bs = s->bs;
> +    b->bitmap = s->bitmap;
> +    b->migrated = false;
> +    b->enabled = flags & DIRTY_BITMAP_MIG_START_FLAG_ENABLED,
> +
> +    s->bitmaps = g_slist_prepend(s->bitmaps, b);

Did you really mean to use a comma operator there, or should that be ';'?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 12/22] qemu-iotests/199: fix style
  2020-02-17 15:02 ` [PATCH v2 12/22] qemu-iotests/199: fix style Vladimir Sementsov-Ogievskiy
  2020-02-19  7:04   ` Andrey Shinkevich
@ 2020-07-23 22:03   ` Eric Blake
  2020-07-24  6:32     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 80+ messages in thread
From: Eric Blake @ 2020-07-23 22:03 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, qemu-block, quintela, dgilbert, Max Reitz, andrey.shinkevich

On 2/17/20 9:02 AM, Vladimir Sementsov-Ogievskiy wrote:
> Mostly, satisfy pep8 complains.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   tests/qemu-iotests/199 | 13 +++++++------
>   1 file changed, 7 insertions(+), 6 deletions(-)

With none of your series applied, I get:

$ ./check -qcow2 199
...
199      not run    [16:52:34] [16:52:34]                    not 
suitable for this cache mode: writeback
Not run: 199
Passed all 0 iotests
199      fail       [16:53:37] [16:53:37]                    output 
mismatch (see 199.out.bad)
--- /home/eblake/qemu/tests/qemu-iotests/199.out	2020-07-23 
16:48:56.275529368 -0500
+++ /home/eblake/qemu/build/tests/qemu-iotests/199.out.bad	2020-07-23 
16:53:37.728416207 -0500
@@ -1,5 +1,13 @@
-.
+E
+======================================================================
+ERROR: test_postcopy (__main__.TestDirtyBitmapPostcopyMigration)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "199", line 41, in setUp
+    os.mkfifo(fifo)
+FileExistsError: [Errno 17] File exists
+
  ----------------------------------------------------------------------
  Ran 1 tests

-OK
+FAILED (errors=1)
Failures: 199
Failed 1 of 1 iotests

Ah, 'scratch/mig_fifo' was left over from some other aborted run of the 
test. I removed that file (which implies it might be nice if the test 
handled that automatically, instead of making me do it), and tried 
again; now I got the desired:

199      pass       [17:00:34] [17:01:48]  74s
Passed all 1 iotests


After trying to rebase your series, I once again got failures, but that 
could mean I botched the rebase (since quite a few of the code patches 
earlier in the series were non-trivially changed).  If you send a v3 
(which would be really nice!), I'd hoist this and 13/22 first in the 
series, to get to a point where testing 199 works, to then make it 
easier to demonstrate what the rest of the 199 enhancements do in 
relation to the non-iotest patches.  But I like that you separated the 
199 improvements from the code - testing-wise, it's easy to apply the 
iotests patches first, make sure it fails, then apply the code patches, 
and make sure it passes, to prove that the enhanced test now covers what 
the code fixes did.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 17/22] qemu-iotests/199: increase postcopy period
  2020-02-17 15:02 ` [PATCH v2 17/22] qemu-iotests/199: increase postcopy period Vladimir Sementsov-Ogievskiy
  2020-02-19 14:56   ` Andrey Shinkevich
@ 2020-07-24  0:14   ` Eric Blake
  1 sibling, 0 replies; 80+ messages in thread
From: Eric Blake @ 2020-07-24  0:14 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, qemu-block, quintela, dgilbert, Max Reitz, andrey.shinkevich

On 2/17/20 9:02 AM, Vladimir Sementsov-Ogievskiy wrote:
> Test wants force bitmap postcopy. Still, resulting postcopy period is

The test wants to force a bitmap postcopy. Still, the resulting postcopy 
period is very small.

> very small. Let's increase it by adding more bitmaps to migrate. Also,
> test disabled bitmaps migration.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   tests/qemu-iotests/199 | 58 ++++++++++++++++++++++++++++--------------
>   1 file changed, 39 insertions(+), 19 deletions(-)

Patches 12-17:
Tested-by: Eric Blake <eblake@redhat.com>

As they all work without any other patches in this series, and DO make a 
dramatic difference (cutting the test from over 70 seconds to just 7, on 
my machine), I'm inclined to stage them now, even while waiting for you 
to rebase the rest of the series.  And 18 is already in the tree.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 16/22] qemu-iotests/199: change discard patterns
  2020-02-17 15:02 ` [PATCH v2 16/22] qemu-iotests/199: change discard patterns Vladimir Sementsov-Ogievskiy
  2020-02-19 14:33   ` Andrey Shinkevich
@ 2020-07-24  0:23   ` Eric Blake
  1 sibling, 0 replies; 80+ messages in thread
From: Eric Blake @ 2020-07-24  0:23 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, qemu-block, quintela, dgilbert, Max Reitz, andrey.shinkevich

On 2/17/20 9:02 AM, Vladimir Sementsov-Ogievskiy wrote:
> iotest 40 works too long because of many discard opertion. On the same

I'm assuming you meant s/40/199/ here, as well as the typo fixes pointed 
out by Andrey.

> time, postcopy period is very short, in spite of all these efforts.
> 
> So, let's use less discards (and with more interesting patterns) to
> reduce test timing. In the next commit we'll increase postcopy period.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   tests/qemu-iotests/199 | 44 +++++++++++++++++++++++++-----------------
>   1 file changed, 26 insertions(+), 18 deletions(-)
> 


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 08/22] migration/block-dirty-bitmap: keep bitmap state for all bitmaps
  2020-07-23 21:30   ` Eric Blake
@ 2020-07-24  5:18     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-24  5:18 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: Fam Zheng, qemu-block, quintela, dgilbert, Stefan Hajnoczi,
	andrey.shinkevich

24.07.2020 00:30, Eric Blake wrote:
> On 2/17/20 9:02 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Keep bitmap state for disabled bitmaps too. Keep the state until the
>> end of the process. It's needed for the following commit to implement
>> bitmap postcopy canceling.
>>
>> To clean-up the new list the following logic is used:
>> We need two events to consider bitmap migration finished:
>> 1. chunk with DIRTY_BITMAP_MIG_FLAG_COMPLETE flag should be received
>> 2. dirty_bitmap_mig_before_vm_start should be called
>> These two events may come in any order, so we understand which one is
>> last, and on the last of them we remove bitmap migration state from the
>> list.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   migration/block-dirty-bitmap.c | 64 +++++++++++++++++++++++-----------
>>   1 file changed, 43 insertions(+), 21 deletions(-)
> 
>> @@ -484,45 +488,59 @@ static int dirty_bitmap_load_start(QEMUFile *f, DBMLoadState *s)
>>       bdrv_disable_dirty_bitmap(s->bitmap);
>>       if (flags & DIRTY_BITMAP_MIG_START_FLAG_ENABLED) {
>> -        LoadBitmapState *b;
>> -
>>           bdrv_dirty_bitmap_create_successor(s->bitmap, &local_err);
>>           if (local_err) {
>>               error_report_err(local_err);
>>               return -EINVAL;
>>           }
>> -
>> -        b = g_new(LoadBitmapState, 1);
>> -        b->bs = s->bs;
>> -        b->bitmap = s->bitmap;
>> -        b->migrated = false;
>> -        s->enabled_bitmaps = g_slist_prepend(s->enabled_bitmaps, b);
>>       }
>> +    b = g_new(LoadBitmapState, 1);
>> +    b->bs = s->bs;
>> +    b->bitmap = s->bitmap;
>> +    b->migrated = false;
>> +    b->enabled = flags & DIRTY_BITMAP_MIG_START_FLAG_ENABLED,
>> +
>> +    s->bitmaps = g_slist_prepend(s->bitmaps, b);
> 
> Did you really mean to use a comma operator there, or should that be ';'?
> 

Of course, it should be ';':)

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 12/22] qemu-iotests/199: fix style
  2020-07-23 22:03   ` Eric Blake
@ 2020-07-24  6:32     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-24  6:32 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: Kevin Wolf, qemu-block, quintela, dgilbert, Max Reitz, andrey.shinkevich

24.07.2020 01:03, Eric Blake wrote:
> On 2/17/20 9:02 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Mostly, satisfy pep8 complains.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   tests/qemu-iotests/199 | 13 +++++++------
>>   1 file changed, 7 insertions(+), 6 deletions(-)
> 
> With none of your series applied, I get:
> 
> $ ./check -qcow2 199
> ...
> 199      not run    [16:52:34] [16:52:34]                    not suitable for this cache mode: writeback
> Not run: 199
> Passed all 0 iotests
> 199      fail       [16:53:37] [16:53:37]                    output mismatch (see 199.out.bad)
> --- /home/eblake/qemu/tests/qemu-iotests/199.out    2020-07-23 16:48:56.275529368 -0500
> +++ /home/eblake/qemu/build/tests/qemu-iotests/199.out.bad    2020-07-23 16:53:37.728416207 -0500
> @@ -1,5 +1,13 @@
> -.
> +E
> +======================================================================
> +ERROR: test_postcopy (__main__.TestDirtyBitmapPostcopyMigration)
> +----------------------------------------------------------------------
> +Traceback (most recent call last):
> +  File "199", line 41, in setUp
> +    os.mkfifo(fifo)
> +FileExistsError: [Errno 17] File exists
> +
>   ----------------------------------------------------------------------
>   Ran 1 tests
> 
> -OK
> +FAILED (errors=1)
> Failures: 199
> Failed 1 of 1 iotests
> 
> Ah, 'scratch/mig_fifo' was left over from some other aborted run of the test. I removed that file (which implies it might be nice if the test handled that automatically, instead of making me do it), and tried again; now I got the desired:
> 
> 199      pass       [17:00:34] [17:01:48]  74s
> Passed all 1 iotests
> 
> 
> After trying to rebase your series, I once again got failures, but that could mean I botched the rebase (since quite a few of the code patches earlier in the series were non-trivially changed).> If you send a v3 (which would be really nice!), I'd hoist this and 13/22 first in the series, to get to a point where testing 199 works, to then make it easier to demonstrate what the rest of the 199 enhancements do in relation to the non-iotest patches.  But I like that you separated the 199 improvements from the code - testing-wise, it's easy to apply the iotests patches first, make sure it fails, then apply the code patches, and make sure it passes, to prove that the enhanced test now covers what the code fixes did.
> 

A bit off-topic:

Yes, that's our usual scheme: test goes after bug-fix, so careful reviewers always have to apply patches in different order to check is there a real bug-fix.. And the only benefit of such scheme - not to break git bisect. Still, I think we can do better:

  - apply test first, just put it into "bug" group
  - check script should ignore "bug" group (unless it explicitly specified, or direct test run)
  - bug-fix patch removes test from "bug" group

So bisect is not broken and we achieve native ordering: problem exists (test fails) -> fixing -> no problem (test pass).

I think, I'll add "bug" group as a follow up to my "[PATCH v4 0/9] Rework iotests/check", which I really hope will land one day.

PS: I've successfully rebased the series, and test pass. I'll now fix all review notes and resend soon.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 14/22] qemu-iotests/199: better catch postcopy time
  2020-02-19 13:16   ` Andrey Shinkevich
  2020-02-19 15:44     ` Vladimir Sementsov-Ogievskiy
@ 2020-07-24  6:50     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-24  6:50 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-devel
  Cc: Kevin Wolf, Max Reitz, dgilbert, qemu-block, quintela

19.02.2020 16:16, Andrey Shinkevich wrote:
> On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
>> The test aims to test _postcopy_ migration, and wants to do some write
>> operations during postcopy time.
>>
>> Test considers migrate status=complete event on source as start of
>> postcopy. This is completely wrong, completion is completion of the
>> whole migration process. Let's instead consider destination start as
>> start of postcopy, and use RESUME event for it.
>>
>> Next, as migration finish, let's use migration status=complete event on
>> target, as such method is closer to what libvirt or another user will
>> do, than tracking number of dirty-bitmaps.
>>
>> Finally, add a possibility to dump events for debug. And if
>> set debug to True, we see, that actual postcopy period is very small
>> relatively to the whole test duration time (~0.2 seconds to >40 seconds
>> for me). This means, that test is very inefficient in what it supposed
>> to do. Let's improve it in following commits.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   tests/qemu-iotests/199 | 72 +++++++++++++++++++++++++++++++++---------
>>   1 file changed, 57 insertions(+), 15 deletions(-)
>>
>> diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
>> index dda918450a..6599fc6fb4 100755
>> --- a/tests/qemu-iotests/199
>> +++ b/tests/qemu-iotests/199
>> @@ -20,17 +20,43 @@
>>   import os
>>   import iotests
>> -import time
>>   from iotests import qemu_img
>> +debug = False
>> +
>>   disk_a = os.path.join(iotests.test_dir, 'disk_a')
>>   disk_b = os.path.join(iotests.test_dir, 'disk_b')
>>   size = '256G'
>>   fifo = os.path.join(iotests.test_dir, 'mig_fifo')
>> +def event_seconds(event):
>> +    return event['timestamp']['seconds'] + \
>> +        event['timestamp']['microseconds'] / 1000000.0
>> +
>> +
>> +def event_dist(e1, e2):
>> +    return event_seconds(e2) - event_seconds(e1)
>> +
>> +
>>   class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>>       def tearDown(self):
> It's common to put the definition of setUp() ahead

Preexisting, I don't want to update it in this patch

> 
>> +        if debug:
>> +            self.vm_a_events += self.vm_a.get_qmp_events()
>> +            self.vm_b_events += self.vm_b.get_qmp_events()
>> +            for e in self.vm_a_events:
>> +                e['vm'] = 'SRC'
>> +            for e in self.vm_b_events:
>> +                e['vm'] = 'DST'
>> +            events = (self.vm_a_events + self.vm_b_events)
>> +            events = [(e['timestamp']['seconds'],
>> +                       e['timestamp']['microseconds'],
>> +                       e['vm'],
>> +                       e['event'],
>> +                       e.get('data', '')) for e in events]
>> +            for e in sorted(events):
>> +                print('{}.{:06} {} {} {}'.format(*e))
>> +
>>           self.vm_a.shutdown()
>>           self.vm_b.shutdown()
>>           os.remove(disk_a)
>> @@ -47,6 +73,10 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>>           self.vm_a.launch()
>>           self.vm_b.launch()
>> +        # collect received events for debug
>> +        self.vm_a_events = []
>> +        self.vm_b_events = []
>> +
>>       def test_postcopy(self):
>>           write_size = 0x40000000
>>           granularity = 512
>> @@ -77,15 +107,13 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>>               self.vm_a.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
>>               s += 0x10000
>> -        bitmaps_cap = {'capability': 'dirty-bitmaps', 'state': True}
>> -        events_cap = {'capability': 'events', 'state': True}
>> +        caps = [{'capability': 'dirty-bitmaps', 'state': True},
> The name "capabilities" would be an appropriate identifier.

This will result in following lines growing and not fit into one line. I'll leave
"caps". Also, they are called "caps" in iotest 169 and in migration.c. And here in the
context always used together with full word ('capability': or capabilities=).

> 
>> +                {'capability': 'events', 'state': True}]
>> -        result = self.vm_a.qmp('migrate-set-capabilities',
>> -                               capabilities=[bitmaps_cap, events_cap])
>> +        result = self.vm_a.qmp('migrate-set-capabilities', capabilities=caps)
>>           self.assert_qmp(result, 'return', {})
>> -        result = self.vm_b.qmp('migrate-set-capabilities',
>> -                               capabilities=[bitmaps_cap])
>> +        result = self.vm_b.qmp('migrate-set-capabilities', capabilities=caps)
>>           self.assert_qmp(result, 'return', {})
>>           result = self.vm_a.qmp('migrate', uri='exec:cat>' + fifo)
>> @@ -94,24 +122,38 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>>           result = self.vm_a.qmp('migrate-start-postcopy')
>>           self.assert_qmp(result, 'return', {})
>> -        while True:
>> -            event = self.vm_a.event_wait('MIGRATION')
>> -            if event['data']['status'] == 'completed':
>> -                break
>> +        e_resume = self.vm_b.event_wait('RESUME')
> "event_resume" gives a faster understanding

OK, no problem

> 
>> +        self.vm_b_events.append(e_resume)
>>           s = 0x8000
>>           while s < write_size:
>>               self.vm_b.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
>>               s += 0x10000
>> +        match = {'data': {'status': 'completed'}}
>> +        e_complete = self.vm_b.event_wait('MIGRATION', match=match)
> "event_complete" also

OK

> 
>> +        self.vm_b_events.append(e_complete)
>> +
>> +        # take queued event, should already been happened
>> +        e_stop = self.vm_a.event_wait('STOP')
> "event_stop"

OK

> 
>> +        self.vm_a_events.append(e_stop)
>> +
>> +        downtime = event_dist(e_stop, e_resume)
>> +        postcopy_time = event_dist(e_resume, e_complete)
>> +
>> +        # TODO: assert downtime * 10 < postcopy_time
> 
> I got the results below in debug mode:
> 
> downtime: 6.194924831390381
> postcopy_time: 0.1592559814453125
> 1582102669.764919 SRC MIGRATION {'status': 'setup'}
> 1582102669.766179 SRC MIGRATION_PASS {'pass': 1}
> 1582102669.766234 SRC MIGRATION {'status': 'active'}
> 1582102669.768058 DST MIGRATION {'status': 'active'}
> 1582102669.801422 SRC MIGRATION {'status': 'postcopy-active'}
> 1582102669.801510 SRC STOP
> 1582102675.990041 DST MIGRATION {'status': 'postcopy-active'}
> 1582102675.996435 DST RESUME
> 1582102676.111313 SRC MIGRATION {'status': 'completed'}
> 1582102676.155691 DST MIGRATION {'status': 'completed'}
> 
>> +        if debug:
> with no usage in the following patches, you can put the whole block of relative code above under the "if debug: section
> 
>> +            print('downtime:', downtime)
>> +            print('postcopy_time:', postcopy_time)
>> +
>> +        # Assert that bitmap migration is finished (check that successor bitmap
>> +        # is removed)
>>           result = self.vm_b.qmp('query-block')
>> -        while len(result['return'][0]['dirty-bitmaps']) > 1:
>> -            time.sleep(2)
>> -            result = self.vm_b.qmp('query-block')
>> +        assert len(result['return'][0]['dirty-bitmaps']) == 1
>> +        # Check content of migrated (and updated by new writes) bitmap
>>           result = self.vm_b.qmp('x-debug-block-dirty-bitmap-sha256',
>>                                  node='drive0', name='bitmap')
>> -
>>           self.assert_qmp(result, 'return/sha256', sha256)
>>
> 
> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>

Thanks!

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 09/22] migration/block-dirty-bitmap: relax error handling in incoming part
  2020-02-19 15:34     ` Vladimir Sementsov-Ogievskiy
@ 2020-07-24  7:23       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-24  7:23 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-devel
  Cc: Fam Zheng, qemu-block, quintela, dgilbert, Stefan Hajnoczi, John Snow

19.02.2020 18:34, Vladimir Sementsov-Ogievskiy wrote:
> 18.02.2020 21:54, Andrey Shinkevich wrote:
>>
>>
>> On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
>>> Bitmaps data is not critical, and we should not fail the migration (or
>>> use postcopy recovering) because of dirty-bitmaps migration failure.
>>> Instead we should just lose unfinished bitmaps.
>>>
>>> Still we have to report io stream violation errors, as they affect the
>>> whole migration stream.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   migration/block-dirty-bitmap.c | 148 +++++++++++++++++++++++++--------
>>>   1 file changed, 113 insertions(+), 35 deletions(-)
>>>
>>> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
>>> index 1329db8d7d..aea5326804 100644
>>> --- a/migration/block-dirty-bitmap.c
>>> +++ b/migration/block-dirty-bitmap.c
>>> @@ -145,6 +145,15 @@ typedef struct DBMLoadState {
>>>       bool before_vm_start_handled; /* set in dirty_bitmap_mig_before_vm_start */
>>> +    /*
>>> +     * cancelled
>>> +     * Incoming migration is cancelled for some reason. That means that we
>>> +     * still should read our chunks from migration stream, to not affect other
>>> +     * migration objects (like RAM), but just ignore them and do not touch any
>>> +     * bitmaps or nodes.
>>> +     */
>>> +    bool cancelled;
>>> +
>>>       GSList *bitmaps;
>>>       QemuMutex lock; /* protect bitmaps */
>>>   } DBMLoadState;
>>> @@ -545,13 +554,47 @@ void dirty_bitmap_mig_before_vm_start(void)
>>>       qemu_mutex_unlock(&s->lock);
>>>   }
>>> +static void cancel_incoming_locked(DBMLoadState *s)
>>> +{
>>> +    GSList *item;
>>> +
>>> +    if (s->cancelled) {
>>> +        return;
>>> +    }
>>> +
>>> +    s->cancelled = true;
>>> +    s->bs = NULL;
>>> +    s->bitmap = NULL;
>>> +
>>> +    /* Drop all unfinished bitmaps */
>>> +    for (item = s->bitmaps; item; item = g_slist_next(item)) {
>>> +        LoadBitmapState *b = item->data;
>>> +
>>> +        /*
>>> +         * Bitmap must be unfinished, as finished bitmaps should already be
>>> +         * removed from the list.
>>> +         */
>>> +        assert(!s->before_vm_start_handled || !b->migrated);
>>> +        if (bdrv_dirty_bitmap_has_successor(b->bitmap)) {
>>> +            bdrv_reclaim_dirty_bitmap(b->bitmap, &error_abort);
>>> +        }
>>> +        bdrv_release_dirty_bitmap(b->bitmap);
>>> +    }
>>> +
>>> +    g_slist_free_full(s->bitmaps, g_free);
>>> +    s->bitmaps = NULL;
>>> +}
>>> +
>>>   static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>>>   {
>>>       GSList *item;
>>>       trace_dirty_bitmap_load_complete();
>>> -    bdrv_dirty_bitmap_deserialize_finish(s->bitmap);
>>> -    qemu_mutex_lock(&s->lock);
>>
>> Why is it safe to remove the critical section?
> 
> It's not removed, it becomes wider in this patch.
> 
>>
>>> +    if (s->cancelled) {
>>> +        return;
>>> +    }
>>> +
>>> +    bdrv_dirty_bitmap_deserialize_finish(s->bitmap);
>>>       if (bdrv_dirty_bitmap_has_successor(s->bitmap)) {
>>>           bdrv_reclaim_dirty_bitmap(s->bitmap, &error_abort);
>>> @@ -569,8 +612,6 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>>>               break;
>>>           }
>>>       }
>>> -
>>> -    qemu_mutex_unlock(&s->lock);
>>>   }
>>>   static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
>>> @@ -582,15 +623,32 @@ static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
>>>       if (s->flags & DIRTY_BITMAP_MIG_FLAG_ZEROES) {
>>>           trace_dirty_bitmap_load_bits_zeroes();
>>> -        bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_byte, nr_bytes,
>>> -                                             false);
>>> +        if (!s->cancelled) {
>>> +            bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_byte,
>>> +                                                 nr_bytes, false);
>>> +        }
>>>       } else {
>>>           size_t ret;
>>>           uint8_t *buf;
>>>           uint64_t buf_size = qemu_get_be64(f);
>>> -        uint64_t needed_size =
>>> -            bdrv_dirty_bitmap_serialization_size(s->bitmap,
>>> -                                                 first_byte, nr_bytes);
>>> +        uint64_t needed_size;
>>> +
>>> +        buf = g_malloc(buf_size);
>>> +        ret = qemu_get_buffer(f, buf, buf_size);
>>> +        if (ret != buf_size) {
>>> +            error_report("Failed to read bitmap bits");
>>> +            g_free(buf);
>>> +            return -EIO;
>>> +        }
>>> +
>>> +        if (s->cancelled) {
>>> +            g_free(buf);
>>> +            return 0;
>>> +        }
>>> +
>>> +        needed_size = bdrv_dirty_bitmap_serialization_size(s->bitmap,
>>> +                                                           first_byte,
>>> +                                                           nr_bytes);
>>>           if (needed_size > buf_size ||
>>>               buf_size > QEMU_ALIGN_UP(needed_size, 4 * sizeof(long))
>>> @@ -599,15 +657,8 @@ static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
>>>               error_report("Migrated bitmap granularity doesn't "
>>>                            "match the destination bitmap '%s' granularity",
>>>                            bdrv_dirty_bitmap_name(s->bitmap));
>>> -            return -EINVAL;
>>> -        }
>>> -
>>> -        buf = g_malloc(buf_size);
>>> -        ret = qemu_get_buffer(f, buf, buf_size);
>>> -        if (ret != buf_size) {
>>> -            error_report("Failed to read bitmap bits");
>>> -            g_free(buf);
>>> -            return -EIO;
>>> +            cancel_incoming_locked(s);
>>
>>                 /* Continue the VM migration as bitmaps data are not critical */
> 
> Hmm yes it what this patch does.. But I don't think we should add comment to each call of cancel_..()
> 
>>
>>> +            return 0;
>>>           }
>>>           bdrv_dirty_bitmap_deserialize_part(s->bitmap, buf, first_byte, nr_bytes,
>>> @@ -632,14 +683,16 @@ static int dirty_bitmap_load_header(QEMUFile *f, DBMLoadState *s)
>>>               error_report("Unable to read node name string");
>>>               return -EINVAL;
>>>           }
>>> -        s->bs = bdrv_lookup_bs(s->node_name, s->node_name, &local_err);
>>> -        if (!s->bs) {
>>> -            error_report_err(local_err);
>>> -            return -EINVAL;
>>> +        if (!s->cancelled) {
>>> +            s->bs = bdrv_lookup_bs(s->node_name, s->node_name, &local_err);
>>> +            if (!s->bs) {
>>> +                error_report_err(local_err);
>>
>> The error message may be supplemented with a report about the canceled bitmap migration. Also down there at cancel_incoming_locked(s).

If we want to log something about cancelling, I think it should be done in cancel_incoming_locked(). I'll keep this patch as is.

>>
>>> +                cancel_incoming_locked(s);
>>> +            }
>>>           }
>>> -    } else if (!s->bs && !nothing) {
>>> +    } else if (!s->bs && !nothing && !s->cancelled) {
>>>           error_report("Error: block device name is not set");
>>> -        return -EINVAL;
>>> +        cancel_incoming_locked(s);
>>>       }
>>>       if (s->flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
>>> @@ -647,24 +700,38 @@ static int dirty_bitmap_load_header(QEMUFile *f, DBMLoadState *s)
>>>               error_report("Unable to read bitmap name string");
>>>               return -EINVAL;
>>>           }
>>> -        s->bitmap = bdrv_find_dirty_bitmap(s->bs, s->bitmap_name);
>>> -
>>> -        /* bitmap may be NULL here, it wouldn't be an error if it is the
>>> -         * first occurrence of the bitmap */
>>> -        if (!s->bitmap && !(s->flags & DIRTY_BITMAP_MIG_FLAG_START)) {
>>> -            error_report("Error: unknown dirty bitmap "
>>> -                         "'%s' for block device '%s'",
>>> -                         s->bitmap_name, s->node_name);
>>> -            return -EINVAL;
>>> +        if (!s->cancelled) {
>>> +            s->bitmap = bdrv_find_dirty_bitmap(s->bs, s->bitmap_name);
>>> +
>>> +            /*
>>> +             * bitmap may be NULL here, it wouldn't be an error if it is the
>>> +             * first occurrence of the bitmap
>>> +             */
>>> +            if (!s->bitmap && !(s->flags & DIRTY_BITMAP_MIG_FLAG_START)) {
>>> +                error_report("Error: unknown dirty bitmap "
>>> +                             "'%s' for block device '%s'",
>>> +                             s->bitmap_name, s->node_name);
>>> +                cancel_incoming_locked(s);
>>> +            }
>>>           }
>>> -    } else if (!s->bitmap && !nothing) {
>>> +    } else if (!s->bitmap && !nothing && !s->cancelled) {
>>>           error_report("Error: block device name is not set");
>>> -        return -EINVAL;
>>> +        cancel_incoming_locked(s);
>>>       }
>>>       return 0;
>>>   }
>>> +/*
>>> + * dirty_bitmap_load
>>> + *
>>> + * Load sequence of dirty bitmap chunks. Return error only on fatal io stream
>>> + * violations. On other errors just cancel bitmaps incoming migration and return
>>> + * 0.
>>> + *
>>> + * Note, than when incoming bitmap migration is canceled, we still must read all
>> "than (that)" may be omitted
>>
>>> + * our chunks (and just ignore them), to not affect other migration objects.
>>> + */
>>>   static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
>>>   {
>>>       DBMLoadState *s = &((DBMState *)opaque)->load;
>>> @@ -673,12 +740,19 @@ static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
>>>       trace_dirty_bitmap_load_enter();
>>>       if (version_id != 1) {
>>> +        qemu_mutex_lock(&s->lock);
>>> +        cancel_incoming_locked(s);
>>> +        qemu_mutex_unlock(&s->lock);
>>>           return -EINVAL;
>>>       }
>>>       do {
>>> +        qemu_mutex_lock(&s->lock);
>>> +
>>>           ret = dirty_bitmap_load_header(f, s);
>>>           if (ret < 0) {
>>> +            cancel_incoming_locked(s);
>>> +            qemu_mutex_unlock(&s->lock);
>>>               return ret;
>>>           }
>>> @@ -695,8 +769,12 @@ static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
>>>           }
>>>           if (ret) {
>>> +            cancel_incoming_locked(s);
>>> +            qemu_mutex_unlock(&s->lock);
>>>               return ret;
>>>           }
>>> +
>>> +        qemu_mutex_unlock(&s->lock);
>>>       } while (!(s->flags & DIRTY_BITMAP_MIG_FLAG_EOS));
>>>       trace_dirty_bitmap_load_success();
>>>
>>
>> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> 
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 22/22] qemu-iotests/199: add source-killed case to bitmaps postcopy
  2020-02-19 17:15   ` Andrey Shinkevich
@ 2020-07-24  7:50     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 80+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-24  7:50 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-devel
  Cc: Kevin Wolf, Max Reitz, dgilbert, qemu-block, quintela

19.02.2020 20:15, Andrey Shinkevich wrote:
> On 17/02/2020 18:02, Vladimir Sementsov-Ogievskiy wrote:
>> Previous patches fixes behavior of bitmaps migration, so that errors
>> are handled by just removing unfinished bitmaps, and not fail or try to
>> recover postcopy migration. Add corresponding test.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   tests/qemu-iotests/199     | 15 +++++++++++++++
>>   tests/qemu-iotests/199.out |  4 ++--
>>   2 files changed, 17 insertions(+), 2 deletions(-)
>>
>> diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
>> index 0d12e6b1ae..d38913fa44 100755
>> --- a/tests/qemu-iotests/199
>> +++ b/tests/qemu-iotests/199
>> @@ -235,6 +235,21 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
>>           self.vm_a.launch()
>>           check_bitmaps(self.vm_a, 0)
>> +    def test_early_kill_source(self):
>> +        self.start_postcopy()
>> +
>> +        self.vm_a_events = self.vm_a.get_qmp_events()
>> +        self.vm_a.kill()
>> +
>> +        self.vm_a.launch()
>> +
>> +        match = {'data': {'status': 'completed'}}
>> +        e_complete = self.vm_b.event_wait('MIGRATION', match=match)
> 
> A failed migration gets the status 'completed'. That misleads a user but is not in the scope of this series, I guess.

It's not failed. Only bitmaps are not migrated, which is not a problem.. Probably we should invent some additional status or QAPI event for this, but yes, not in this series.

> 
>> +        self.vm_b_events.append(e_complete)
>> +
>> +        check_bitmaps(self.vm_a, 0)
>> +        check_bitmaps(self.vm_b, 0)
>> +
>>   if __name__ == '__main__':
>>       iotests.main(supported_fmts=['qcow2'])
>> diff --git a/tests/qemu-iotests/199.out b/tests/qemu-iotests/199.out
>> index fbc63e62f8..8d7e996700 100644
>> --- a/tests/qemu-iotests/199.out
>> +++ b/tests/qemu-iotests/199.out
>> @@ -1,5 +1,5 @@
>> -..
>> +...
>>   ----------------------------------------------------------------------
>> -Ran 2 tests
>> +Ran 3 tests
>>   OK
>>
> 
> The updated test passed.
> 
> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2020-07-24  7:51 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-17 15:02 [PATCH v2 00/22] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
2020-02-17 15:02 ` [PATCH v2 01/22] migration/block-dirty-bitmap: fix dirty_bitmap_mig_before_vm_start Vladimir Sementsov-Ogievskiy
2020-02-18  9:44   ` Andrey Shinkevich
2020-02-17 15:02 ` [PATCH v2 02/22] migration/block-dirty-bitmap: rename state structure types Vladimir Sementsov-Ogievskiy
2020-07-23 20:50   ` Eric Blake
2020-02-17 15:02 ` [PATCH v2 03/22] migration/block-dirty-bitmap: rename dirty_bitmap_mig_cleanup Vladimir Sementsov-Ogievskiy
2020-02-18 11:00   ` Andrey Shinkevich
2020-02-19 14:20     ` Vladimir Sementsov-Ogievskiy
2020-07-23 20:54       ` Eric Blake
2020-02-17 15:02 ` [PATCH v2 04/22] migration/block-dirty-bitmap: move mutex init to dirty_bitmap_mig_init Vladimir Sementsov-Ogievskiy
2020-02-18 11:28   ` Andrey Shinkevich
2020-02-17 15:02 ` [PATCH v2 05/22] migration/block-dirty-bitmap: refactor state global variables Vladimir Sementsov-Ogievskiy
2020-02-18 13:05   ` Andrey Shinkevich
2020-02-19 15:29     ` Vladimir Sementsov-Ogievskiy
2020-02-17 15:02 ` [PATCH v2 06/22] migration/block-dirty-bitmap: rename finish_lock to just lock Vladimir Sementsov-Ogievskiy
2020-02-18 13:20   ` Andrey Shinkevich
2020-02-17 15:02 ` [PATCH v2 07/22] migration/block-dirty-bitmap: simplify dirty_bitmap_load_complete Vladimir Sementsov-Ogievskiy
2020-02-18 14:26   ` Andrey Shinkevich
2020-02-19 15:30     ` Vladimir Sementsov-Ogievskiy
2020-02-19 16:14       ` Vladimir Sementsov-Ogievskiy
2020-02-17 15:02 ` [PATCH v2 08/22] migration/block-dirty-bitmap: keep bitmap state for all bitmaps Vladimir Sementsov-Ogievskiy
2020-02-18 17:07   ` Andrey Shinkevich
2020-07-23 21:30   ` Eric Blake
2020-07-24  5:18     ` Vladimir Sementsov-Ogievskiy
2020-02-17 15:02 ` [PATCH v2 09/22] migration/block-dirty-bitmap: relax error handling in incoming part Vladimir Sementsov-Ogievskiy
2020-02-18 18:54   ` Andrey Shinkevich
2020-02-19 15:34     ` Vladimir Sementsov-Ogievskiy
2020-07-24  7:23       ` Vladimir Sementsov-Ogievskiy
2020-02-17 15:02 ` [PATCH v2 10/22] migration/block-dirty-bitmap: cancel migration on shutdown Vladimir Sementsov-Ogievskiy
2020-02-18 19:11   ` Andrey Shinkevich
2020-07-23 21:04   ` Eric Blake
2020-02-17 15:02 ` [PATCH v2 11/22] migration/savevm: don't worry if bitmap migration postcopy failed Vladimir Sementsov-Ogievskiy
2020-02-17 16:57   ` Dr. David Alan Gilbert
2020-02-18 19:44   ` Andrey Shinkevich
2020-02-17 15:02 ` [PATCH v2 12/22] qemu-iotests/199: fix style Vladimir Sementsov-Ogievskiy
2020-02-19  7:04   ` Andrey Shinkevich
2020-07-23 22:03   ` Eric Blake
2020-07-24  6:32     ` Vladimir Sementsov-Ogievskiy
2020-02-17 15:02 ` [PATCH v2 13/22] qemu-iotests/199: drop extra constraints Vladimir Sementsov-Ogievskiy
2020-02-19  8:02   ` Andrey Shinkevich
2020-02-17 15:02 ` [PATCH v2 14/22] qemu-iotests/199: better catch postcopy time Vladimir Sementsov-Ogievskiy
2020-02-19 13:16   ` Andrey Shinkevich
2020-02-19 15:44     ` Vladimir Sementsov-Ogievskiy
2020-07-24  6:50     ` Vladimir Sementsov-Ogievskiy
2020-02-17 15:02 ` [PATCH v2 15/22] qemu-iotests/199: improve performance: set bitmap by discard Vladimir Sementsov-Ogievskiy
2020-02-19 14:17   ` Andrey Shinkevich
2020-02-17 15:02 ` [PATCH v2 16/22] qemu-iotests/199: change discard patterns Vladimir Sementsov-Ogievskiy
2020-02-19 14:33   ` Andrey Shinkevich
2020-02-19 14:44     ` Andrey Shinkevich
2020-02-19 15:46     ` Vladimir Sementsov-Ogievskiy
2020-07-24  0:23   ` Eric Blake
2020-02-17 15:02 ` [PATCH v2 17/22] qemu-iotests/199: increase postcopy period Vladimir Sementsov-Ogievskiy
2020-02-19 14:56   ` Andrey Shinkevich
2020-07-24  0:14   ` Eric Blake
2020-02-17 15:02 ` [PATCH v2 18/22] python/qemu/machine: add kill() method Vladimir Sementsov-Ogievskiy
2020-02-19 17:00   ` Andrey Shinkevich
2020-05-29 10:09   ` Philippe Mathieu-Daudé
2020-02-17 15:02 ` [PATCH v2 19/22] qemu-iotests/199: prepare for new test-cases addition Vladimir Sementsov-Ogievskiy
2020-02-19 16:10   ` Andrey Shinkevich
2020-02-17 15:02 ` [PATCH v2 20/22] qemu-iotests/199: check persistent bitmaps Vladimir Sementsov-Ogievskiy
2020-02-19 16:28   ` Andrey Shinkevich
2020-02-17 15:02 ` [PATCH v2 21/22] qemu-iotests/199: add early shutdown case to bitmaps postcopy Vladimir Sementsov-Ogievskiy
2020-02-19 16:48   ` Andrey Shinkevich
2020-02-19 16:50   ` Andrey Shinkevich
2020-02-17 15:02 ` [PATCH v2 22/22] qemu-iotests/199: add source-killed " Vladimir Sementsov-Ogievskiy
2020-02-19 17:15   ` Andrey Shinkevich
2020-07-24  7:50     ` Vladimir Sementsov-Ogievskiy
2020-02-17 19:31 ` [PATCH v2 00/22] Fix error handling during bitmap postcopy no-reply
2020-02-18 20:02 ` Andrey Shinkevich
2020-02-18 20:57   ` Eric Blake
2020-02-19 13:25     ` Andrey Shinkevich
2020-02-19 13:36       ` Eric Blake
2020-02-19 13:52         ` Andrey Shinkevich
2020-02-19 14:58           ` Eric Blake
2020-02-19 17:22             ` Andrey Shinkevich
2020-02-19 14:00         ` Eric Blake
2020-04-02  7:42 ` Vladimir Sementsov-Ogievskiy
2020-05-29 11:58   ` Eric Blake
2020-05-29 12:16     ` Vladimir Sementsov-Ogievskiy
2020-07-23 20:39       ` Eric Blake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.