All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 for-5.1?? 00/21] Fix error handling during bitmap postcopy
@ 2020-07-24  8:43 Vladimir Sementsov-Ogievskiy
  2020-07-24  8:43 ` [PATCH v3 01/21] qemu-iotests/199: fix style Vladimir Sementsov-Ogievskiy
                   ` (20 more replies)
  0 siblings, 21 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-24  8:43 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, fam, vsementsov, quintela, qemu-devel, dgilbert, stefanha,
	andrey.shinkevich, den, mreitz, jsnow

Hi all!

I'm resending this now, as Eric started to review v2 and it occurs
outdated.

Last time it was somehow postponed, and I thought that I'll to
rebase it onto Max's
"[PATCH v2 0/3] migration: Add block-bitmap-mapping parameter"

Of course, if this goes into 5.1, Max's series will need a rebase, sorry
for this.

So we need to decide now, do we really want this in 5.1?

pros: it fixes real problems
cons: - it affects migration in invasive way. Still, it may be not bad,
    if we double check that it don't affect migration when dirty-bitmaps
    migration capability is disabled (the default).
      - seems, there are at least one more crash which Max found. I
      don't remember now, if I reproduced it on top of my series or
      not...

If we decide, that it goes to 5.2, than again, let's decide which series
goes first. I'm OK either way, and can rebase this series, or rebase
Max's series myself, if he will not have time (I fill responsive for all
this mess, sorry).

Max, please look at this series if you have some time, you are familiar
with code now.

====

Original idea of bitmaps postcopy migration is that bitmaps are non
critical data, and their loss is not serious problem. So, using postcopy
method on any failure we should just drop unfinished bitmaps and
continue guest execution.

However, it doesn't work so. It crashes, fails, it goes to
postcopy-recovery feature. It does anything except for behavior we want.
These series fixes at least some problems with error handling during
bitmaps migration postcopy.

====

v3:
- 199-iotest improvements moved to the beginning of the series, v2:0018 already merged upstream.
- add r-b and t-b marks by Andrey and Eric

03: rename same variables [Andrey] 
05,06: update commit msg
08: some changes due to rebase, still, keep Eric's and Andrey's r-bs
11: rebased, drop r-b
14: s/,/;/
15: handle cancelled at start of dirty_bitmap_load_start()
17: Modify error message a bit, keep r-bs
18: Rebased on 03 changes
20: add comment

v2:

Most of patches are new or changed a lot.
Only patches 06,07 mostly unchanged, just rebased on refactorings.

v1 was "[PATCH 0/7] Fix crashes on early shutdown during bitmaps postcopy"

Vladimir Sementsov-Ogievskiy (21):
  qemu-iotests/199: fix style
  qemu-iotests/199: drop extra constraints
  qemu-iotests/199: better catch postcopy time
  qemu-iotests/199: improve performance: set bitmap by discard
  qemu-iotests/199: change discard patterns
  qemu-iotests/199: increase postcopy period
  migration/block-dirty-bitmap: fix dirty_bitmap_mig_before_vm_start
  migration/block-dirty-bitmap: rename state structure types
  migration/block-dirty-bitmap: rename dirty_bitmap_mig_cleanup
  migration/block-dirty-bitmap: move mutex init to dirty_bitmap_mig_init
  migration/block-dirty-bitmap: refactor state global variables
  migration/block-dirty-bitmap: rename finish_lock to just lock
  migration/block-dirty-bitmap: simplify dirty_bitmap_load_complete
  migration/block-dirty-bitmap: keep bitmap state for all bitmaps
  migration/block-dirty-bitmap: relax error handling in incoming part
  migration/block-dirty-bitmap: cancel migration on shutdown
  migration/savevm: don't worry if bitmap migration postcopy failed
  qemu-iotests/199: prepare for new test-cases addition
  qemu-iotests/199: check persistent bitmaps
  qemu-iotests/199: add early shutdown case to bitmaps postcopy
  qemu-iotests/199: add source-killed case to bitmaps postcopy

 migration/migration.h          |   3 +-
 migration/block-dirty-bitmap.c | 458 +++++++++++++++++++++------------
 migration/migration.c          |  15 +-
 migration/savevm.c             |  37 ++-
 tests/qemu-iotests/199         | 250 ++++++++++++++----
 tests/qemu-iotests/199.out     |   4 +-
 6 files changed, 535 insertions(+), 232 deletions(-)

-- 
2.21.0



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v3 01/21] qemu-iotests/199: fix style
  2020-07-24  8:43 [PATCH v3 for-5.1?? 00/21] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
@ 2020-07-24  8:43 ` Vladimir Sementsov-Ogievskiy
  2020-07-24 15:03   ` Eric Blake
  2020-07-24  8:43 ` [PATCH v3 02/21] qemu-iotests/199: drop extra constraints Vladimir Sementsov-Ogievskiy
                   ` (19 subsequent siblings)
  20 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-24  8:43 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, fam, vsementsov, quintela, qemu-devel, dgilbert, stefanha,
	andrey.shinkevich, den, mreitz, jsnow

Mostly, satisfy pep8 complains.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Tested-by: Eric Blake <eblake@redhat.com>
---
 tests/qemu-iotests/199 | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index 40774eed74..de9ba8d94c 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -28,8 +28,8 @@ disk_b = os.path.join(iotests.test_dir, 'disk_b')
 size = '256G'
 fifo = os.path.join(iotests.test_dir, 'mig_fifo')
 
-class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 
+class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
     def tearDown(self):
         self.vm_a.shutdown()
         self.vm_b.shutdown()
@@ -54,7 +54,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 
         result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
                                name='bitmap', granularity=granularity)
-        self.assert_qmp(result, 'return', {});
+        self.assert_qmp(result, 'return', {})
 
         s = 0
         while s < write_size:
@@ -71,7 +71,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 
         result = self.vm_a.qmp('block-dirty-bitmap-clear', node='drive0',
                                name='bitmap')
-        self.assert_qmp(result, 'return', {});
+        self.assert_qmp(result, 'return', {})
         s = 0
         while s < write_size:
             self.vm_a.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
@@ -104,15 +104,16 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
             self.vm_b.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
             s += 0x10000
 
-        result = self.vm_b.qmp('query-block');
+        result = self.vm_b.qmp('query-block')
         while len(result['return'][0]['dirty-bitmaps']) > 1:
             time.sleep(2)
-            result = self.vm_b.qmp('query-block');
+            result = self.vm_b.qmp('query-block')
 
         result = self.vm_b.qmp('x-debug-block-dirty-bitmap-sha256',
                                node='drive0', name='bitmap')
 
-        self.assert_qmp(result, 'return/sha256', sha256);
+        self.assert_qmp(result, 'return/sha256', sha256)
+
 
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2'], supported_cache_modes=['none'],
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 02/21] qemu-iotests/199: drop extra constraints
  2020-07-24  8:43 [PATCH v3 for-5.1?? 00/21] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
  2020-07-24  8:43 ` [PATCH v3 01/21] qemu-iotests/199: fix style Vladimir Sementsov-Ogievskiy
@ 2020-07-24  8:43 ` Vladimir Sementsov-Ogievskiy
  2020-07-24  8:43 ` [PATCH v3 03/21] qemu-iotests/199: better catch postcopy time Vladimir Sementsov-Ogievskiy
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-24  8:43 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, fam, vsementsov, quintela, qemu-devel, dgilbert, stefanha,
	andrey.shinkevich, den, mreitz, jsnow

We don't need any specific format constraints here. Still keep qcow2
for two reasons:
1. No extra calls of format-unrelated test
2. Add some check around persistent bitmap in future (require qcow2)

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Tested-by: Eric Blake <eblake@redhat.com>
---
 tests/qemu-iotests/199 | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index de9ba8d94c..dda918450a 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -116,5 +116,4 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 
 
 if __name__ == '__main__':
-    iotests.main(supported_fmts=['qcow2'], supported_cache_modes=['none'],
-                 supported_protocols=['file'])
+    iotests.main(supported_fmts=['qcow2'])
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 03/21] qemu-iotests/199: better catch postcopy time
  2020-07-24  8:43 [PATCH v3 for-5.1?? 00/21] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
  2020-07-24  8:43 ` [PATCH v3 01/21] qemu-iotests/199: fix style Vladimir Sementsov-Ogievskiy
  2020-07-24  8:43 ` [PATCH v3 02/21] qemu-iotests/199: drop extra constraints Vladimir Sementsov-Ogievskiy
@ 2020-07-24  8:43 ` Vladimir Sementsov-Ogievskiy
  2020-07-24  8:43 ` [PATCH v3 04/21] qemu-iotests/199: improve performance: set bitmap by discard Vladimir Sementsov-Ogievskiy
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-24  8:43 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, fam, vsementsov, quintela, qemu-devel, dgilbert, stefanha,
	andrey.shinkevich, den, mreitz, jsnow

The test aims to test _postcopy_ migration, and wants to do some write
operations during postcopy time.

Test considers migrate status=complete event on source as start of
postcopy. This is completely wrong, completion is completion of the
whole migration process. Let's instead consider destination start as
start of postcopy, and use RESUME event for it.

Next, as migration finish, let's use migration status=complete event on
target, as such method is closer to what libvirt or another user will
do, than tracking number of dirty-bitmaps.

Finally, add a possibility to dump events for debug. And if
set debug to True, we see, that actual postcopy period is very small
relatively to the whole test duration time (~0.2 seconds to >40 seconds
for me). This means, that test is very inefficient in what it supposed
to do. Let's improve it in following commits.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Tested-by: Eric Blake <eblake@redhat.com>
---
 tests/qemu-iotests/199 | 72 +++++++++++++++++++++++++++++++++---------
 1 file changed, 57 insertions(+), 15 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index dda918450a..dd6044768c 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -20,17 +20,43 @@
 
 import os
 import iotests
-import time
 from iotests import qemu_img
 
+debug = False
+
 disk_a = os.path.join(iotests.test_dir, 'disk_a')
 disk_b = os.path.join(iotests.test_dir, 'disk_b')
 size = '256G'
 fifo = os.path.join(iotests.test_dir, 'mig_fifo')
 
 
+def event_seconds(event):
+    return event['timestamp']['seconds'] + \
+        event['timestamp']['microseconds'] / 1000000.0
+
+
+def event_dist(e1, e2):
+    return event_seconds(e2) - event_seconds(e1)
+
+
 class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
     def tearDown(self):
+        if debug:
+            self.vm_a_events += self.vm_a.get_qmp_events()
+            self.vm_b_events += self.vm_b.get_qmp_events()
+            for e in self.vm_a_events:
+                e['vm'] = 'SRC'
+            for e in self.vm_b_events:
+                e['vm'] = 'DST'
+            events = (self.vm_a_events + self.vm_b_events)
+            events = [(e['timestamp']['seconds'],
+                       e['timestamp']['microseconds'],
+                       e['vm'],
+                       e['event'],
+                       e.get('data', '')) for e in events]
+            for e in sorted(events):
+                print('{}.{:06} {} {} {}'.format(*e))
+
         self.vm_a.shutdown()
         self.vm_b.shutdown()
         os.remove(disk_a)
@@ -47,6 +73,10 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
         self.vm_a.launch()
         self.vm_b.launch()
 
+        # collect received events for debug
+        self.vm_a_events = []
+        self.vm_b_events = []
+
     def test_postcopy(self):
         write_size = 0x40000000
         granularity = 512
@@ -77,15 +107,13 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
             self.vm_a.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
             s += 0x10000
 
-        bitmaps_cap = {'capability': 'dirty-bitmaps', 'state': True}
-        events_cap = {'capability': 'events', 'state': True}
+        caps = [{'capability': 'dirty-bitmaps', 'state': True},
+                {'capability': 'events', 'state': True}]
 
-        result = self.vm_a.qmp('migrate-set-capabilities',
-                               capabilities=[bitmaps_cap, events_cap])
+        result = self.vm_a.qmp('migrate-set-capabilities', capabilities=caps)
         self.assert_qmp(result, 'return', {})
 
-        result = self.vm_b.qmp('migrate-set-capabilities',
-                               capabilities=[bitmaps_cap])
+        result = self.vm_b.qmp('migrate-set-capabilities', capabilities=caps)
         self.assert_qmp(result, 'return', {})
 
         result = self.vm_a.qmp('migrate', uri='exec:cat>' + fifo)
@@ -94,24 +122,38 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
         result = self.vm_a.qmp('migrate-start-postcopy')
         self.assert_qmp(result, 'return', {})
 
-        while True:
-            event = self.vm_a.event_wait('MIGRATION')
-            if event['data']['status'] == 'completed':
-                break
+        event_resume = self.vm_b.event_wait('RESUME')
+        self.vm_b_events.append(event_resume)
 
         s = 0x8000
         while s < write_size:
             self.vm_b.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
             s += 0x10000
 
+        match = {'data': {'status': 'completed'}}
+        event_complete = self.vm_b.event_wait('MIGRATION', match=match)
+        self.vm_b_events.append(event_complete)
+
+        # take queued event, should already been happened
+        event_stop = self.vm_a.event_wait('STOP')
+        self.vm_a_events.append(event_stop)
+
+        downtime = event_dist(event_stop, event_resume)
+        postcopy_time = event_dist(event_resume, event_complete)
+
+        # TODO: assert downtime * 10 < postcopy_time
+        if debug:
+            print('downtime:', downtime)
+            print('postcopy_time:', postcopy_time)
+
+        # Assert that bitmap migration is finished (check that successor bitmap
+        # is removed)
         result = self.vm_b.qmp('query-block')
-        while len(result['return'][0]['dirty-bitmaps']) > 1:
-            time.sleep(2)
-            result = self.vm_b.qmp('query-block')
+        assert len(result['return'][0]['dirty-bitmaps']) == 1
 
+        # Check content of migrated (and updated by new writes) bitmap
         result = self.vm_b.qmp('x-debug-block-dirty-bitmap-sha256',
                                node='drive0', name='bitmap')
-
         self.assert_qmp(result, 'return/sha256', sha256)
 
 
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 04/21] qemu-iotests/199: improve performance: set bitmap by discard
  2020-07-24  8:43 [PATCH v3 for-5.1?? 00/21] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2020-07-24  8:43 ` [PATCH v3 03/21] qemu-iotests/199: better catch postcopy time Vladimir Sementsov-Ogievskiy
@ 2020-07-24  8:43 ` Vladimir Sementsov-Ogievskiy
  2020-07-24  8:43 ` [PATCH v3 05/21] qemu-iotests/199: change discard patterns Vladimir Sementsov-Ogievskiy
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-24  8:43 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, fam, vsementsov, quintela, qemu-devel, dgilbert, stefanha,
	andrey.shinkevich, den, mreitz, jsnow

Discard dirties dirty-bitmap as well as write, but works faster. Let's
use it instead.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Tested-by: Eric Blake <eblake@redhat.com>
---
 tests/qemu-iotests/199 | 31 ++++++++++++++++++++-----------
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index dd6044768c..190e820b84 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -67,8 +67,10 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
         os.mkfifo(fifo)
         qemu_img('create', '-f', iotests.imgfmt, disk_a, size)
         qemu_img('create', '-f', iotests.imgfmt, disk_b, size)
-        self.vm_a = iotests.VM(path_suffix='a').add_drive(disk_a)
-        self.vm_b = iotests.VM(path_suffix='b').add_drive(disk_b)
+        self.vm_a = iotests.VM(path_suffix='a').add_drive(disk_a,
+                                                          'discard=unmap')
+        self.vm_b = iotests.VM(path_suffix='b').add_drive(disk_b,
+                                                          'discard=unmap')
         self.vm_b.add_incoming("exec: cat '" + fifo + "'")
         self.vm_a.launch()
         self.vm_b.launch()
@@ -78,7 +80,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
         self.vm_b_events = []
 
     def test_postcopy(self):
-        write_size = 0x40000000
+        discard_size = 0x40000000
         granularity = 512
         chunk = 4096
 
@@ -86,25 +88,32 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
                                name='bitmap', granularity=granularity)
         self.assert_qmp(result, 'return', {})
 
+        result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
+                               node='drive0', name='bitmap')
+        empty_sha256 = result['return']['sha256']
+
         s = 0
-        while s < write_size:
-            self.vm_a.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
+        while s < discard_size:
+            self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
             s += 0x10000
         s = 0x8000
-        while s < write_size:
-            self.vm_a.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
+        while s < discard_size:
+            self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
             s += 0x10000
 
         result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
                                node='drive0', name='bitmap')
         sha256 = result['return']['sha256']
 
+        # Check, that updating the bitmap by discards works
+        assert sha256 != empty_sha256
+
         result = self.vm_a.qmp('block-dirty-bitmap-clear', node='drive0',
                                name='bitmap')
         self.assert_qmp(result, 'return', {})
         s = 0
-        while s < write_size:
-            self.vm_a.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
+        while s < discard_size:
+            self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
             s += 0x10000
 
         caps = [{'capability': 'dirty-bitmaps', 'state': True},
@@ -126,8 +135,8 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
         self.vm_b_events.append(event_resume)
 
         s = 0x8000
-        while s < write_size:
-            self.vm_b.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
+        while s < discard_size:
+            self.vm_b.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
             s += 0x10000
 
         match = {'data': {'status': 'completed'}}
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 05/21] qemu-iotests/199: change discard patterns
  2020-07-24  8:43 [PATCH v3 for-5.1?? 00/21] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (3 preceding siblings ...)
  2020-07-24  8:43 ` [PATCH v3 04/21] qemu-iotests/199: improve performance: set bitmap by discard Vladimir Sementsov-Ogievskiy
@ 2020-07-24  8:43 ` Vladimir Sementsov-Ogievskiy
  2020-07-24  8:43 ` [PATCH v3 06/21] qemu-iotests/199: increase postcopy period Vladimir Sementsov-Ogievskiy
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-24  8:43 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, fam, vsementsov, quintela, qemu-devel, dgilbert, stefanha,
	andrey.shinkevich, den, mreitz, jsnow

iotest 199 works too long because of many discard operations. At the
same time, postcopy period is very short, in spite of all these
efforts.

So, let's use less discards (and with more interesting patterns) to
reduce test timing. In the next commit we'll increase postcopy period.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Tested-by: Eric Blake <eblake@redhat.com>
---
 tests/qemu-iotests/199 | 44 +++++++++++++++++++++++++-----------------
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index 190e820b84..da4dae01fb 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -30,6 +30,28 @@ size = '256G'
 fifo = os.path.join(iotests.test_dir, 'mig_fifo')
 
 
+GiB = 1024 * 1024 * 1024
+
+discards1 = (
+    (0, GiB),
+    (2 * GiB + 512 * 5, 512),
+    (3 * GiB + 512 * 5, 512),
+    (100 * GiB, GiB)
+)
+
+discards2 = (
+    (3 * GiB + 512 * 8, 512),
+    (4 * GiB + 512 * 8, 512),
+    (50 * GiB, GiB),
+    (100 * GiB + GiB // 2, GiB)
+)
+
+
+def apply_discards(vm, discards):
+    for d in discards:
+        vm.hmp_qemu_io('drive0', 'discard {} {}'.format(*d))
+
+
 def event_seconds(event):
     return event['timestamp']['seconds'] + \
         event['timestamp']['microseconds'] / 1000000.0
@@ -80,9 +102,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
         self.vm_b_events = []
 
     def test_postcopy(self):
-        discard_size = 0x40000000
         granularity = 512
-        chunk = 4096
 
         result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
                                name='bitmap', granularity=granularity)
@@ -92,14 +112,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
                                node='drive0', name='bitmap')
         empty_sha256 = result['return']['sha256']
 
-        s = 0
-        while s < discard_size:
-            self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
-            s += 0x10000
-        s = 0x8000
-        while s < discard_size:
-            self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
-            s += 0x10000
+        apply_discards(self.vm_a, discards1 + discards2)
 
         result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
                                node='drive0', name='bitmap')
@@ -111,10 +124,8 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
         result = self.vm_a.qmp('block-dirty-bitmap-clear', node='drive0',
                                name='bitmap')
         self.assert_qmp(result, 'return', {})
-        s = 0
-        while s < discard_size:
-            self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
-            s += 0x10000
+
+        apply_discards(self.vm_a, discards1)
 
         caps = [{'capability': 'dirty-bitmaps', 'state': True},
                 {'capability': 'events', 'state': True}]
@@ -134,10 +145,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
         event_resume = self.vm_b.event_wait('RESUME')
         self.vm_b_events.append(event_resume)
 
-        s = 0x8000
-        while s < discard_size:
-            self.vm_b.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
-            s += 0x10000
+        apply_discards(self.vm_b, discards2)
 
         match = {'data': {'status': 'completed'}}
         event_complete = self.vm_b.event_wait('MIGRATION', match=match)
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 06/21] qemu-iotests/199: increase postcopy period
  2020-07-24  8:43 [PATCH v3 for-5.1?? 00/21] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (4 preceding siblings ...)
  2020-07-24  8:43 ` [PATCH v3 05/21] qemu-iotests/199: change discard patterns Vladimir Sementsov-Ogievskiy
@ 2020-07-24  8:43 ` Vladimir Sementsov-Ogievskiy
  2020-07-24  8:43 ` [PATCH v3 07/21] migration/block-dirty-bitmap: fix dirty_bitmap_mig_before_vm_start Vladimir Sementsov-Ogievskiy
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-24  8:43 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, fam, vsementsov, quintela, qemu-devel, dgilbert, stefanha,
	andrey.shinkevich, den, mreitz, jsnow

The test wants to force a bitmap postcopy. Still, the resulting
postcopy period is very small. Let's increase it by adding more
bitmaps to migrate. Also, test disabled bitmaps migration.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Tested-by: Eric Blake <eblake@redhat.com>
---
 tests/qemu-iotests/199 | 58 ++++++++++++++++++++++++++++--------------
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index da4dae01fb..d8532e49da 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -103,29 +103,45 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 
     def test_postcopy(self):
         granularity = 512
+        nb_bitmaps = 15
 
-        result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
-                               name='bitmap', granularity=granularity)
-        self.assert_qmp(result, 'return', {})
+        for i in range(nb_bitmaps):
+            result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
+                                   name='bitmap{}'.format(i),
+                                   granularity=granularity)
+            self.assert_qmp(result, 'return', {})
 
         result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
-                               node='drive0', name='bitmap')
+                               node='drive0', name='bitmap0')
         empty_sha256 = result['return']['sha256']
 
-        apply_discards(self.vm_a, discards1 + discards2)
+        apply_discards(self.vm_a, discards1)
 
         result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
-                               node='drive0', name='bitmap')
-        sha256 = result['return']['sha256']
+                               node='drive0', name='bitmap0')
+        discards1_sha256 = result['return']['sha256']
 
         # Check, that updating the bitmap by discards works
-        assert sha256 != empty_sha256
+        assert discards1_sha256 != empty_sha256
 
-        result = self.vm_a.qmp('block-dirty-bitmap-clear', node='drive0',
-                               name='bitmap')
-        self.assert_qmp(result, 'return', {})
+        # We want to calculate resulting sha256. Do it in bitmap0, so, disable
+        # other bitmaps
+        for i in range(1, nb_bitmaps):
+            result = self.vm_a.qmp('block-dirty-bitmap-disable', node='drive0',
+                                   name='bitmap{}'.format(i))
+            self.assert_qmp(result, 'return', {})
 
-        apply_discards(self.vm_a, discards1)
+        apply_discards(self.vm_a, discards2)
+
+        result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
+                               node='drive0', name='bitmap0')
+        all_discards_sha256 = result['return']['sha256']
+
+        # Now, enable some bitmaps, to be updated during migration
+        for i in range(2, nb_bitmaps, 2):
+            result = self.vm_a.qmp('block-dirty-bitmap-enable', node='drive0',
+                                   name='bitmap{}'.format(i))
+            self.assert_qmp(result, 'return', {})
 
         caps = [{'capability': 'dirty-bitmaps', 'state': True},
                 {'capability': 'events', 'state': True}]
@@ -145,6 +161,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
         event_resume = self.vm_b.event_wait('RESUME')
         self.vm_b_events.append(event_resume)
 
+        # enabled bitmaps should be updated
         apply_discards(self.vm_b, discards2)
 
         match = {'data': {'status': 'completed'}}
@@ -158,7 +175,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
         downtime = event_dist(event_stop, event_resume)
         postcopy_time = event_dist(event_resume, event_complete)
 
-        # TODO: assert downtime * 10 < postcopy_time
+        assert downtime * 10 < postcopy_time
         if debug:
             print('downtime:', downtime)
             print('postcopy_time:', postcopy_time)
@@ -166,12 +183,15 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
         # Assert that bitmap migration is finished (check that successor bitmap
         # is removed)
         result = self.vm_b.qmp('query-block')
-        assert len(result['return'][0]['dirty-bitmaps']) == 1
-
-        # Check content of migrated (and updated by new writes) bitmap
-        result = self.vm_b.qmp('x-debug-block-dirty-bitmap-sha256',
-                               node='drive0', name='bitmap')
-        self.assert_qmp(result, 'return/sha256', sha256)
+        assert len(result['return'][0]['dirty-bitmaps']) == nb_bitmaps
+
+        # Check content of migrated bitmaps. Still, don't waste time checking
+        # every bitmap
+        for i in range(0, nb_bitmaps, 5):
+            result = self.vm_b.qmp('x-debug-block-dirty-bitmap-sha256',
+                                   node='drive0', name='bitmap{}'.format(i))
+            sha256 = discards1_sha256 if i % 2 else all_discards_sha256
+            self.assert_qmp(result, 'return/sha256', sha256)
 
 
 if __name__ == '__main__':
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 07/21] migration/block-dirty-bitmap: fix dirty_bitmap_mig_before_vm_start
  2020-07-24  8:43 [PATCH v3 for-5.1?? 00/21] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (5 preceding siblings ...)
  2020-07-24  8:43 ` [PATCH v3 06/21] qemu-iotests/199: increase postcopy period Vladimir Sementsov-Ogievskiy
@ 2020-07-24  8:43 ` Vladimir Sementsov-Ogievskiy
  2020-07-24 15:49   ` Eric Blake
  2020-07-24  8:43 ` [PATCH v3 08/21] migration/block-dirty-bitmap: rename state structure types Vladimir Sementsov-Ogievskiy
                   ` (13 subsequent siblings)
  20 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-24  8:43 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, fam, vsementsov, quintela, qemu-stable, qemu-devel,
	dgilbert, stefanha, andrey.shinkevich, den, mreitz, jsnow

No reason to use _locked version of bdrv_enable_dirty_bitmap, as we
don't lock this mutex before. Moreover, the adjacent
bdrv_dirty_bitmap_enable_successor do lock the mutex.

Fixes: 58f72b965e9e1q
Cc: qemu-stable@nongnu.org # v3.0
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
---
 migration/block-dirty-bitmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index b0dbf9eeed..0739f1259e 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -566,7 +566,7 @@ void dirty_bitmap_mig_before_vm_start(void)
         DirtyBitmapLoadBitmapState *b = item->data;
 
         if (b->migrated) {
-            bdrv_enable_dirty_bitmap_locked(b->bitmap);
+            bdrv_enable_dirty_bitmap(b->bitmap);
         } else {
             bdrv_dirty_bitmap_enable_successor(b->bitmap);
         }
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 08/21] migration/block-dirty-bitmap: rename state structure types
  2020-07-24  8:43 [PATCH v3 for-5.1?? 00/21] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (6 preceding siblings ...)
  2020-07-24  8:43 ` [PATCH v3 07/21] migration/block-dirty-bitmap: fix dirty_bitmap_mig_before_vm_start Vladimir Sementsov-Ogievskiy
@ 2020-07-24  8:43 ` Vladimir Sementsov-Ogievskiy
  2020-07-24  8:43 ` [PATCH v3 09/21] migration/block-dirty-bitmap: rename dirty_bitmap_mig_cleanup Vladimir Sementsov-Ogievskiy
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-24  8:43 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, fam, vsementsov, quintela, qemu-devel, dgilbert, stefanha,
	andrey.shinkevich, den, mreitz, jsnow

Rename types to be symmetrical for load/save part and shorter.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 migration/block-dirty-bitmap.c | 70 ++++++++++++++++++----------------
 1 file changed, 37 insertions(+), 33 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 0739f1259e..1d57bff4f6 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -100,23 +100,25 @@
 /* 0x04 was "AUTOLOAD" flags on elder versions, no it is ignored */
 #define DIRTY_BITMAP_MIG_START_FLAG_RESERVED_MASK    0xf8
 
-typedef struct DirtyBitmapMigBitmapState {
+/* State of one bitmap during save process */
+typedef struct SaveBitmapState {
     /* Written during setup phase. */
     BlockDriverState *bs;
     const char *node_name;
     BdrvDirtyBitmap *bitmap;
     uint64_t total_sectors;
     uint64_t sectors_per_chunk;
-    QSIMPLEQ_ENTRY(DirtyBitmapMigBitmapState) entry;
+    QSIMPLEQ_ENTRY(SaveBitmapState) entry;
     uint8_t flags;
 
     /* For bulk phase. */
     bool bulk_completed;
     uint64_t cur_sector;
-} DirtyBitmapMigBitmapState;
+} SaveBitmapState;
 
-typedef struct DirtyBitmapMigState {
-    QSIMPLEQ_HEAD(, DirtyBitmapMigBitmapState) dbms_list;
+/* State of the dirty bitmap migration (DBM) during save process */
+typedef struct DBMSaveState {
+    QSIMPLEQ_HEAD(, SaveBitmapState) dbms_list;
 
     bool bulk_completed;
     bool no_bitmaps;
@@ -124,23 +126,25 @@ typedef struct DirtyBitmapMigState {
     /* for send_bitmap_bits() */
     BlockDriverState *prev_bs;
     BdrvDirtyBitmap *prev_bitmap;
-} DirtyBitmapMigState;
+} DBMSaveState;
 
-typedef struct DirtyBitmapLoadState {
+/* State of the dirty bitmap migration (DBM) during load process */
+typedef struct DBMLoadState {
     uint32_t flags;
     char node_name[256];
     char bitmap_name[256];
     BlockDriverState *bs;
     BdrvDirtyBitmap *bitmap;
-} DirtyBitmapLoadState;
+} DBMLoadState;
 
-static DirtyBitmapMigState dirty_bitmap_mig_state;
+static DBMSaveState dirty_bitmap_mig_state;
 
-typedef struct DirtyBitmapLoadBitmapState {
+/* State of one bitmap during load process */
+typedef struct LoadBitmapState {
     BlockDriverState *bs;
     BdrvDirtyBitmap *bitmap;
     bool migrated;
-} DirtyBitmapLoadBitmapState;
+} LoadBitmapState;
 static GSList *enabled_bitmaps;
 QemuMutex finish_lock;
 
@@ -170,7 +174,7 @@ static void qemu_put_bitmap_flags(QEMUFile *f, uint32_t flags)
     qemu_put_byte(f, flags);
 }
 
-static void send_bitmap_header(QEMUFile *f, DirtyBitmapMigBitmapState *dbms,
+static void send_bitmap_header(QEMUFile *f, SaveBitmapState *dbms,
                                uint32_t additional_flags)
 {
     BlockDriverState *bs = dbms->bs;
@@ -199,19 +203,19 @@ static void send_bitmap_header(QEMUFile *f, DirtyBitmapMigBitmapState *dbms,
     }
 }
 
-static void send_bitmap_start(QEMUFile *f, DirtyBitmapMigBitmapState *dbms)
+static void send_bitmap_start(QEMUFile *f, SaveBitmapState *dbms)
 {
     send_bitmap_header(f, dbms, DIRTY_BITMAP_MIG_FLAG_START);
     qemu_put_be32(f, bdrv_dirty_bitmap_granularity(dbms->bitmap));
     qemu_put_byte(f, dbms->flags);
 }
 
-static void send_bitmap_complete(QEMUFile *f, DirtyBitmapMigBitmapState *dbms)
+static void send_bitmap_complete(QEMUFile *f, SaveBitmapState *dbms)
 {
     send_bitmap_header(f, dbms, DIRTY_BITMAP_MIG_FLAG_COMPLETE);
 }
 
-static void send_bitmap_bits(QEMUFile *f, DirtyBitmapMigBitmapState *dbms,
+static void send_bitmap_bits(QEMUFile *f, SaveBitmapState *dbms,
                              uint64_t start_sector, uint32_t nr_sectors)
 {
     /* align for buffer_is_zero() */
@@ -257,7 +261,7 @@ static void send_bitmap_bits(QEMUFile *f, DirtyBitmapMigBitmapState *dbms,
 /* Called with iothread lock taken.  */
 static void dirty_bitmap_mig_cleanup(void)
 {
-    DirtyBitmapMigBitmapState *dbms;
+    SaveBitmapState *dbms;
 
     while ((dbms = QSIMPLEQ_FIRST(&dirty_bitmap_mig_state.dbms_list)) != NULL) {
         QSIMPLEQ_REMOVE_HEAD(&dirty_bitmap_mig_state.dbms_list, entry);
@@ -271,7 +275,7 @@ static void dirty_bitmap_mig_cleanup(void)
 static int add_bitmaps_to_list(BlockDriverState *bs, const char *bs_name)
 {
     BdrvDirtyBitmap *bitmap;
-    DirtyBitmapMigBitmapState *dbms;
+    SaveBitmapState *dbms;
     Error *local_err = NULL;
 
     FOR_EACH_DIRTY_BITMAP(bs, bitmap) {
@@ -309,7 +313,7 @@ static int add_bitmaps_to_list(BlockDriverState *bs, const char *bs_name)
         bdrv_ref(bs);
         bdrv_dirty_bitmap_set_busy(bitmap, true);
 
-        dbms = g_new0(DirtyBitmapMigBitmapState, 1);
+        dbms = g_new0(SaveBitmapState, 1);
         dbms->bs = bs;
         dbms->node_name = bs_name;
         dbms->bitmap = bitmap;
@@ -334,7 +338,7 @@ static int add_bitmaps_to_list(BlockDriverState *bs, const char *bs_name)
 static int init_dirty_bitmap_migration(void)
 {
     BlockDriverState *bs;
-    DirtyBitmapMigBitmapState *dbms;
+    SaveBitmapState *dbms;
     GHashTable *handled_by_blk = g_hash_table_new(NULL, NULL);
     BlockBackend *blk;
 
@@ -408,7 +412,7 @@ fail:
 }
 
 /* Called with no lock taken.  */
-static void bulk_phase_send_chunk(QEMUFile *f, DirtyBitmapMigBitmapState *dbms)
+static void bulk_phase_send_chunk(QEMUFile *f, SaveBitmapState *dbms)
 {
     uint32_t nr_sectors = MIN(dbms->total_sectors - dbms->cur_sector,
                              dbms->sectors_per_chunk);
@@ -424,7 +428,7 @@ static void bulk_phase_send_chunk(QEMUFile *f, DirtyBitmapMigBitmapState *dbms)
 /* Called with no lock taken.  */
 static void bulk_phase(QEMUFile *f, bool limit)
 {
-    DirtyBitmapMigBitmapState *dbms;
+    SaveBitmapState *dbms;
 
     QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
         while (!dbms->bulk_completed) {
@@ -461,7 +465,7 @@ static int dirty_bitmap_save_iterate(QEMUFile *f, void *opaque)
 
 static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
 {
-    DirtyBitmapMigBitmapState *dbms;
+    SaveBitmapState *dbms;
     trace_dirty_bitmap_save_complete_enter();
 
     if (!dirty_bitmap_mig_state.bulk_completed) {
@@ -486,7 +490,7 @@ static void dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
                                       uint64_t *res_compatible,
                                       uint64_t *res_postcopy_only)
 {
-    DirtyBitmapMigBitmapState *dbms;
+    SaveBitmapState *dbms;
     uint64_t pending = 0;
 
     qemu_mutex_lock_iothread();
@@ -507,7 +511,7 @@ static void dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
 }
 
 /* First occurrence of this bitmap. It should be created if doesn't exist */
-static int dirty_bitmap_load_start(QEMUFile *f, DirtyBitmapLoadState *s)
+static int dirty_bitmap_load_start(QEMUFile *f, DBMLoadState *s)
 {
     Error *local_err = NULL;
     uint32_t granularity = qemu_get_be32(f);
@@ -538,7 +542,7 @@ static int dirty_bitmap_load_start(QEMUFile *f, DirtyBitmapLoadState *s)
 
     bdrv_disable_dirty_bitmap(s->bitmap);
     if (flags & DIRTY_BITMAP_MIG_START_FLAG_ENABLED) {
-        DirtyBitmapLoadBitmapState *b;
+        LoadBitmapState *b;
 
         bdrv_dirty_bitmap_create_successor(s->bitmap, &local_err);
         if (local_err) {
@@ -546,7 +550,7 @@ static int dirty_bitmap_load_start(QEMUFile *f, DirtyBitmapLoadState *s)
             return -EINVAL;
         }
 
-        b = g_new(DirtyBitmapLoadBitmapState, 1);
+        b = g_new(LoadBitmapState, 1);
         b->bs = s->bs;
         b->bitmap = s->bitmap;
         b->migrated = false;
@@ -563,7 +567,7 @@ void dirty_bitmap_mig_before_vm_start(void)
     qemu_mutex_lock(&finish_lock);
 
     for (item = enabled_bitmaps; item; item = g_slist_next(item)) {
-        DirtyBitmapLoadBitmapState *b = item->data;
+        LoadBitmapState *b = item->data;
 
         if (b->migrated) {
             bdrv_enable_dirty_bitmap(b->bitmap);
@@ -580,7 +584,7 @@ void dirty_bitmap_mig_before_vm_start(void)
     qemu_mutex_unlock(&finish_lock);
 }
 
-static void dirty_bitmap_load_complete(QEMUFile *f, DirtyBitmapLoadState *s)
+static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
 {
     GSList *item;
     trace_dirty_bitmap_load_complete();
@@ -589,7 +593,7 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DirtyBitmapLoadState *s)
     qemu_mutex_lock(&finish_lock);
 
     for (item = enabled_bitmaps; item; item = g_slist_next(item)) {
-        DirtyBitmapLoadBitmapState *b = item->data;
+        LoadBitmapState *b = item->data;
 
         if (b->bitmap == s->bitmap) {
             b->migrated = true;
@@ -621,7 +625,7 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DirtyBitmapLoadState *s)
     qemu_mutex_unlock(&finish_lock);
 }
 
-static int dirty_bitmap_load_bits(QEMUFile *f, DirtyBitmapLoadState *s)
+static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
 {
     uint64_t first_byte = qemu_get_be64(f) << BDRV_SECTOR_BITS;
     uint64_t nr_bytes = (uint64_t)qemu_get_be32(f) << BDRV_SECTOR_BITS;
@@ -666,7 +670,7 @@ static int dirty_bitmap_load_bits(QEMUFile *f, DirtyBitmapLoadState *s)
     return 0;
 }
 
-static int dirty_bitmap_load_header(QEMUFile *f, DirtyBitmapLoadState *s)
+static int dirty_bitmap_load_header(QEMUFile *f, DBMLoadState *s)
 {
     Error *local_err = NULL;
     bool nothing;
@@ -715,7 +719,7 @@ static int dirty_bitmap_load_header(QEMUFile *f, DirtyBitmapLoadState *s)
 
 static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
 {
-    static DirtyBitmapLoadState s;
+    static DBMLoadState s;
     int ret = 0;
 
     trace_dirty_bitmap_load_enter();
@@ -753,7 +757,7 @@ static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
 
 static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
 {
-    DirtyBitmapMigBitmapState *dbms = NULL;
+    SaveBitmapState *dbms = NULL;
     if (init_dirty_bitmap_migration() < 0) {
         return -1;
     }
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 09/21] migration/block-dirty-bitmap: rename dirty_bitmap_mig_cleanup
  2020-07-24  8:43 [PATCH v3 for-5.1?? 00/21] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (7 preceding siblings ...)
  2020-07-24  8:43 ` [PATCH v3 08/21] migration/block-dirty-bitmap: rename state structure types Vladimir Sementsov-Ogievskiy
@ 2020-07-24  8:43 ` Vladimir Sementsov-Ogievskiy
  2020-07-24  8:43 ` [PATCH v3 10/21] migration/block-dirty-bitmap: move mutex init to dirty_bitmap_mig_init Vladimir Sementsov-Ogievskiy
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-24  8:43 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, fam, vsementsov, quintela, qemu-devel, dgilbert, stefanha,
	andrey.shinkevich, den, mreitz, jsnow

Rename dirty_bitmap_mig_cleanup to dirty_bitmap_do_save_cleanup, to
stress that it is on save part.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 migration/block-dirty-bitmap.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 1d57bff4f6..01a536d7d3 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -259,7 +259,7 @@ static void send_bitmap_bits(QEMUFile *f, SaveBitmapState *dbms,
 }
 
 /* Called with iothread lock taken.  */
-static void dirty_bitmap_mig_cleanup(void)
+static void dirty_bitmap_do_save_cleanup(void)
 {
     SaveBitmapState *dbms;
 
@@ -406,7 +406,7 @@ static int init_dirty_bitmap_migration(void)
 
 fail:
     g_hash_table_destroy(handled_by_blk);
-    dirty_bitmap_mig_cleanup();
+    dirty_bitmap_do_save_cleanup();
 
     return -1;
 }
@@ -445,7 +445,7 @@ static void bulk_phase(QEMUFile *f, bool limit)
 /* for SaveVMHandlers */
 static void dirty_bitmap_save_cleanup(void *opaque)
 {
-    dirty_bitmap_mig_cleanup();
+    dirty_bitmap_do_save_cleanup();
 }
 
 static int dirty_bitmap_save_iterate(QEMUFile *f, void *opaque)
@@ -480,7 +480,7 @@ static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
 
     trace_dirty_bitmap_save_complete_finish();
 
-    dirty_bitmap_mig_cleanup();
+    dirty_bitmap_do_save_cleanup();
     return 0;
 }
 
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 10/21] migration/block-dirty-bitmap: move mutex init to dirty_bitmap_mig_init
  2020-07-24  8:43 [PATCH v3 for-5.1?? 00/21] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (8 preceding siblings ...)
  2020-07-24  8:43 ` [PATCH v3 09/21] migration/block-dirty-bitmap: rename dirty_bitmap_mig_cleanup Vladimir Sementsov-Ogievskiy
@ 2020-07-24  8:43 ` Vladimir Sementsov-Ogievskiy
  2020-07-27  9:51   ` Dr. David Alan Gilbert
  2020-07-24  8:43 ` [PATCH v3 11/21] migration/block-dirty-bitmap: refactor state global variables Vladimir Sementsov-Ogievskiy
                   ` (10 subsequent siblings)
  20 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-24  8:43 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, fam, vsementsov, quintela, qemu-devel, dgilbert, stefanha,
	andrey.shinkevich, den, mreitz, jsnow

No reasons to keep two public init functions.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
---
 migration/migration.h          | 1 -
 migration/block-dirty-bitmap.c | 6 +-----
 migration/migration.c          | 2 --
 3 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index f617960522..ab20c756f5 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -335,7 +335,6 @@ void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
 void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
 
 void dirty_bitmap_mig_before_vm_start(void);
-void init_dirty_bitmap_incoming_migration(void);
 void migrate_add_address(SocketAddress *address);
 
 int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 01a536d7d3..4b67e4f4fb 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -148,11 +148,6 @@ typedef struct LoadBitmapState {
 static GSList *enabled_bitmaps;
 QemuMutex finish_lock;
 
-void init_dirty_bitmap_incoming_migration(void)
-{
-    qemu_mutex_init(&finish_lock);
-}
-
 static uint32_t qemu_get_bitmap_flags(QEMUFile *f)
 {
     uint8_t flags = qemu_get_byte(f);
@@ -801,6 +796,7 @@ static SaveVMHandlers savevm_dirty_bitmap_handlers = {
 void dirty_bitmap_mig_init(void)
 {
     QSIMPLEQ_INIT(&dirty_bitmap_mig_state.dbms_list);
+    qemu_mutex_init(&finish_lock);
 
     register_savevm_live("dirty-bitmap", 0, 1,
                          &savevm_dirty_bitmap_handlers,
diff --git a/migration/migration.c b/migration/migration.c
index 2ed9923227..1c61428988 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -165,8 +165,6 @@ void migration_object_init(void)
     qemu_sem_init(&current_incoming->postcopy_pause_sem_dst, 0);
     qemu_sem_init(&current_incoming->postcopy_pause_sem_fault, 0);
 
-    init_dirty_bitmap_incoming_migration();
-
     if (!migration_object_check(current_migration, &err)) {
         error_report_err(err);
         exit(1);
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 11/21] migration/block-dirty-bitmap: refactor state global variables
  2020-07-24  8:43 [PATCH v3 for-5.1?? 00/21] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (9 preceding siblings ...)
  2020-07-24  8:43 ` [PATCH v3 10/21] migration/block-dirty-bitmap: move mutex init to dirty_bitmap_mig_init Vladimir Sementsov-Ogievskiy
@ 2020-07-24  8:43 ` Vladimir Sementsov-Ogievskiy
  2020-07-24  8:43 ` [PATCH v3 12/21] migration/block-dirty-bitmap: rename finish_lock to just lock Vladimir Sementsov-Ogievskiy
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-24  8:43 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, fam, vsementsov, quintela, qemu-devel, dgilbert, stefanha,
	andrey.shinkevich, den, mreitz, jsnow

Move all state variables into one global struct. Reduce global
variable usage, utilizing opaque pointer where possible.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
---
 migration/block-dirty-bitmap.c | 179 ++++++++++++++++++---------------
 1 file changed, 99 insertions(+), 80 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 4b67e4f4fb..9b39e7aa2b 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -128,6 +128,12 @@ typedef struct DBMSaveState {
     BdrvDirtyBitmap *prev_bitmap;
 } DBMSaveState;
 
+typedef struct LoadBitmapState {
+    BlockDriverState *bs;
+    BdrvDirtyBitmap *bitmap;
+    bool migrated;
+} LoadBitmapState;
+
 /* State of the dirty bitmap migration (DBM) during load process */
 typedef struct DBMLoadState {
     uint32_t flags;
@@ -135,18 +141,17 @@ typedef struct DBMLoadState {
     char bitmap_name[256];
     BlockDriverState *bs;
     BdrvDirtyBitmap *bitmap;
+
+    GSList *enabled_bitmaps;
+    QemuMutex finish_lock;
 } DBMLoadState;
 
-static DBMSaveState dirty_bitmap_mig_state;
+typedef struct DBMState {
+    DBMSaveState save;
+    DBMLoadState load;
+} DBMState;
 
-/* State of one bitmap during load process */
-typedef struct LoadBitmapState {
-    BlockDriverState *bs;
-    BdrvDirtyBitmap *bitmap;
-    bool migrated;
-} LoadBitmapState;
-static GSList *enabled_bitmaps;
-QemuMutex finish_lock;
+static DBMState dbm_state;
 
 static uint32_t qemu_get_bitmap_flags(QEMUFile *f)
 {
@@ -169,21 +174,21 @@ static void qemu_put_bitmap_flags(QEMUFile *f, uint32_t flags)
     qemu_put_byte(f, flags);
 }
 
-static void send_bitmap_header(QEMUFile *f, SaveBitmapState *dbms,
-                               uint32_t additional_flags)
+static void send_bitmap_header(QEMUFile *f, DBMSaveState *s,
+                               SaveBitmapState *dbms, uint32_t additional_flags)
 {
     BlockDriverState *bs = dbms->bs;
     BdrvDirtyBitmap *bitmap = dbms->bitmap;
     uint32_t flags = additional_flags;
     trace_send_bitmap_header_enter();
 
-    if (bs != dirty_bitmap_mig_state.prev_bs) {
-        dirty_bitmap_mig_state.prev_bs = bs;
+    if (bs != s->prev_bs) {
+        s->prev_bs = bs;
         flags |= DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME;
     }
 
-    if (bitmap != dirty_bitmap_mig_state.prev_bitmap) {
-        dirty_bitmap_mig_state.prev_bitmap = bitmap;
+    if (bitmap != s->prev_bitmap) {
+        s->prev_bitmap = bitmap;
         flags |= DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME;
     }
 
@@ -198,19 +203,22 @@ static void send_bitmap_header(QEMUFile *f, SaveBitmapState *dbms,
     }
 }
 
-static void send_bitmap_start(QEMUFile *f, SaveBitmapState *dbms)
+static void send_bitmap_start(QEMUFile *f, DBMSaveState *s,
+                              SaveBitmapState *dbms)
 {
-    send_bitmap_header(f, dbms, DIRTY_BITMAP_MIG_FLAG_START);
+    send_bitmap_header(f, s, dbms, DIRTY_BITMAP_MIG_FLAG_START);
     qemu_put_be32(f, bdrv_dirty_bitmap_granularity(dbms->bitmap));
     qemu_put_byte(f, dbms->flags);
 }
 
-static void send_bitmap_complete(QEMUFile *f, SaveBitmapState *dbms)
+static void send_bitmap_complete(QEMUFile *f, DBMSaveState *s,
+                                 SaveBitmapState *dbms)
 {
-    send_bitmap_header(f, dbms, DIRTY_BITMAP_MIG_FLAG_COMPLETE);
+    send_bitmap_header(f, s, dbms, DIRTY_BITMAP_MIG_FLAG_COMPLETE);
 }
 
-static void send_bitmap_bits(QEMUFile *f, SaveBitmapState *dbms,
+static void send_bitmap_bits(QEMUFile *f, DBMSaveState *s,
+                             SaveBitmapState *dbms,
                              uint64_t start_sector, uint32_t nr_sectors)
 {
     /* align for buffer_is_zero() */
@@ -235,7 +243,7 @@ static void send_bitmap_bits(QEMUFile *f, SaveBitmapState *dbms,
 
     trace_send_bitmap_bits(flags, start_sector, nr_sectors, buf_size);
 
-    send_bitmap_header(f, dbms, flags);
+    send_bitmap_header(f, s, dbms, flags);
 
     qemu_put_be64(f, start_sector);
     qemu_put_be32(f, nr_sectors);
@@ -254,12 +262,12 @@ static void send_bitmap_bits(QEMUFile *f, SaveBitmapState *dbms,
 }
 
 /* Called with iothread lock taken.  */
-static void dirty_bitmap_do_save_cleanup(void)
+static void dirty_bitmap_do_save_cleanup(DBMSaveState *s)
 {
     SaveBitmapState *dbms;
 
-    while ((dbms = QSIMPLEQ_FIRST(&dirty_bitmap_mig_state.dbms_list)) != NULL) {
-        QSIMPLEQ_REMOVE_HEAD(&dirty_bitmap_mig_state.dbms_list, entry);
+    while ((dbms = QSIMPLEQ_FIRST(&s->dbms_list)) != NULL) {
+        QSIMPLEQ_REMOVE_HEAD(&s->dbms_list, entry);
         bdrv_dirty_bitmap_set_busy(dbms->bitmap, false);
         bdrv_unref(dbms->bs);
         g_free(dbms);
@@ -267,7 +275,8 @@ static void dirty_bitmap_do_save_cleanup(void)
 }
 
 /* Called with iothread lock taken. */
-static int add_bitmaps_to_list(BlockDriverState *bs, const char *bs_name)
+static int add_bitmaps_to_list(DBMSaveState *s, BlockDriverState *bs,
+                               const char *bs_name)
 {
     BdrvDirtyBitmap *bitmap;
     SaveBitmapState *dbms;
@@ -322,25 +331,24 @@ static int add_bitmaps_to_list(BlockDriverState *bs, const char *bs_name)
             dbms->flags |= DIRTY_BITMAP_MIG_START_FLAG_PERSISTENT;
         }
 
-        QSIMPLEQ_INSERT_TAIL(&dirty_bitmap_mig_state.dbms_list,
-                             dbms, entry);
+        QSIMPLEQ_INSERT_TAIL(&s->dbms_list, dbms, entry);
     }
 
     return 0;
 }
 
 /* Called with iothread lock taken. */
-static int init_dirty_bitmap_migration(void)
+static int init_dirty_bitmap_migration(DBMSaveState *s)
 {
     BlockDriverState *bs;
     SaveBitmapState *dbms;
     GHashTable *handled_by_blk = g_hash_table_new(NULL, NULL);
     BlockBackend *blk;
 
-    dirty_bitmap_mig_state.bulk_completed = false;
-    dirty_bitmap_mig_state.prev_bs = NULL;
-    dirty_bitmap_mig_state.prev_bitmap = NULL;
-    dirty_bitmap_mig_state.no_bitmaps = false;
+    s->bulk_completed = false;
+    s->prev_bs = NULL;
+    s->prev_bitmap = NULL;
+    s->no_bitmaps = false;
 
     /*
      * Use blockdevice name for direct (or filtered) children of named block
@@ -369,7 +377,7 @@ static int init_dirty_bitmap_migration(void)
         }
 
         if (bs && bs->drv && !bs->drv->is_filter) {
-            if (add_bitmaps_to_list(bs, name)) {
+            if (add_bitmaps_to_list(s, bs, name)) {
                 goto fail;
             }
             g_hash_table_add(handled_by_blk, bs);
@@ -381,18 +389,18 @@ static int init_dirty_bitmap_migration(void)
             continue;
         }
 
-        if (add_bitmaps_to_list(bs, bdrv_get_node_name(bs))) {
+        if (add_bitmaps_to_list(s, bs, bdrv_get_node_name(bs))) {
             goto fail;
         }
     }
 
     /* unset migration flags here, to not roll back it */
-    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
+    QSIMPLEQ_FOREACH(dbms, &s->dbms_list, entry) {
         bdrv_dirty_bitmap_skip_store(dbms->bitmap, true);
     }
 
-    if (QSIMPLEQ_EMPTY(&dirty_bitmap_mig_state.dbms_list)) {
-        dirty_bitmap_mig_state.no_bitmaps = true;
+    if (QSIMPLEQ_EMPTY(&s->dbms_list)) {
+        s->no_bitmaps = true;
     }
 
     g_hash_table_destroy(handled_by_blk);
@@ -401,18 +409,19 @@ static int init_dirty_bitmap_migration(void)
 
 fail:
     g_hash_table_destroy(handled_by_blk);
-    dirty_bitmap_do_save_cleanup();
+    dirty_bitmap_do_save_cleanup(s);
 
     return -1;
 }
 
 /* Called with no lock taken.  */
-static void bulk_phase_send_chunk(QEMUFile *f, SaveBitmapState *dbms)
+static void bulk_phase_send_chunk(QEMUFile *f, DBMSaveState *s,
+                                  SaveBitmapState *dbms)
 {
     uint32_t nr_sectors = MIN(dbms->total_sectors - dbms->cur_sector,
                              dbms->sectors_per_chunk);
 
-    send_bitmap_bits(f, dbms, dbms->cur_sector, nr_sectors);
+    send_bitmap_bits(f, s, dbms, dbms->cur_sector, nr_sectors);
 
     dbms->cur_sector += nr_sectors;
     if (dbms->cur_sector >= dbms->total_sectors) {
@@ -421,61 +430,66 @@ static void bulk_phase_send_chunk(QEMUFile *f, SaveBitmapState *dbms)
 }
 
 /* Called with no lock taken.  */
-static void bulk_phase(QEMUFile *f, bool limit)
+static void bulk_phase(QEMUFile *f, DBMSaveState *s, bool limit)
 {
     SaveBitmapState *dbms;
 
-    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
+    QSIMPLEQ_FOREACH(dbms, &s->dbms_list, entry) {
         while (!dbms->bulk_completed) {
-            bulk_phase_send_chunk(f, dbms);
+            bulk_phase_send_chunk(f, s, dbms);
             if (limit && qemu_file_rate_limit(f)) {
                 return;
             }
         }
     }
 
-    dirty_bitmap_mig_state.bulk_completed = true;
+    s->bulk_completed = true;
 }
 
 /* for SaveVMHandlers */
 static void dirty_bitmap_save_cleanup(void *opaque)
 {
-    dirty_bitmap_do_save_cleanup();
+    DBMSaveState *s = &((DBMState *)opaque)->save;
+
+    dirty_bitmap_do_save_cleanup(s);
 }
 
 static int dirty_bitmap_save_iterate(QEMUFile *f, void *opaque)
 {
+    DBMSaveState *s = &((DBMState *)opaque)->save;
+
     trace_dirty_bitmap_save_iterate(migration_in_postcopy());
 
-    if (migration_in_postcopy() && !dirty_bitmap_mig_state.bulk_completed) {
-        bulk_phase(f, true);
+    if (migration_in_postcopy() && !s->bulk_completed) {
+        bulk_phase(f, s, true);
     }
 
     qemu_put_bitmap_flags(f, DIRTY_BITMAP_MIG_FLAG_EOS);
 
-    return dirty_bitmap_mig_state.bulk_completed;
+    return s->bulk_completed;
 }
 
 /* Called with iothread lock taken.  */
 
 static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
 {
+    DBMSaveState *s = &((DBMState *)opaque)->save;
     SaveBitmapState *dbms;
     trace_dirty_bitmap_save_complete_enter();
 
-    if (!dirty_bitmap_mig_state.bulk_completed) {
-        bulk_phase(f, false);
+    if (!s->bulk_completed) {
+        bulk_phase(f, s, false);
     }
 
-    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
-        send_bitmap_complete(f, dbms);
+    QSIMPLEQ_FOREACH(dbms, &s->dbms_list, entry) {
+        send_bitmap_complete(f, s, dbms);
     }
 
     qemu_put_bitmap_flags(f, DIRTY_BITMAP_MIG_FLAG_EOS);
 
     trace_dirty_bitmap_save_complete_finish();
 
-    dirty_bitmap_do_save_cleanup();
+    dirty_bitmap_save_cleanup(opaque);
     return 0;
 }
 
@@ -485,12 +499,13 @@ static void dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
                                       uint64_t *res_compatible,
                                       uint64_t *res_postcopy_only)
 {
+    DBMSaveState *s = &((DBMState *)opaque)->save;
     SaveBitmapState *dbms;
     uint64_t pending = 0;
 
     qemu_mutex_lock_iothread();
 
-    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
+    QSIMPLEQ_FOREACH(dbms, &s->dbms_list, entry) {
         uint64_t gran = bdrv_dirty_bitmap_granularity(dbms->bitmap);
         uint64_t sectors = dbms->bulk_completed ? 0 :
                            dbms->total_sectors - dbms->cur_sector;
@@ -549,7 +564,7 @@ static int dirty_bitmap_load_start(QEMUFile *f, DBMLoadState *s)
         b->bs = s->bs;
         b->bitmap = s->bitmap;
         b->migrated = false;
-        enabled_bitmaps = g_slist_prepend(enabled_bitmaps, b);
+        s->enabled_bitmaps = g_slist_prepend(s->enabled_bitmaps, b);
     }
 
     return 0;
@@ -557,11 +572,12 @@ static int dirty_bitmap_load_start(QEMUFile *f, DBMLoadState *s)
 
 void dirty_bitmap_mig_before_vm_start(void)
 {
+    DBMLoadState *s = &dbm_state.load;
     GSList *item;
 
-    qemu_mutex_lock(&finish_lock);
+    qemu_mutex_lock(&s->finish_lock);
 
-    for (item = enabled_bitmaps; item; item = g_slist_next(item)) {
+    for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
         LoadBitmapState *b = item->data;
 
         if (b->migrated) {
@@ -573,10 +589,10 @@ void dirty_bitmap_mig_before_vm_start(void)
         g_free(b);
     }
 
-    g_slist_free(enabled_bitmaps);
-    enabled_bitmaps = NULL;
+    g_slist_free(s->enabled_bitmaps);
+    s->enabled_bitmaps = NULL;
 
-    qemu_mutex_unlock(&finish_lock);
+    qemu_mutex_unlock(&s->finish_lock);
 }
 
 static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
@@ -585,9 +601,9 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
     trace_dirty_bitmap_load_complete();
     bdrv_dirty_bitmap_deserialize_finish(s->bitmap);
 
-    qemu_mutex_lock(&finish_lock);
+    qemu_mutex_lock(&s->finish_lock);
 
-    for (item = enabled_bitmaps; item; item = g_slist_next(item)) {
+    for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
         LoadBitmapState *b = item->data;
 
         if (b->bitmap == s->bitmap) {
@@ -598,7 +614,7 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
 
     if (bdrv_dirty_bitmap_has_successor(s->bitmap)) {
         bdrv_dirty_bitmap_lock(s->bitmap);
-        if (enabled_bitmaps == NULL) {
+        if (s->enabled_bitmaps == NULL) {
             /* in postcopy */
             bdrv_reclaim_dirty_bitmap_locked(s->bitmap, &error_abort);
             bdrv_enable_dirty_bitmap_locked(s->bitmap);
@@ -617,7 +633,7 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
         bdrv_dirty_bitmap_unlock(s->bitmap);
     }
 
-    qemu_mutex_unlock(&finish_lock);
+    qemu_mutex_unlock(&s->finish_lock);
 }
 
 static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
@@ -714,7 +730,7 @@ static int dirty_bitmap_load_header(QEMUFile *f, DBMLoadState *s)
 
 static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
 {
-    static DBMLoadState s;
+    DBMLoadState *s = &((DBMState *)opaque)->load;
     int ret = 0;
 
     trace_dirty_bitmap_load_enter();
@@ -724,17 +740,17 @@ static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
     }
 
     do {
-        ret = dirty_bitmap_load_header(f, &s);
+        ret = dirty_bitmap_load_header(f, s);
         if (ret < 0) {
             return ret;
         }
 
-        if (s.flags & DIRTY_BITMAP_MIG_FLAG_START) {
-            ret = dirty_bitmap_load_start(f, &s);
-        } else if (s.flags & DIRTY_BITMAP_MIG_FLAG_COMPLETE) {
-            dirty_bitmap_load_complete(f, &s);
-        } else if (s.flags & DIRTY_BITMAP_MIG_FLAG_BITS) {
-            ret = dirty_bitmap_load_bits(f, &s);
+        if (s->flags & DIRTY_BITMAP_MIG_FLAG_START) {
+            ret = dirty_bitmap_load_start(f, s);
+        } else if (s->flags & DIRTY_BITMAP_MIG_FLAG_COMPLETE) {
+            dirty_bitmap_load_complete(f, s);
+        } else if (s->flags & DIRTY_BITMAP_MIG_FLAG_BITS) {
+            ret = dirty_bitmap_load_bits(f, s);
         }
 
         if (!ret) {
@@ -744,7 +760,7 @@ static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
         if (ret) {
             return ret;
         }
-    } while (!(s.flags & DIRTY_BITMAP_MIG_FLAG_EOS));
+    } while (!(s->flags & DIRTY_BITMAP_MIG_FLAG_EOS));
 
     trace_dirty_bitmap_load_success();
     return 0;
@@ -752,13 +768,14 @@ static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
 
 static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
 {
+    DBMSaveState *s = &((DBMState *)opaque)->save;
     SaveBitmapState *dbms = NULL;
-    if (init_dirty_bitmap_migration() < 0) {
+    if (init_dirty_bitmap_migration(s) < 0) {
         return -1;
     }
 
-    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
-        send_bitmap_start(f, dbms);
+    QSIMPLEQ_FOREACH(dbms, &s->dbms_list, entry) {
+        send_bitmap_start(f, s, dbms);
     }
     qemu_put_bitmap_flags(f, DIRTY_BITMAP_MIG_FLAG_EOS);
 
@@ -767,7 +784,9 @@ static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
 
 static bool dirty_bitmap_is_active(void *opaque)
 {
-    return migrate_dirty_bitmaps() && !dirty_bitmap_mig_state.no_bitmaps;
+    DBMSaveState *s = &((DBMState *)opaque)->save;
+
+    return migrate_dirty_bitmaps() && !s->no_bitmaps;
 }
 
 static bool dirty_bitmap_is_active_iterate(void *opaque)
@@ -795,10 +814,10 @@ static SaveVMHandlers savevm_dirty_bitmap_handlers = {
 
 void dirty_bitmap_mig_init(void)
 {
-    QSIMPLEQ_INIT(&dirty_bitmap_mig_state.dbms_list);
-    qemu_mutex_init(&finish_lock);
+    QSIMPLEQ_INIT(&dbm_state.save.dbms_list);
+    qemu_mutex_init(&dbm_state.load.finish_lock);
 
     register_savevm_live("dirty-bitmap", 0, 1,
                          &savevm_dirty_bitmap_handlers,
-                         &dirty_bitmap_mig_state);
+                         &dbm_state);
 }
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 12/21] migration/block-dirty-bitmap: rename finish_lock to just lock
  2020-07-24  8:43 [PATCH v3 for-5.1?? 00/21] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (10 preceding siblings ...)
  2020-07-24  8:43 ` [PATCH v3 11/21] migration/block-dirty-bitmap: refactor state global variables Vladimir Sementsov-Ogievskiy
@ 2020-07-24  8:43 ` Vladimir Sementsov-Ogievskiy
  2020-07-24  8:43 ` [PATCH v3 13/21] migration/block-dirty-bitmap: simplify dirty_bitmap_load_complete Vladimir Sementsov-Ogievskiy
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-24  8:43 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, fam, vsementsov, quintela, qemu-devel, dgilbert, stefanha,
	andrey.shinkevich, den, mreitz, jsnow

finish_lock is bad name, as lock used not only on process end.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
---
 migration/block-dirty-bitmap.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 9b39e7aa2b..9194807b54 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -143,7 +143,7 @@ typedef struct DBMLoadState {
     BdrvDirtyBitmap *bitmap;
 
     GSList *enabled_bitmaps;
-    QemuMutex finish_lock;
+    QemuMutex lock; /* protect enabled_bitmaps */
 } DBMLoadState;
 
 typedef struct DBMState {
@@ -575,7 +575,7 @@ void dirty_bitmap_mig_before_vm_start(void)
     DBMLoadState *s = &dbm_state.load;
     GSList *item;
 
-    qemu_mutex_lock(&s->finish_lock);
+    qemu_mutex_lock(&s->lock);
 
     for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
         LoadBitmapState *b = item->data;
@@ -592,7 +592,7 @@ void dirty_bitmap_mig_before_vm_start(void)
     g_slist_free(s->enabled_bitmaps);
     s->enabled_bitmaps = NULL;
 
-    qemu_mutex_unlock(&s->finish_lock);
+    qemu_mutex_unlock(&s->lock);
 }
 
 static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
@@ -601,7 +601,7 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
     trace_dirty_bitmap_load_complete();
     bdrv_dirty_bitmap_deserialize_finish(s->bitmap);
 
-    qemu_mutex_lock(&s->finish_lock);
+    qemu_mutex_lock(&s->lock);
 
     for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
         LoadBitmapState *b = item->data;
@@ -633,7 +633,7 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
         bdrv_dirty_bitmap_unlock(s->bitmap);
     }
 
-    qemu_mutex_unlock(&s->finish_lock);
+    qemu_mutex_unlock(&s->lock);
 }
 
 static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
@@ -815,7 +815,7 @@ static SaveVMHandlers savevm_dirty_bitmap_handlers = {
 void dirty_bitmap_mig_init(void)
 {
     QSIMPLEQ_INIT(&dbm_state.save.dbms_list);
-    qemu_mutex_init(&dbm_state.load.finish_lock);
+    qemu_mutex_init(&dbm_state.load.lock);
 
     register_savevm_live("dirty-bitmap", 0, 1,
                          &savevm_dirty_bitmap_handlers,
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 13/21] migration/block-dirty-bitmap: simplify dirty_bitmap_load_complete
  2020-07-24  8:43 [PATCH v3 for-5.1?? 00/21] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (11 preceding siblings ...)
  2020-07-24  8:43 ` [PATCH v3 12/21] migration/block-dirty-bitmap: rename finish_lock to just lock Vladimir Sementsov-Ogievskiy
@ 2020-07-24  8:43 ` Vladimir Sementsov-Ogievskiy
  2020-07-24 16:11   ` Eric Blake
  2020-07-24  8:43 ` [PATCH v3 14/21] migration/block-dirty-bitmap: keep bitmap state for all bitmaps Vladimir Sementsov-Ogievskiy
                   ` (7 subsequent siblings)
  20 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-24  8:43 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, fam, vsementsov, quintela, qemu-devel, dgilbert, stefanha,
	andrey.shinkevich, den, mreitz, jsnow

bdrv_enable_dirty_bitmap_locked() call does nothing, as if we are in
postcopy, bitmap successor must be enabled, and reclaim operation will
enable the bitmap.

So, actually we need just call _reclaim_ in both if branches, and
making differences only to add an assertion seems not really good. The
logic becomes simple: on load complete we do reclaim and that's all.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
---
 migration/block-dirty-bitmap.c | 25 ++++---------------------
 1 file changed, 4 insertions(+), 21 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 9194807b54..405a259296 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -603,6 +603,10 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
 
     qemu_mutex_lock(&s->lock);
 
+    if (bdrv_dirty_bitmap_has_successor(s->bitmap)) {
+        bdrv_reclaim_dirty_bitmap(s->bitmap, &error_abort);
+    }
+
     for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
         LoadBitmapState *b = item->data;
 
@@ -612,27 +616,6 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
         }
     }
 
-    if (bdrv_dirty_bitmap_has_successor(s->bitmap)) {
-        bdrv_dirty_bitmap_lock(s->bitmap);
-        if (s->enabled_bitmaps == NULL) {
-            /* in postcopy */
-            bdrv_reclaim_dirty_bitmap_locked(s->bitmap, &error_abort);
-            bdrv_enable_dirty_bitmap_locked(s->bitmap);
-        } else {
-            /* target not started, successor must be empty */
-            int64_t count = bdrv_get_dirty_count(s->bitmap);
-            BdrvDirtyBitmap *ret = bdrv_reclaim_dirty_bitmap_locked(s->bitmap,
-                                                                    NULL);
-            /* bdrv_reclaim_dirty_bitmap can fail only on no successor (it
-             * must be) or on merge fail, but merge can't fail when second
-             * bitmap is empty
-             */
-            assert(ret == s->bitmap &&
-                   count == bdrv_get_dirty_count(s->bitmap));
-        }
-        bdrv_dirty_bitmap_unlock(s->bitmap);
-    }
-
     qemu_mutex_unlock(&s->lock);
 }
 
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 14/21] migration/block-dirty-bitmap: keep bitmap state for all bitmaps
  2020-07-24  8:43 [PATCH v3 for-5.1?? 00/21] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (12 preceding siblings ...)
  2020-07-24  8:43 ` [PATCH v3 13/21] migration/block-dirty-bitmap: simplify dirty_bitmap_load_complete Vladimir Sementsov-Ogievskiy
@ 2020-07-24  8:43 ` Vladimir Sementsov-Ogievskiy
  2020-07-24  8:43 ` [PATCH v3 15/21] migration/block-dirty-bitmap: relax error handling in incoming part Vladimir Sementsov-Ogievskiy
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-24  8:43 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, fam, vsementsov, quintela, qemu-devel, dgilbert, stefanha,
	andrey.shinkevich, den, mreitz, jsnow

Keep bitmap state for disabled bitmaps too. Keep the state until the
end of the process. It's needed for the following commit to implement
bitmap postcopy canceling.

To clean-up the new list the following logic is used:
We need two events to consider bitmap migration finished:
1. chunk with DIRTY_BITMAP_MIG_FLAG_COMPLETE flag should be received
2. dirty_bitmap_mig_before_vm_start should be called
These two events may come in any order, so we understand which one is
last, and on the last of them we remove bitmap migration state from the
list.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
---
 migration/block-dirty-bitmap.c | 64 +++++++++++++++++++++++-----------
 1 file changed, 43 insertions(+), 21 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 405a259296..eb4ffeac4d 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -132,6 +132,7 @@ typedef struct LoadBitmapState {
     BlockDriverState *bs;
     BdrvDirtyBitmap *bitmap;
     bool migrated;
+    bool enabled;
 } LoadBitmapState;
 
 /* State of the dirty bitmap migration (DBM) during load process */
@@ -142,8 +143,10 @@ typedef struct DBMLoadState {
     BlockDriverState *bs;
     BdrvDirtyBitmap *bitmap;
 
-    GSList *enabled_bitmaps;
-    QemuMutex lock; /* protect enabled_bitmaps */
+    bool before_vm_start_handled; /* set in dirty_bitmap_mig_before_vm_start */
+
+    GSList *bitmaps;
+    QemuMutex lock; /* protect bitmaps */
 } DBMLoadState;
 
 typedef struct DBMState {
@@ -526,6 +529,7 @@ static int dirty_bitmap_load_start(QEMUFile *f, DBMLoadState *s)
     Error *local_err = NULL;
     uint32_t granularity = qemu_get_be32(f);
     uint8_t flags = qemu_get_byte(f);
+    LoadBitmapState *b;
 
     if (s->bitmap) {
         error_report("Bitmap with the same name ('%s') already exists on "
@@ -552,45 +556,59 @@ static int dirty_bitmap_load_start(QEMUFile *f, DBMLoadState *s)
 
     bdrv_disable_dirty_bitmap(s->bitmap);
     if (flags & DIRTY_BITMAP_MIG_START_FLAG_ENABLED) {
-        LoadBitmapState *b;
-
         bdrv_dirty_bitmap_create_successor(s->bitmap, &local_err);
         if (local_err) {
             error_report_err(local_err);
             return -EINVAL;
         }
-
-        b = g_new(LoadBitmapState, 1);
-        b->bs = s->bs;
-        b->bitmap = s->bitmap;
-        b->migrated = false;
-        s->enabled_bitmaps = g_slist_prepend(s->enabled_bitmaps, b);
     }
 
+    b = g_new(LoadBitmapState, 1);
+    b->bs = s->bs;
+    b->bitmap = s->bitmap;
+    b->migrated = false;
+    b->enabled = flags & DIRTY_BITMAP_MIG_START_FLAG_ENABLED;
+
+    s->bitmaps = g_slist_prepend(s->bitmaps, b);
+
     return 0;
 }
 
-void dirty_bitmap_mig_before_vm_start(void)
+/*
+ * before_vm_start_handle_item
+ *
+ * g_slist_foreach helper
+ *
+ * item is LoadBitmapState*
+ * opaque is DBMLoadState*
+ */
+static void before_vm_start_handle_item(void *item, void *opaque)
 {
-    DBMLoadState *s = &dbm_state.load;
-    GSList *item;
-
-    qemu_mutex_lock(&s->lock);
-
-    for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
-        LoadBitmapState *b = item->data;
+    DBMLoadState *s = opaque;
+    LoadBitmapState *b = item;
 
+    if (b->enabled) {
         if (b->migrated) {
             bdrv_enable_dirty_bitmap(b->bitmap);
         } else {
             bdrv_dirty_bitmap_enable_successor(b->bitmap);
         }
+    }
 
+    if (b->migrated) {
+        s->bitmaps = g_slist_remove(s->bitmaps, b);
         g_free(b);
     }
+}
 
-    g_slist_free(s->enabled_bitmaps);
-    s->enabled_bitmaps = NULL;
+void dirty_bitmap_mig_before_vm_start(void)
+{
+    DBMLoadState *s = &dbm_state.load;
+    qemu_mutex_lock(&s->lock);
+
+    assert(!s->before_vm_start_handled);
+    g_slist_foreach(s->bitmaps, before_vm_start_handle_item, s);
+    s->before_vm_start_handled = true;
 
     qemu_mutex_unlock(&s->lock);
 }
@@ -607,11 +625,15 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
         bdrv_reclaim_dirty_bitmap(s->bitmap, &error_abort);
     }
 
-    for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
+    for (item = s->bitmaps; item; item = g_slist_next(item)) {
         LoadBitmapState *b = item->data;
 
         if (b->bitmap == s->bitmap) {
             b->migrated = true;
+            if (s->before_vm_start_handled) {
+                s->bitmaps = g_slist_remove(s->bitmaps, b);
+                g_free(b);
+            }
             break;
         }
     }
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 15/21] migration/block-dirty-bitmap: relax error handling in incoming part
  2020-07-24  8:43 [PATCH v3 for-5.1?? 00/21] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (13 preceding siblings ...)
  2020-07-24  8:43 ` [PATCH v3 14/21] migration/block-dirty-bitmap: keep bitmap state for all bitmaps Vladimir Sementsov-Ogievskiy
@ 2020-07-24  8:43 ` Vladimir Sementsov-Ogievskiy
  2020-07-24 17:35   ` Eric Blake
  2020-07-27 11:23   ` Dr. David Alan Gilbert
  2020-07-24  8:43 ` [PATCH v3 16/21] migration/block-dirty-bitmap: cancel migration on shutdown Vladimir Sementsov-Ogievskiy
                   ` (5 subsequent siblings)
  20 siblings, 2 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-24  8:43 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, fam, vsementsov, quintela, qemu-devel, dgilbert, stefanha,
	andrey.shinkevich, den, mreitz, jsnow

Bitmaps data is not critical, and we should not fail the migration (or
use postcopy recovering) because of dirty-bitmaps migration failure.
Instead we should just lose unfinished bitmaps.

Still we have to report io stream violation errors, as they affect the
whole migration stream.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 migration/block-dirty-bitmap.c | 152 +++++++++++++++++++++++++--------
 1 file changed, 117 insertions(+), 35 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index eb4ffeac4d..c24d4614bf 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -145,6 +145,15 @@ typedef struct DBMLoadState {
 
     bool before_vm_start_handled; /* set in dirty_bitmap_mig_before_vm_start */
 
+    /*
+     * cancelled
+     * Incoming migration is cancelled for some reason. That means that we
+     * still should read our chunks from migration stream, to not affect other
+     * migration objects (like RAM), but just ignore them and do not touch any
+     * bitmaps or nodes.
+     */
+    bool cancelled;
+
     GSList *bitmaps;
     QemuMutex lock; /* protect bitmaps */
 } DBMLoadState;
@@ -531,6 +540,10 @@ static int dirty_bitmap_load_start(QEMUFile *f, DBMLoadState *s)
     uint8_t flags = qemu_get_byte(f);
     LoadBitmapState *b;
 
+    if (s->cancelled) {
+        return 0;
+    }
+
     if (s->bitmap) {
         error_report("Bitmap with the same name ('%s') already exists on "
                      "destination", bdrv_dirty_bitmap_name(s->bitmap));
@@ -613,13 +626,47 @@ void dirty_bitmap_mig_before_vm_start(void)
     qemu_mutex_unlock(&s->lock);
 }
 
+static void cancel_incoming_locked(DBMLoadState *s)
+{
+    GSList *item;
+
+    if (s->cancelled) {
+        return;
+    }
+
+    s->cancelled = true;
+    s->bs = NULL;
+    s->bitmap = NULL;
+
+    /* Drop all unfinished bitmaps */
+    for (item = s->bitmaps; item; item = g_slist_next(item)) {
+        LoadBitmapState *b = item->data;
+
+        /*
+         * Bitmap must be unfinished, as finished bitmaps should already be
+         * removed from the list.
+         */
+        assert(!s->before_vm_start_handled || !b->migrated);
+        if (bdrv_dirty_bitmap_has_successor(b->bitmap)) {
+            bdrv_reclaim_dirty_bitmap(b->bitmap, &error_abort);
+        }
+        bdrv_release_dirty_bitmap(b->bitmap);
+    }
+
+    g_slist_free_full(s->bitmaps, g_free);
+    s->bitmaps = NULL;
+}
+
 static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
 {
     GSList *item;
     trace_dirty_bitmap_load_complete();
-    bdrv_dirty_bitmap_deserialize_finish(s->bitmap);
 
-    qemu_mutex_lock(&s->lock);
+    if (s->cancelled) {
+        return;
+    }
+
+    bdrv_dirty_bitmap_deserialize_finish(s->bitmap);
 
     if (bdrv_dirty_bitmap_has_successor(s->bitmap)) {
         bdrv_reclaim_dirty_bitmap(s->bitmap, &error_abort);
@@ -637,8 +684,6 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
             break;
         }
     }
-
-    qemu_mutex_unlock(&s->lock);
 }
 
 static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
@@ -650,15 +695,32 @@ static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
 
     if (s->flags & DIRTY_BITMAP_MIG_FLAG_ZEROES) {
         trace_dirty_bitmap_load_bits_zeroes();
-        bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_byte, nr_bytes,
-                                             false);
+        if (!s->cancelled) {
+            bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_byte,
+                                                 nr_bytes, false);
+        }
     } else {
         size_t ret;
         uint8_t *buf;
         uint64_t buf_size = qemu_get_be64(f);
-        uint64_t needed_size =
-            bdrv_dirty_bitmap_serialization_size(s->bitmap,
-                                                 first_byte, nr_bytes);
+        uint64_t needed_size;
+
+        buf = g_malloc(buf_size);
+        ret = qemu_get_buffer(f, buf, buf_size);
+        if (ret != buf_size) {
+            error_report("Failed to read bitmap bits");
+            g_free(buf);
+            return -EIO;
+        }
+
+        if (s->cancelled) {
+            g_free(buf);
+            return 0;
+        }
+
+        needed_size = bdrv_dirty_bitmap_serialization_size(s->bitmap,
+                                                           first_byte,
+                                                           nr_bytes);
 
         if (needed_size > buf_size ||
             buf_size > QEMU_ALIGN_UP(needed_size, 4 * sizeof(long))
@@ -667,15 +729,8 @@ static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
             error_report("Migrated bitmap granularity doesn't "
                          "match the destination bitmap '%s' granularity",
                          bdrv_dirty_bitmap_name(s->bitmap));
-            return -EINVAL;
-        }
-
-        buf = g_malloc(buf_size);
-        ret = qemu_get_buffer(f, buf, buf_size);
-        if (ret != buf_size) {
-            error_report("Failed to read bitmap bits");
-            g_free(buf);
-            return -EIO;
+            cancel_incoming_locked(s);
+            return 0;
         }
 
         bdrv_dirty_bitmap_deserialize_part(s->bitmap, buf, first_byte, nr_bytes,
@@ -700,14 +755,16 @@ static int dirty_bitmap_load_header(QEMUFile *f, DBMLoadState *s)
             error_report("Unable to read node name string");
             return -EINVAL;
         }
-        s->bs = bdrv_lookup_bs(s->node_name, s->node_name, &local_err);
-        if (!s->bs) {
-            error_report_err(local_err);
-            return -EINVAL;
+        if (!s->cancelled) {
+            s->bs = bdrv_lookup_bs(s->node_name, s->node_name, &local_err);
+            if (!s->bs) {
+                error_report_err(local_err);
+                cancel_incoming_locked(s);
+            }
         }
-    } else if (!s->bs && !nothing) {
+    } else if (!s->bs && !nothing && !s->cancelled) {
         error_report("Error: block device name is not set");
-        return -EINVAL;
+        cancel_incoming_locked(s);
     }
 
     if (s->flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
@@ -715,24 +772,38 @@ static int dirty_bitmap_load_header(QEMUFile *f, DBMLoadState *s)
             error_report("Unable to read bitmap name string");
             return -EINVAL;
         }
-        s->bitmap = bdrv_find_dirty_bitmap(s->bs, s->bitmap_name);
-
-        /* bitmap may be NULL here, it wouldn't be an error if it is the
-         * first occurrence of the bitmap */
-        if (!s->bitmap && !(s->flags & DIRTY_BITMAP_MIG_FLAG_START)) {
-            error_report("Error: unknown dirty bitmap "
-                         "'%s' for block device '%s'",
-                         s->bitmap_name, s->node_name);
-            return -EINVAL;
+        if (!s->cancelled) {
+            s->bitmap = bdrv_find_dirty_bitmap(s->bs, s->bitmap_name);
+
+            /*
+             * bitmap may be NULL here, it wouldn't be an error if it is the
+             * first occurrence of the bitmap
+             */
+            if (!s->bitmap && !(s->flags & DIRTY_BITMAP_MIG_FLAG_START)) {
+                error_report("Error: unknown dirty bitmap "
+                             "'%s' for block device '%s'",
+                             s->bitmap_name, s->node_name);
+                cancel_incoming_locked(s);
+            }
         }
-    } else if (!s->bitmap && !nothing) {
+    } else if (!s->bitmap && !nothing && !s->cancelled) {
         error_report("Error: block device name is not set");
-        return -EINVAL;
+        cancel_incoming_locked(s);
     }
 
     return 0;
 }
 
+/*
+ * dirty_bitmap_load
+ *
+ * Load sequence of dirty bitmap chunks. Return error only on fatal io stream
+ * violations. On other errors just cancel bitmaps incoming migration and return
+ * 0.
+ *
+ * Note, than when incoming bitmap migration is canceled, we still must read all
+ * our chunks (and just ignore them), to not affect other migration objects.
+ */
 static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
 {
     DBMLoadState *s = &((DBMState *)opaque)->load;
@@ -741,12 +812,19 @@ static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
     trace_dirty_bitmap_load_enter();
 
     if (version_id != 1) {
+        qemu_mutex_lock(&s->lock);
+        cancel_incoming_locked(s);
+        qemu_mutex_unlock(&s->lock);
         return -EINVAL;
     }
 
     do {
+        qemu_mutex_lock(&s->lock);
+
         ret = dirty_bitmap_load_header(f, s);
         if (ret < 0) {
+            cancel_incoming_locked(s);
+            qemu_mutex_unlock(&s->lock);
             return ret;
         }
 
@@ -763,8 +841,12 @@ static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
         }
 
         if (ret) {
+            cancel_incoming_locked(s);
+            qemu_mutex_unlock(&s->lock);
             return ret;
         }
+
+        qemu_mutex_unlock(&s->lock);
     } while (!(s->flags & DIRTY_BITMAP_MIG_FLAG_EOS));
 
     trace_dirty_bitmap_load_success();
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 16/21] migration/block-dirty-bitmap: cancel migration on shutdown
  2020-07-24  8:43 [PATCH v3 for-5.1?? 00/21] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (14 preceding siblings ...)
  2020-07-24  8:43 ` [PATCH v3 15/21] migration/block-dirty-bitmap: relax error handling in incoming part Vladimir Sementsov-Ogievskiy
@ 2020-07-24  8:43 ` Vladimir Sementsov-Ogievskiy
  2020-07-27 13:21   ` Dr. David Alan Gilbert
  2020-07-24  8:43 ` [PATCH v3 17/21] migration/savevm: don't worry if bitmap migration postcopy failed Vladimir Sementsov-Ogievskiy
                   ` (4 subsequent siblings)
  20 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-24  8:43 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, fam, vsementsov, quintela, qemu-devel, dgilbert, stefanha,
	andrey.shinkevich, den, mreitz, jsnow

If target is turned off prior to postcopy finished, target crashes
because busy bitmaps are found at shutdown.
Canceling incoming migration helps, as it removes all unfinished (and
therefore busy) bitmaps.

Similarly on source we crash in bdrv_close_all which asserts that all
bdrv states are removed, because bdrv states involved into dirty bitmap
migration are referenced by it. So, we need to cancel outgoing
migration as well.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
---
 migration/migration.h          |  2 ++
 migration/block-dirty-bitmap.c | 16 ++++++++++++++++
 migration/migration.c          | 13 +++++++++++++
 3 files changed, 31 insertions(+)

diff --git a/migration/migration.h b/migration/migration.h
index ab20c756f5..6c6a931d0d 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -335,6 +335,8 @@ void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
 void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
 
 void dirty_bitmap_mig_before_vm_start(void);
+void dirty_bitmap_mig_cancel_outgoing(void);
+void dirty_bitmap_mig_cancel_incoming(void);
 void migrate_add_address(SocketAddress *address);
 
 int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index c24d4614bf..a198ec7278 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -657,6 +657,22 @@ static void cancel_incoming_locked(DBMLoadState *s)
     s->bitmaps = NULL;
 }
 
+void dirty_bitmap_mig_cancel_outgoing(void)
+{
+    dirty_bitmap_do_save_cleanup(&dbm_state.save);
+}
+
+void dirty_bitmap_mig_cancel_incoming(void)
+{
+    DBMLoadState *s = &dbm_state.load;
+
+    qemu_mutex_lock(&s->lock);
+
+    cancel_incoming_locked(s);
+
+    qemu_mutex_unlock(&s->lock);
+}
+
 static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
 {
     GSList *item;
diff --git a/migration/migration.c b/migration/migration.c
index 1c61428988..8fe36339db 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -188,6 +188,19 @@ void migration_shutdown(void)
      */
     migrate_fd_cancel(current_migration);
     object_unref(OBJECT(current_migration));
+
+    /*
+     * Cancel outgoing migration of dirty bitmaps. It should
+     * at least unref used block nodes.
+     */
+    dirty_bitmap_mig_cancel_outgoing();
+
+    /*
+     * Cancel incoming migration of dirty bitmaps. Dirty bitmaps
+     * are non-critical data, and their loss never considered as
+     * something serious.
+     */
+    dirty_bitmap_mig_cancel_incoming();
 }
 
 /* For outgoing */
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 17/21] migration/savevm: don't worry if bitmap migration postcopy failed
  2020-07-24  8:43 [PATCH v3 for-5.1?? 00/21] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (15 preceding siblings ...)
  2020-07-24  8:43 ` [PATCH v3 16/21] migration/block-dirty-bitmap: cancel migration on shutdown Vladimir Sementsov-Ogievskiy
@ 2020-07-24  8:43 ` Vladimir Sementsov-Ogievskiy
  2020-07-24 18:08   ` Eric Blake
  2020-07-24  8:43 ` [PATCH v3 18/21] qemu-iotests/199: prepare for new test-cases addition Vladimir Sementsov-Ogievskiy
                   ` (3 subsequent siblings)
  20 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-24  8:43 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, fam, vsementsov, quintela, qemu-devel, dgilbert, stefanha,
	andrey.shinkevich, den, mreitz, jsnow

First, if only bitmaps postcopy enabled (not ram postcopy)
postcopy_pause_incoming crashes on assertion assert(mis->to_src_file).

And anyway, bitmaps postcopy is not prepared to be somehow recovered.
The original idea instead is that if bitmaps postcopy failed, we just
loss some bitmaps, which is not critical. So, on failure we just need
to remove unfinished bitmaps and guest should continue execution on
destination.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
---
 migration/savevm.c | 37 ++++++++++++++++++++++++++++++++-----
 1 file changed, 32 insertions(+), 5 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 45c9dd9d8a..a843d202b5 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1813,6 +1813,9 @@ static void *postcopy_ram_listen_thread(void *opaque)
     MigrationIncomingState *mis = migration_incoming_get_current();
     QEMUFile *f = mis->from_src_file;
     int load_res;
+    MigrationState *migr = migrate_get_current();
+
+    object_ref(OBJECT(migr));
 
     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
                                    MIGRATION_STATUS_POSTCOPY_ACTIVE);
@@ -1839,11 +1842,24 @@ static void *postcopy_ram_listen_thread(void *opaque)
 
     trace_postcopy_ram_listen_thread_exit();
     if (load_res < 0) {
-        error_report("%s: loadvm failed: %d", __func__, load_res);
         qemu_file_set_error(f, load_res);
-        migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
-                                       MIGRATION_STATUS_FAILED);
-    } else {
+        dirty_bitmap_mig_cancel_incoming();
+        if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
+            !migrate_postcopy_ram() && migrate_dirty_bitmaps())
+        {
+            error_report("%s: loadvm failed during postcopy: %d. All states "
+                         "are migrated except dirty bitmaps. Some dirty "
+                         "bitmaps may be lost, and present migrated dirty "
+                         "bitmaps are correctly migrated and valid.",
+                         __func__, load_res);
+            load_res = 0; /* prevent further exit() */
+        } else {
+            error_report("%s: loadvm failed: %d", __func__, load_res);
+            migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+                                           MIGRATION_STATUS_FAILED);
+        }
+    }
+    if (load_res >= 0) {
         /*
          * This looks good, but it's possible that the device loading in the
          * main thread hasn't finished yet, and so we might not be in 'RUN'
@@ -1879,6 +1895,8 @@ static void *postcopy_ram_listen_thread(void *opaque)
     mis->have_listen_thread = false;
     postcopy_state_set(POSTCOPY_INCOMING_END);
 
+    object_unref(OBJECT(migr));
+
     return NULL;
 }
 
@@ -2437,6 +2455,8 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
 {
     trace_postcopy_pause_incoming();
 
+    assert(migrate_postcopy_ram());
+
     /* Clear the triggered bit to allow one recovery */
     mis->postcopy_recover_triggered = false;
 
@@ -2521,15 +2541,22 @@ out:
     if (ret < 0) {
         qemu_file_set_error(f, ret);
 
+        /* Cancel bitmaps incoming regardless of recovery */
+        dirty_bitmap_mig_cancel_incoming();
+
         /*
          * If we are during an active postcopy, then we pause instead
          * of bail out to at least keep the VM's dirty data.  Note
          * that POSTCOPY_INCOMING_LISTENING stage is still not enough,
          * during which we're still receiving device states and we
          * still haven't yet started the VM on destination.
+         *
+         * Only RAM postcopy supports recovery. Still, if RAM postcopy is
+         * enabled, canceled bitmaps postcopy will not affect RAM postcopy
+         * recovering.
          */
         if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
-            postcopy_pause_incoming(mis)) {
+            migrate_postcopy_ram() && postcopy_pause_incoming(mis)) {
             /* Reset f to point to the newly created channel */
             f = mis->from_src_file;
             goto retry;
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 18/21] qemu-iotests/199: prepare for new test-cases addition
  2020-07-24  8:43 [PATCH v3 for-5.1?? 00/21] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (16 preceding siblings ...)
  2020-07-24  8:43 ` [PATCH v3 17/21] migration/savevm: don't worry if bitmap migration postcopy failed Vladimir Sementsov-Ogievskiy
@ 2020-07-24  8:43 ` Vladimir Sementsov-Ogievskiy
  2020-07-24  8:43 ` [PATCH v3 19/21] qemu-iotests/199: check persistent bitmaps Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-24  8:43 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, fam, vsementsov, quintela, qemu-devel, dgilbert, stefanha,
	andrey.shinkevich, den, mreitz, jsnow

Move future common part to start_postcopy() method. Move checking
number of bitmaps to check_bitmap().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
---
 tests/qemu-iotests/199 | 36 +++++++++++++++++++++++-------------
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index d8532e49da..355c0b2885 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -29,6 +29,8 @@ disk_b = os.path.join(iotests.test_dir, 'disk_b')
 size = '256G'
 fifo = os.path.join(iotests.test_dir, 'mig_fifo')
 
+granularity = 512
+nb_bitmaps = 15
 
 GiB = 1024 * 1024 * 1024
 
@@ -61,6 +63,15 @@ def event_dist(e1, e2):
     return event_seconds(e2) - event_seconds(e1)
 
 
+def check_bitmaps(vm, count):
+    result = vm.qmp('query-block')
+
+    if count == 0:
+        assert 'dirty-bitmaps' not in result['return'][0]
+    else:
+        assert len(result['return'][0]['dirty-bitmaps']) == count
+
+
 class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
     def tearDown(self):
         if debug:
@@ -101,10 +112,8 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
         self.vm_a_events = []
         self.vm_b_events = []
 
-    def test_postcopy(self):
-        granularity = 512
-        nb_bitmaps = 15
-
+    def start_postcopy(self):
+        """ Run migration until RESUME event on target. Return this event. """
         for i in range(nb_bitmaps):
             result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
                                    name='bitmap{}'.format(i),
@@ -119,10 +128,10 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 
         result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
                                node='drive0', name='bitmap0')
-        discards1_sha256 = result['return']['sha256']
+        self.discards1_sha256 = result['return']['sha256']
 
         # Check, that updating the bitmap by discards works
-        assert discards1_sha256 != empty_sha256
+        assert self.discards1_sha256 != empty_sha256
 
         # We want to calculate resulting sha256. Do it in bitmap0, so, disable
         # other bitmaps
@@ -135,7 +144,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 
         result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
                                node='drive0', name='bitmap0')
-        all_discards_sha256 = result['return']['sha256']
+        self.all_discards_sha256 = result['return']['sha256']
 
         # Now, enable some bitmaps, to be updated during migration
         for i in range(2, nb_bitmaps, 2):
@@ -160,6 +169,10 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 
         event_resume = self.vm_b.event_wait('RESUME')
         self.vm_b_events.append(event_resume)
+        return event_resume
+
+    def test_postcopy_success(self):
+        event_resume = self.start_postcopy()
 
         # enabled bitmaps should be updated
         apply_discards(self.vm_b, discards2)
@@ -180,18 +193,15 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
             print('downtime:', downtime)
             print('postcopy_time:', postcopy_time)
 
-        # Assert that bitmap migration is finished (check that successor bitmap
-        # is removed)
-        result = self.vm_b.qmp('query-block')
-        assert len(result['return'][0]['dirty-bitmaps']) == nb_bitmaps
+        check_bitmaps(self.vm_b, nb_bitmaps)
 
         # Check content of migrated bitmaps. Still, don't waste time checking
         # every bitmap
         for i in range(0, nb_bitmaps, 5):
             result = self.vm_b.qmp('x-debug-block-dirty-bitmap-sha256',
                                    node='drive0', name='bitmap{}'.format(i))
-            sha256 = discards1_sha256 if i % 2 else all_discards_sha256
-            self.assert_qmp(result, 'return/sha256', sha256)
+            sha = self.discards1_sha256 if i % 2 else self.all_discards_sha256
+            self.assert_qmp(result, 'return/sha256', sha)
 
 
 if __name__ == '__main__':
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 19/21] qemu-iotests/199: check persistent bitmaps
  2020-07-24  8:43 [PATCH v3 for-5.1?? 00/21] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (17 preceding siblings ...)
  2020-07-24  8:43 ` [PATCH v3 18/21] qemu-iotests/199: prepare for new test-cases addition Vladimir Sementsov-Ogievskiy
@ 2020-07-24  8:43 ` Vladimir Sementsov-Ogievskiy
  2020-07-24  8:43 ` [PATCH v3 20/21] qemu-iotests/199: add early shutdown case to bitmaps postcopy Vladimir Sementsov-Ogievskiy
  2020-07-24  8:43 ` [PATCH v3 21/21] qemu-iotests/199: add source-killed " Vladimir Sementsov-Ogievskiy
  20 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-24  8:43 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, fam, vsementsov, quintela, qemu-devel, dgilbert, stefanha,
	andrey.shinkevich, den, mreitz, jsnow

Check that persistent bitmaps are not stored on source and that bitmaps
are persistent on destination.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
---
 tests/qemu-iotests/199 | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index 355c0b2885..5fd34f0fcd 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -117,7 +117,8 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
         for i in range(nb_bitmaps):
             result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
                                    name='bitmap{}'.format(i),
-                                   granularity=granularity)
+                                   granularity=granularity,
+                                   persistent=True)
             self.assert_qmp(result, 'return', {})
 
         result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
@@ -193,6 +194,19 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
             print('downtime:', downtime)
             print('postcopy_time:', postcopy_time)
 
+        # check that there are no bitmaps stored on source
+        self.vm_a_events += self.vm_a.get_qmp_events()
+        self.vm_a.shutdown()
+        self.vm_a.launch()
+        check_bitmaps(self.vm_a, 0)
+
+        # check that bitmaps are migrated and persistence works
+        check_bitmaps(self.vm_b, nb_bitmaps)
+        self.vm_b.shutdown()
+        # recreate vm_b, so there is no incoming option, which prevents
+        # loading bitmaps from disk
+        self.vm_b = iotests.VM(path_suffix='b').add_drive(disk_b)
+        self.vm_b.launch()
         check_bitmaps(self.vm_b, nb_bitmaps)
 
         # Check content of migrated bitmaps. Still, don't waste time checking
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 20/21] qemu-iotests/199: add early shutdown case to bitmaps postcopy
  2020-07-24  8:43 [PATCH v3 for-5.1?? 00/21] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (18 preceding siblings ...)
  2020-07-24  8:43 ` [PATCH v3 19/21] qemu-iotests/199: check persistent bitmaps Vladimir Sementsov-Ogievskiy
@ 2020-07-24  8:43 ` Vladimir Sementsov-Ogievskiy
  2020-07-24 19:07   ` Eric Blake
  2020-07-24  8:43 ` [PATCH v3 21/21] qemu-iotests/199: add source-killed " Vladimir Sementsov-Ogievskiy
  20 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-24  8:43 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, fam, vsementsov, quintela, qemu-devel, dgilbert, stefanha,
	andrey.shinkevich, den, mreitz, jsnow

Previous patches fixed two crashes which may occur on shutdown prior to
bitmaps postcopy finished. Check that it works now.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
---
 tests/qemu-iotests/199     | 24 ++++++++++++++++++++++++
 tests/qemu-iotests/199.out |  4 ++--
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index 5fd34f0fcd..140930b2b1 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -217,6 +217,30 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
             sha = self.discards1_sha256 if i % 2 else self.all_discards_sha256
             self.assert_qmp(result, 'return/sha256', sha)
 
+    def test_early_shutdown_destination(self):
+        self.start_postcopy()
+
+        self.vm_b_events += self.vm_b.get_qmp_events()
+        self.vm_b.shutdown()
+        # recreate vm_b, so there is no incoming option, which prevents
+        # loading bitmaps from disk
+        self.vm_b = iotests.VM(path_suffix='b').add_drive(disk_b)
+        self.vm_b.launch()
+        check_bitmaps(self.vm_b, 0)
+
+        # Bitmaps will be lost if we just shutdown the vm, as they are marked
+        # to skip storing to disk when prepared for migration. And that's
+        # correct, as actual data may be modified in target vm, so we play
+        # safe.
+        # Still, this mark would be taken away if we do 'cont', and bitmaps
+        # become persistent again. (see iotest 169 for such behavior case)
+        result = self.vm_a.qmp('query-status')
+        assert not result['return']['running']
+        self.vm_a_events += self.vm_a.get_qmp_events()
+        self.vm_a.shutdown()
+        self.vm_a.launch()
+        check_bitmaps(self.vm_a, 0)
+
 
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2'])
diff --git a/tests/qemu-iotests/199.out b/tests/qemu-iotests/199.out
index ae1213e6f8..fbc63e62f8 100644
--- a/tests/qemu-iotests/199.out
+++ b/tests/qemu-iotests/199.out
@@ -1,5 +1,5 @@
-.
+..
 ----------------------------------------------------------------------
-Ran 1 tests
+Ran 2 tests
 
 OK
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 21/21] qemu-iotests/199: add source-killed case to bitmaps postcopy
  2020-07-24  8:43 [PATCH v3 for-5.1?? 00/21] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
                   ` (19 preceding siblings ...)
  2020-07-24  8:43 ` [PATCH v3 20/21] qemu-iotests/199: add early shutdown case to bitmaps postcopy Vladimir Sementsov-Ogievskiy
@ 2020-07-24  8:43 ` Vladimir Sementsov-Ogievskiy
  20 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-24  8:43 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, fam, vsementsov, quintela, qemu-devel, dgilbert, stefanha,
	andrey.shinkevich, den, mreitz, jsnow

Previous patches fixes behavior of bitmaps migration, so that errors
are handled by just removing unfinished bitmaps, and not fail or try to
recover postcopy migration. Add corresponding test.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
---
 tests/qemu-iotests/199     | 15 +++++++++++++++
 tests/qemu-iotests/199.out |  4 ++--
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index 140930b2b1..58fad872a1 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -241,6 +241,21 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
         self.vm_a.launch()
         check_bitmaps(self.vm_a, 0)
 
+    def test_early_kill_source(self):
+        self.start_postcopy()
+
+        self.vm_a_events = self.vm_a.get_qmp_events()
+        self.vm_a.kill()
+
+        self.vm_a.launch()
+
+        match = {'data': {'status': 'completed'}}
+        e_complete = self.vm_b.event_wait('MIGRATION', match=match)
+        self.vm_b_events.append(e_complete)
+
+        check_bitmaps(self.vm_a, 0)
+        check_bitmaps(self.vm_b, 0)
+
 
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2'])
diff --git a/tests/qemu-iotests/199.out b/tests/qemu-iotests/199.out
index fbc63e62f8..8d7e996700 100644
--- a/tests/qemu-iotests/199.out
+++ b/tests/qemu-iotests/199.out
@@ -1,5 +1,5 @@
-..
+...
 ----------------------------------------------------------------------
-Ran 2 tests
+Ran 3 tests
 
 OK
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 01/21] qemu-iotests/199: fix style
  2020-07-24  8:43 ` [PATCH v3 01/21] qemu-iotests/199: fix style Vladimir Sementsov-Ogievskiy
@ 2020-07-24 15:03   ` Eric Blake
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2020-07-24 15:03 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, fam, quintela, qemu-devel, dgilbert, stefanha,
	andrey.shinkevich, den, mreitz, jsnow

On 7/24/20 3:43 AM, Vladimir Sementsov-Ogievskiy wrote:
> Mostly, satisfy pep8 complains.

complaints

I can touch that up while staging.

> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> Tested-by: Eric Blake <eblake@redhat.com>
> ---
>   tests/qemu-iotests/199 | 13 +++++++------
>   1 file changed, 7 insertions(+), 6 deletions(-)
> 


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 07/21] migration/block-dirty-bitmap: fix dirty_bitmap_mig_before_vm_start
  2020-07-24  8:43 ` [PATCH v3 07/21] migration/block-dirty-bitmap: fix dirty_bitmap_mig_before_vm_start Vladimir Sementsov-Ogievskiy
@ 2020-07-24 15:49   ` Eric Blake
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2020-07-24 15:49 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, fam, quintela, qemu-stable, qemu-devel, dgilbert,
	stefanha, andrey.shinkevich, den, mreitz, jsnow

On 7/24/20 3:43 AM, Vladimir Sementsov-Ogievskiy wrote:
> No reason to use _locked version of bdrv_enable_dirty_bitmap, as we
> don't lock this mutex before. Moreover, the adjacent
> bdrv_dirty_bitmap_enable_successor do lock the mutex.

Grammar suggestion:

Using the _locked version of bdrv_enable_dirty_bitmap to bypass locking 
is wrong as we do not already own the mutex.  Moreover, the adjacent 
call to bdrv_dirty_bitmap_enable_successor grabs the mutex.

> 
> Fixes: 58f72b965e9e1q
> Cc: qemu-stable@nongnu.org # v3.0
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> ---
>   migration/block-dirty-bitmap.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

I'm comfortable taking this in 5.1.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 13/21] migration/block-dirty-bitmap: simplify dirty_bitmap_load_complete
  2020-07-24  8:43 ` [PATCH v3 13/21] migration/block-dirty-bitmap: simplify dirty_bitmap_load_complete Vladimir Sementsov-Ogievskiy
@ 2020-07-24 16:11   ` Eric Blake
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2020-07-24 16:11 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, fam, quintela, qemu-devel, dgilbert, stefanha,
	andrey.shinkevich, den, mreitz, jsnow

On 7/24/20 3:43 AM, Vladimir Sementsov-Ogievskiy wrote:
> bdrv_enable_dirty_bitmap_locked() call does nothing, as if we are in
> postcopy, bitmap successor must be enabled, and reclaim operation will
> enable the bitmap.
> 
> So, actually we need just call _reclaim_ in both if branches, and
> making differences only to add an assertion seems not really good. The
> logic becomes simple: on load complete we do reclaim and that's all.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> ---
>   migration/block-dirty-bitmap.c | 25 ++++---------------------
>   1 file changed, 4 insertions(+), 21 deletions(-)

Looks like 8-13 are all cleanups with no semantic change.  As it makes 
the later bug fix easier, I'm fine including them in 5.1 if the bug fix 
is also 5.1 material.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 15/21] migration/block-dirty-bitmap: relax error handling in incoming part
  2020-07-24  8:43 ` [PATCH v3 15/21] migration/block-dirty-bitmap: relax error handling in incoming part Vladimir Sementsov-Ogievskiy
@ 2020-07-24 17:35   ` Eric Blake
  2020-07-24 20:30     ` Vladimir Sementsov-Ogievskiy
  2020-07-27 11:23   ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 38+ messages in thread
From: Eric Blake @ 2020-07-24 17:35 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, fam, quintela, qemu-devel, dgilbert, stefanha,
	andrey.shinkevich, den, mreitz, jsnow

On 7/24/20 3:43 AM, Vladimir Sementsov-Ogievskiy wrote:
> Bitmaps data is not critical, and we should not fail the migration (or
> use postcopy recovering) because of dirty-bitmaps migration failure.
> Instead we should just lose unfinished bitmaps.
> 
> Still we have to report io stream violation errors, as they affect the
> whole migration stream.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   migration/block-dirty-bitmap.c | 152 +++++++++++++++++++++++++--------
>   1 file changed, 117 insertions(+), 35 deletions(-)
> 

> @@ -650,15 +695,32 @@ static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
>   
>       if (s->flags & DIRTY_BITMAP_MIG_FLAG_ZEROES) {
>           trace_dirty_bitmap_load_bits_zeroes();
> -        bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_byte, nr_bytes,
> -                                             false);
> +        if (!s->cancelled) {
> +            bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_byte,
> +                                                 nr_bytes, false);
> +        }
>       } else {
>           size_t ret;
>           uint8_t *buf;
>           uint64_t buf_size = qemu_get_be64(f);

Pre-existing, but if I understand, we are reading a value from the 
migration stream...

> -        uint64_t needed_size =
> -            bdrv_dirty_bitmap_serialization_size(s->bitmap,
> -                                                 first_byte, nr_bytes);
> +        uint64_t needed_size;
> +
> +        buf = g_malloc(buf_size);
> +        ret = qemu_get_buffer(f, buf, buf_size);

...and using it to malloc memory.  Is that a potential risk of a 
malicious stream causing us to allocate too much memory in relation to 
the guest's normal size?  If so, fixing that should be done separately.

I'm not a migration expert, but the patch looks reasonable to me.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 17/21] migration/savevm: don't worry if bitmap migration postcopy failed
  2020-07-24  8:43 ` [PATCH v3 17/21] migration/savevm: don't worry if bitmap migration postcopy failed Vladimir Sementsov-Ogievskiy
@ 2020-07-24 18:08   ` Eric Blake
  2020-07-27 13:29     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2020-07-24 18:08 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, fam, quintela, qemu-devel, dgilbert, stefanha,
	andrey.shinkevich, den, mreitz, jsnow

On 7/24/20 3:43 AM, Vladimir Sementsov-Ogievskiy wrote:
> First, if only bitmaps postcopy enabled (not ram postcopy)

is enabled (and not ram postcopy),

> postcopy_pause_incoming crashes on assertion assert(mis->to_src_file).

on an

> 
> And anyway, bitmaps postcopy is not prepared to be somehow recovered.
> The original idea instead is that if bitmaps postcopy failed, we just
> loss some bitmaps, which is not critical. So, on failure we just need

lose

> to remove unfinished bitmaps and guest should continue execution on
> destination.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> ---
>   migration/savevm.c | 37 ++++++++++++++++++++++++++++++++-----
>   1 file changed, 32 insertions(+), 5 deletions(-)
> 

Definitely a bug fix, but I'd like David's opinion on whether this is 
still 5.1 material (because it is limited to just bitmaps migration, 
which is opt-in) or too risky (because we've already had several 
releases where it was broken, what's one more?).

I'm less familiar with the code, so this is weak, but I did read through 
it and nothing jumped out at me, so:

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 20/21] qemu-iotests/199: add early shutdown case to bitmaps postcopy
  2020-07-24  8:43 ` [PATCH v3 20/21] qemu-iotests/199: add early shutdown case to bitmaps postcopy Vladimir Sementsov-Ogievskiy
@ 2020-07-24 19:07   ` Eric Blake
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2020-07-24 19:07 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, fam, quintela, qemu-devel, dgilbert, stefanha,
	andrey.shinkevich, den, mreitz, jsnow

On 7/24/20 3:43 AM, Vladimir Sementsov-Ogievskiy wrote:
> Previous patches fixed two crashes which may occur on shutdown prior to
> bitmaps postcopy finished. Check that it works now.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> ---
>   tests/qemu-iotests/199     | 24 ++++++++++++++++++++++++
>   tests/qemu-iotests/199.out |  4 ++--
>   2 files changed, 26 insertions(+), 2 deletions(-)

I've now confirmed that 18,19 don't expose any failures (but I'll leave 
them where they are in the series), and 20 does expose a testsuite 
failure if it is applied early; 20 and 21 are not fixed until 17 is 
applied (although I did not try to further bisect if 20 in isolation 
gets fixed sooner in the series).

So I'm adding to 20 and 21:

Tested-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 15/21] migration/block-dirty-bitmap: relax error handling in incoming part
  2020-07-24 17:35   ` Eric Blake
@ 2020-07-24 20:30     ` Vladimir Sementsov-Ogievskiy
  2020-07-27 11:16       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-24 20:30 UTC (permalink / raw)
  To: Eric Blake, qemu-block
  Cc: kwolf, fam, quintela, qemu-devel, dgilbert, stefanha,
	andrey.shinkevich, den, mreitz, jsnow

24.07.2020 20:35, Eric Blake wrote:
> On 7/24/20 3:43 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Bitmaps data is not critical, and we should not fail the migration (or
>> use postcopy recovering) because of dirty-bitmaps migration failure.
>> Instead we should just lose unfinished bitmaps.
>>
>> Still we have to report io stream violation errors, as they affect the
>> whole migration stream.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   migration/block-dirty-bitmap.c | 152 +++++++++++++++++++++++++--------
>>   1 file changed, 117 insertions(+), 35 deletions(-)
>>
> 
>> @@ -650,15 +695,32 @@ static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
>>       if (s->flags & DIRTY_BITMAP_MIG_FLAG_ZEROES) {
>>           trace_dirty_bitmap_load_bits_zeroes();
>> -        bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_byte, nr_bytes,
>> -                                             false);
>> +        if (!s->cancelled) {
>> +            bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_byte,
>> +                                                 nr_bytes, false);
>> +        }
>>       } else {
>>           size_t ret;
>>           uint8_t *buf;
>>           uint64_t buf_size = qemu_get_be64(f);
> 
> Pre-existing, but if I understand, we are reading a value from the migration stream...

Hmm, actually, this becomes worse after patch, as before patch we had the check, that the size at least corresponds to the bitmap.. But we want to relax things in cancelled mode (and we may not have any bitmap). Most correct thing is to use read in a loop to just skip the data from stream if we are in cancelled mode (something like nbd_drop()).

I can fix this with a follow-up patch.

> 
>> -        uint64_t needed_size =
>> -            bdrv_dirty_bitmap_serialization_size(s->bitmap,
>> -                                                 first_byte, nr_bytes);
>> +        uint64_t needed_size;
>> +
>> +        buf = g_malloc(buf_size);
>> +        ret = qemu_get_buffer(f, buf, buf_size);
> 
> ...and using it to malloc memory.  Is that a potential risk of a malicious stream causing us to allocate too much memory in relation to the guest's normal size?  If so, fixing that should be done separately.
> 
> I'm not a migration expert, but the patch looks reasonable to me.
> 
> Reviewed-by: Eric Blake <eblake@redhat.com>
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 10/21] migration/block-dirty-bitmap: move mutex init to dirty_bitmap_mig_init
  2020-07-24  8:43 ` [PATCH v3 10/21] migration/block-dirty-bitmap: move mutex init to dirty_bitmap_mig_init Vladimir Sementsov-Ogievskiy
@ 2020-07-27  9:51   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 38+ messages in thread
From: Dr. David Alan Gilbert @ 2020-07-27  9:51 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: kwolf, fam, qemu-block, quintela, qemu-devel, mreitz, stefanha,
	andrey.shinkevich, den, jsnow

* Vladimir Sementsov-Ogievskiy (vsementsov@virtuozzo.com) wrote:
> No reasons to keep two public init functions.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>

Yes so I think that means the initialisation of that lock is a little
later in startup, but OK

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.h          | 1 -
>  migration/block-dirty-bitmap.c | 6 +-----
>  migration/migration.c          | 2 --
>  3 files changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/migration/migration.h b/migration/migration.h
> index f617960522..ab20c756f5 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -335,7 +335,6 @@ void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
>  void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
>  
>  void dirty_bitmap_mig_before_vm_start(void);
> -void init_dirty_bitmap_incoming_migration(void);
>  void migrate_add_address(SocketAddress *address);
>  
>  int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
> index 01a536d7d3..4b67e4f4fb 100644
> --- a/migration/block-dirty-bitmap.c
> +++ b/migration/block-dirty-bitmap.c
> @@ -148,11 +148,6 @@ typedef struct LoadBitmapState {
>  static GSList *enabled_bitmaps;
>  QemuMutex finish_lock;
>  
> -void init_dirty_bitmap_incoming_migration(void)
> -{
> -    qemu_mutex_init(&finish_lock);
> -}
> -
>  static uint32_t qemu_get_bitmap_flags(QEMUFile *f)
>  {
>      uint8_t flags = qemu_get_byte(f);
> @@ -801,6 +796,7 @@ static SaveVMHandlers savevm_dirty_bitmap_handlers = {
>  void dirty_bitmap_mig_init(void)
>  {
>      QSIMPLEQ_INIT(&dirty_bitmap_mig_state.dbms_list);
> +    qemu_mutex_init(&finish_lock);
>  
>      register_savevm_live("dirty-bitmap", 0, 1,
>                           &savevm_dirty_bitmap_handlers,
> diff --git a/migration/migration.c b/migration/migration.c
> index 2ed9923227..1c61428988 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -165,8 +165,6 @@ void migration_object_init(void)
>      qemu_sem_init(&current_incoming->postcopy_pause_sem_dst, 0);
>      qemu_sem_init(&current_incoming->postcopy_pause_sem_fault, 0);
>  
> -    init_dirty_bitmap_incoming_migration();
> -
>      if (!migration_object_check(current_migration, &err)) {
>          error_report_err(err);
>          exit(1);
> -- 
> 2.21.0
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 15/21] migration/block-dirty-bitmap: relax error handling in incoming part
  2020-07-24 20:30     ` Vladimir Sementsov-Ogievskiy
@ 2020-07-27 11:16       ` Dr. David Alan Gilbert
  2020-07-27 11:26         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 38+ messages in thread
From: Dr. David Alan Gilbert @ 2020-07-27 11:16 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: kwolf, fam, qemu-block, quintela, jsnow, qemu-devel, mreitz,
	stefanha, andrey.shinkevich, den

* Vladimir Sementsov-Ogievskiy (vsementsov@virtuozzo.com) wrote:
> 24.07.2020 20:35, Eric Blake wrote:
> > On 7/24/20 3:43 AM, Vladimir Sementsov-Ogievskiy wrote:
> > > Bitmaps data is not critical, and we should not fail the migration (or
> > > use postcopy recovering) because of dirty-bitmaps migration failure.
> > > Instead we should just lose unfinished bitmaps.
> > > 
> > > Still we have to report io stream violation errors, as they affect the
> > > whole migration stream.
> > > 
> > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > > ---
> > >   migration/block-dirty-bitmap.c | 152 +++++++++++++++++++++++++--------
> > >   1 file changed, 117 insertions(+), 35 deletions(-)
> > > 
> > 
> > > @@ -650,15 +695,32 @@ static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
> > >       if (s->flags & DIRTY_BITMAP_MIG_FLAG_ZEROES) {
> > >           trace_dirty_bitmap_load_bits_zeroes();
> > > -        bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_byte, nr_bytes,
> > > -                                             false);
> > > +        if (!s->cancelled) {
> > > +            bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_byte,
> > > +                                                 nr_bytes, false);
> > > +        }
> > >       } else {
> > >           size_t ret;
> > >           uint8_t *buf;
> > >           uint64_t buf_size = qemu_get_be64(f);
> > 
> > Pre-existing, but if I understand, we are reading a value from the migration stream...
> 
> Hmm, actually, this becomes worse after patch, as before patch we had the check, that the size at least corresponds to the bitmap.. But we want to relax things in cancelled mode (and we may not have any bitmap). Most correct thing is to use read in a loop to just skip the data from stream if we are in cancelled mode (something like nbd_drop()).
> 
> I can fix this with a follow-up patch.

If the size is bogus, it's probably not worth trying to skip anything
because it could be just a broken/misaligned stream.

Dave

> > 
> > > -        uint64_t needed_size =
> > > -            bdrv_dirty_bitmap_serialization_size(s->bitmap,
> > > -                                                 first_byte, nr_bytes);
> > > +        uint64_t needed_size;
> > > +
> > > +        buf = g_malloc(buf_size);
> > > +        ret = qemu_get_buffer(f, buf, buf_size);
> > 
> > ...and using it to malloc memory.  Is that a potential risk of a malicious stream causing us to allocate too much memory in relation to the guest's normal size?  If so, fixing that should be done separately.
> > 
> > I'm not a migration expert, but the patch looks reasonable to me.
> > 
> > Reviewed-by: Eric Blake <eblake@redhat.com>
> > 
> 
> 
> -- 
> Best regards,
> Vladimir
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 15/21] migration/block-dirty-bitmap: relax error handling in incoming part
  2020-07-24  8:43 ` [PATCH v3 15/21] migration/block-dirty-bitmap: relax error handling in incoming part Vladimir Sementsov-Ogievskiy
  2020-07-24 17:35   ` Eric Blake
@ 2020-07-27 11:23   ` Dr. David Alan Gilbert
  2020-07-27 11:28     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 38+ messages in thread
From: Dr. David Alan Gilbert @ 2020-07-27 11:23 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: kwolf, fam, qemu-block, quintela, qemu-devel, mreitz, stefanha,
	andrey.shinkevich, den, jsnow

* Vladimir Sementsov-Ogievskiy (vsementsov@virtuozzo.com) wrote:
> Bitmaps data is not critical, and we should not fail the migration (or
> use postcopy recovering) because of dirty-bitmaps migration failure.
> Instead we should just lose unfinished bitmaps.
> 
> Still we have to report io stream violation errors, as they affect the
> whole migration stream.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  migration/block-dirty-bitmap.c | 152 +++++++++++++++++++++++++--------
>  1 file changed, 117 insertions(+), 35 deletions(-)
> 
> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
> index eb4ffeac4d..c24d4614bf 100644
> --- a/migration/block-dirty-bitmap.c
> +++ b/migration/block-dirty-bitmap.c
> @@ -145,6 +145,15 @@ typedef struct DBMLoadState {
>  
>      bool before_vm_start_handled; /* set in dirty_bitmap_mig_before_vm_start */
>  
> +    /*
> +     * cancelled
> +     * Incoming migration is cancelled for some reason. That means that we
> +     * still should read our chunks from migration stream, to not affect other
> +     * migration objects (like RAM), but just ignore them and do not touch any
> +     * bitmaps or nodes.
> +     */
> +    bool cancelled;
> +
>      GSList *bitmaps;
>      QemuMutex lock; /* protect bitmaps */
>  } DBMLoadState;
> @@ -531,6 +540,10 @@ static int dirty_bitmap_load_start(QEMUFile *f, DBMLoadState *s)
>      uint8_t flags = qemu_get_byte(f);
>      LoadBitmapState *b;
>  
> +    if (s->cancelled) {
> +        return 0;
> +    }
> +
>      if (s->bitmap) {
>          error_report("Bitmap with the same name ('%s') already exists on "
>                       "destination", bdrv_dirty_bitmap_name(s->bitmap));
> @@ -613,13 +626,47 @@ void dirty_bitmap_mig_before_vm_start(void)
>      qemu_mutex_unlock(&s->lock);
>  }
>  
> +static void cancel_incoming_locked(DBMLoadState *s)
> +{
> +    GSList *item;
> +
> +    if (s->cancelled) {
> +        return;
> +    }
> +
> +    s->cancelled = true;
> +    s->bs = NULL;
> +    s->bitmap = NULL;
> +
> +    /* Drop all unfinished bitmaps */
> +    for (item = s->bitmaps; item; item = g_slist_next(item)) {
> +        LoadBitmapState *b = item->data;
> +
> +        /*
> +         * Bitmap must be unfinished, as finished bitmaps should already be
> +         * removed from the list.
> +         */
> +        assert(!s->before_vm_start_handled || !b->migrated);
> +        if (bdrv_dirty_bitmap_has_successor(b->bitmap)) {
> +            bdrv_reclaim_dirty_bitmap(b->bitmap, &error_abort);
> +        }
> +        bdrv_release_dirty_bitmap(b->bitmap);
> +    }
> +
> +    g_slist_free_full(s->bitmaps, g_free);
> +    s->bitmaps = NULL;
> +}
> +
>  static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>  {
>      GSList *item;
>      trace_dirty_bitmap_load_complete();
> -    bdrv_dirty_bitmap_deserialize_finish(s->bitmap);
>  
> -    qemu_mutex_lock(&s->lock);
> +    if (s->cancelled) {
> +        return;
> +    }
> +
> +    bdrv_dirty_bitmap_deserialize_finish(s->bitmap);
>  
>      if (bdrv_dirty_bitmap_has_successor(s->bitmap)) {
>          bdrv_reclaim_dirty_bitmap(s->bitmap, &error_abort);
> @@ -637,8 +684,6 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>              break;
>          }
>      }
> -
> -    qemu_mutex_unlock(&s->lock);
>  }
>  
>  static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
> @@ -650,15 +695,32 @@ static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
>  
>      if (s->flags & DIRTY_BITMAP_MIG_FLAG_ZEROES) {
>          trace_dirty_bitmap_load_bits_zeroes();
> -        bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_byte, nr_bytes,
> -                                             false);
> +        if (!s->cancelled) {
> +            bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_byte,
> +                                                 nr_bytes, false);
> +        }
>      } else {
>          size_t ret;
>          uint8_t *buf;
>          uint64_t buf_size = qemu_get_be64(f);
> -        uint64_t needed_size =
> -            bdrv_dirty_bitmap_serialization_size(s->bitmap,
> -                                                 first_byte, nr_bytes);
> +        uint64_t needed_size;
> +
> +        buf = g_malloc(buf_size);
> +        ret = qemu_get_buffer(f, buf, buf_size);
> +        if (ret != buf_size) {
> +            error_report("Failed to read bitmap bits");
> +            g_free(buf);
> +            return -EIO;
> +        }
> +
> +        if (s->cancelled) {
> +            g_free(buf);
> +            return 0;
> +        }
> +
> +        needed_size = bdrv_dirty_bitmap_serialization_size(s->bitmap,
> +                                                           first_byte,
> +                                                           nr_bytes);
>  
>          if (needed_size > buf_size ||
>              buf_size > QEMU_ALIGN_UP(needed_size, 4 * sizeof(long))
> @@ -667,15 +729,8 @@ static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
>              error_report("Migrated bitmap granularity doesn't "
>                           "match the destination bitmap '%s' granularity",
>                           bdrv_dirty_bitmap_name(s->bitmap));
> -            return -EINVAL;
> -        }
> -
> -        buf = g_malloc(buf_size);
> -        ret = qemu_get_buffer(f, buf, buf_size);
> -        if (ret != buf_size) {
> -            error_report("Failed to read bitmap bits");
> -            g_free(buf);
> -            return -EIO;
> +            cancel_incoming_locked(s);
> +            return 0;
>          }
>  
>          bdrv_dirty_bitmap_deserialize_part(s->bitmap, buf, first_byte, nr_bytes,
> @@ -700,14 +755,16 @@ static int dirty_bitmap_load_header(QEMUFile *f, DBMLoadState *s)
>              error_report("Unable to read node name string");
>              return -EINVAL;
>          }
> -        s->bs = bdrv_lookup_bs(s->node_name, s->node_name, &local_err);
> -        if (!s->bs) {
> -            error_report_err(local_err);
> -            return -EINVAL;
> +        if (!s->cancelled) {
> +            s->bs = bdrv_lookup_bs(s->node_name, s->node_name, &local_err);
> +            if (!s->bs) {
> +                error_report_err(local_err);
> +                cancel_incoming_locked(s);
> +            }
>          }
> -    } else if (!s->bs && !nothing) {
> +    } else if (!s->bs && !nothing && !s->cancelled) {
>          error_report("Error: block device name is not set");
> -        return -EINVAL;
> +        cancel_incoming_locked(s);
>      }
>  
>      if (s->flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
> @@ -715,24 +772,38 @@ static int dirty_bitmap_load_header(QEMUFile *f, DBMLoadState *s)
>              error_report("Unable to read bitmap name string");
>              return -EINVAL;
>          }
> -        s->bitmap = bdrv_find_dirty_bitmap(s->bs, s->bitmap_name);
> -
> -        /* bitmap may be NULL here, it wouldn't be an error if it is the
> -         * first occurrence of the bitmap */
> -        if (!s->bitmap && !(s->flags & DIRTY_BITMAP_MIG_FLAG_START)) {
> -            error_report("Error: unknown dirty bitmap "
> -                         "'%s' for block device '%s'",
> -                         s->bitmap_name, s->node_name);
> -            return -EINVAL;
> +        if (!s->cancelled) {
> +            s->bitmap = bdrv_find_dirty_bitmap(s->bs, s->bitmap_name);
> +
> +            /*
> +             * bitmap may be NULL here, it wouldn't be an error if it is the
> +             * first occurrence of the bitmap
> +             */
> +            if (!s->bitmap && !(s->flags & DIRTY_BITMAP_MIG_FLAG_START)) {
> +                error_report("Error: unknown dirty bitmap "
> +                             "'%s' for block device '%s'",
> +                             s->bitmap_name, s->node_name);
> +                cancel_incoming_locked(s);
> +            }
>          }
> -    } else if (!s->bitmap && !nothing) {
> +    } else if (!s->bitmap && !nothing && !s->cancelled) {
>          error_report("Error: block device name is not set");
> -        return -EINVAL;
> +        cancel_incoming_locked(s);
>      }
>  
>      return 0;
>  }
>  
> +/*
> + * dirty_bitmap_load
> + *
> + * Load sequence of dirty bitmap chunks. Return error only on fatal io stream
> + * violations. On other errors just cancel bitmaps incoming migration and return
> + * 0.
> + *
> + * Note, than when incoming bitmap migration is canceled, we still must read all
> + * our chunks (and just ignore them), to not affect other migration objects.
> + */
>  static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
>  {
>      DBMLoadState *s = &((DBMState *)opaque)->load;
> @@ -741,12 +812,19 @@ static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
>      trace_dirty_bitmap_load_enter();
>  
>      if (version_id != 1) {
> +        qemu_mutex_lock(&s->lock);
> +        cancel_incoming_locked(s);
> +        qemu_mutex_unlock(&s->lock);
>          return -EINVAL;
>      }
>  
>      do {
> +        qemu_mutex_lock(&s->lock);

Would QEMU_LOCK_GUARD(&s->lock)  work there?
It avoids the need to catch the unlock on each of the failure cases.

Dave

>          ret = dirty_bitmap_load_header(f, s);
>          if (ret < 0) {
> +            cancel_incoming_locked(s);
> +            qemu_mutex_unlock(&s->lock);
>              return ret;
>          }
>  
> @@ -763,8 +841,12 @@ static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
>          }
>  
>          if (ret) {
> +            cancel_incoming_locked(s);
> +            qemu_mutex_unlock(&s->lock);
>              return ret;
>          }
> +
> +        qemu_mutex_unlock(&s->lock);
>      } while (!(s->flags & DIRTY_BITMAP_MIG_FLAG_EOS));
>  
>      trace_dirty_bitmap_load_success();
> -- 
> 2.21.0
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 15/21] migration/block-dirty-bitmap: relax error handling in incoming part
  2020-07-27 11:16       ` Dr. David Alan Gilbert
@ 2020-07-27 11:26         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-27 11:26 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: kwolf, fam, qemu-block, quintela, jsnow, qemu-devel, mreitz,
	stefanha, andrey.shinkevich, den

27.07.2020 14:16, Dr. David Alan Gilbert wrote:
> * Vladimir Sementsov-Ogievskiy (vsementsov@virtuozzo.com) wrote:
>> 24.07.2020 20:35, Eric Blake wrote:
>>> On 7/24/20 3:43 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>> Bitmaps data is not critical, and we should not fail the migration (or
>>>> use postcopy recovering) because of dirty-bitmaps migration failure.
>>>> Instead we should just lose unfinished bitmaps.
>>>>
>>>> Still we have to report io stream violation errors, as they affect the
>>>> whole migration stream.
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>> ---
>>>>   migration/block-dirty-bitmap.c | 152 +++++++++++++++++++++++++--------
>>>>   1 file changed, 117 insertions(+), 35 deletions(-)
>>>>
>>>
>>>> @@ -650,15 +695,32 @@ static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
>>>>       if (s->flags & DIRTY_BITMAP_MIG_FLAG_ZEROES) {
>>>>           trace_dirty_bitmap_load_bits_zeroes();
>>>> -        bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_byte, nr_bytes,
>>>> -                                             false);
>>>> +        if (!s->cancelled) {
>>>> +            bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_byte,
>>>> +                                                 nr_bytes, false);
>>>> +        }
>>>>       } else {
>>>>           size_t ret;
>>>>           uint8_t *buf;
>>>>           uint64_t buf_size = qemu_get_be64(f);
>>>
>>> Pre-existing, but if I understand, we are reading a value from the migration stream...
>>
>> Hmm, actually, this becomes worse after patch, as before patch we had the check, that the size at least corresponds to the bitmap.. But we want to relax things in cancelled mode (and we may not have any bitmap). Most correct thing is to use read in a loop to just skip the data from stream if we are in cancelled mode (something like nbd_drop()).
>>
>> I can fix this with a follow-up patch.
> 
> If the size is bogus, it's probably not worth trying to skip anything
> because it could be just a broken/misaligned stream.
> 

The problem is that, when we are already in "skipping" mode, we don't have actual bitmap to understand, is size look reasonable or not. We can probably just invent some heuristic constant (100M for example?), so that any size less will be silently skipped, and any size above will be considered as stream violation and cancel postcopy process.


> 
>>>
>>>> -        uint64_t needed_size =
>>>> -            bdrv_dirty_bitmap_serialization_size(s->bitmap,
>>>> -                                                 first_byte, nr_bytes);
>>>> +        uint64_t needed_size;
>>>> +
>>>> +        buf = g_malloc(buf_size);
>>>> +        ret = qemu_get_buffer(f, buf, buf_size);
>>>
>>> ...and using it to malloc memory.  Is that a potential risk of a malicious stream causing us to allocate too much memory in relation to the guest's normal size?  If so, fixing that should be done separately.
>>>
>>> I'm not a migration expert, but the patch looks reasonable to me.
>>>
>>> Reviewed-by: Eric Blake <eblake@redhat.com>
>>>
>>
>>
>> -- 
>> Best regards,
>> Vladimir
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 15/21] migration/block-dirty-bitmap: relax error handling in incoming part
  2020-07-27 11:23   ` Dr. David Alan Gilbert
@ 2020-07-27 11:28     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-27 11:28 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: kwolf, fam, qemu-block, quintela, qemu-devel, mreitz, stefanha,
	andrey.shinkevich, den, jsnow

27.07.2020 14:23, Dr. David Alan Gilbert wrote:
> * Vladimir Sementsov-Ogievskiy (vsementsov@virtuozzo.com) wrote:
>> Bitmaps data is not critical, and we should not fail the migration (or
>> use postcopy recovering) because of dirty-bitmaps migration failure.
>> Instead we should just lose unfinished bitmaps.
>>
>> Still we have to report io stream violation errors, as they affect the
>> whole migration stream.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   migration/block-dirty-bitmap.c | 152 +++++++++++++++++++++++++--------
>>   1 file changed, 117 insertions(+), 35 deletions(-)
>>
>> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
>> index eb4ffeac4d..c24d4614bf 100644
>> --- a/migration/block-dirty-bitmap.c
>> +++ b/migration/block-dirty-bitmap.c
>> @@ -145,6 +145,15 @@ typedef struct DBMLoadState {
>>   
>>       bool before_vm_start_handled; /* set in dirty_bitmap_mig_before_vm_start */
>>   
>> +    /*
>> +     * cancelled
>> +     * Incoming migration is cancelled for some reason. That means that we
>> +     * still should read our chunks from migration stream, to not affect other
>> +     * migration objects (like RAM), but just ignore them and do not touch any
>> +     * bitmaps or nodes.
>> +     */
>> +    bool cancelled;
>> +
>>       GSList *bitmaps;
>>       QemuMutex lock; /* protect bitmaps */
>>   } DBMLoadState;
>> @@ -531,6 +540,10 @@ static int dirty_bitmap_load_start(QEMUFile *f, DBMLoadState *s)
>>       uint8_t flags = qemu_get_byte(f);
>>       LoadBitmapState *b;
>>   
>> +    if (s->cancelled) {
>> +        return 0;
>> +    }
>> +
>>       if (s->bitmap) {
>>           error_report("Bitmap with the same name ('%s') already exists on "
>>                        "destination", bdrv_dirty_bitmap_name(s->bitmap));
>> @@ -613,13 +626,47 @@ void dirty_bitmap_mig_before_vm_start(void)
>>       qemu_mutex_unlock(&s->lock);
>>   }
>>   
>> +static void cancel_incoming_locked(DBMLoadState *s)
>> +{
>> +    GSList *item;
>> +
>> +    if (s->cancelled) {
>> +        return;
>> +    }
>> +
>> +    s->cancelled = true;
>> +    s->bs = NULL;
>> +    s->bitmap = NULL;
>> +
>> +    /* Drop all unfinished bitmaps */
>> +    for (item = s->bitmaps; item; item = g_slist_next(item)) {
>> +        LoadBitmapState *b = item->data;
>> +
>> +        /*
>> +         * Bitmap must be unfinished, as finished bitmaps should already be
>> +         * removed from the list.
>> +         */
>> +        assert(!s->before_vm_start_handled || !b->migrated);
>> +        if (bdrv_dirty_bitmap_has_successor(b->bitmap)) {
>> +            bdrv_reclaim_dirty_bitmap(b->bitmap, &error_abort);
>> +        }
>> +        bdrv_release_dirty_bitmap(b->bitmap);
>> +    }
>> +
>> +    g_slist_free_full(s->bitmaps, g_free);
>> +    s->bitmaps = NULL;
>> +}
>> +
>>   static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>>   {
>>       GSList *item;
>>       trace_dirty_bitmap_load_complete();
>> -    bdrv_dirty_bitmap_deserialize_finish(s->bitmap);
>>   
>> -    qemu_mutex_lock(&s->lock);
>> +    if (s->cancelled) {
>> +        return;
>> +    }
>> +
>> +    bdrv_dirty_bitmap_deserialize_finish(s->bitmap);
>>   
>>       if (bdrv_dirty_bitmap_has_successor(s->bitmap)) {
>>           bdrv_reclaim_dirty_bitmap(s->bitmap, &error_abort);
>> @@ -637,8 +684,6 @@ static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>>               break;
>>           }
>>       }
>> -
>> -    qemu_mutex_unlock(&s->lock);
>>   }
>>   
>>   static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
>> @@ -650,15 +695,32 @@ static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
>>   
>>       if (s->flags & DIRTY_BITMAP_MIG_FLAG_ZEROES) {
>>           trace_dirty_bitmap_load_bits_zeroes();
>> -        bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_byte, nr_bytes,
>> -                                             false);
>> +        if (!s->cancelled) {
>> +            bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_byte,
>> +                                                 nr_bytes, false);
>> +        }
>>       } else {
>>           size_t ret;
>>           uint8_t *buf;
>>           uint64_t buf_size = qemu_get_be64(f);
>> -        uint64_t needed_size =
>> -            bdrv_dirty_bitmap_serialization_size(s->bitmap,
>> -                                                 first_byte, nr_bytes);
>> +        uint64_t needed_size;
>> +
>> +        buf = g_malloc(buf_size);
>> +        ret = qemu_get_buffer(f, buf, buf_size);
>> +        if (ret != buf_size) {
>> +            error_report("Failed to read bitmap bits");
>> +            g_free(buf);
>> +            return -EIO;
>> +        }
>> +
>> +        if (s->cancelled) {
>> +            g_free(buf);
>> +            return 0;
>> +        }
>> +
>> +        needed_size = bdrv_dirty_bitmap_serialization_size(s->bitmap,
>> +                                                           first_byte,
>> +                                                           nr_bytes);
>>   
>>           if (needed_size > buf_size ||
>>               buf_size > QEMU_ALIGN_UP(needed_size, 4 * sizeof(long))
>> @@ -667,15 +729,8 @@ static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
>>               error_report("Migrated bitmap granularity doesn't "
>>                            "match the destination bitmap '%s' granularity",
>>                            bdrv_dirty_bitmap_name(s->bitmap));
>> -            return -EINVAL;
>> -        }
>> -
>> -        buf = g_malloc(buf_size);
>> -        ret = qemu_get_buffer(f, buf, buf_size);
>> -        if (ret != buf_size) {
>> -            error_report("Failed to read bitmap bits");
>> -            g_free(buf);
>> -            return -EIO;
>> +            cancel_incoming_locked(s);
>> +            return 0;
>>           }
>>   
>>           bdrv_dirty_bitmap_deserialize_part(s->bitmap, buf, first_byte, nr_bytes,
>> @@ -700,14 +755,16 @@ static int dirty_bitmap_load_header(QEMUFile *f, DBMLoadState *s)
>>               error_report("Unable to read node name string");
>>               return -EINVAL;
>>           }
>> -        s->bs = bdrv_lookup_bs(s->node_name, s->node_name, &local_err);
>> -        if (!s->bs) {
>> -            error_report_err(local_err);
>> -            return -EINVAL;
>> +        if (!s->cancelled) {
>> +            s->bs = bdrv_lookup_bs(s->node_name, s->node_name, &local_err);
>> +            if (!s->bs) {
>> +                error_report_err(local_err);
>> +                cancel_incoming_locked(s);
>> +            }
>>           }
>> -    } else if (!s->bs && !nothing) {
>> +    } else if (!s->bs && !nothing && !s->cancelled) {
>>           error_report("Error: block device name is not set");
>> -        return -EINVAL;
>> +        cancel_incoming_locked(s);
>>       }
>>   
>>       if (s->flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
>> @@ -715,24 +772,38 @@ static int dirty_bitmap_load_header(QEMUFile *f, DBMLoadState *s)
>>               error_report("Unable to read bitmap name string");
>>               return -EINVAL;
>>           }
>> -        s->bitmap = bdrv_find_dirty_bitmap(s->bs, s->bitmap_name);
>> -
>> -        /* bitmap may be NULL here, it wouldn't be an error if it is the
>> -         * first occurrence of the bitmap */
>> -        if (!s->bitmap && !(s->flags & DIRTY_BITMAP_MIG_FLAG_START)) {
>> -            error_report("Error: unknown dirty bitmap "
>> -                         "'%s' for block device '%s'",
>> -                         s->bitmap_name, s->node_name);
>> -            return -EINVAL;
>> +        if (!s->cancelled) {
>> +            s->bitmap = bdrv_find_dirty_bitmap(s->bs, s->bitmap_name);
>> +
>> +            /*
>> +             * bitmap may be NULL here, it wouldn't be an error if it is the
>> +             * first occurrence of the bitmap
>> +             */
>> +            if (!s->bitmap && !(s->flags & DIRTY_BITMAP_MIG_FLAG_START)) {
>> +                error_report("Error: unknown dirty bitmap "
>> +                             "'%s' for block device '%s'",
>> +                             s->bitmap_name, s->node_name);
>> +                cancel_incoming_locked(s);
>> +            }
>>           }
>> -    } else if (!s->bitmap && !nothing) {
>> +    } else if (!s->bitmap && !nothing && !s->cancelled) {
>>           error_report("Error: block device name is not set");
>> -        return -EINVAL;
>> +        cancel_incoming_locked(s);
>>       }
>>   
>>       return 0;
>>   }
>>   
>> +/*
>> + * dirty_bitmap_load
>> + *
>> + * Load sequence of dirty bitmap chunks. Return error only on fatal io stream
>> + * violations. On other errors just cancel bitmaps incoming migration and return
>> + * 0.
>> + *
>> + * Note, than when incoming bitmap migration is canceled, we still must read all
>> + * our chunks (and just ignore them), to not affect other migration objects.
>> + */
>>   static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
>>   {
>>       DBMLoadState *s = &((DBMState *)opaque)->load;
>> @@ -741,12 +812,19 @@ static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
>>       trace_dirty_bitmap_load_enter();
>>   
>>       if (version_id != 1) {
>> +        qemu_mutex_lock(&s->lock);
>> +        cancel_incoming_locked(s);
>> +        qemu_mutex_unlock(&s->lock);
>>           return -EINVAL;
>>       }
>>   
>>       do {
>> +        qemu_mutex_lock(&s->lock);
> 
> Would QEMU_LOCK_GUARD(&s->lock)  work there?
> It avoids the need to catch the unlock on each of the failure cases.
> 

Yes it should work, thanks. I just didn't know about this thing.


> 
>>           ret = dirty_bitmap_load_header(f, s);
>>           if (ret < 0) {
>> +            cancel_incoming_locked(s);
>> +            qemu_mutex_unlock(&s->lock);
>>               return ret;
>>           }
>>   
>> @@ -763,8 +841,12 @@ static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
>>           }
>>   
>>           if (ret) {
>> +            cancel_incoming_locked(s);
>> +            qemu_mutex_unlock(&s->lock);
>>               return ret;
>>           }
>> +
>> +        qemu_mutex_unlock(&s->lock);
>>       } while (!(s->flags & DIRTY_BITMAP_MIG_FLAG_EOS));
>>   
>>       trace_dirty_bitmap_load_success();
>> -- 
>> 2.21.0
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 16/21] migration/block-dirty-bitmap: cancel migration on shutdown
  2020-07-24  8:43 ` [PATCH v3 16/21] migration/block-dirty-bitmap: cancel migration on shutdown Vladimir Sementsov-Ogievskiy
@ 2020-07-27 13:21   ` Dr. David Alan Gilbert
  2020-07-27 17:06     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 38+ messages in thread
From: Dr. David Alan Gilbert @ 2020-07-27 13:21 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: kwolf, fam, qemu-block, quintela, qemu-devel, mreitz, stefanha,
	andrey.shinkevich, den, jsnow

* Vladimir Sementsov-Ogievskiy (vsementsov@virtuozzo.com) wrote:
> If target is turned off prior to postcopy finished, target crashes
> because busy bitmaps are found at shutdown.
> Canceling incoming migration helps, as it removes all unfinished (and
> therefore busy) bitmaps.
> 
> Similarly on source we crash in bdrv_close_all which asserts that all
> bdrv states are removed, because bdrv states involved into dirty bitmap
> migration are referenced by it. So, we need to cancel outgoing
> migration as well.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> ---
>  migration/migration.h          |  2 ++
>  migration/block-dirty-bitmap.c | 16 ++++++++++++++++
>  migration/migration.c          | 13 +++++++++++++
>  3 files changed, 31 insertions(+)
> 
> diff --git a/migration/migration.h b/migration/migration.h
> index ab20c756f5..6c6a931d0d 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -335,6 +335,8 @@ void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
>  void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
>  
>  void dirty_bitmap_mig_before_vm_start(void);
> +void dirty_bitmap_mig_cancel_outgoing(void);
> +void dirty_bitmap_mig_cancel_incoming(void);
>  void migrate_add_address(SocketAddress *address);
>  
>  int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
> index c24d4614bf..a198ec7278 100644
> --- a/migration/block-dirty-bitmap.c
> +++ b/migration/block-dirty-bitmap.c
> @@ -657,6 +657,22 @@ static void cancel_incoming_locked(DBMLoadState *s)
>      s->bitmaps = NULL;
>  }
>  
> +void dirty_bitmap_mig_cancel_outgoing(void)
> +{
> +    dirty_bitmap_do_save_cleanup(&dbm_state.save);
> +}
> +
> +void dirty_bitmap_mig_cancel_incoming(void)
> +{
> +    DBMLoadState *s = &dbm_state.load;
> +
> +    qemu_mutex_lock(&s->lock);
> +
> +    cancel_incoming_locked(s);
> +
> +    qemu_mutex_unlock(&s->lock);
> +}
> +
>  static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>  {
>      GSList *item;
> diff --git a/migration/migration.c b/migration/migration.c
> index 1c61428988..8fe36339db 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -188,6 +188,19 @@ void migration_shutdown(void)
>       */
>      migrate_fd_cancel(current_migration);
>      object_unref(OBJECT(current_migration));
> +
> +    /*
> +     * Cancel outgoing migration of dirty bitmaps. It should
> +     * at least unref used block nodes.
> +     */
> +    dirty_bitmap_mig_cancel_outgoing();
> +
> +    /*
> +     * Cancel incoming migration of dirty bitmaps. Dirty bitmaps
> +     * are non-critical data, and their loss never considered as
> +     * something serious.
> +     */
> +    dirty_bitmap_mig_cancel_incoming();

Are you sure this is the right place to put them - I'm thinking that
perhaps the object_unref of current_migration should still be after
them?

Dave

>  }
>  
>  /* For outgoing */
> -- 
> 2.21.0
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 17/21] migration/savevm: don't worry if bitmap migration postcopy failed
  2020-07-24 18:08   ` Eric Blake
@ 2020-07-27 13:29     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 38+ messages in thread
From: Dr. David Alan Gilbert @ 2020-07-27 13:29 UTC (permalink / raw)
  To: Eric Blake
  Cc: kwolf, fam, Vladimir Sementsov-Ogievskiy, qemu-block, quintela,
	qemu-devel, mreitz, stefanha, andrey.shinkevich, den, jsnow

* Eric Blake (eblake@redhat.com) wrote:
> On 7/24/20 3:43 AM, Vladimir Sementsov-Ogievskiy wrote:
> > First, if only bitmaps postcopy enabled (not ram postcopy)
> 
> is enabled (and not ram postcopy),
> 
> > postcopy_pause_incoming crashes on assertion assert(mis->to_src_file).
> 
> on an
> 
> > 
> > And anyway, bitmaps postcopy is not prepared to be somehow recovered.
> > The original idea instead is that if bitmaps postcopy failed, we just
> > loss some bitmaps, which is not critical. So, on failure we just need
> 
> lose
> 
> > to remove unfinished bitmaps and guest should continue execution on
> > destination.
> > 
> > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> > ---
> >   migration/savevm.c | 37 ++++++++++++++++++++++++++++++++-----
> >   1 file changed, 32 insertions(+), 5 deletions(-)
> > 
> 
> Definitely a bug fix, but I'd like David's opinion on whether this is still
> 5.1 material (because it is limited to just bitmaps migration, which is
> opt-in) or too risky (because we've already had several releases where it
> was broken, what's one more?).

I think it's OK for 5.1

Dave

> I'm less familiar with the code, so this is weak, but I did read through it
> and nothing jumped out at me, so:
> 
> Reviewed-by: Eric Blake <eblake@redhat.com>
> 
> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.           +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 16/21] migration/block-dirty-bitmap: cancel migration on shutdown
  2020-07-27 13:21   ` Dr. David Alan Gilbert
@ 2020-07-27 17:06     ` Vladimir Sementsov-Ogievskiy
  2020-07-27 19:27       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-27 17:06 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: kwolf, fam, qemu-block, quintela, qemu-devel, mreitz, stefanha,
	andrey.shinkevich, den, jsnow

27.07.2020 16:21, Dr. David Alan Gilbert wrote:
> * Vladimir Sementsov-Ogievskiy (vsementsov@virtuozzo.com) wrote:
>> If target is turned off prior to postcopy finished, target crashes
>> because busy bitmaps are found at shutdown.
>> Canceling incoming migration helps, as it removes all unfinished (and
>> therefore busy) bitmaps.
>>
>> Similarly on source we crash in bdrv_close_all which asserts that all
>> bdrv states are removed, because bdrv states involved into dirty bitmap
>> migration are referenced by it. So, we need to cancel outgoing
>> migration as well.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>> ---
>>   migration/migration.h          |  2 ++
>>   migration/block-dirty-bitmap.c | 16 ++++++++++++++++
>>   migration/migration.c          | 13 +++++++++++++
>>   3 files changed, 31 insertions(+)
>>
>> diff --git a/migration/migration.h b/migration/migration.h
>> index ab20c756f5..6c6a931d0d 100644
>> --- a/migration/migration.h
>> +++ b/migration/migration.h
>> @@ -335,6 +335,8 @@ void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
>>   void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
>>   
>>   void dirty_bitmap_mig_before_vm_start(void);
>> +void dirty_bitmap_mig_cancel_outgoing(void);
>> +void dirty_bitmap_mig_cancel_incoming(void);
>>   void migrate_add_address(SocketAddress *address);
>>   
>>   int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
>> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
>> index c24d4614bf..a198ec7278 100644
>> --- a/migration/block-dirty-bitmap.c
>> +++ b/migration/block-dirty-bitmap.c
>> @@ -657,6 +657,22 @@ static void cancel_incoming_locked(DBMLoadState *s)
>>       s->bitmaps = NULL;
>>   }
>>   
>> +void dirty_bitmap_mig_cancel_outgoing(void)
>> +{
>> +    dirty_bitmap_do_save_cleanup(&dbm_state.save);
>> +}
>> +
>> +void dirty_bitmap_mig_cancel_incoming(void)
>> +{
>> +    DBMLoadState *s = &dbm_state.load;
>> +
>> +    qemu_mutex_lock(&s->lock);
>> +
>> +    cancel_incoming_locked(s);
>> +
>> +    qemu_mutex_unlock(&s->lock);
>> +}
>> +
>>   static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>>   {
>>       GSList *item;
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 1c61428988..8fe36339db 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -188,6 +188,19 @@ void migration_shutdown(void)
>>        */
>>       migrate_fd_cancel(current_migration);
>>       object_unref(OBJECT(current_migration));
>> +
>> +    /*
>> +     * Cancel outgoing migration of dirty bitmaps. It should
>> +     * at least unref used block nodes.
>> +     */
>> +    dirty_bitmap_mig_cancel_outgoing();
>> +
>> +    /*
>> +     * Cancel incoming migration of dirty bitmaps. Dirty bitmaps
>> +     * are non-critical data, and their loss never considered as
>> +     * something serious.
>> +     */
>> +    dirty_bitmap_mig_cancel_incoming();
> 
> Are you sure this is the right place to put them - I'm thinking that
> perhaps the object_unref of current_migration should still be after
> them?

Hmm, looks strange, I will check.

> 
>>   }
>>   
>>   /* For outgoing */
>> -- 
>> 2.21.0
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 16/21] migration/block-dirty-bitmap: cancel migration on shutdown
  2020-07-27 17:06     ` Vladimir Sementsov-Ogievskiy
@ 2020-07-27 19:27       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-27 19:27 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: kwolf, fam, qemu-block, quintela, qemu-devel, mreitz, stefanha,
	andrey.shinkevich, den, jsnow

27.07.2020 20:06, Vladimir Sementsov-Ogievskiy wrote:
> 27.07.2020 16:21, Dr. David Alan Gilbert wrote:
>> * Vladimir Sementsov-Ogievskiy (vsementsov@virtuozzo.com) wrote:
>>> If target is turned off prior to postcopy finished, target crashes
>>> because busy bitmaps are found at shutdown.
>>> Canceling incoming migration helps, as it removes all unfinished (and
>>> therefore busy) bitmaps.
>>>
>>> Similarly on source we crash in bdrv_close_all which asserts that all
>>> bdrv states are removed, because bdrv states involved into dirty bitmap
>>> migration are referenced by it. So, we need to cancel outgoing
>>> migration as well.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>>> ---
>>>   migration/migration.h          |  2 ++
>>>   migration/block-dirty-bitmap.c | 16 ++++++++++++++++
>>>   migration/migration.c          | 13 +++++++++++++
>>>   3 files changed, 31 insertions(+)
>>>
>>> diff --git a/migration/migration.h b/migration/migration.h
>>> index ab20c756f5..6c6a931d0d 100644
>>> --- a/migration/migration.h
>>> +++ b/migration/migration.h
>>> @@ -335,6 +335,8 @@ void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
>>>   void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
>>>   void dirty_bitmap_mig_before_vm_start(void);
>>> +void dirty_bitmap_mig_cancel_outgoing(void);
>>> +void dirty_bitmap_mig_cancel_incoming(void);
>>>   void migrate_add_address(SocketAddress *address);
>>>   int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
>>> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
>>> index c24d4614bf..a198ec7278 100644
>>> --- a/migration/block-dirty-bitmap.c
>>> +++ b/migration/block-dirty-bitmap.c
>>> @@ -657,6 +657,22 @@ static void cancel_incoming_locked(DBMLoadState *s)
>>>       s->bitmaps = NULL;
>>>   }
>>> +void dirty_bitmap_mig_cancel_outgoing(void)
>>> +{
>>> +    dirty_bitmap_do_save_cleanup(&dbm_state.save);
>>> +}
>>> +
>>> +void dirty_bitmap_mig_cancel_incoming(void)
>>> +{
>>> +    DBMLoadState *s = &dbm_state.load;
>>> +
>>> +    qemu_mutex_lock(&s->lock);
>>> +
>>> +    cancel_incoming_locked(s);
>>> +
>>> +    qemu_mutex_unlock(&s->lock);
>>> +}
>>> +
>>>   static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>>>   {
>>>       GSList *item;
>>> diff --git a/migration/migration.c b/migration/migration.c
>>> index 1c61428988..8fe36339db 100644
>>> --- a/migration/migration.c
>>> +++ b/migration/migration.c
>>> @@ -188,6 +188,19 @@ void migration_shutdown(void)
>>>        */
>>>       migrate_fd_cancel(current_migration);
>>>       object_unref(OBJECT(current_migration));
>>> +
>>> +    /*
>>> +     * Cancel outgoing migration of dirty bitmaps. It should
>>> +     * at least unref used block nodes.
>>> +     */
>>> +    dirty_bitmap_mig_cancel_outgoing();
>>> +
>>> +    /*
>>> +     * Cancel incoming migration of dirty bitmaps. Dirty bitmaps
>>> +     * are non-critical data, and their loss never considered as
>>> +     * something serious.
>>> +     */
>>> +    dirty_bitmap_mig_cancel_incoming();
>>
>> Are you sure this is the right place to put them - I'm thinking that
>> perhaps the object_unref of current_migration should still be after
>> them?
> 
> Hmm, looks strange, I will check.

It's OK. These functions are operate on global bitmap migration state
which is separate from  current_migration, and do post-processing of
dirty bitmaps, so it's seems OK to do it at last.

> 
>>
>>>   }
>>>   /* For outgoing */
>>> -- 
>>> 2.21.0
>>>
>> -- 
>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>
> 
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2020-07-27 19:28 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-24  8:43 [PATCH v3 for-5.1?? 00/21] Fix error handling during bitmap postcopy Vladimir Sementsov-Ogievskiy
2020-07-24  8:43 ` [PATCH v3 01/21] qemu-iotests/199: fix style Vladimir Sementsov-Ogievskiy
2020-07-24 15:03   ` Eric Blake
2020-07-24  8:43 ` [PATCH v3 02/21] qemu-iotests/199: drop extra constraints Vladimir Sementsov-Ogievskiy
2020-07-24  8:43 ` [PATCH v3 03/21] qemu-iotests/199: better catch postcopy time Vladimir Sementsov-Ogievskiy
2020-07-24  8:43 ` [PATCH v3 04/21] qemu-iotests/199: improve performance: set bitmap by discard Vladimir Sementsov-Ogievskiy
2020-07-24  8:43 ` [PATCH v3 05/21] qemu-iotests/199: change discard patterns Vladimir Sementsov-Ogievskiy
2020-07-24  8:43 ` [PATCH v3 06/21] qemu-iotests/199: increase postcopy period Vladimir Sementsov-Ogievskiy
2020-07-24  8:43 ` [PATCH v3 07/21] migration/block-dirty-bitmap: fix dirty_bitmap_mig_before_vm_start Vladimir Sementsov-Ogievskiy
2020-07-24 15:49   ` Eric Blake
2020-07-24  8:43 ` [PATCH v3 08/21] migration/block-dirty-bitmap: rename state structure types Vladimir Sementsov-Ogievskiy
2020-07-24  8:43 ` [PATCH v3 09/21] migration/block-dirty-bitmap: rename dirty_bitmap_mig_cleanup Vladimir Sementsov-Ogievskiy
2020-07-24  8:43 ` [PATCH v3 10/21] migration/block-dirty-bitmap: move mutex init to dirty_bitmap_mig_init Vladimir Sementsov-Ogievskiy
2020-07-27  9:51   ` Dr. David Alan Gilbert
2020-07-24  8:43 ` [PATCH v3 11/21] migration/block-dirty-bitmap: refactor state global variables Vladimir Sementsov-Ogievskiy
2020-07-24  8:43 ` [PATCH v3 12/21] migration/block-dirty-bitmap: rename finish_lock to just lock Vladimir Sementsov-Ogievskiy
2020-07-24  8:43 ` [PATCH v3 13/21] migration/block-dirty-bitmap: simplify dirty_bitmap_load_complete Vladimir Sementsov-Ogievskiy
2020-07-24 16:11   ` Eric Blake
2020-07-24  8:43 ` [PATCH v3 14/21] migration/block-dirty-bitmap: keep bitmap state for all bitmaps Vladimir Sementsov-Ogievskiy
2020-07-24  8:43 ` [PATCH v3 15/21] migration/block-dirty-bitmap: relax error handling in incoming part Vladimir Sementsov-Ogievskiy
2020-07-24 17:35   ` Eric Blake
2020-07-24 20:30     ` Vladimir Sementsov-Ogievskiy
2020-07-27 11:16       ` Dr. David Alan Gilbert
2020-07-27 11:26         ` Vladimir Sementsov-Ogievskiy
2020-07-27 11:23   ` Dr. David Alan Gilbert
2020-07-27 11:28     ` Vladimir Sementsov-Ogievskiy
2020-07-24  8:43 ` [PATCH v3 16/21] migration/block-dirty-bitmap: cancel migration on shutdown Vladimir Sementsov-Ogievskiy
2020-07-27 13:21   ` Dr. David Alan Gilbert
2020-07-27 17:06     ` Vladimir Sementsov-Ogievskiy
2020-07-27 19:27       ` Vladimir Sementsov-Ogievskiy
2020-07-24  8:43 ` [PATCH v3 17/21] migration/savevm: don't worry if bitmap migration postcopy failed Vladimir Sementsov-Ogievskiy
2020-07-24 18:08   ` Eric Blake
2020-07-27 13:29     ` Dr. David Alan Gilbert
2020-07-24  8:43 ` [PATCH v3 18/21] qemu-iotests/199: prepare for new test-cases addition Vladimir Sementsov-Ogievskiy
2020-07-24  8:43 ` [PATCH v3 19/21] qemu-iotests/199: check persistent bitmaps Vladimir Sementsov-Ogievskiy
2020-07-24  8:43 ` [PATCH v3 20/21] qemu-iotests/199: add early shutdown case to bitmaps postcopy Vladimir Sementsov-Ogievskiy
2020-07-24 19:07   ` Eric Blake
2020-07-24  8:43 ` [PATCH v3 21/21] qemu-iotests/199: add source-killed " Vladimir Sementsov-Ogievskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.