QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [Qemu-devel] [PULL 00/16] Block layer patches
@ 2019-08-16  9:34 Kevin Wolf
  2019-08-16  9:34 ` [Qemu-devel] [PULL 01/16] iotests/118: Test media change for scsi-cd Kevin Wolf
                   ` (17 more replies)
  0 siblings, 18 replies; 19+ messages in thread
From: Kevin Wolf @ 2019-08-16  9:34 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

The following changes since commit 9e06029aea3b2eca1d5261352e695edc1e7d7b8b:

  Update version for v4.1.0 release (2019-08-15 13:03:37 +0100)

are available in the Git repository at:

  git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to a6b257a08e3d72219f03e461a52152672fec0612:

  file-posix: Handle undetectable alignment (2019-08-16 11:29:11 +0200)

----------------------------------------------------------------
Block layer patches:

- file-posix: Fix O_DIRECT alignment detection
- Fixes for concurrent block jobs
- block-backend: Queue requests while drained (fix IDE vs. job crashes)
- qemu-img convert: Deprecate using -n and -o together
- iotests: Migration tests with filter nodes
- iotests: More media change tests

----------------------------------------------------------------
Kevin Wolf (10):
      iotests/118: Test media change for scsi-cd
      iotests/118: Create test classes dynamically
      iotests/118: Add -blockdev based tests
      iotests: Move migration helpers to iotests.py
      iotests: Test migration with all kinds of filter nodes
      block: Simplify bdrv_filter_default_perms()
      block: Remove blk_pread_unthrottled()
      mirror: Keep mirror_top_bs drained after dropping permissions
      block-backend: Queue requests while drained
      qemu-img convert: Deprecate using -n and -o together

Max Reitz (5):
      block: Keep subtree drained in drop_intermediate
      block: Reduce (un)drains when replacing a child
      tests: Test polling in bdrv_drop_intermediate()
      tests: Test mid-drain bdrv_replace_child_noperm()
      iotests: Add test for concurrent stream/commit

Nir Soffer (1):
      file-posix: Handle undetectable alignment

 include/sysemu/block-backend.h |   3 +-
 block.c                        |  63 +++---
 block/backup.c                 |   1 +
 block/block-backend.c          |  69 ++++--
 block/commit.c                 |   2 +
 block/file-posix.c             |  36 +++-
 block/mirror.c                 |   7 +-
 blockjob.c                     |   3 +
 hw/block/hd-geometry.c         |   7 +-
 qemu-img.c                     |   5 +
 tests/test-bdrv-drain.c        | 476 +++++++++++++++++++++++++++++++++++++++++
 qemu-deprecated.texi           |   7 +
 tests/qemu-iotests/118         |  84 ++++----
 tests/qemu-iotests/118.out     |   4 +-
 tests/qemu-iotests/234         |  30 +--
 tests/qemu-iotests/258         | 163 ++++++++++++++
 tests/qemu-iotests/258.out     |  33 +++
 tests/qemu-iotests/262         |  82 +++++++
 tests/qemu-iotests/262.out     |  17 ++
 tests/qemu-iotests/group       |   2 +
 tests/qemu-iotests/iotests.py  |  16 ++
 21 files changed, 983 insertions(+), 127 deletions(-)
 create mode 100755 tests/qemu-iotests/258
 create mode 100644 tests/qemu-iotests/258.out
 create mode 100755 tests/qemu-iotests/262
 create mode 100644 tests/qemu-iotests/262.out


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PULL 01/16] iotests/118: Test media change for scsi-cd
  2019-08-16  9:34 [Qemu-devel] [PULL 00/16] Block layer patches Kevin Wolf
@ 2019-08-16  9:34 ` Kevin Wolf
  2019-08-16  9:34 ` [Qemu-devel] [PULL 02/16] iotests/118: Create test classes dynamically Kevin Wolf
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Kevin Wolf @ 2019-08-16  9:34 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

The test covered only floppy and ide-cd. Add scsi-cd as well.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/118     | 20 ++++++++++++++++++++
 tests/qemu-iotests/118.out |  4 ++--
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/118 b/tests/qemu-iotests/118
index 499c5f0901..3c20d2d61f 100755
--- a/tests/qemu-iotests/118
+++ b/tests/qemu-iotests/118
@@ -33,6 +33,8 @@ def interface_to_device_name(interface):
         return 'ide-cd'
     elif interface == 'floppy':
         return 'floppy'
+    elif interface == 'scsi':
+        return 'scsi-cd'
     else:
         return None
 
@@ -297,6 +299,8 @@ class TestInitiallyFilled(GeneralChangeTestsBaseClass):
         qemu_img('create', '-f', iotests.imgfmt, new_img, '1440k')
         self.vm = iotests.VM()
         self.vm.add_drive(old_img, 'media=%s' % media, 'none')
+        if interface == 'scsi':
+            self.vm.add_device('virtio-scsi-pci')
         self.vm.add_device('%s,drive=drive0,id=%s' %
                            (interface_to_device_name(interface),
                             self.device_name))
@@ -330,6 +334,8 @@ class TestInitiallyEmpty(GeneralChangeTestsBaseClass):
     def setUp(self, media, interface):
         qemu_img('create', '-f', iotests.imgfmt, new_img, '1440k')
         self.vm = iotests.VM().add_drive(None, 'media=%s' % media, 'none')
+        if interface == 'scsi':
+            self.vm.add_device('virtio-scsi-pci')
         self.vm.add_device('%s,drive=drive0,id=%s' %
                            (interface_to_device_name(interface),
                             self.device_name))
@@ -363,6 +369,20 @@ class TestCDInitiallyEmpty(TestInitiallyEmpty):
     def setUp(self):
         self.TestInitiallyEmpty.setUp(self, 'cdrom', 'ide')
 
+class TestSCSICDInitiallyFilled(TestInitiallyFilled):
+    TestInitiallyFilled = TestInitiallyFilled
+    has_real_tray = True
+
+    def setUp(self):
+        self.TestInitiallyFilled.setUp(self, 'cdrom', 'scsi')
+
+class TestSCSICDInitiallyEmpty(TestInitiallyEmpty):
+    TestInitiallyEmpty = TestInitiallyEmpty
+    has_real_tray = True
+
+    def setUp(self):
+        self.TestInitiallyEmpty.setUp(self, 'cdrom', 'scsi')
+
 class TestFloppyInitiallyFilled(TestInitiallyFilled):
     TestInitiallyFilled = TestInitiallyFilled
     has_real_tray = False
diff --git a/tests/qemu-iotests/118.out b/tests/qemu-iotests/118.out
index 4823c113d5..b4ff997a8c 100644
--- a/tests/qemu-iotests/118.out
+++ b/tests/qemu-iotests/118.out
@@ -1,5 +1,5 @@
-...............................................................
+.........................................................................................
 ----------------------------------------------------------------------
-Ran 63 tests
+Ran 89 tests
 
 OK
-- 
2.20.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PULL 02/16] iotests/118: Create test classes dynamically
  2019-08-16  9:34 [Qemu-devel] [PULL 00/16] Block layer patches Kevin Wolf
  2019-08-16  9:34 ` [Qemu-devel] [PULL 01/16] iotests/118: Test media change for scsi-cd Kevin Wolf
@ 2019-08-16  9:34 ` Kevin Wolf
  2019-08-16  9:34 ` [Qemu-devel] [PULL 03/16] iotests/118: Add -blockdev based tests Kevin Wolf
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Kevin Wolf @ 2019-08-16  9:34 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

We're getting a ridiculous number of child classes of
TestInitiallyFilled and TestInitiallyEmpty that differ only in a few
attributes that we want to test in all combinations.

Instead of explicitly writing down every combination, let's use a loop
and create those classes dynamically.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/118 | 69 +++++++++++++-----------------------------
 1 file changed, 21 insertions(+), 48 deletions(-)

diff --git a/tests/qemu-iotests/118 b/tests/qemu-iotests/118
index 3c20d2d61f..c281259215 100755
--- a/tests/qemu-iotests/118
+++ b/tests/qemu-iotests/118
@@ -294,15 +294,15 @@ class GeneralChangeTestsBaseClass(ChangeBaseClass):
 class TestInitiallyFilled(GeneralChangeTestsBaseClass):
     was_empty = False
 
-    def setUp(self, media, interface):
+    def setUp(self):
         qemu_img('create', '-f', iotests.imgfmt, old_img, '1440k')
         qemu_img('create', '-f', iotests.imgfmt, new_img, '1440k')
         self.vm = iotests.VM()
-        self.vm.add_drive(old_img, 'media=%s' % media, 'none')
-        if interface == 'scsi':
+        self.vm.add_drive(old_img, 'media=%s' % self.media, 'none')
+        if self.interface == 'scsi':
             self.vm.add_device('virtio-scsi-pci')
         self.vm.add_device('%s,drive=drive0,id=%s' %
-                           (interface_to_device_name(interface),
+                           (interface_to_device_name(self.interface),
                             self.device_name))
         self.vm.launch()
 
@@ -331,13 +331,13 @@ class TestInitiallyFilled(GeneralChangeTestsBaseClass):
 class TestInitiallyEmpty(GeneralChangeTestsBaseClass):
     was_empty = True
 
-    def setUp(self, media, interface):
+    def setUp(self):
         qemu_img('create', '-f', iotests.imgfmt, new_img, '1440k')
-        self.vm = iotests.VM().add_drive(None, 'media=%s' % media, 'none')
-        if interface == 'scsi':
+        self.vm = iotests.VM().add_drive(None, 'media=%s' % self.media, 'none')
+        if self.interface == 'scsi':
             self.vm.add_device('virtio-scsi-pci')
         self.vm.add_device('%s,drive=drive0,id=%s' %
-                           (interface_to_device_name(interface),
+                           (interface_to_device_name(self.interface),
                             self.device_name))
         self.vm.launch()
 
@@ -355,50 +355,23 @@ class TestInitiallyEmpty(GeneralChangeTestsBaseClass):
         # Should be a no-op
         self.assert_qmp(result, 'return', {})
 
-class TestCDInitiallyFilled(TestInitiallyFilled):
-    TestInitiallyFilled = TestInitiallyFilled
-    has_real_tray = True
-
-    def setUp(self):
-        self.TestInitiallyFilled.setUp(self, 'cdrom', 'ide')
-
-class TestCDInitiallyEmpty(TestInitiallyEmpty):
-    TestInitiallyEmpty = TestInitiallyEmpty
-    has_real_tray = True
-
-    def setUp(self):
-        self.TestInitiallyEmpty.setUp(self, 'cdrom', 'ide')
+# Do this in a function to avoid leaking variables like case into the global
+# name space (otherwise tests would be run for the abstract base classes)
+def create_basic_test_classes():
+    for (media, interface, has_real_tray) in [ ('cdrom', 'ide', True),
+                                               ('cdrom', 'scsi', True),
+                                               ('disk', 'floppy', False) ]:
 
-class TestSCSICDInitiallyFilled(TestInitiallyFilled):
-    TestInitiallyFilled = TestInitiallyFilled
-    has_real_tray = True
+        for case in [ TestInitiallyFilled, TestInitiallyEmpty ]:
 
-    def setUp(self):
-        self.TestInitiallyFilled.setUp(self, 'cdrom', 'scsi')
+            attr = { 'media': media,
+                     'interface': interface,
+                     'has_real_tray': has_real_tray }
 
-class TestSCSICDInitiallyEmpty(TestInitiallyEmpty):
-    TestInitiallyEmpty = TestInitiallyEmpty
-    has_real_tray = True
+            name = '%s_%s_%s' % (case.__name__, media, interface)
+            globals()[name] = type(name, (case, ), attr)
 
-    def setUp(self):
-        self.TestInitiallyEmpty.setUp(self, 'cdrom', 'scsi')
-
-class TestFloppyInitiallyFilled(TestInitiallyFilled):
-    TestInitiallyFilled = TestInitiallyFilled
-    has_real_tray = False
-
-    def setUp(self):
-        self.TestInitiallyFilled.setUp(self, 'disk', 'floppy')
-
-class TestFloppyInitiallyEmpty(TestInitiallyEmpty):
-    TestInitiallyEmpty = TestInitiallyEmpty
-    has_real_tray = False
-
-    def setUp(self):
-        self.TestInitiallyEmpty.setUp(self, 'disk', 'floppy')
-        # FDDs not having a real tray and there not being a medium inside the
-        # tray at startup means the tray will be considered open
-        self.has_opened = True
+create_basic_test_classes()
 
 class TestChangeReadOnly(ChangeBaseClass):
     device_name = 'qdev0'
-- 
2.20.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PULL 03/16] iotests/118: Add -blockdev based tests
  2019-08-16  9:34 [Qemu-devel] [PULL 00/16] Block layer patches Kevin Wolf
  2019-08-16  9:34 ` [Qemu-devel] [PULL 01/16] iotests/118: Test media change for scsi-cd Kevin Wolf
  2019-08-16  9:34 ` [Qemu-devel] [PULL 02/16] iotests/118: Create test classes dynamically Kevin Wolf
@ 2019-08-16  9:34 ` Kevin Wolf
  2019-08-16  9:34 ` [Qemu-devel] [PULL 04/16] iotests: Move migration helpers to iotests.py Kevin Wolf
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Kevin Wolf @ 2019-08-16  9:34 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

The code path for -device drive=<node-name> or without a drive=...
option for empty drives, which is supposed to be used with -blockdev
differs enough from the -drive based path with a user-owned
BlockBackend, so we want to test both paths at least for the basic tests
implemented by TestInitiallyFilled and TestInitiallyEmpty.

This would have caught the bug recently fixed for inserting read-only
nodes into a scsi-cd created without a drive=... option.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/118     | 43 ++++++++++++++++++++++++++------------
 tests/qemu-iotests/118.out |  4 ++--
 2 files changed, 32 insertions(+), 15 deletions(-)

diff --git a/tests/qemu-iotests/118 b/tests/qemu-iotests/118
index c281259215..6f45779ee9 100755
--- a/tests/qemu-iotests/118
+++ b/tests/qemu-iotests/118
@@ -42,10 +42,14 @@ class ChangeBaseClass(iotests.QMPTestCase):
     has_opened = False
     has_closed = False
 
+    device_name = 'qdev0'
+    use_drive = False
+
     def process_events(self):
         for event in self.vm.get_qmp_events(wait=False):
             if (event['event'] == 'DEVICE_TRAY_MOVED' and
-                event['data']['device'] == 'drive0'):
+                (event['data']['device'] == 'drive0' or
+                 event['data']['id'] == self.device_name)):
                 if event['data']['tray-open'] == False:
                     self.has_closed = True
                 else:
@@ -69,9 +73,11 @@ class ChangeBaseClass(iotests.QMPTestCase):
 
 class GeneralChangeTestsBaseClass(ChangeBaseClass):
 
-    device_name = 'qdev0'
-
     def test_change(self):
+        # 'change' requires a drive name, so skip the test for blockdev
+        if not self.use_drive:
+            return
+
         result = self.vm.qmp('change', device='drive0', target=new_img,
                                        arg=iotests.imgfmt)
         self.assert_qmp(result, 'return', {})
@@ -298,7 +304,13 @@ class TestInitiallyFilled(GeneralChangeTestsBaseClass):
         qemu_img('create', '-f', iotests.imgfmt, old_img, '1440k')
         qemu_img('create', '-f', iotests.imgfmt, new_img, '1440k')
         self.vm = iotests.VM()
-        self.vm.add_drive(old_img, 'media=%s' % self.media, 'none')
+        if self.use_drive:
+            self.vm.add_drive(old_img, 'media=%s' % self.media, 'none')
+        else:
+            self.vm.add_blockdev([ 'node-name=drive0',
+                                   'driver=%s' % iotests.imgfmt,
+                                   'file.driver=file',
+                                   'file.filename=%s' % old_img ])
         if self.interface == 'scsi':
             self.vm.add_device('virtio-scsi-pci')
         self.vm.add_device('%s,drive=drive0,id=%s' %
@@ -333,11 +345,14 @@ class TestInitiallyEmpty(GeneralChangeTestsBaseClass):
 
     def setUp(self):
         qemu_img('create', '-f', iotests.imgfmt, new_img, '1440k')
-        self.vm = iotests.VM().add_drive(None, 'media=%s' % self.media, 'none')
+        self.vm = iotests.VM()
+        if self.use_drive:
+            self.vm.add_drive(None, 'media=%s' % self.media, 'none')
         if self.interface == 'scsi':
             self.vm.add_device('virtio-scsi-pci')
-        self.vm.add_device('%s,drive=drive0,id=%s' %
+        self.vm.add_device('%s,%sid=%s' %
                            (interface_to_device_name(self.interface),
+                            'drive=drive0,' if self.use_drive else '',
                             self.device_name))
         self.vm.launch()
 
@@ -363,13 +378,15 @@ def create_basic_test_classes():
                                                ('disk', 'floppy', False) ]:
 
         for case in [ TestInitiallyFilled, TestInitiallyEmpty ]:
-
-            attr = { 'media': media,
-                     'interface': interface,
-                     'has_real_tray': has_real_tray }
-
-            name = '%s_%s_%s' % (case.__name__, media, interface)
-            globals()[name] = type(name, (case, ), attr)
+            for use_drive in [ True, False ]:
+                attr = { 'media': media,
+                         'interface': interface,
+                         'has_real_tray': has_real_tray,
+                         'use_drive': use_drive }
+
+                name = '%s_%s_%s_%s' % (case.__name__, media, interface,
+                                        'drive' if use_drive else 'blockdev')
+                globals()[name] = type(name, (case, ), attr)
 
 create_basic_test_classes()
 
diff --git a/tests/qemu-iotests/118.out b/tests/qemu-iotests/118.out
index b4ff997a8c..bf5bfd5aca 100644
--- a/tests/qemu-iotests/118.out
+++ b/tests/qemu-iotests/118.out
@@ -1,5 +1,5 @@
-.........................................................................................
+.......................................................................................................................................................................
 ----------------------------------------------------------------------
-Ran 89 tests
+Ran 167 tests
 
 OK
-- 
2.20.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PULL 04/16] iotests: Move migration helpers to iotests.py
  2019-08-16  9:34 [Qemu-devel] [PULL 00/16] Block layer patches Kevin Wolf
                   ` (2 preceding siblings ...)
  2019-08-16  9:34 ` [Qemu-devel] [PULL 03/16] iotests/118: Add -blockdev based tests Kevin Wolf
@ 2019-08-16  9:34 ` Kevin Wolf
  2019-08-16  9:34 ` [Qemu-devel] [PULL 05/16] iotests: Test migration with all kinds of filter nodes Kevin Wolf
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Kevin Wolf @ 2019-08-16  9:34 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

234 implements functions that are useful for doing migration between two
VMs. Move them to iotests.py so that other test cases can use them, too.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/234        | 30 +++++++-----------------------
 tests/qemu-iotests/iotests.py | 16 ++++++++++++++++
 2 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/tests/qemu-iotests/234 b/tests/qemu-iotests/234
index c4c26bc21e..34c818c485 100755
--- a/tests/qemu-iotests/234
+++ b/tests/qemu-iotests/234
@@ -26,22 +26,6 @@ import os
 iotests.verify_image_format(supported_fmts=['qcow2'])
 iotests.verify_platform(['linux'])
 
-def enable_migration_events(vm, name):
-    iotests.log('Enabling migration QMP events on %s...' % name)
-    iotests.log(vm.qmp('migrate-set-capabilities', capabilities=[
-        {
-            'capability': 'events',
-            'state': True
-        }
-    ]))
-
-def wait_migration(vm):
-    while True:
-        event = vm.event_wait('MIGRATION')
-        iotests.log(event, filters=[iotests.filter_qmp_event])
-        if event['data']['status'] == 'completed':
-            break
-
 with iotests.FilePath('img') as img_path, \
      iotests.FilePath('backing') as backing_path, \
      iotests.FilePath('mig_fifo_a') as fifo_a, \
@@ -62,7 +46,7 @@ with iotests.FilePath('img') as img_path, \
          .add_blockdev('%s,file=drive0-backing-file,node-name=drive0-backing' % (iotests.imgfmt))
          .launch())
 
-    enable_migration_events(vm_a, 'A')
+    vm_a.enable_migration_events('A')
 
     iotests.log('Launching destination VM...')
     (vm_b.add_blockdev('file,filename=%s,node-name=drive0-file' % (img_path))
@@ -72,7 +56,7 @@ with iotests.FilePath('img') as img_path, \
          .add_incoming("exec: cat '%s'" % (fifo_a))
          .launch())
 
-    enable_migration_events(vm_b, 'B')
+    vm_b.enable_migration_events('B')
 
     # Add a child node that was created after the parent node. The reverse case
     # is covered by the -blockdev options above.
@@ -85,9 +69,9 @@ with iotests.FilePath('img') as img_path, \
     iotests.log(vm_a.qmp('migrate', uri='exec:cat >%s' % (fifo_a)))
     with iotests.Timeout(3, 'Migration does not complete'):
         # Wait for the source first (which includes setup=setup)
-        wait_migration(vm_a)
+        vm_a.wait_migration()
         # Wait for the destination second (which does not)
-        wait_migration(vm_b)
+        vm_b.wait_migration()
 
     iotests.log(vm_a.qmp('query-migrate')['return']['status'])
     iotests.log(vm_b.qmp('query-migrate')['return']['status'])
@@ -105,7 +89,7 @@ with iotests.FilePath('img') as img_path, \
          .add_incoming("exec: cat '%s'" % (fifo_b))
          .launch())
 
-    enable_migration_events(vm_a, 'A')
+    vm_a.enable_migration_events('A')
 
     iotests.log(vm_a.qmp('blockdev-snapshot', node='drive0-backing',
                          overlay='drive0'))
@@ -114,9 +98,9 @@ with iotests.FilePath('img') as img_path, \
     iotests.log(vm_b.qmp('migrate', uri='exec:cat >%s' % (fifo_b)))
     with iotests.Timeout(3, 'Migration does not complete'):
         # Wait for the source first (which includes setup=setup)
-        wait_migration(vm_b)
+        vm_b.wait_migration()
         # Wait for the destination second (which does not)
-        wait_migration(vm_a)
+        vm_a.wait_migration()
 
     iotests.log(vm_a.qmp('query-migrate')['return']['status'])
     iotests.log(vm_b.qmp('query-migrate')['return']['status'])
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index ce74177ab1..91172c39a5 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -583,6 +583,22 @@ class VM(qtest.QEMUQtestMachine):
             elif status == 'null':
                 return error
 
+    def enable_migration_events(self, name):
+        log('Enabling migration QMP events on %s...' % name)
+        log(self.qmp('migrate-set-capabilities', capabilities=[
+            {
+                'capability': 'events',
+                'state': True
+            }
+        ]))
+
+    def wait_migration(self):
+        while True:
+            event = self.event_wait('MIGRATION')
+            log(event, filters=[filter_qmp_event])
+            if event['data']['status'] == 'completed':
+                break
+
     def node_info(self, node_name):
         nodes = self.qmp('query-named-block-nodes')
         for x in nodes['return']:
-- 
2.20.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PULL 05/16] iotests: Test migration with all kinds of filter nodes
  2019-08-16  9:34 [Qemu-devel] [PULL 00/16] Block layer patches Kevin Wolf
                   ` (3 preceding siblings ...)
  2019-08-16  9:34 ` [Qemu-devel] [PULL 04/16] iotests: Move migration helpers to iotests.py Kevin Wolf
@ 2019-08-16  9:34 ` Kevin Wolf
  2019-08-16  9:34 ` [Qemu-devel] [PULL 06/16] block: Simplify bdrv_filter_default_perms() Kevin Wolf
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Kevin Wolf @ 2019-08-16  9:34 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

This test case is motivated by commit 2b23f28639 ('block/copy-on-read:
Fix permissions for inactive node'). Instead of just testing
copy-on-read on migration, let's stack all sorts of filter nodes on top
of each other and try if the resulting VM can still migrate
successfully. For good measure, put everything into an iothread, because
why not?

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/262     | 82 ++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/262.out | 17 ++++++++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 100 insertions(+)
 create mode 100755 tests/qemu-iotests/262
 create mode 100644 tests/qemu-iotests/262.out

diff --git a/tests/qemu-iotests/262 b/tests/qemu-iotests/262
new file mode 100755
index 0000000000..398f63587e
--- /dev/null
+++ b/tests/qemu-iotests/262
@@ -0,0 +1,82 @@
+#!/usr/bin/env python
+#
+# Copyright (C) 2019 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+# Creator/Owner: Kevin Wolf <kwolf@redhat.com>
+#
+# Test migration with filter drivers present. Keep everything in an
+# iothread just for fun.
+
+import iotests
+import os
+
+iotests.verify_image_format(supported_fmts=['qcow2'])
+iotests.verify_platform(['linux'])
+
+with iotests.FilePath('img') as img_path, \
+     iotests.FilePath('mig_fifo') as fifo, \
+     iotests.VM(path_suffix='a') as vm_a, \
+     iotests.VM(path_suffix='b') as vm_b:
+
+    def add_opts(vm):
+        vm.add_object('iothread,id=iothread0')
+        vm.add_object('throttle-group,id=tg0,x-bps-total=65536')
+        vm.add_blockdev('file,filename=%s,node-name=drive0-file' % (img_path))
+        vm.add_blockdev('%s,file=drive0-file,node-name=drive0-fmt' % (iotests.imgfmt))
+        vm.add_blockdev('copy-on-read,file=drive0-fmt,node-name=drive0-cor')
+        vm.add_blockdev('throttle,file=drive0-cor,node-name=drive0-throttle,throttle-group=tg0')
+        vm.add_blockdev('blkdebug,image=drive0-throttle,node-name=drive0-dbg')
+        vm.add_blockdev('null-co,node-name=null,read-zeroes=on')
+        vm.add_blockdev('blkverify,test=drive0-dbg,raw=null,node-name=drive0-verify')
+
+        if iotests.supports_quorum():
+            vm.add_blockdev('quorum,children.0=drive0-verify,vote-threshold=1,node-name=drive0-quorum')
+            root = "drive0-quorum"
+        else:
+            root = "drive0-verify"
+
+        vm.add_device('virtio-blk,drive=%s,iothread=iothread0' % root)
+
+    iotests.qemu_img_pipe('create', '-f', iotests.imgfmt, img_path, '64M')
+
+    os.mkfifo(fifo)
+
+    iotests.log('Launching source VM...')
+    add_opts(vm_a)
+    vm_a.launch()
+
+    vm_a.enable_migration_events('A')
+
+    iotests.log('Launching destination VM...')
+    add_opts(vm_b)
+    vm_b.add_incoming("exec: cat '%s'" % (fifo))
+    vm_b.launch()
+
+    vm_b.enable_migration_events('B')
+
+    iotests.log('Starting migration to B...')
+    iotests.log(vm_a.qmp('migrate', uri='exec:cat >%s' % (fifo)))
+    with iotests.Timeout(3, 'Migration does not complete'):
+        # Wait for the source first (which includes setup=setup)
+        vm_a.wait_migration()
+        # Wait for the destination second (which does not)
+        vm_b.wait_migration()
+
+    iotests.log(vm_a.qmp('query-migrate')['return']['status'])
+    iotests.log(vm_b.qmp('query-migrate')['return']['status'])
+
+    iotests.log(vm_a.qmp('query-status'))
+    iotests.log(vm_b.qmp('query-status'))
diff --git a/tests/qemu-iotests/262.out b/tests/qemu-iotests/262.out
new file mode 100644
index 0000000000..5a58e5e9f8
--- /dev/null
+++ b/tests/qemu-iotests/262.out
@@ -0,0 +1,17 @@
+Launching source VM...
+Enabling migration QMP events on A...
+{"return": {}}
+Launching destination VM...
+Enabling migration QMP events on B...
+{"return": {}}
+Starting migration to B...
+{"return": {}}
+{"data": {"status": "setup"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"data": {"status": "active"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"data": {"status": "completed"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"data": {"status": "active"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"data": {"status": "completed"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+completed
+completed
+{"return": {"running": false, "singlestep": false, "status": "postmigrate"}}
+{"return": {"running": true, "singlestep": false, "status": "running"}}
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index f13e5f2e23..71ba3c05dc 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -271,3 +271,4 @@
 254 rw backing quick
 255 rw quick
 256 rw quick
+262 rw quick migration
-- 
2.20.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PULL 06/16] block: Simplify bdrv_filter_default_perms()
  2019-08-16  9:34 [Qemu-devel] [PULL 00/16] Block layer patches Kevin Wolf
                   ` (4 preceding siblings ...)
  2019-08-16  9:34 ` [Qemu-devel] [PULL 05/16] iotests: Test migration with all kinds of filter nodes Kevin Wolf
@ 2019-08-16  9:34 ` Kevin Wolf
  2019-08-16  9:34 ` [Qemu-devel] [PULL 07/16] block: Keep subtree drained in drop_intermediate Kevin Wolf
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Kevin Wolf @ 2019-08-16  9:34 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

The same change as commit 2b23f28639 ('block/copy-on-read: Fix
permissions for inactive node') made for the copy-on-read driver can be
made for bdrv_filter_default_perms(): Retaining the old permissions from
the BdrvChild if it is given complicates things unnecessarily when in
the end this only means that the options set in the c == NULL case (i.e.
during child creation) are retained.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/block.c b/block.c
index cbd8da5f3b..6db8ecd62b 100644
--- a/block.c
+++ b/block.c
@@ -2168,16 +2168,8 @@ void bdrv_filter_default_perms(BlockDriverState *bs, BdrvChild *c,
                                uint64_t perm, uint64_t shared,
                                uint64_t *nperm, uint64_t *nshared)
 {
-    if (c == NULL) {
-        *nperm = perm & DEFAULT_PERM_PASSTHROUGH;
-        *nshared = (shared & DEFAULT_PERM_PASSTHROUGH) | DEFAULT_PERM_UNCHANGED;
-        return;
-    }
-
-    *nperm = (perm & DEFAULT_PERM_PASSTHROUGH) |
-             (c->perm & DEFAULT_PERM_UNCHANGED);
-    *nshared = (shared & DEFAULT_PERM_PASSTHROUGH) |
-               (c->shared_perm & DEFAULT_PERM_UNCHANGED);
+    *nperm = perm & DEFAULT_PERM_PASSTHROUGH;
+    *nshared = (shared & DEFAULT_PERM_PASSTHROUGH) | DEFAULT_PERM_UNCHANGED;
 }
 
 void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
-- 
2.20.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PULL 07/16] block: Keep subtree drained in drop_intermediate
  2019-08-16  9:34 [Qemu-devel] [PULL 00/16] Block layer patches Kevin Wolf
                   ` (5 preceding siblings ...)
  2019-08-16  9:34 ` [Qemu-devel] [PULL 06/16] block: Simplify bdrv_filter_default_perms() Kevin Wolf
@ 2019-08-16  9:34 ` Kevin Wolf
  2019-08-16  9:34 ` [Qemu-devel] [PULL 08/16] block: Reduce (un)drains when replacing a child Kevin Wolf
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Kevin Wolf @ 2019-08-16  9:34 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Max Reitz <mreitz@redhat.com>

bdrv_drop_intermediate() calls BdrvChildRole.update_filename().  That
may poll, thus changing the graph, which potentially breaks the
QLIST_FOREACH_SAFE() loop.

Just keep the whole subtree drained.  This is probably the right thing
to do anyway (dropping nodes while the subtree is not drained seems
wrong).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block.c b/block.c
index 6db8ecd62b..df3407934b 100644
--- a/block.c
+++ b/block.c
@@ -4491,6 +4491,7 @@ int bdrv_drop_intermediate(BlockDriverState *top, BlockDriverState *base,
     int ret = -EIO;
 
     bdrv_ref(top);
+    bdrv_subtree_drained_begin(top);
 
     if (!top->drv || !base->drv) {
         goto exit;
@@ -4562,6 +4563,7 @@ int bdrv_drop_intermediate(BlockDriverState *top, BlockDriverState *base,
 
     ret = 0;
 exit:
+    bdrv_subtree_drained_end(top);
     bdrv_unref(top);
     return ret;
 }
-- 
2.20.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PULL 08/16] block: Reduce (un)drains when replacing a child
  2019-08-16  9:34 [Qemu-devel] [PULL 00/16] Block layer patches Kevin Wolf
                   ` (6 preceding siblings ...)
  2019-08-16  9:34 ` [Qemu-devel] [PULL 07/16] block: Keep subtree drained in drop_intermediate Kevin Wolf
@ 2019-08-16  9:34 ` Kevin Wolf
  2019-08-16  9:34 ` [Qemu-devel] [PULL 09/16] tests: Test polling in bdrv_drop_intermediate() Kevin Wolf
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Kevin Wolf @ 2019-08-16  9:34 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Max Reitz <mreitz@redhat.com>

Currently, bdrv_replace_child_noperm() undrains the parent until it is
completely undrained, then re-drains it after attaching the new child
node.

This is a problem with bdrv_drop_intermediate(): We want to keep the
whole subtree drained, including parents, while the operation is
under way.  bdrv_replace_child_noperm() breaks this by allowing every
parent to become unquiesced briefly, and then redraining it.

In fact, there is no reason why the parent should become unquiesced and
be allowed to submit requests to the new child node if that new node is
supposed to be kept drained.  So if anything, we have to drain the
parent before detaching the old child node.  Conversely, we have to
undrain it only after attaching the new child node.

Thus, change the whole drain algorithm here: Calculate the number of
times we have to drain/undrain the parent before replacing the child
node then drain it (if necessary), replace the child node, and then
undrain it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 49 +++++++++++++++++++++++++++++++++----------------
 1 file changed, 33 insertions(+), 16 deletions(-)

diff --git a/block.c b/block.c
index df3407934b..66e8602e68 100644
--- a/block.c
+++ b/block.c
@@ -2230,13 +2230,27 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
                                       BlockDriverState *new_bs)
 {
     BlockDriverState *old_bs = child->bs;
-    int i;
+    int new_bs_quiesce_counter;
+    int drain_saldo;
 
     assert(!child->frozen);
 
     if (old_bs && new_bs) {
         assert(bdrv_get_aio_context(old_bs) == bdrv_get_aio_context(new_bs));
     }
+
+    new_bs_quiesce_counter = (new_bs ? new_bs->quiesce_counter : 0);
+    drain_saldo = new_bs_quiesce_counter - child->parent_quiesce_counter;
+
+    /*
+     * If the new child node is drained but the old one was not, flush
+     * all outstanding requests to the old child node.
+     */
+    while (drain_saldo > 0 && child->role->drained_begin) {
+        bdrv_parent_drained_begin_single(child, true);
+        drain_saldo--;
+    }
+
     if (old_bs) {
         /* Detach first so that the recursive drain sections coming from @child
          * are already gone and we only end the drain sections that came from
@@ -2244,28 +2258,22 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
         if (child->role->detach) {
             child->role->detach(child);
         }
-        while (child->parent_quiesce_counter) {
-            bdrv_parent_drained_end_single(child);
-        }
         QLIST_REMOVE(child, next_parent);
-    } else {
-        assert(child->parent_quiesce_counter == 0);
     }
 
     child->bs = new_bs;
 
     if (new_bs) {
         QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);
-        if (new_bs->quiesce_counter) {
-            int num = new_bs->quiesce_counter;
-            if (child->role->parent_is_bds) {
-                num -= bdrv_drain_all_count;
-            }
-            assert(num >= 0);
-            for (i = 0; i < num; i++) {
-                bdrv_parent_drained_begin_single(child, true);
-            }
-        }
+
+        /*
+         * Detaching the old node may have led to the new node's
+         * quiesce_counter having been decreased.  Not a problem, we
+         * just need to recognize this here and then invoke
+         * drained_end appropriately more often.
+         */
+        assert(new_bs->quiesce_counter <= new_bs_quiesce_counter);
+        drain_saldo += new_bs->quiesce_counter - new_bs_quiesce_counter;
 
         /* Attach only after starting new drained sections, so that recursive
          * drain sections coming from @child don't get an extra .drained_begin
@@ -2274,6 +2282,15 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
             child->role->attach(child);
         }
     }
+
+    /*
+     * If the old child node was drained but the new one is not, allow
+     * requests to come in only after the new node has been attached.
+     */
+    while (drain_saldo < 0 && child->role->drained_end) {
+        bdrv_parent_drained_end_single(child);
+        drain_saldo++;
+    }
 }
 
 /*
-- 
2.20.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PULL 09/16] tests: Test polling in bdrv_drop_intermediate()
  2019-08-16  9:34 [Qemu-devel] [PULL 00/16] Block layer patches Kevin Wolf
                   ` (7 preceding siblings ...)
  2019-08-16  9:34 ` [Qemu-devel] [PULL 08/16] block: Reduce (un)drains when replacing a child Kevin Wolf
@ 2019-08-16  9:34 ` Kevin Wolf
  2019-08-16  9:34 ` [Qemu-devel] [PULL 10/16] tests: Test mid-drain bdrv_replace_child_noperm() Kevin Wolf
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Kevin Wolf @ 2019-08-16  9:34 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Max Reitz <mreitz@redhat.com>

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/test-bdrv-drain.c | 167 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 167 insertions(+)

diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 03fa1142a1..1600d41e9a 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -100,6 +100,13 @@ static void bdrv_test_child_perm(BlockDriverState *bs, BdrvChild *c,
                               nperm, nshared);
 }
 
+static int bdrv_test_change_backing_file(BlockDriverState *bs,
+                                         const char *backing_file,
+                                         const char *backing_fmt)
+{
+    return 0;
+}
+
 static BlockDriver bdrv_test = {
     .format_name            = "test",
     .instance_size          = sizeof(BDRVTestState),
@@ -111,6 +118,8 @@ static BlockDriver bdrv_test = {
     .bdrv_co_drain_end      = bdrv_test_co_drain_end,
 
     .bdrv_child_perm        = bdrv_test_child_perm,
+
+    .bdrv_change_backing_file = bdrv_test_change_backing_file,
 };
 
 static void aio_ret_cb(void *opaque, int ret)
@@ -1671,6 +1680,161 @@ static void test_blockjob_commit_by_drained_end(void)
     bdrv_unref(bs_child);
 }
 
+
+typedef struct TestSimpleBlockJob {
+    BlockJob common;
+    bool should_complete;
+    bool *did_complete;
+} TestSimpleBlockJob;
+
+static int coroutine_fn test_simple_job_run(Job *job, Error **errp)
+{
+    TestSimpleBlockJob *s = container_of(job, TestSimpleBlockJob, common.job);
+
+    while (!s->should_complete) {
+        job_sleep_ns(job, 0);
+    }
+
+    return 0;
+}
+
+static void test_simple_job_clean(Job *job)
+{
+    TestSimpleBlockJob *s = container_of(job, TestSimpleBlockJob, common.job);
+    *s->did_complete = true;
+}
+
+static const BlockJobDriver test_simple_job_driver = {
+    .job_driver = {
+        .instance_size  = sizeof(TestSimpleBlockJob),
+        .free           = block_job_free,
+        .user_resume    = block_job_user_resume,
+        .drain          = block_job_drain,
+        .run            = test_simple_job_run,
+        .clean          = test_simple_job_clean,
+    },
+};
+
+static int drop_intermediate_poll_update_filename(BdrvChild *child,
+                                                  BlockDriverState *new_base,
+                                                  const char *filename,
+                                                  Error **errp)
+{
+    /*
+     * We are free to poll here, which may change the block graph, if
+     * it is not drained.
+     */
+
+    /* If the job is not drained: Complete it, schedule job_exit() */
+    aio_poll(qemu_get_current_aio_context(), false);
+    /* If the job is not drained: Run job_exit(), finish the job */
+    aio_poll(qemu_get_current_aio_context(), false);
+
+    return 0;
+}
+
+/**
+ * Test a poll in the midst of bdrv_drop_intermediate().
+ *
+ * bdrv_drop_intermediate() calls BdrvChildRole.update_filename(),
+ * which can yield or poll.  This may lead to graph changes, unless
+ * the whole subtree in question is drained.
+ *
+ * We test this on the following graph:
+ *
+ *                    Job
+ *
+ *                     |
+ *                  job-node
+ *                     |
+ *                     v
+ *
+ *                  job-node
+ *
+ *                     |
+ *                  backing
+ *                     |
+ *                     v
+ *
+ * node-2 --chain--> node-1 --chain--> node-0
+ *
+ * We drop node-1 with bdrv_drop_intermediate(top=node-1, base=node-0).
+ *
+ * This first updates node-2's backing filename by invoking
+ * drop_intermediate_poll_update_filename(), which polls twice.  This
+ * causes the job to finish, which in turns causes the job-node to be
+ * deleted.
+ *
+ * bdrv_drop_intermediate() uses a QLIST_FOREACH_SAFE() loop, so it
+ * already has a pointer to the BdrvChild edge between job-node and
+ * node-1.  When it tries to handle that edge, we probably get a
+ * segmentation fault because the object no longer exists.
+ *
+ *
+ * The solution is for bdrv_drop_intermediate() to drain top's
+ * subtree.  This prevents graph changes from happening just because
+ * BdrvChildRole.update_filename() yields or polls.  Thus, the block
+ * job is paused during that drained section and must finish before or
+ * after.
+ *
+ * (In addition, bdrv_replace_child() must keep the job paused.)
+ */
+static void test_drop_intermediate_poll(void)
+{
+    static BdrvChildRole chain_child_role;
+    BlockDriverState *chain[3];
+    TestSimpleBlockJob *job;
+    BlockDriverState *job_node;
+    bool job_has_completed = false;
+    int i;
+    int ret;
+
+    chain_child_role = child_backing;
+    chain_child_role.update_filename = drop_intermediate_poll_update_filename;
+
+    for (i = 0; i < 3; i++) {
+        char name[32];
+        snprintf(name, 32, "node-%i", i);
+
+        chain[i] = bdrv_new_open_driver(&bdrv_test, name, 0, &error_abort);
+    }
+
+    job_node = bdrv_new_open_driver(&bdrv_test, "job-node", BDRV_O_RDWR,
+                                    &error_abort);
+    bdrv_set_backing_hd(job_node, chain[1], &error_abort);
+
+    /*
+     * Establish the chain last, so the chain links are the first
+     * elements in the BDS.parents lists
+     */
+    for (i = 0; i < 3; i++) {
+        if (i) {
+            /* Takes the reference to chain[i - 1] */
+            chain[i]->backing = bdrv_attach_child(chain[i], chain[i - 1],
+                                                  "chain", &chain_child_role,
+                                                  &error_abort);
+        }
+    }
+
+    job = block_job_create("job", &test_simple_job_driver, NULL, job_node,
+                           0, BLK_PERM_ALL, 0, 0, NULL, NULL, &error_abort);
+
+    /* The job has a reference now */
+    bdrv_unref(job_node);
+
+    job->did_complete = &job_has_completed;
+
+    job_start(&job->common.job);
+    job->should_complete = true;
+
+    g_assert(!job_has_completed);
+    ret = bdrv_drop_intermediate(chain[1], chain[0], NULL);
+    g_assert(ret == 0);
+    g_assert(job_has_completed);
+
+    bdrv_unref(chain[2]);
+}
+
 int main(int argc, char **argv)
 {
     int ret;
@@ -1757,6 +1921,9 @@ int main(int argc, char **argv)
     g_test_add_func("/bdrv-drain/blockjob/commit_by_drained_end",
                     test_blockjob_commit_by_drained_end);
 
+    g_test_add_func("/bdrv-drain/bdrv_drop_intermediate/poll",
+                    test_drop_intermediate_poll);
+
     ret = g_test_run();
     qemu_event_destroy(&done_event);
     return ret;
-- 
2.20.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PULL 10/16] tests: Test mid-drain bdrv_replace_child_noperm()
  2019-08-16  9:34 [Qemu-devel] [PULL 00/16] Block layer patches Kevin Wolf
                   ` (8 preceding siblings ...)
  2019-08-16  9:34 ` [Qemu-devel] [PULL 09/16] tests: Test polling in bdrv_drop_intermediate() Kevin Wolf
@ 2019-08-16  9:34 ` Kevin Wolf
  2019-08-16  9:34 ` [Qemu-devel] [PULL 11/16] iotests: Add test for concurrent stream/commit Kevin Wolf
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Kevin Wolf @ 2019-08-16  9:34 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Max Reitz <mreitz@redhat.com>

Add a test for what happens when you call bdrv_replace_child_noperm()
for various drain situations ({old,new} child {drained,not drained}).

Most importantly, if both the old and the new child are drained, the
parent must not be undrained at any point.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/test-bdrv-drain.c | 308 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 308 insertions(+)

diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 1600d41e9a..9dffd86c47 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -1835,6 +1835,311 @@ static void test_drop_intermediate_poll(void)
     bdrv_unref(chain[2]);
 }
 
+
+typedef struct BDRVReplaceTestState {
+    bool was_drained;
+    bool was_undrained;
+    bool has_read;
+
+    int drain_count;
+
+    bool yield_before_read;
+    Coroutine *io_co;
+    Coroutine *drain_co;
+} BDRVReplaceTestState;
+
+static void bdrv_replace_test_close(BlockDriverState *bs)
+{
+}
+
+/**
+ * If @bs has a backing file:
+ *   Yield if .yield_before_read is true (and wait for drain_begin to
+ *   wake us up).
+ *   Forward the read to bs->backing.  Set .has_read to true.
+ *   If drain_begin has woken us, wake it in turn.
+ *
+ * Otherwise:
+ *   Set .has_read to true and return success.
+ */
+static int coroutine_fn bdrv_replace_test_co_preadv(BlockDriverState *bs,
+                                                    uint64_t offset,
+                                                    uint64_t bytes,
+                                                    QEMUIOVector *qiov,
+                                                    int flags)
+{
+    BDRVReplaceTestState *s = bs->opaque;
+
+    if (bs->backing) {
+        int ret;
+
+        g_assert(!s->drain_count);
+
+        s->io_co = qemu_coroutine_self();
+        if (s->yield_before_read) {
+            s->yield_before_read = false;
+            qemu_coroutine_yield();
+        }
+        s->io_co = NULL;
+
+        ret = bdrv_preadv(bs->backing, offset, qiov);
+        s->has_read = true;
+
+        /* Wake up drain_co if it runs */
+        if (s->drain_co) {
+            aio_co_wake(s->drain_co);
+        }
+
+        return ret;
+    }
+
+    s->has_read = true;
+    return 0;
+}
+
+/**
+ * If .drain_count is 0, wake up .io_co if there is one; and set
+ * .was_drained.
+ * Increment .drain_count.
+ */
+static void coroutine_fn bdrv_replace_test_co_drain_begin(BlockDriverState *bs)
+{
+    BDRVReplaceTestState *s = bs->opaque;
+
+    if (!s->drain_count) {
+        /* Keep waking io_co up until it is done */
+        s->drain_co = qemu_coroutine_self();
+        while (s->io_co) {
+            aio_co_wake(s->io_co);
+            s->io_co = NULL;
+            qemu_coroutine_yield();
+        }
+        s->drain_co = NULL;
+
+        s->was_drained = true;
+    }
+    s->drain_count++;
+}
+
+/**
+ * Reduce .drain_count, set .was_undrained once it reaches 0.
+ * If .drain_count reaches 0 and the node has a backing file, issue a
+ * read request.
+ */
+static void coroutine_fn bdrv_replace_test_co_drain_end(BlockDriverState *bs)
+{
+    BDRVReplaceTestState *s = bs->opaque;
+
+    g_assert(s->drain_count > 0);
+    if (!--s->drain_count) {
+        int ret;
+
+        s->was_undrained = true;
+
+        if (bs->backing) {
+            char data;
+            QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, &data, 1);
+
+            /* Queue a read request post-drain */
+            ret = bdrv_replace_test_co_preadv(bs, 0, 1, &qiov, 0);
+            g_assert(ret >= 0);
+        }
+    }
+}
+
+static BlockDriver bdrv_replace_test = {
+    .format_name            = "replace_test",
+    .instance_size          = sizeof(BDRVReplaceTestState),
+
+    .bdrv_close             = bdrv_replace_test_close,
+    .bdrv_co_preadv         = bdrv_replace_test_co_preadv,
+
+    .bdrv_co_drain_begin    = bdrv_replace_test_co_drain_begin,
+    .bdrv_co_drain_end      = bdrv_replace_test_co_drain_end,
+
+    .bdrv_child_perm        = bdrv_format_default_perms,
+};
+
+static void coroutine_fn test_replace_child_mid_drain_read_co(void *opaque)
+{
+    int ret;
+    char data;
+
+    ret = blk_co_pread(opaque, 0, 1, &data, 0);
+    g_assert(ret >= 0);
+}
+
+/**
+ * We test two things:
+ * (1) bdrv_replace_child_noperm() must not undrain the parent if both
+ *     children are drained.
+ * (2) bdrv_replace_child_noperm() must never flush I/O requests to a
+ *     drained child.  If the old child is drained, it must flush I/O
+ *     requests after the new one has been attached.  If the new child
+ *     is drained, it must flush I/O requests before the old one is
+ *     detached.
+ *
+ * To do so, we create one parent node and two child nodes; then
+ * attach one of the children (old_child_bs) to the parent, then
+ * drain both old_child_bs and new_child_bs according to
+ * old_drain_count and new_drain_count, respectively, and finally
+ * we invoke bdrv_replace_node() to replace old_child_bs by
+ * new_child_bs.
+ *
+ * The test block driver we use here (bdrv_replace_test) has a read
+ * function that:
+ * - For the parent node, can optionally yield, and then forwards the
+ *   read to bdrv_preadv(),
+ * - For the child node, just returns immediately.
+ *
+ * If the read yields, the drain_begin function will wake it up.
+ *
+ * The drain_end function issues a read on the parent once it is fully
+ * undrained (which simulates requests starting to come in again).
+ */
+static void do_test_replace_child_mid_drain(int old_drain_count,
+                                            int new_drain_count)
+{
+    BlockBackend *parent_blk;
+    BlockDriverState *parent_bs;
+    BlockDriverState *old_child_bs, *new_child_bs;
+    BDRVReplaceTestState *parent_s;
+    BDRVReplaceTestState *old_child_s, *new_child_s;
+    Coroutine *io_co;
+    int i;
+
+    parent_bs = bdrv_new_open_driver(&bdrv_replace_test, "parent", 0,
+                                     &error_abort);
+    parent_s = parent_bs->opaque;
+
+    parent_blk = blk_new(qemu_get_aio_context(),
+                         BLK_PERM_CONSISTENT_READ, BLK_PERM_ALL);
+    blk_insert_bs(parent_blk, parent_bs, &error_abort);
+
+    old_child_bs = bdrv_new_open_driver(&bdrv_replace_test, "old-child", 0,
+                                        &error_abort);
+    new_child_bs = bdrv_new_open_driver(&bdrv_replace_test, "new-child", 0,
+                                        &error_abort);
+    old_child_s = old_child_bs->opaque;
+    new_child_s = new_child_bs->opaque;
+
+    /* So that we can read something */
+    parent_bs->total_sectors = 1;
+    old_child_bs->total_sectors = 1;
+    new_child_bs->total_sectors = 1;
+
+    bdrv_ref(old_child_bs);
+    parent_bs->backing = bdrv_attach_child(parent_bs, old_child_bs, "child",
+                                           &child_backing, &error_abort);
+
+    for (i = 0; i < old_drain_count; i++) {
+        bdrv_drained_begin(old_child_bs);
+    }
+    for (i = 0; i < new_drain_count; i++) {
+        bdrv_drained_begin(new_child_bs);
+    }
+
+    if (!old_drain_count) {
+        /*
+         * Start a read operation that will yield, so it will not
+         * complete before the node is drained.
+         */
+        parent_s->yield_before_read = true;
+        io_co = qemu_coroutine_create(test_replace_child_mid_drain_read_co,
+                                      parent_blk);
+        qemu_coroutine_enter(io_co);
+    }
+
+    /* If we have started a read operation, it should have yielded */
+    g_assert(!parent_s->has_read);
+
+    /* Reset drained status so we can see what bdrv_replace_node() does */
+    parent_s->was_drained = false;
+    parent_s->was_undrained = false;
+
+    g_assert(parent_bs->quiesce_counter == old_drain_count);
+    bdrv_replace_node(old_child_bs, new_child_bs, &error_abort);
+    g_assert(parent_bs->quiesce_counter == new_drain_count);
+
+    if (!old_drain_count && !new_drain_count) {
+        /*
+         * From undrained to undrained drains and undrains the parent,
+         * because bdrv_replace_node() contains a drained section for
+         * @old_child_bs.
+         */
+        g_assert(parent_s->was_drained && parent_s->was_undrained);
+    } else if (!old_drain_count && new_drain_count) {
+        /*
+         * From undrained to drained should drain the parent and keep
+         * it that way.
+         */
+        g_assert(parent_s->was_drained && !parent_s->was_undrained);
+    } else if (old_drain_count && !new_drain_count) {
+        /*
+         * From drained to undrained should undrain the parent and
+         * keep it that way.
+         */
+        g_assert(!parent_s->was_drained && parent_s->was_undrained);
+    } else /* if (old_drain_count && new_drain_count) */ {
+        /*
+         * From drained to drained must not undrain the parent at any
+         * point
+         */
+        g_assert(!parent_s->was_drained && !parent_s->was_undrained);
+    }
+
+    if (!old_drain_count || !new_drain_count) {
+        /*
+         * If !old_drain_count, we have started a read request before
+         * bdrv_replace_node().  If !new_drain_count, the parent must
+         * have been undrained at some point, and
+         * bdrv_replace_test_co_drain_end() starts a read request
+         * then.
+         */
+        g_assert(parent_s->has_read);
+    } else {
+        /*
+         * If the parent was never undrained, there is no way to start
+         * a read request.
+         */
+        g_assert(!parent_s->has_read);
+    }
+
+    /* A drained child must have not received any request */
+    g_assert(!(old_drain_count && old_child_s->has_read));
+    g_assert(!(new_drain_count && new_child_s->has_read));
+
+    for (i = 0; i < new_drain_count; i++) {
+        bdrv_drained_end(new_child_bs);
+    }
+    for (i = 0; i < old_drain_count; i++) {
+        bdrv_drained_end(old_child_bs);
+    }
+
+    /*
+     * By now, bdrv_replace_test_co_drain_end() must have been called
+     * at some point while the new child was attached to the parent.
+     */
+    g_assert(parent_s->has_read);
+    g_assert(new_child_s->has_read);
+
+    blk_unref(parent_blk);
+    bdrv_unref(parent_bs);
+    bdrv_unref(old_child_bs);
+    bdrv_unref(new_child_bs);
+}
+
+static void test_replace_child_mid_drain(void)
+{
+    int old_drain_count, new_drain_count;
+
+    for (old_drain_count = 0; old_drain_count < 2; old_drain_count++) {
+        for (new_drain_count = 0; new_drain_count < 2; new_drain_count++) {
+            do_test_replace_child_mid_drain(old_drain_count, new_drain_count);
+        }
+    }
+}
+
 int main(int argc, char **argv)
 {
     int ret;
@@ -1924,6 +2229,9 @@ int main(int argc, char **argv)
     g_test_add_func("/bdrv-drain/bdrv_drop_intermediate/poll",
                     test_drop_intermediate_poll);
 
+    g_test_add_func("/bdrv-drain/replace_child/mid-drain",
+                    test_replace_child_mid_drain);
+
     ret = g_test_run();
     qemu_event_destroy(&done_event);
     return ret;
-- 
2.20.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PULL 11/16] iotests: Add test for concurrent stream/commit
  2019-08-16  9:34 [Qemu-devel] [PULL 00/16] Block layer patches Kevin Wolf
                   ` (9 preceding siblings ...)
  2019-08-16  9:34 ` [Qemu-devel] [PULL 10/16] tests: Test mid-drain bdrv_replace_child_noperm() Kevin Wolf
@ 2019-08-16  9:34 ` Kevin Wolf
  2019-08-16  9:34 ` [Qemu-devel] [PULL 12/16] block: Remove blk_pread_unthrottled() Kevin Wolf
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Kevin Wolf @ 2019-08-16  9:34 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Max Reitz <mreitz@redhat.com>

We already have 030 for that in general, but this tests very specific
cases of both jobs finishing concurrently.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/qemu-iotests/258     | 163 +++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/258.out |  33 ++++++++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 197 insertions(+)
 create mode 100755 tests/qemu-iotests/258
 create mode 100644 tests/qemu-iotests/258.out

diff --git a/tests/qemu-iotests/258 b/tests/qemu-iotests/258
new file mode 100755
index 0000000000..b84cf02254
--- /dev/null
+++ b/tests/qemu-iotests/258
@@ -0,0 +1,163 @@
+#!/usr/bin/env python
+#
+# Very specific tests for adjacent commit/stream block jobs
+#
+# Copyright (C) 2019 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+# Creator/Owner: Max Reitz <mreitz@redhat.com>
+
+import iotests
+from iotests import log, qemu_img, qemu_io_silent, \
+        filter_qmp_testfiles, filter_qmp_imgfmt
+
+# Need backing file and change-backing-file support
+iotests.verify_image_format(supported_fmts=['qcow2', 'qed'])
+iotests.verify_platform(['linux'])
+
+
+# Returns a node for blockdev-add
+def node(node_name, path, backing=None, fmt=None, throttle=None):
+    if fmt is None:
+        fmt = iotests.imgfmt
+
+    res = {
+        'node-name': node_name,
+        'driver': fmt,
+        'file': {
+            'driver': 'file',
+            'filename': path
+        }
+    }
+
+    if backing is not None:
+        res['backing'] = backing
+
+    if throttle:
+        res['file'] = {
+            'driver': 'throttle',
+            'throttle-group': throttle,
+            'file': res['file']
+        }
+
+    return res
+
+# Finds a node in the debug block graph
+def find_graph_node(graph, node_id):
+    return next(node for node in graph['nodes'] if node['id'] == node_id)
+
+
+def test_concurrent_finish(write_to_stream_node):
+    log('')
+    log('=== Commit and stream finish concurrently (letting %s write) ===' % \
+        ('stream' if write_to_stream_node else 'commit'))
+    log('')
+
+    # All chosen in such a way that when the commit job wants to
+    # finish, it polls and thus makes stream finish concurrently --
+    # and the other way around, depending on whether the commit job
+    # is finalized before stream completes or not.
+
+    with iotests.FilePath('node4.img') as node4_path, \
+         iotests.FilePath('node3.img') as node3_path, \
+         iotests.FilePath('node2.img') as node2_path, \
+         iotests.FilePath('node1.img') as node1_path, \
+         iotests.FilePath('node0.img') as node0_path, \
+         iotests.VM() as vm:
+
+        # It is important to use raw for the base layer (so that
+        # permissions are just handed through to the protocol layer)
+        assert qemu_img('create', '-f', 'raw', node0_path, '64M') == 0
+
+        stream_throttle=None
+        commit_throttle=None
+
+        for path in [node1_path, node2_path, node3_path, node4_path]:
+            assert qemu_img('create', '-f', iotests.imgfmt, path, '64M') == 0
+
+        if write_to_stream_node:
+            # This is what (most of the time) makes commit finish
+            # earlier and then pull in stream
+            assert qemu_io_silent(node2_path,
+                                  '-c', 'write %iK 64K' % (65536 - 192),
+                                  '-c', 'write %iK 64K' % (65536 -  64)) == 0
+
+            stream_throttle='tg'
+        else:
+            # And this makes stream finish earlier
+            assert qemu_io_silent(node1_path,
+                                  '-c', 'write %iK 64K' % (65536 - 64)) == 0
+
+            commit_throttle='tg'
+
+        vm.launch()
+
+        vm.qmp_log('object-add',
+                   qom_type='throttle-group',
+                   id='tg',
+                   props={
+                       'x-iops-write': 1,
+                       'x-iops-write-max': 1
+                   })
+
+        vm.qmp_log('blockdev-add',
+                   filters=[filter_qmp_testfiles, filter_qmp_imgfmt],
+                   **node('node4', node4_path, throttle=stream_throttle,
+                     backing=node('node3', node3_path,
+                     backing=node('node2', node2_path,
+                     backing=node('node1', node1_path,
+                     backing=node('node0', node0_path, throttle=commit_throttle,
+                                  fmt='raw'))))))
+
+        vm.qmp_log('block-commit',
+                   job_id='commit',
+                   device='node4',
+                   filter_node_name='commit-filter',
+                   top_node='node1',
+                   base_node='node0',
+                   auto_finalize=False)
+
+        vm.qmp_log('block-stream',
+                   job_id='stream',
+                   device='node3',
+                   base_node='commit-filter')
+
+        if write_to_stream_node:
+            vm.run_job('commit', auto_finalize=False, auto_dismiss=True)
+            vm.run_job('stream', auto_finalize=True, auto_dismiss=True)
+        else:
+            # No, the jobs do not really finish concurrently here,
+            # the stream job does complete strictly before commit.
+            # But still, this is close enough for what we want to
+            # test.
+            vm.run_job('stream', auto_finalize=True, auto_dismiss=True)
+            vm.run_job('commit', auto_finalize=False, auto_dismiss=True)
+
+        # Assert that the backing node of node3 is node 0 now
+        graph = vm.qmp('x-debug-query-block-graph')['return']
+        for edge in graph['edges']:
+            if edge['name'] == 'backing' and \
+               find_graph_node(graph, edge['parent'])['name'] == 'node3':
+                assert find_graph_node(graph, edge['child'])['name'] == 'node0'
+                break
+
+
+def main():
+    log('Running tests:')
+    test_concurrent_finish(True)
+    test_concurrent_finish(False)
+
+if __name__ == '__main__':
+    main()
diff --git a/tests/qemu-iotests/258.out b/tests/qemu-iotests/258.out
new file mode 100644
index 0000000000..ce6e9ba3e5
--- /dev/null
+++ b/tests/qemu-iotests/258.out
@@ -0,0 +1,33 @@
+Running tests:
+
+=== Commit and stream finish concurrently (letting stream write) ===
+
+{"execute": "object-add", "arguments": {"id": "tg", "props": {"x-iops-write": 1, "x-iops-write-max": 1}, "qom-type": "throttle-group"}}
+{"return": {}}
+{"execute": "blockdev-add", "arguments": {"backing": {"backing": {"backing": {"backing": {"driver": "raw", "file": {"driver": "file", "filename": "TEST_DIR/PID-node0.img"}, "node-name": "node0"}, "driver": "IMGFMT", "file": {"driver": "file", "filename": "TEST_DIR/PID-node1.img"}, "node-name": "node1"}, "driver": "IMGFMT", "file": {"driver": "file", "filename": "TEST_DIR/PID-node2.img"}, "node-name": "node2"}, "driver": "IMGFMT", "file": {"driver": "file", "filename": "TEST_DIR/PID-node3.img"}, "node-name": "node3"}, "driver": "IMGFMT", "file": {"driver": "throttle", "file": {"driver": "file", "filename": "TEST_DIR/PID-node4.img"}, "throttle-group": "tg"}, "node-name": "node4"}}
+{"return": {}}
+{"execute": "block-commit", "arguments": {"auto-finalize": false, "base-node": "node0", "device": "node4", "filter-node-name": "commit-filter", "job-id": "commit", "top-node": "node1"}}
+{"return": {}}
+{"execute": "block-stream", "arguments": {"base-node": "commit-filter", "device": "node3", "job-id": "stream"}}
+{"return": {}}
+{"execute": "job-finalize", "arguments": {"id": "commit"}}
+{"return": {}}
+{"data": {"id": "commit", "type": "commit"}, "event": "BLOCK_JOB_PENDING", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"data": {"device": "commit", "len": 67108864, "offset": 67108864, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"data": {"device": "stream", "len": 67108864, "offset": 67108864, "speed": 0, "type": "stream"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+
+=== Commit and stream finish concurrently (letting commit write) ===
+
+{"execute": "object-add", "arguments": {"id": "tg", "props": {"x-iops-write": 1, "x-iops-write-max": 1}, "qom-type": "throttle-group"}}
+{"return": {}}
+{"execute": "blockdev-add", "arguments": {"backing": {"backing": {"backing": {"backing": {"driver": "raw", "file": {"driver": "throttle", "file": {"driver": "file", "filename": "TEST_DIR/PID-node0.img"}, "throttle-group": "tg"}, "node-name": "node0"}, "driver": "IMGFMT", "file": {"driver": "file", "filename": "TEST_DIR/PID-node1.img"}, "node-name": "node1"}, "driver": "IMGFMT", "file": {"driver": "file", "filename": "TEST_DIR/PID-node2.img"}, "node-name": "node2"}, "driver": "IMGFMT", "file": {"driver": "file", "filename": "TEST_DIR/PID-node3.img"}, "node-name": "node3"}, "driver": "IMGFMT", "file": {"driver": "file", "filename": "TEST_DIR/PID-node4.img"}, "node-name": "node4"}}
+{"return": {}}
+{"execute": "block-commit", "arguments": {"auto-finalize": false, "base-node": "node0", "device": "node4", "filter-node-name": "commit-filter", "job-id": "commit", "top-node": "node1"}}
+{"return": {}}
+{"execute": "block-stream", "arguments": {"base-node": "commit-filter", "device": "node3", "job-id": "stream"}}
+{"return": {}}
+{"data": {"device": "stream", "len": 67108864, "offset": 67108864, "speed": 0, "type": "stream"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"execute": "job-finalize", "arguments": {"id": "commit"}}
+{"return": {}}
+{"data": {"id": "commit", "type": "commit"}, "event": "BLOCK_JOB_PENDING", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"data": {"device": "commit", "len": 67108864, "offset": 67108864, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 71ba3c05dc..5a37839e35 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -271,4 +271,5 @@
 254 rw backing quick
 255 rw quick
 256 rw quick
+258 rw quick
 262 rw quick migration
-- 
2.20.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PULL 12/16] block: Remove blk_pread_unthrottled()
  2019-08-16  9:34 [Qemu-devel] [PULL 00/16] Block layer patches Kevin Wolf
                   ` (10 preceding siblings ...)
  2019-08-16  9:34 ` [Qemu-devel] [PULL 11/16] iotests: Add test for concurrent stream/commit Kevin Wolf
@ 2019-08-16  9:34 ` Kevin Wolf
  2019-08-16  9:34 ` [Qemu-devel] [PULL 13/16] mirror: Keep mirror_top_bs drained after dropping permissions Kevin Wolf
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Kevin Wolf @ 2019-08-16  9:34 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

The functionality offered by blk_pread_unthrottled() goes back to commit
498e386c584. Then, we couldn't perform I/O throttling with synchronous
requests because timers wouldn't be executed in polling loops. So the
commit automatically disabled I/O throttling as soon as a synchronous
request was issued.

However, for geometry detection during disk initialisation, we always
used (and still use) synchronous requests even if guest requests use AIO
later. Geometry detection was not wanted to disable I/O throttling, so
bdrv_pread_unthrottled() was introduced which disabled throttling only
temporarily.

All of this isn't necessary any more because we do run timers in polling
loop and even synchronous requests are now using coroutine
infrastructure internally. For this reason, commit 90c78624f already
removed the automatic disabling of I/O throttling.

It's time to get rid of the workaround for the removed code, and its
abuse of blk_root_drained_begin()/end(), as well.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 include/sysemu/block-backend.h |  2 --
 block/block-backend.c          | 16 ----------------
 hw/block/hd-geometry.c         |  7 +------
 3 files changed, 1 insertion(+), 24 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 733c4957eb..7320b58467 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -117,8 +117,6 @@ char *blk_get_attached_dev_id(BlockBackend *blk);
 BlockBackend *blk_by_dev(void *dev);
 BlockBackend *blk_by_qdev_id(const char *id, Error **errp);
 void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops, void *opaque);
-int blk_pread_unthrottled(BlockBackend *blk, int64_t offset, uint8_t *buf,
-                          int bytes);
 int coroutine_fn blk_co_preadv(BlockBackend *blk, int64_t offset,
                                unsigned int bytes, QEMUIOVector *qiov,
                                BdrvRequestFlags flags);
diff --git a/block/block-backend.c b/block/block-backend.c
index 0056b526b8..fdd6b01ecf 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1237,22 +1237,6 @@ static int blk_prw(BlockBackend *blk, int64_t offset, uint8_t *buf,
     return rwco.ret;
 }
 
-int blk_pread_unthrottled(BlockBackend *blk, int64_t offset, uint8_t *buf,
-                          int count)
-{
-    int ret;
-
-    ret = blk_check_byte_request(blk, offset, count);
-    if (ret < 0) {
-        return ret;
-    }
-
-    blk_root_drained_begin(blk->root);
-    ret = blk_pread(blk, offset, buf, count);
-    blk_root_drained_end(blk->root, NULL);
-    return ret;
-}
-
 int blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
                       int bytes, BdrvRequestFlags flags)
 {
diff --git a/hw/block/hd-geometry.c b/hw/block/hd-geometry.c
index 79384a2b0a..dcbccee294 100644
--- a/hw/block/hd-geometry.c
+++ b/hw/block/hd-geometry.c
@@ -63,12 +63,7 @@ static int guess_disk_lchs(BlockBackend *blk,
 
     blk_get_geometry(blk, &nb_sectors);
 
-    /**
-     * The function will be invoked during startup not only in sync I/O mode,
-     * but also in async I/O mode. So the I/O throttling function has to
-     * be disabled temporarily here, not permanently.
-     */
-    if (blk_pread_unthrottled(blk, 0, buf, BDRV_SECTOR_SIZE) < 0) {
+    if (blk_pread(blk, 0, buf, BDRV_SECTOR_SIZE) < 0) {
         return -1;
     }
     /* test msdos magic */
-- 
2.20.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PULL 13/16] mirror: Keep mirror_top_bs drained after dropping permissions
  2019-08-16  9:34 [Qemu-devel] [PULL 00/16] Block layer patches Kevin Wolf
                   ` (11 preceding siblings ...)
  2019-08-16  9:34 ` [Qemu-devel] [PULL 12/16] block: Remove blk_pread_unthrottled() Kevin Wolf
@ 2019-08-16  9:34 ` Kevin Wolf
  2019-08-16  9:34 ` [Qemu-devel] [PULL 14/16] block-backend: Queue requests while drained Kevin Wolf
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Kevin Wolf @ 2019-08-16  9:34 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

mirror_top_bs is currently implicitly drained through its connection to
the source or the target node. However, the drain section for target_bs
ends early after moving mirror_top_bs from src to target_bs, so that
requests can already be restarted while mirror_top_bs is still present
in the chain, but has dropped all permissions and therefore runs into an
assertion failure like this:

    qemu-system-x86_64: block/io.c:1634: bdrv_co_write_req_prepare:
    Assertion `child->perm & BLK_PERM_WRITE' failed.

Keep mirror_top_bs drained until all graph changes have completed.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/mirror.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/mirror.c b/block/mirror.c
index 9f5c59ece1..642d6570cc 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -656,7 +656,10 @@ static int mirror_exit_common(Job *job)
     s->target = NULL;
 
     /* We don't access the source any more. Dropping any WRITE/RESIZE is
-     * required before it could become a backing file of target_bs. */
+     * required before it could become a backing file of target_bs. Not having
+     * these permissions any more means that we can't allow any new requests on
+     * mirror_top_bs from now on, so keep it drained. */
+    bdrv_drained_begin(mirror_top_bs);
     bs_opaque->stop = true;
     bdrv_child_refresh_perms(mirror_top_bs, mirror_top_bs->backing,
                              &error_abort);
@@ -724,6 +727,7 @@ static int mirror_exit_common(Job *job)
     bs_opaque->job = NULL;
 
     bdrv_drained_end(src);
+    bdrv_drained_end(mirror_top_bs);
     s->in_drain = false;
     bdrv_unref(mirror_top_bs);
     bdrv_unref(src);
-- 
2.20.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PULL 14/16] block-backend: Queue requests while drained
  2019-08-16  9:34 [Qemu-devel] [PULL 00/16] Block layer patches Kevin Wolf
                   ` (12 preceding siblings ...)
  2019-08-16  9:34 ` [Qemu-devel] [PULL 13/16] mirror: Keep mirror_top_bs drained after dropping permissions Kevin Wolf
@ 2019-08-16  9:34 ` Kevin Wolf
  2019-08-16  9:34 ` [Qemu-devel] [PULL 15/16] qemu-img convert: Deprecate using -n and -o together Kevin Wolf
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Kevin Wolf @ 2019-08-16  9:34 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

This fixes devices like IDE that can still start new requests from I/O
handlers in the CPU thread while the block backend is drained.

The basic assumption is that in a drain section, no new requests should
be allowed through a BlockBackend (blk_drained_begin/end don't exist,
we get drain sections only on the node level). However, there are two
special cases where requests should not be queued:

1. Block jobs: We already make sure that block jobs are paused in a
   drain section, so they won't start new requests. However, if the
   drain_begin is called on the job's BlockBackend first, it can happen
   that we deadlock because the job stays busy until it reaches a pause
   point - which it can't if its requests aren't processed any more.

   The proper solution here would be to make all requests through the
   job's filter node instead of using a BlockBackend. For now, just
   disabling request queuing on the job BlockBackend is simpler.

2. In test cases where making requests through bdrv_* would be
   cumbersome because we'd need a BdrvChild. As we already got the
   functionality to disable request queuing from 1., use it in tests,
   too, for convenience.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 include/sysemu/block-backend.h |  1 +
 block/backup.c                 |  1 +
 block/block-backend.c          | 53 ++++++++++++++++++++++++++++++++--
 block/commit.c                 |  2 ++
 block/mirror.c                 |  1 +
 blockjob.c                     |  3 ++
 tests/test-bdrv-drain.c        |  1 +
 7 files changed, 59 insertions(+), 3 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 7320b58467..368d53af77 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -104,6 +104,7 @@ void blk_get_perm(BlockBackend *blk, uint64_t *perm, uint64_t *shared_perm);
 
 void blk_set_allow_write_beyond_eof(BlockBackend *blk, bool allow);
 void blk_set_allow_aio_context_change(BlockBackend *blk, bool allow);
+void blk_set_disable_request_queuing(BlockBackend *blk, bool disable);
 void blk_iostatus_enable(BlockBackend *blk);
 bool blk_iostatus_is_enabled(const BlockBackend *blk);
 BlockDeviceIoStatus blk_iostatus(const BlockBackend *blk);
diff --git a/block/backup.c b/block/backup.c
index b26c22c4b8..4743c8f0bc 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -644,6 +644,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     if (ret < 0) {
         goto error;
     }
+    blk_set_disable_request_queuing(job->target, true);
 
     job->on_source_error = on_source_error;
     job->on_target_error = on_target_error;
diff --git a/block/block-backend.c b/block/block-backend.c
index fdd6b01ecf..c13c5c83b0 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -79,6 +79,9 @@ struct BlockBackend {
     QLIST_HEAD(, BlockBackendAioNotifier) aio_notifiers;
 
     int quiesce_counter;
+    CoQueue queued_requests;
+    bool disable_request_queuing;
+
     VMChangeStateEntry *vmsh;
     bool force_allow_inactivate;
 
@@ -339,6 +342,7 @@ BlockBackend *blk_new(AioContext *ctx, uint64_t perm, uint64_t shared_perm)
 
     block_acct_init(&blk->stats);
 
+    qemu_co_queue_init(&blk->queued_requests);
     notifier_list_init(&blk->remove_bs_notifiers);
     notifier_list_init(&blk->insert_bs_notifiers);
     QLIST_INIT(&blk->aio_notifiers);
@@ -1096,6 +1100,11 @@ void blk_set_allow_aio_context_change(BlockBackend *blk, bool allow)
     blk->allow_aio_context_change = allow;
 }
 
+void blk_set_disable_request_queuing(BlockBackend *blk, bool disable)
+{
+    blk->disable_request_queuing = disable;
+}
+
 static int blk_check_byte_request(BlockBackend *blk, int64_t offset,
                                   size_t size)
 {
@@ -1127,13 +1136,24 @@ static int blk_check_byte_request(BlockBackend *blk, int64_t offset,
     return 0;
 }
 
+static void coroutine_fn blk_wait_while_drained(BlockBackend *blk)
+{
+    if (blk->quiesce_counter && !blk->disable_request_queuing) {
+        qemu_co_queue_wait(&blk->queued_requests, NULL);
+    }
+}
+
 int coroutine_fn blk_co_preadv(BlockBackend *blk, int64_t offset,
                                unsigned int bytes, QEMUIOVector *qiov,
                                BdrvRequestFlags flags)
 {
     int ret;
-    BlockDriverState *bs = blk_bs(blk);
+    BlockDriverState *bs;
 
+    blk_wait_while_drained(blk);
+
+    /* Call blk_bs() only after waiting, the graph may have changed */
+    bs = blk_bs(blk);
     trace_blk_co_preadv(blk, bs, offset, bytes, flags);
 
     ret = blk_check_byte_request(blk, offset, bytes);
@@ -1159,8 +1179,12 @@ int coroutine_fn blk_co_pwritev(BlockBackend *blk, int64_t offset,
                                 BdrvRequestFlags flags)
 {
     int ret;
-    BlockDriverState *bs = blk_bs(blk);
+    BlockDriverState *bs;
 
+    blk_wait_while_drained(blk);
+
+    /* Call blk_bs() only after waiting, the graph may have changed */
+    bs = blk_bs(blk);
     trace_blk_co_pwritev(blk, bs, offset, bytes, flags);
 
     ret = blk_check_byte_request(blk, offset, bytes);
@@ -1349,6 +1373,12 @@ static void blk_aio_read_entry(void *opaque)
     BlkRwCo *rwco = &acb->rwco;
     QEMUIOVector *qiov = rwco->iobuf;
 
+    if (rwco->blk->quiesce_counter) {
+        blk_dec_in_flight(rwco->blk);
+        blk_wait_while_drained(rwco->blk);
+        blk_inc_in_flight(rwco->blk);
+    }
+
     assert(qiov->size == acb->bytes);
     rwco->ret = blk_co_preadv(rwco->blk, rwco->offset, acb->bytes,
                               qiov, rwco->flags);
@@ -1361,6 +1391,12 @@ static void blk_aio_write_entry(void *opaque)
     BlkRwCo *rwco = &acb->rwco;
     QEMUIOVector *qiov = rwco->iobuf;
 
+    if (rwco->blk->quiesce_counter) {
+        blk_dec_in_flight(rwco->blk);
+        blk_wait_while_drained(rwco->blk);
+        blk_inc_in_flight(rwco->blk);
+    }
+
     assert(!qiov || qiov->size == acb->bytes);
     rwco->ret = blk_co_pwritev(rwco->blk, rwco->offset, acb->bytes,
                                qiov, rwco->flags);
@@ -1482,6 +1518,8 @@ void blk_aio_cancel_async(BlockAIOCB *acb)
 
 int blk_co_ioctl(BlockBackend *blk, unsigned long int req, void *buf)
 {
+    blk_wait_while_drained(blk);
+
     if (!blk_is_available(blk)) {
         return -ENOMEDIUM;
     }
@@ -1522,7 +1560,11 @@ BlockAIOCB *blk_aio_ioctl(BlockBackend *blk, unsigned long int req, void *buf,
 
 int blk_co_pdiscard(BlockBackend *blk, int64_t offset, int bytes)
 {
-    int ret = blk_check_byte_request(blk, offset, bytes);
+    int ret;
+
+    blk_wait_while_drained(blk);
+
+    ret = blk_check_byte_request(blk, offset, bytes);
     if (ret < 0) {
         return ret;
     }
@@ -1532,6 +1574,8 @@ int blk_co_pdiscard(BlockBackend *blk, int64_t offset, int bytes)
 
 int blk_co_flush(BlockBackend *blk)
 {
+    blk_wait_while_drained(blk);
+
     if (!blk_is_available(blk)) {
         return -ENOMEDIUM;
     }
@@ -2232,6 +2276,9 @@ static void blk_root_drained_end(BdrvChild *child, int *drained_end_counter)
         if (blk->dev_ops && blk->dev_ops->drained_end) {
             blk->dev_ops->drained_end(blk->dev_opaque);
         }
+        while (qemu_co_enter_next(&blk->queued_requests, NULL)) {
+            /* Resume all queued requests */
+        }
     }
 }
 
diff --git a/block/commit.c b/block/commit.c
index 2c5a6d4ebc..408ae15389 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -350,6 +350,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
     if (ret < 0) {
         goto fail;
     }
+    blk_set_disable_request_queuing(s->base, true);
     s->base_bs = base;
 
     /* Required permissions are already taken with block_job_add_bdrv() */
@@ -358,6 +359,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
     if (ret < 0) {
         goto fail;
     }
+    blk_set_disable_request_queuing(s->top, true);
 
     s->backing_file_str = g_strdup(backing_file_str);
     s->on_error = on_error;
diff --git a/block/mirror.c b/block/mirror.c
index 642d6570cc..9b36391bb9 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1636,6 +1636,7 @@ static BlockJob *mirror_start_job(
         blk_set_force_allow_inactivate(s->target);
     }
     blk_set_allow_aio_context_change(s->target, true);
+    blk_set_disable_request_queuing(s->target, true);
 
     s->replaces = g_strdup(replaces);
     s->on_source_error = on_source_error;
diff --git a/blockjob.c b/blockjob.c
index 20b7f557da..73d9f1ba2b 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -445,6 +445,9 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
 
     bdrv_op_unblock(bs, BLOCK_OP_TYPE_DATAPLANE, job->blocker);
 
+    /* Disable request queuing in the BlockBackend to avoid deadlocks on drain:
+     * The job reports that it's busy until it reaches a pause point. */
+    blk_set_disable_request_queuing(blk, true);
     blk_set_allow_aio_context_change(blk, true);
 
     /* Only set speed when necessary to avoid NotSupported error */
diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 9dffd86c47..468bbcc9a1 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -686,6 +686,7 @@ static void test_iothread_common(enum drain_type drain_type, int drain_thread)
                               &error_abort);
     s = bs->opaque;
     blk_insert_bs(blk, bs, &error_abort);
+    blk_set_disable_request_queuing(blk, true);
 
     blk_set_aio_context(blk, ctx_a, &error_abort);
     aio_context_acquire(ctx_a);
-- 
2.20.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PULL 15/16] qemu-img convert: Deprecate using -n and -o together
  2019-08-16  9:34 [Qemu-devel] [PULL 00/16] Block layer patches Kevin Wolf
                   ` (13 preceding siblings ...)
  2019-08-16  9:34 ` [Qemu-devel] [PULL 14/16] block-backend: Queue requests while drained Kevin Wolf
@ 2019-08-16  9:34 ` Kevin Wolf
  2019-08-16  9:34 ` [Qemu-devel] [PULL 16/16] file-posix: Handle undetectable alignment Kevin Wolf
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Kevin Wolf @ 2019-08-16  9:34 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

bdrv_create options specified with -o have no effect when skipping image
creation with -n, so this doesn't make sense. Warn against the misuse
and deprecate the combination so we can make it a hard error later.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 qemu-img.c           | 5 +++++
 qemu-deprecated.texi | 7 +++++++
 2 files changed, 12 insertions(+)

diff --git a/qemu-img.c b/qemu-img.c
index 79983772de..d9321f6418 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2231,6 +2231,11 @@ static int img_convert(int argc, char **argv)
         goto fail_getopt;
     }
 
+    if (skip_create && options) {
+        warn_report("-o has no effect when skipping image creation");
+        warn_report("This will become an error in future QEMU versions.");
+    }
+
     s.src_num = argc - optind - 1;
     out_filename = s.src_num >= 1 ? argv[argc - 1] : NULL;
 
diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi
index fff07bb2a3..f7680c08e1 100644
--- a/qemu-deprecated.texi
+++ b/qemu-deprecated.texi
@@ -305,6 +305,13 @@ to just export the entire image and then mount only /dev/nbd0p1 than
 it is to reinvoke @command{qemu-nbd -c /dev/nbd0} limited to just a
 subset of the image.
 
+@subsection qemu-img convert -n -o (since 4.2.0)
+
+All options specified in @option{-o} are image creation options, so
+they have no effect when used with @option{-n} to skip image creation.
+Silently ignored options can be confusing, so this combination of
+options will be made an error in future versions.
+
 @section Build system
 
 @subsection Python 2 support (since 4.1.0)
-- 
2.20.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PULL 16/16] file-posix: Handle undetectable alignment
  2019-08-16  9:34 [Qemu-devel] [PULL 00/16] Block layer patches Kevin Wolf
                   ` (14 preceding siblings ...)
  2019-08-16  9:34 ` [Qemu-devel] [PULL 15/16] qemu-img convert: Deprecate using -n and -o together Kevin Wolf
@ 2019-08-16  9:34 ` Kevin Wolf
  2019-08-16 10:14 ` [Qemu-devel] [PULL 00/16] Block layer patches no-reply
  2019-08-16 16:21 ` Peter Maydell
  17 siblings, 0 replies; 19+ messages in thread
From: Kevin Wolf @ 2019-08-16  9:34 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Nir Soffer <nirsof@gmail.com>

In some cases buf_align or request_alignment cannot be detected:

1. With Gluster, buf_align cannot be detected since the actual I/O is
   done on Gluster server, and qemu buffer alignment does not matter.
   Since we don't have alignment requirement, buf_align=1 is the best
   value.

2. With local XFS filesystem, buf_align cannot be detected if reading
   from unallocated area. In this we must align the buffer, but we don't
   know what is the correct size. Using the wrong alignment results in
   I/O error.

3. With Gluster backed by XFS, request_alignment cannot be detected if
   reading from unallocated area. In this case we need to use the
   correct alignment, and failing to do so results in I/O errors.

4. With NFS, the server does not use direct I/O, so both buf_align cannot
   be detected. In this case we don't need any alignment so we can use
   buf_align=1 and request_alignment=1.

These cases seems to work when storage sector size is 512 bytes, because
the current code starts checking align=512. If the check succeeds
because alignment cannot be detected we use 512. But this does not work
for storage with 4k sector size.

To determine if we can detect the alignment, we probe first with
align=1. If probing succeeds, maybe there are no alignment requirement
(cases 1, 4) or we are probing unallocated area (cases 2, 3). Since we
don't have any way to tell, we treat this as undetectable alignment. If
probing with align=1 fails with EINVAL, but probing with one of the
expected alignments succeeds, we know that we found a working alignment.

Practically the alignment requirements are the same for buffer
alignment, buffer length, and offset in file. So in case we cannot
detect buf_align, we can use request alignment. If we cannot detect
request alignment, we can fallback to a safe value. To use this logic,
we probe first request alignment instead of buf_align.

Here is a table showing the behaviour with current code (the value in
parenthesis is the optimal value).

Case    Sector    buf_align (opt)   request_alignment (opt)     result
======================================================================
1       512       512   (1)          512   (512)                 OK
1       4096      512   (1)          4096  (4096)                FAIL
----------------------------------------------------------------------
2       512       512   (512)        512   (512)                 OK
2       4096      512   (4096)       4096  (4096)                FAIL
----------------------------------------------------------------------
3       512       512   (1)          512   (512)                 OK
3       4096      512   (1)          512   (4096)                FAIL
----------------------------------------------------------------------
4       512       512   (1)          512   (1)                   OK
4       4096      512   (1)          512   (1)                   OK

Same cases with this change:

Case    Sector    buf_align (opt)   request_alignment (opt)     result
======================================================================
1       512       512   (1)          512   (512)                 OK
1       4096      4096  (1)          4096  (4096)                OK
----------------------------------------------------------------------
2       512       512   (512)        512   (512)                 OK
2       4096      4096  (4096)       4096  (4096)                OK
----------------------------------------------------------------------
3       512       4096  (1)          4096  (512)                 OK
3       4096      4096  (1)          4096  (4096)                OK
----------------------------------------------------------------------
4       512       4096  (1)          4096  (1)                   OK
4       4096      4096  (1)          4096  (1)                   OK

I tested that provisioning VMs and copying disks on local XFS and
Gluster with 4k bytes sector size work now, resolving bugs [1],[2].
I tested also on XFS, NFS, Gluster with 512 bytes sector size.

[1] https://bugzilla.redhat.com/1737256
[2] https://bugzilla.redhat.com/1738657

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/file-posix.c | 36 +++++++++++++++++++++++++-----------
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 4479cc7ab4..b8b4dad553 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -323,6 +323,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
     BDRVRawState *s = bs->opaque;
     char *buf;
     size_t max_align = MAX(MAX_BLOCKSIZE, getpagesize());
+    size_t alignments[] = {1, 512, 1024, 2048, 4096};
 
     /* For SCSI generic devices the alignment is not really used.
        With buffered I/O, we don't have any restrictions. */
@@ -349,25 +350,38 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
     }
 #endif
 
-    /* If we could not get the sizes so far, we can only guess them */
-    if (!s->buf_align) {
+    /*
+     * If we could not get the sizes so far, we can only guess them. First try
+     * to detect request alignment, since it is more likely to succeed. Then
+     * try to detect buf_align, which cannot be detected in some cases (e.g.
+     * Gluster). If buf_align cannot be detected, we fallback to the value of
+     * request_alignment.
+     */
+
+    if (!bs->bl.request_alignment) {
+        int i;
         size_t align;
-        buf = qemu_memalign(max_align, 2 * max_align);
-        for (align = 512; align <= max_align; align <<= 1) {
-            if (raw_is_io_aligned(fd, buf + align, max_align)) {
-                s->buf_align = align;
+        buf = qemu_memalign(max_align, max_align);
+        for (i = 0; i < ARRAY_SIZE(alignments); i++) {
+            align = alignments[i];
+            if (raw_is_io_aligned(fd, buf, align)) {
+                /* Fallback to safe value. */
+                bs->bl.request_alignment = (align != 1) ? align : max_align;
                 break;
             }
         }
         qemu_vfree(buf);
     }
 
-    if (!bs->bl.request_alignment) {
+    if (!s->buf_align) {
+        int i;
         size_t align;
-        buf = qemu_memalign(s->buf_align, max_align);
-        for (align = 512; align <= max_align; align <<= 1) {
-            if (raw_is_io_aligned(fd, buf, align)) {
-                bs->bl.request_alignment = align;
+        buf = qemu_memalign(max_align, 2 * max_align);
+        for (i = 0; i < ARRAY_SIZE(alignments); i++) {
+            align = alignments[i];
+            if (raw_is_io_aligned(fd, buf + align, max_align)) {
+                /* Fallback to request_aligment. */
+                s->buf_align = (align != 1) ? align : bs->bl.request_alignment;
                 break;
             }
         }
-- 
2.20.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PULL 00/16] Block layer patches
  2019-08-16  9:34 [Qemu-devel] [PULL 00/16] Block layer patches Kevin Wolf
                   ` (15 preceding siblings ...)
  2019-08-16  9:34 ` [Qemu-devel] [PULL 16/16] file-posix: Handle undetectable alignment Kevin Wolf
@ 2019-08-16 10:14 ` no-reply
  2019-08-16 16:21 ` Peter Maydell
  17 siblings, 0 replies; 19+ messages in thread
From: no-reply @ 2019-08-16 10:14 UTC (permalink / raw)
  To: kwolf; +Cc: kwolf, qemu-devel, qemu-block

Patchew URL: https://patchew.org/QEMU/20190816093439.14262-1-kwolf@redhat.com/



Hi,

This series failed build test on s390x host. Please find the details below.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
# Testing script will be invoked under the git checkout with
# HEAD pointing to a commit that has the patches applied on top of "base"
# branch
set -e

echo
echo "=== ENV ==="
env

echo
echo "=== PACKAGES ==="
rpm -qa

echo
echo "=== UNAME ==="
uname -a

CC=$HOME/bin/cc
INSTALL=$PWD/install
BUILD=$PWD/build
mkdir -p $BUILD $INSTALL
SRC=$PWD
cd $BUILD
$SRC/configure --cc=$CC --prefix=$INSTALL
make -j4
# XXX: we need reliable clean up
# make check -j4 V=1
make install
=== TEST SCRIPT END ===

  CC      aarch64-softmmu/target/arm/sve_helper.o
  CC      lm32-softmmu/hw/input/milkymist-softusb.o
  CC      lm32-softmmu/hw/misc/milkymist-hpdmc.o
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:209: qemu-system-arm] Error 1
make: *** [Makefile:472: arm-softmmu/all] Error 2
make: *** Waiting for unfinished jobs....


The full log is available at
http://patchew.org/logs/20190816093439.14262-1-kwolf@redhat.com/testing.s390x/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PULL 00/16] Block layer patches
  2019-08-16  9:34 [Qemu-devel] [PULL 00/16] Block layer patches Kevin Wolf
                   ` (16 preceding siblings ...)
  2019-08-16 10:14 ` [Qemu-devel] [PULL 00/16] Block layer patches no-reply
@ 2019-08-16 16:21 ` Peter Maydell
  17 siblings, 0 replies; 19+ messages in thread
From: Peter Maydell @ 2019-08-16 16:21 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: QEMU Developers, Qemu-block

On Fri, 16 Aug 2019 at 10:36, Kevin Wolf <kwolf@redhat.com> wrote:
>
> The following changes since commit 9e06029aea3b2eca1d5261352e695edc1e7d7b8b:
>
>   Update version for v4.1.0 release (2019-08-15 13:03:37 +0100)
>
> are available in the Git repository at:
>
>   git://repo.or.cz/qemu/kevin.git tags/for-upstream
>
> for you to fetch changes up to a6b257a08e3d72219f03e461a52152672fec0612:
>
>   file-posix: Handle undetectable alignment (2019-08-16 11:29:11 +0200)
>
> ----------------------------------------------------------------
> Block layer patches:
>
> - file-posix: Fix O_DIRECT alignment detection
> - Fixes for concurrent block jobs
> - block-backend: Queue requests while drained (fix IDE vs. job crashes)
> - qemu-img convert: Deprecate using -n and -o together
> - iotests: Migration tests with filter nodes
> - iotests: More media change tests
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/4.2
for any user-visible changes.

-- PMM


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, back to index

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-16  9:34 [Qemu-devel] [PULL 00/16] Block layer patches Kevin Wolf
2019-08-16  9:34 ` [Qemu-devel] [PULL 01/16] iotests/118: Test media change for scsi-cd Kevin Wolf
2019-08-16  9:34 ` [Qemu-devel] [PULL 02/16] iotests/118: Create test classes dynamically Kevin Wolf
2019-08-16  9:34 ` [Qemu-devel] [PULL 03/16] iotests/118: Add -blockdev based tests Kevin Wolf
2019-08-16  9:34 ` [Qemu-devel] [PULL 04/16] iotests: Move migration helpers to iotests.py Kevin Wolf
2019-08-16  9:34 ` [Qemu-devel] [PULL 05/16] iotests: Test migration with all kinds of filter nodes Kevin Wolf
2019-08-16  9:34 ` [Qemu-devel] [PULL 06/16] block: Simplify bdrv_filter_default_perms() Kevin Wolf
2019-08-16  9:34 ` [Qemu-devel] [PULL 07/16] block: Keep subtree drained in drop_intermediate Kevin Wolf
2019-08-16  9:34 ` [Qemu-devel] [PULL 08/16] block: Reduce (un)drains when replacing a child Kevin Wolf
2019-08-16  9:34 ` [Qemu-devel] [PULL 09/16] tests: Test polling in bdrv_drop_intermediate() Kevin Wolf
2019-08-16  9:34 ` [Qemu-devel] [PULL 10/16] tests: Test mid-drain bdrv_replace_child_noperm() Kevin Wolf
2019-08-16  9:34 ` [Qemu-devel] [PULL 11/16] iotests: Add test for concurrent stream/commit Kevin Wolf
2019-08-16  9:34 ` [Qemu-devel] [PULL 12/16] block: Remove blk_pread_unthrottled() Kevin Wolf
2019-08-16  9:34 ` [Qemu-devel] [PULL 13/16] mirror: Keep mirror_top_bs drained after dropping permissions Kevin Wolf
2019-08-16  9:34 ` [Qemu-devel] [PULL 14/16] block-backend: Queue requests while drained Kevin Wolf
2019-08-16  9:34 ` [Qemu-devel] [PULL 15/16] qemu-img convert: Deprecate using -n and -o together Kevin Wolf
2019-08-16  9:34 ` [Qemu-devel] [PULL 16/16] file-posix: Handle undetectable alignment Kevin Wolf
2019-08-16 10:14 ` [Qemu-devel] [PULL 00/16] Block layer patches no-reply
2019-08-16 16:21 ` Peter Maydell

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git