All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] Switch iotests to using Async QMP
@ 2021-10-12 22:34 John Snow
  2021-10-12 22:34 ` [PATCH v3 1/7] python/machine: remove has_quit argument John Snow
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: John Snow @ 2021-10-12 22:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Eduardo Habkost,
	qemu-block, Hanna Reitz, Cleber Rosa, John Snow

Based-on: <20211012214152.802483-1-jsnow@redhat.com>
          [PULL 00/10] Python patches
GitLab: https://gitlab.com/jsnow/qemu/-/commits/python-aqmp-iotest-wrapper
CI: https://gitlab.com/jsnow/qemu/-/pipelines/387210591

Hiya,

This series continues where the last two AQMP series left off and adds a
synchronous 'legacy' wrapper around the new AQMP interface, then drops
it straight into iotests to prove that AQMP is functional and totally
cool and fine. The disruption and churn to iotests is pretty minimal.

In the event that a regression happens and I am not physically proximate
to inflict damage upon, one may set the QEMU_PYTHON_LEGACY_QMP variable
to any non-empty string as it pleases you to engage the QMP machinery
you are used to.

I'd like to try and get this committed early in the 6.2 development
cycle to give ample time to smooth over any possible regressions. I've
tested it locally and via gitlab CI, across Python versions 3.6 through
3.10, and "worksforme". If something bad happens, we can revert the
actual switch-flip very trivially.

V3:

001/7:[----] [--] 'python/machine: remove has_quit argument'
002/7:[0002] [FC] 'python/machine: Handle QMP errors on close more meticulously'
003/7:[----] [--] 'python/aqmp: Remove scary message'
004/7:[0006] [FC] 'iotests: Accommodate async QMP Exception classes'
005/7:[0003] [FC] 'iotests: Conditionally silence certain AQMP errors'
006/7:[0009] [FC] 'python/aqmp: Create sync QMP wrapper for iotests'
007/7:[----] [--] 'python, iotests: replace qmp with aqmp'

002: Account for force-kill cases, too.
003: Shuffled earlier into the series to prevent a mid-series regression.
004: Rewrite the imports to be less "heterogeneous" ;)
005: Add in a TODO for me to trip over in the future.
006: Fix a bug surfaced by a new iotest where waiting with pull_event for a
     timeout of 0.0 will cause a timeout exception to be raised even if there
     was an event ready to be read.

V2:

001/17:[----] [--] 'python/aqmp: add greeting property to QMPClient'
002/17:[----] [--] 'python/aqmp: add .empty() method to EventListener'
003/17:[----] [--] 'python/aqmp: Return cleared events from EventListener.clear()'
004/17:[0007] [FC] 'python/aqmp: add send_fd_scm'
005/17:[down] 'python/aqmp: Add dict conversion method to Greeting object'
006/17:[down] 'python/aqmp: Reduce severity of EOFError-caused loop terminations'
007/17:[down] 'python/aqmp: Disable logging messages by default'

008/17:[0002] [FC] 'python/qmp: clear events on get_events() call'
009/17:[----] [--] 'python/qmp: add send_fd_scm directly to QEMUMonitorProtocol'
010/17:[----] [--] 'python, iotests: remove socket_scm_helper'
011/17:[0013] [FC] 'python/machine: remove has_quit argument'
012/17:[down] 'python/machine: Handle QMP errors on close more meticulously'

013/17:[0009] [FC] 'iotests: Accommodate async QMP Exception classes'
014/17:[down] 'iotests: Conditionally silence certain AQMP errors'

015/17:[0016] [FC] 'python/aqmp: Create sync QMP wrapper for iotests'
016/17:[0002] [FC] 'python/aqmp: Remove scary message'
017/17:[----] [--] 'python, iotests: replace qmp with aqmp'

- Rebased on jsnow/python, which was recently rebased on origin/master.
- Make aqmp's send_fd_scm method bark if the socket isn't AF_UNIX (Hanna)
- Uh... modify send_fd_scm so it doesn't break when Python 3.11 comes out ...
  See the commit message for more detail.
- Drop the "python/aqmp: Create MessageModel and StandaloneModel class"
  patch and replace with a far simpler method that just adds an
  _asdict() method.
- Add patches 06 and 07 to change how the AQMP library handles logging.
- Adjust docstring in patch 08 (Hanna)
- Rename "_has_quit" attribute to "_quid_issued" (Hanna)
- Renamed patch 12, simplified the logic in _soft_shutdown a tiny bit.
- Fixed bad exception handling logic in 13 (Hanna)
- Introduce a helper in patch 14 to silence log output when it's unwanted.
- Small addition of _get_greeting() helper in patch 15, coinciding with the
  new patch 05 here.
- Contextual changes in 16.

John Snow (7):
  python/machine: remove has_quit argument
  python/machine: Handle QMP errors on close more meticulously
  python/aqmp: Remove scary message
  iotests: Accommodate async QMP Exception classes
  iotests: Conditionally silence certain AQMP errors
  python/aqmp: Create sync QMP wrapper for iotests
  python, iotests: replace qmp with aqmp

 python/qemu/aqmp/__init__.py              |  12 --
 python/qemu/aqmp/legacy.py                | 138 ++++++++++++++++++++++
 python/qemu/machine/machine.py            |  85 +++++++++----
 scripts/simplebench/bench_block_job.py    |   3 +-
 tests/qemu-iotests/040                    |   7 +-
 tests/qemu-iotests/218                    |   2 +-
 tests/qemu-iotests/255                    |   2 +-
 tests/qemu-iotests/iotests.py             |  20 +++-
 tests/qemu-iotests/tests/mirror-top-perms |  17 ++-
 9 files changed, 238 insertions(+), 48 deletions(-)
 create mode 100644 python/qemu/aqmp/legacy.py

-- 
2.31.1




^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 1/7] python/machine: remove has_quit argument
  2021-10-12 22:34 [PATCH v3 0/7] Switch iotests to using Async QMP John Snow
@ 2021-10-12 22:34 ` John Snow
  2021-10-12 22:34 ` [PATCH v3 2/7] python/machine: Handle QMP errors on close more meticulously John Snow
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: John Snow @ 2021-10-12 22:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Eduardo Habkost,
	qemu-block, Hanna Reitz, Cleber Rosa, John Snow

If we spy on the QMP commands instead, we don't need callers to remember
to pass it. Seems like a fair trade-off.

The one slightly weird bit is overloading this instance variable for
wait(), where we use it to mean "don't issue the qmp 'quit'
command". This means that wait() will "fail" if the QEMU process does
not terminate of its own accord.

In most cases, we probably did already actually issue quit -- some
iotests do this -- but in some others, we may be waiting for QEMU to
terminate for some other reason, such as a test wherein we tell the
guest (directly) to shut down.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
---
 python/qemu/machine/machine.py | 34 +++++++++++++++++++---------------
 tests/qemu-iotests/040         |  7 +------
 tests/qemu-iotests/218         |  2 +-
 tests/qemu-iotests/255         |  2 +-
 4 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
index 056d340e355..0bd40bc2f76 100644
--- a/python/qemu/machine/machine.py
+++ b/python/qemu/machine/machine.py
@@ -170,6 +170,7 @@ def __init__(self,
         self._console_socket: Optional[socket.socket] = None
         self._remove_files: List[str] = []
         self._user_killed = False
+        self._quit_issued = False
 
     def __enter__(self: _T) -> _T:
         return self
@@ -368,6 +369,7 @@ def _post_shutdown(self) -> None:
                 command = ''
             LOG.warning(msg, -int(exitcode), command)
 
+        self._quit_issued = False
         self._user_killed = False
         self._launched = False
 
@@ -443,15 +445,13 @@ def _hard_shutdown(self) -> None:
         self._subp.kill()
         self._subp.wait(timeout=60)
 
-    def _soft_shutdown(self, timeout: Optional[int],
-                       has_quit: bool = False) -> None:
+    def _soft_shutdown(self, timeout: Optional[int]) -> None:
         """
         Perform early cleanup, attempt to gracefully shut down the VM, and wait
         for it to terminate.
 
         :param timeout: Timeout in seconds for graceful shutdown.
                         A value of None is an infinite wait.
-        :param has_quit: When True, don't attempt to issue 'quit' QMP command
 
         :raise ConnectionReset: On QMP communication errors
         :raise subprocess.TimeoutExpired: When timeout is exceeded waiting for
@@ -460,21 +460,19 @@ def _soft_shutdown(self, timeout: Optional[int],
         self._early_cleanup()
 
         if self._qmp_connection:
-            if not has_quit:
+            if not self._quit_issued:
                 # Might raise ConnectionReset
-                self._qmp.cmd('quit')
+                self.qmp('quit')
 
         # May raise subprocess.TimeoutExpired
         self._subp.wait(timeout=timeout)
 
-    def _do_shutdown(self, timeout: Optional[int],
-                     has_quit: bool = False) -> None:
+    def _do_shutdown(self, timeout: Optional[int]) -> None:
         """
         Attempt to shutdown the VM gracefully; fallback to a hard shutdown.
 
         :param timeout: Timeout in seconds for graceful shutdown.
                         A value of None is an infinite wait.
-        :param has_quit: When True, don't attempt to issue 'quit' QMP command
 
         :raise AbnormalShutdown: When the VM could not be shut down gracefully.
             The inner exception will likely be ConnectionReset or
@@ -482,13 +480,13 @@ def _do_shutdown(self, timeout: Optional[int],
             may result in its own exceptions, likely subprocess.TimeoutExpired.
         """
         try:
-            self._soft_shutdown(timeout, has_quit)
+            self._soft_shutdown(timeout)
         except Exception as exc:
             self._hard_shutdown()
             raise AbnormalShutdown("Could not perform graceful shutdown") \
                 from exc
 
-    def shutdown(self, has_quit: bool = False,
+    def shutdown(self,
                  hard: bool = False,
                  timeout: Optional[int] = 30) -> None:
         """
@@ -498,7 +496,6 @@ def shutdown(self, has_quit: bool = False,
         If the VM has not yet been launched, or shutdown(), wait(), or kill()
         have already been called, this method does nothing.
 
-        :param has_quit: When true, do not attempt to issue 'quit' QMP command.
         :param hard: When true, do not attempt graceful shutdown, and
                      suppress the SIGKILL warning log message.
         :param timeout: Optional timeout in seconds for graceful shutdown.
@@ -512,7 +509,7 @@ def shutdown(self, has_quit: bool = False,
                 self._user_killed = True
                 self._hard_shutdown()
             else:
-                self._do_shutdown(timeout, has_quit)
+                self._do_shutdown(timeout)
         finally:
             self._post_shutdown()
 
@@ -529,7 +526,8 @@ def wait(self, timeout: Optional[int] = 30) -> None:
         :param timeout: Optional timeout in seconds. Default 30 seconds.
                         A value of `None` is an infinite wait.
         """
-        self.shutdown(has_quit=True, timeout=timeout)
+        self._quit_issued = True
+        self.shutdown(timeout=timeout)
 
     def set_qmp_monitor(self, enabled: bool = True) -> None:
         """
@@ -574,7 +572,10 @@ def qmp(self, cmd: str,
             conv_keys = True
 
         qmp_args = self._qmp_args(conv_keys, args)
-        return self._qmp.cmd(cmd, args=qmp_args)
+        ret = self._qmp.cmd(cmd, args=qmp_args)
+        if cmd == 'quit' and 'error' not in ret and 'return' in ret:
+            self._quit_issued = True
+        return ret
 
     def command(self, cmd: str,
                 conv_keys: bool = True,
@@ -585,7 +586,10 @@ def command(self, cmd: str,
         On failure raise an exception.
         """
         qmp_args = self._qmp_args(conv_keys, args)
-        return self._qmp.command(cmd, **qmp_args)
+        ret = self._qmp.command(cmd, **qmp_args)
+        if cmd == 'quit':
+            self._quit_issued = True
+        return ret
 
     def get_qmp_event(self, wait: bool = False) -> Optional[QMPMessage]:
         """
diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index f3677de9dfd..6af5ab9e764 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -92,10 +92,9 @@ class TestSingleDrive(ImageCommitTestCase):
         self.vm.add_device('virtio-scsi')
         self.vm.add_device("scsi-hd,id=scsi0,drive=drive0")
         self.vm.launch()
-        self.has_quit = False
 
     def tearDown(self):
-        self.vm.shutdown(has_quit=self.has_quit)
+        self.vm.shutdown()
         os.remove(test_img)
         os.remove(mid_img)
         os.remove(backing_img)
@@ -127,8 +126,6 @@ class TestSingleDrive(ImageCommitTestCase):
         result = self.vm.qmp('quit')
         self.assert_qmp(result, 'return', {})
 
-        self.has_quit = True
-
     # Same as above, but this time we add the filter after starting the job
     @iotests.skip_if_unsupported(['throttle'])
     def test_commit_plus_filter_and_quit(self):
@@ -147,8 +144,6 @@ class TestSingleDrive(ImageCommitTestCase):
         result = self.vm.qmp('quit')
         self.assert_qmp(result, 'return', {})
 
-        self.has_quit = True
-
     def test_device_not_found(self):
         result = self.vm.qmp('block-commit', device='nonexistent', top='%s' % mid_img)
         self.assert_qmp(result, 'error/class', 'DeviceNotFound')
diff --git a/tests/qemu-iotests/218 b/tests/qemu-iotests/218
index 325d8244fb9..4922b4d3b6f 100755
--- a/tests/qemu-iotests/218
+++ b/tests/qemu-iotests/218
@@ -187,4 +187,4 @@ with iotests.VM() as vm, \
     log(vm.qmp('quit'))
 
     with iotests.Timeout(5, 'Timeout waiting for VM to quit'):
-        vm.shutdown(has_quit=True)
+        vm.shutdown()
diff --git a/tests/qemu-iotests/255 b/tests/qemu-iotests/255
index c43aa9c67ac..3d6d0e80cb5 100755
--- a/tests/qemu-iotests/255
+++ b/tests/qemu-iotests/255
@@ -123,4 +123,4 @@ with iotests.FilePath('src.qcow2') as src_path, \
     vm.qmp_log('block-job-cancel', device='job0')
     vm.qmp_log('quit')
 
-    vm.shutdown(has_quit=True)
+    vm.shutdown()
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 2/7] python/machine: Handle QMP errors on close more meticulously
  2021-10-12 22:34 [PATCH v3 0/7] Switch iotests to using Async QMP John Snow
  2021-10-12 22:34 ` [PATCH v3 1/7] python/machine: remove has_quit argument John Snow
@ 2021-10-12 22:34 ` John Snow
  2021-10-13  7:33   ` Hanna Reitz
  2021-10-12 22:34 ` [PATCH v3 3/7] python/aqmp: Remove scary message John Snow
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: John Snow @ 2021-10-12 22:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Eduardo Habkost,
	qemu-block, Hanna Reitz, Cleber Rosa, John Snow

To use the AQMP backend, Machine just needs to be a little more diligent
about what happens when closing a QMP connection. The operation is no
longer a freebie in the async world; it may return errors encountered in
the async bottom half on incoming message receipt, etc.

(AQMP's disconnect, ultimately, serves as the quiescence point where all
async contexts are gathered together, and any final errors reported at
that point.)

Because async QMP continues to check for messages asynchronously, it's
almost certainly likely that the loop will have exited due to EOF after
issuing the last 'quit' command. That error will ultimately be bubbled
up when attempting to close the QMP connection. The manager class here
then is free to discard it -- if it was expected.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 python/qemu/machine/machine.py | 48 +++++++++++++++++++++++++++++-----
 1 file changed, 42 insertions(+), 6 deletions(-)

diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
index 0bd40bc2f76..a0cf69786b4 100644
--- a/python/qemu/machine/machine.py
+++ b/python/qemu/machine/machine.py
@@ -342,9 +342,15 @@ def _post_shutdown(self) -> None:
         # Comprehensive reset for the failed launch case:
         self._early_cleanup()
 
-        if self._qmp_connection:
-            self._qmp.close()
-            self._qmp_connection = None
+        try:
+            self._close_qmp_connection()
+        except Exception as err:  # pylint: disable=broad-except
+            LOG.warning(
+                "Exception closing QMP connection: %s",
+                str(err) if str(err) else type(err).__name__
+            )
+        finally:
+            assert self._qmp_connection is None
 
         self._close_qemu_log_file()
 
@@ -420,6 +426,31 @@ def _launch(self) -> None:
                                        close_fds=False)
         self._post_launch()
 
+    def _close_qmp_connection(self) -> None:
+        """
+        Close the underlying QMP connection, if any.
+
+        Dutifully report errors that occurred while closing, but assume
+        that any error encountered indicates an abnormal termination
+        process and not a failure to close.
+        """
+        if self._qmp_connection is None:
+            return
+
+        try:
+            self._qmp.close()
+        except EOFError:
+            # EOF can occur as an Exception here when using the Async
+            # QMP backend. It indicates that the server closed the
+            # stream. If we successfully issued 'quit' at any point,
+            # then this was expected. If the remote went away without
+            # our permission, it's worth reporting that as an abnormal
+            # shutdown case.
+            if not (self._user_killed or self._quit_issued):
+                raise
+        finally:
+            self._qmp_connection = None
+
     def _early_cleanup(self) -> None:
         """
         Perform any cleanup that needs to happen before the VM exits.
@@ -460,9 +491,14 @@ def _soft_shutdown(self, timeout: Optional[int]) -> None:
         self._early_cleanup()
 
         if self._qmp_connection:
-            if not self._quit_issued:
-                # Might raise ConnectionReset
-                self.qmp('quit')
+            try:
+                if not self._quit_issued:
+                    # May raise ExecInterruptedError or StateError if the
+                    # connection dies or has *already* died.
+                    self.qmp('quit')
+            finally:
+                # Regardless, we want to quiesce the connection.
+                self._close_qmp_connection()
 
         # May raise subprocess.TimeoutExpired
         self._subp.wait(timeout=timeout)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 3/7] python/aqmp: Remove scary message
  2021-10-12 22:34 [PATCH v3 0/7] Switch iotests to using Async QMP John Snow
  2021-10-12 22:34 ` [PATCH v3 1/7] python/machine: remove has_quit argument John Snow
  2021-10-12 22:34 ` [PATCH v3 2/7] python/machine: Handle QMP errors on close more meticulously John Snow
@ 2021-10-12 22:34 ` John Snow
  2021-10-13  7:35   ` Hanna Reitz
  2021-10-12 22:34 ` [PATCH v3 4/7] iotests: Accommodate async QMP Exception classes John Snow
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: John Snow @ 2021-10-12 22:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Eduardo Habkost,
	qemu-block, Hanna Reitz, Cleber Rosa, John Snow

The scary message interferes with the iotests output. Coincidentally, if
iotests works by removing this, then it's good evidence that we don't
really need to scare people away from using it.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 python/qemu/aqmp/__init__.py | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/python/qemu/aqmp/__init__.py b/python/qemu/aqmp/__init__.py
index d1b0e4dc3d3..880d5b6fa7f 100644
--- a/python/qemu/aqmp/__init__.py
+++ b/python/qemu/aqmp/__init__.py
@@ -22,7 +22,6 @@
 # the COPYING file in the top-level directory.
 
 import logging
-import warnings
 
 from .error import AQMPError
 from .events import EventListener
@@ -31,17 +30,6 @@
 from .qmp_client import ExecInterruptedError, ExecuteError, QMPClient
 
 
-_WMSG = """
-
-The Asynchronous QMP library is currently in development and its API
-should be considered highly fluid and subject to change. It should
-not be used by any other scripts checked into the QEMU tree.
-
-Proceed with caution!
-"""
-
-warnings.warn(_WMSG, FutureWarning)
-
 # Suppress logging unless an application engages it.
 logging.getLogger('qemu.aqmp').addHandler(logging.NullHandler())
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 4/7] iotests: Accommodate async QMP Exception classes
  2021-10-12 22:34 [PATCH v3 0/7] Switch iotests to using Async QMP John Snow
                   ` (2 preceding siblings ...)
  2021-10-12 22:34 ` [PATCH v3 3/7] python/aqmp: Remove scary message John Snow
@ 2021-10-12 22:34 ` John Snow
  2021-10-13  7:38   ` Hanna Reitz
  2021-10-12 22:34 ` [PATCH v3 5/7] iotests: Conditionally silence certain AQMP errors John Snow
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: John Snow @ 2021-10-12 22:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Eduardo Habkost,
	qemu-block, Hanna Reitz, Cleber Rosa, John Snow

(But continue to support the old ones for now, too.)

There are very few cases of any user of QEMUMachine or a subclass
thereof relying on a QMP Exception type. If you'd like to check for
yourself, you want to grep for all of the derivatives of QMPError,
excluding 'AQMPError' and its derivatives. That'd be these:

- QMPError
- QMPConnectError
- QMPCapabilitiesError
- QMPTimeoutError
- QMPProtocolError
- QMPResponseError
- QMPBadPortError


Signed-off-by: John Snow <jsnow@redhat.com>
---
 scripts/simplebench/bench_block_job.py    | 3 ++-
 tests/qemu-iotests/tests/mirror-top-perms | 5 +++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/scripts/simplebench/bench_block_job.py b/scripts/simplebench/bench_block_job.py
index 4f03c121697..a403c35b08f 100755
--- a/scripts/simplebench/bench_block_job.py
+++ b/scripts/simplebench/bench_block_job.py
@@ -28,6 +28,7 @@
 sys.path.append(os.path.join(os.path.dirname(__file__), '..', '..', 'python'))
 from qemu.machine import QEMUMachine
 from qemu.qmp import QMPConnectError
+from qemu.aqmp import ConnectError
 
 
 def bench_block_job(cmd, cmd_args, qemu_args):
@@ -49,7 +50,7 @@ def bench_block_job(cmd, cmd_args, qemu_args):
         vm.launch()
     except OSError as e:
         return {'error': 'popen failed: ' + str(e)}
-    except (QMPConnectError, socket.timeout):
+    except (QMPConnectError, ConnectError, socket.timeout):
         return {'error': 'qemu failed: ' + str(vm.get_log())}
 
     try:
diff --git a/tests/qemu-iotests/tests/mirror-top-perms b/tests/qemu-iotests/tests/mirror-top-perms
index 3d475aa3a54..a2d5c269d7a 100755
--- a/tests/qemu-iotests/tests/mirror-top-perms
+++ b/tests/qemu-iotests/tests/mirror-top-perms
@@ -21,8 +21,9 @@
 
 import os
 
-from qemu import qmp
+from qemu.aqmp import ConnectError
 from qemu.machine import machine
+from qemu.qmp import QMPConnectError
 
 import iotests
 from iotests import qemu_img
@@ -102,7 +103,7 @@ class TestMirrorTopPerms(iotests.QMPTestCase):
             self.vm_b.launch()
             print('ERROR: VM B launched successfully, this should not have '
                   'happened')
-        except qmp.QMPConnectError:
+        except (QMPConnectError, ConnectError):
             assert 'Is another process using the image' in self.vm_b.get_log()
 
         result = self.vm.qmp('block-job-cancel',
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 5/7] iotests: Conditionally silence certain AQMP errors
  2021-10-12 22:34 [PATCH v3 0/7] Switch iotests to using Async QMP John Snow
                   ` (3 preceding siblings ...)
  2021-10-12 22:34 ` [PATCH v3 4/7] iotests: Accommodate async QMP Exception classes John Snow
@ 2021-10-12 22:34 ` John Snow
  2021-10-13  8:20   ` Hanna Reitz
  2021-10-12 22:34 ` [PATCH v3 6/7] python/aqmp: Create sync QMP wrapper for iotests John Snow
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: John Snow @ 2021-10-12 22:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Eduardo Habkost,
	qemu-block, Hanna Reitz, Cleber Rosa, John Snow

AQMP likes to be very chatty about errors it encounters. In general,
this is good because it allows us to get good diagnostic information for
otherwise complex async failures.

For example, during a failed QMP connection attempt, we might see:

+ERROR:qemu.aqmp.qmp_client.qemub-2536319:Negotiation failed: EOFError
+ERROR:qemu.aqmp.qmp_client.qemub-2536319:Failed to establish session: EOFError

This might be nice in iotests output, because failure scenarios
involving the new QMP library will be spelled out plainly in the output
diffs.

For tests that are intentionally causing this scenario though, filtering
that log output could be a hassle. For now, add a context manager that
simply lets us toggle this output off during a critical region.

(Additionally, a forthcoming patch allows the use of either legacy or
async QMP to be toggled with an environment variable. In this
circumstance, we can't amend the iotest output to just always expect the
error message, either. Just suppress it for now. More rigorous log
filtering can be investigated later if/when it is deemed safe to
permanently replace the legacy QMP library.)

Signed-off-by: John Snow <jsnow@redhat.com>
---
 tests/qemu-iotests/iotests.py             | 20 +++++++++++++++++++-
 tests/qemu-iotests/tests/mirror-top-perms | 12 ++++++++----
 2 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index e5fff6ddcfc..e2f9d873ada 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -30,7 +30,7 @@
 import subprocess
 import sys
 import time
-from typing import (Any, Callable, Dict, Iterable,
+from typing import (Any, Callable, Dict, Iterable, Iterator,
                     List, Optional, Sequence, TextIO, Tuple, Type, TypeVar)
 import unittest
 
@@ -114,6 +114,24 @@
 sample_img_dir = os.environ['SAMPLE_IMG_DIR']
 
 
+@contextmanager
+def change_log_level(
+        logger_name: str, level: int = logging.CRITICAL) -> Iterator[None]:
+    """
+    Utility function for temporarily changing the log level of a logger.
+
+    This can be used to silence errors that are expected or uninteresting.
+    """
+    _logger = logging.getLogger(logger_name)
+    current_level = _logger.level
+    _logger.setLevel(level)
+
+    try:
+        yield
+    finally:
+        _logger.setLevel(current_level)
+
+
 def unarchive_sample_image(sample, fname):
     sample_fname = os.path.join(sample_img_dir, sample + '.bz2')
     with bz2.open(sample_fname) as f_in, open(fname, 'wb') as f_out:
diff --git a/tests/qemu-iotests/tests/mirror-top-perms b/tests/qemu-iotests/tests/mirror-top-perms
index a2d5c269d7a..0a51a613f39 100755
--- a/tests/qemu-iotests/tests/mirror-top-perms
+++ b/tests/qemu-iotests/tests/mirror-top-perms
@@ -26,7 +26,7 @@ from qemu.machine import machine
 from qemu.qmp import QMPConnectError
 
 import iotests
-from iotests import qemu_img
+from iotests import change_log_level, qemu_img
 
 
 image_size = 1 * 1024 * 1024
@@ -100,9 +100,13 @@ class TestMirrorTopPerms(iotests.QMPTestCase):
         self.vm_b.add_blockdev(f'file,node-name=drive0,filename={source}')
         self.vm_b.add_device('virtio-blk,drive=drive0,share-rw=on')
         try:
-            self.vm_b.launch()
-            print('ERROR: VM B launched successfully, this should not have '
-                  'happened')
+            # Silence AQMP errors temporarily.
+            # TODO: Remove this and just allow the errors to be logged when
+            # AQMP fully replaces QMP.
+            with change_log_level('qemu.aqmp'):
+                self.vm_b.launch()
+                print('ERROR: VM B launched successfully, '
+                      'this should not have happened')
         except (QMPConnectError, ConnectError):
             assert 'Is another process using the image' in self.vm_b.get_log()
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 6/7] python/aqmp: Create sync QMP wrapper for iotests
  2021-10-12 22:34 [PATCH v3 0/7] Switch iotests to using Async QMP John Snow
                   ` (4 preceding siblings ...)
  2021-10-12 22:34 ` [PATCH v3 5/7] iotests: Conditionally silence certain AQMP errors John Snow
@ 2021-10-12 22:34 ` John Snow
  2021-10-13  8:24   ` Hanna Reitz
  2021-10-12 22:34 ` [PATCH v3 7/7] python, iotests: replace qmp with aqmp John Snow
  2021-10-13  8:45 ` [PATCH v3 0/7] Switch iotests to using Async QMP Hanna Reitz
  7 siblings, 1 reply; 19+ messages in thread
From: John Snow @ 2021-10-12 22:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Eduardo Habkost,
	qemu-block, Hanna Reitz, Cleber Rosa, John Snow

This is a wrapper around the async QMPClient that mimics the old,
synchronous QEMUMonitorProtocol class. It is designed to be
interchangeable with the old implementation.

It does not, however, attempt to mimic Exception compatibility.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 python/qemu/aqmp/legacy.py | 138 +++++++++++++++++++++++++++++++++++++
 1 file changed, 138 insertions(+)
 create mode 100644 python/qemu/aqmp/legacy.py

diff --git a/python/qemu/aqmp/legacy.py b/python/qemu/aqmp/legacy.py
new file mode 100644
index 00000000000..9e7b9fb80b9
--- /dev/null
+++ b/python/qemu/aqmp/legacy.py
@@ -0,0 +1,138 @@
+"""
+Sync QMP Wrapper
+
+This class pretends to be qemu.qmp.QEMUMonitorProtocol.
+"""
+
+import asyncio
+from typing import (
+    Awaitable,
+    List,
+    Optional,
+    TypeVar,
+    Union,
+)
+
+import qemu.qmp
+from qemu.qmp import QMPMessage, QMPReturnValue, SocketAddrT
+
+from .qmp_client import QMPClient
+
+
+# pylint: disable=missing-docstring
+
+
+class QEMUMonitorProtocol(qemu.qmp.QEMUMonitorProtocol):
+    def __init__(self, address: SocketAddrT,
+                 server: bool = False,
+                 nickname: Optional[str] = None):
+
+        # pylint: disable=super-init-not-called
+        self._aqmp = QMPClient(nickname)
+        self._aloop = asyncio.get_event_loop()
+        self._address = address
+        self._timeout: Optional[float] = None
+
+    _T = TypeVar('_T')
+
+    def _sync(
+            self, future: Awaitable[_T], timeout: Optional[float] = None
+    ) -> _T:
+        return self._aloop.run_until_complete(
+            asyncio.wait_for(future, timeout=timeout)
+        )
+
+    def _get_greeting(self) -> Optional[QMPMessage]:
+        if self._aqmp.greeting is not None:
+            # pylint: disable=protected-access
+            return self._aqmp.greeting._asdict()
+        return None
+
+    # __enter__ and __exit__ need no changes
+    # parse_address needs no changes
+
+    def connect(self, negotiate: bool = True) -> Optional[QMPMessage]:
+        self._aqmp.await_greeting = negotiate
+        self._aqmp.negotiate = negotiate
+
+        self._sync(
+            self._aqmp.connect(self._address)
+        )
+        return self._get_greeting()
+
+    def accept(self, timeout: Optional[float] = 15.0) -> QMPMessage:
+        self._aqmp.await_greeting = True
+        self._aqmp.negotiate = True
+
+        self._sync(
+            self._aqmp.accept(self._address),
+            timeout
+        )
+
+        ret = self._get_greeting()
+        assert ret is not None
+        return ret
+
+    def cmd_obj(self, qmp_cmd: QMPMessage) -> QMPMessage:
+        return dict(
+            self._sync(
+                # pylint: disable=protected-access
+
+                # _raw() isn't a public API, because turning off
+                # automatic ID assignment is discouraged. For
+                # compatibility with iotests *only*, do it anyway.
+                self._aqmp._raw(qmp_cmd, assign_id=False),
+                self._timeout
+            )
+        )
+
+    # Default impl of cmd() delegates to cmd_obj
+
+    def command(self, cmd: str, **kwds: object) -> QMPReturnValue:
+        return self._sync(
+            self._aqmp.execute(cmd, kwds),
+            self._timeout
+        )
+
+    def pull_event(self,
+                   wait: Union[bool, float] = False) -> Optional[QMPMessage]:
+        if not wait:
+            # wait is False/0: "do not wait, do not except."
+            if self._aqmp.events.empty():
+                return None
+
+        # If wait is 'True', wait forever. If wait is False/0, the events
+        # queue must not be empty; but it still needs some real amount
+        # of time to complete.
+        timeout = None
+        if wait and isinstance(wait, float):
+            timeout = wait
+
+        return dict(
+            self._sync(
+                self._aqmp.events.get(),
+                timeout
+            )
+        )
+
+    def get_events(self, wait: Union[bool, float] = False) -> List[QMPMessage]:
+        events = [dict(x) for x in self._aqmp.events.clear()]
+        if events:
+            return events
+
+        event = self.pull_event(wait)
+        return [event] if event is not None else []
+
+    def clear_events(self) -> None:
+        self._aqmp.events.clear()
+
+    def close(self) -> None:
+        self._sync(
+            self._aqmp.disconnect()
+        )
+
+    def settimeout(self, timeout: Optional[float]) -> None:
+        self._timeout = timeout
+
+    def send_fd_scm(self, fd: int) -> None:
+        self._aqmp.send_fd_scm(fd)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 7/7] python, iotests: replace qmp with aqmp
  2021-10-12 22:34 [PATCH v3 0/7] Switch iotests to using Async QMP John Snow
                   ` (5 preceding siblings ...)
  2021-10-12 22:34 ` [PATCH v3 6/7] python/aqmp: Create sync QMP wrapper for iotests John Snow
@ 2021-10-12 22:34 ` John Snow
  2021-10-13 14:21   ` Eric Blake
  2021-10-13  8:45 ` [PATCH v3 0/7] Switch iotests to using Async QMP Hanna Reitz
  7 siblings, 1 reply; 19+ messages in thread
From: John Snow @ 2021-10-12 22:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Eduardo Habkost,
	qemu-block, Hanna Reitz, Cleber Rosa, John Snow

Swap out the synchronous QEMUMonitorProtocol from qemu.qmp with the sync
wrapper from qemu.aqmp instead.

Add an escape hatch in the form of the environment variable
QEMU_PYTHON_LEGACY_QMP which allows you to cajole QEMUMachine into using
the old implementatin, proving that both implementations work
concurrently.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Tested-by: Hanna Reitz <hreitz@redhat.com>
---
 python/qemu/machine/machine.py | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
index a0cf69786b4..a487c397459 100644
--- a/python/qemu/machine/machine.py
+++ b/python/qemu/machine/machine.py
@@ -41,7 +41,6 @@
 )
 
 from qemu.qmp import (  # pylint: disable=import-error
-    QEMUMonitorProtocol,
     QMPMessage,
     QMPReturnValue,
     SocketAddrT,
@@ -50,6 +49,12 @@
 from . import console_socket
 
 
+if os.environ.get('QEMU_PYTHON_LEGACY_QMP'):
+    from qemu.qmp import QEMUMonitorProtocol
+else:
+    from qemu.aqmp.legacy import QEMUMonitorProtocol
+
+
 LOG = logging.getLogger(__name__)
 
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/7] python/machine: Handle QMP errors on close more meticulously
  2021-10-12 22:34 ` [PATCH v3 2/7] python/machine: Handle QMP errors on close more meticulously John Snow
@ 2021-10-13  7:33   ` Hanna Reitz
  0 siblings, 0 replies; 19+ messages in thread
From: Hanna Reitz @ 2021-10-13  7:33 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Eduardo Habkost,
	qemu-block, Cleber Rosa

On 13.10.21 00:34, John Snow wrote:
> To use the AQMP backend, Machine just needs to be a little more diligent
> about what happens when closing a QMP connection. The operation is no
> longer a freebie in the async world; it may return errors encountered in
> the async bottom half on incoming message receipt, etc.
>
> (AQMP's disconnect, ultimately, serves as the quiescence point where all
> async contexts are gathered together, and any final errors reported at
> that point.)
>
> Because async QMP continues to check for messages asynchronously, it's
> almost certainly likely that the loop will have exited due to EOF after
> issuing the last 'quit' command. That error will ultimately be bubbled
> up when attempting to close the QMP connection. The manager class here
> then is free to discard it -- if it was expected.
>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>   python/qemu/machine/machine.py | 48 +++++++++++++++++++++++++++++-----
>   1 file changed, 42 insertions(+), 6 deletions(-)

Reviewed-by: Hanna Reitz <hreitz@redhat.com>



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 3/7] python/aqmp: Remove scary message
  2021-10-12 22:34 ` [PATCH v3 3/7] python/aqmp: Remove scary message John Snow
@ 2021-10-13  7:35   ` Hanna Reitz
  0 siblings, 0 replies; 19+ messages in thread
From: Hanna Reitz @ 2021-10-13  7:35 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Eduardo Habkost,
	qemu-block, Cleber Rosa

On 13.10.21 00:34, John Snow wrote:
> The scary message interferes with the iotests output. Coincidentally, if
> iotests works by removing this, then it's good evidence that we don't
> really need to scare people away from using it.
>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>   python/qemu/aqmp/__init__.py | 12 ------------
>   1 file changed, 12 deletions(-)

Acked-by: Hanna Reitz <hreitz@redhat.com>



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/7] iotests: Accommodate async QMP Exception classes
  2021-10-12 22:34 ` [PATCH v3 4/7] iotests: Accommodate async QMP Exception classes John Snow
@ 2021-10-13  7:38   ` Hanna Reitz
  0 siblings, 0 replies; 19+ messages in thread
From: Hanna Reitz @ 2021-10-13  7:38 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Eduardo Habkost,
	qemu-block, Cleber Rosa

On 13.10.21 00:34, John Snow wrote:
> (But continue to support the old ones for now, too.)
>
> There are very few cases of any user of QEMUMachine or a subclass
> thereof relying on a QMP Exception type. If you'd like to check for
> yourself, you want to grep for all of the derivatives of QMPError,
> excluding 'AQMPError' and its derivatives. That'd be these:
>
> - QMPError
> - QMPConnectError
> - QMPCapabilitiesError
> - QMPTimeoutError
> - QMPProtocolError
> - QMPResponseError
> - QMPBadPortError
>
>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>   scripts/simplebench/bench_block_job.py    | 3 ++-
>   tests/qemu-iotests/tests/mirror-top-perms | 5 +++--
>   2 files changed, 5 insertions(+), 3 deletions(-)

Reviewed-by: Hanna Reitz <hreitz@redhat.com>



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 5/7] iotests: Conditionally silence certain AQMP errors
  2021-10-12 22:34 ` [PATCH v3 5/7] iotests: Conditionally silence certain AQMP errors John Snow
@ 2021-10-13  8:20   ` Hanna Reitz
  0 siblings, 0 replies; 19+ messages in thread
From: Hanna Reitz @ 2021-10-13  8:20 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Eduardo Habkost,
	qemu-block, Cleber Rosa

On 13.10.21 00:34, John Snow wrote:
> AQMP likes to be very chatty about errors it encounters. In general,
> this is good because it allows us to get good diagnostic information for
> otherwise complex async failures.
>
> For example, during a failed QMP connection attempt, we might see:
>
> +ERROR:qemu.aqmp.qmp_client.qemub-2536319:Negotiation failed: EOFError
> +ERROR:qemu.aqmp.qmp_client.qemub-2536319:Failed to establish session: EOFError
>
> This might be nice in iotests output, because failure scenarios
> involving the new QMP library will be spelled out plainly in the output
> diffs.
>
> For tests that are intentionally causing this scenario though, filtering
> that log output could be a hassle. For now, add a context manager that
> simply lets us toggle this output off during a critical region.
>
> (Additionally, a forthcoming patch allows the use of either legacy or
> async QMP to be toggled with an environment variable. In this
> circumstance, we can't amend the iotest output to just always expect the
> error message, either. Just suppress it for now. More rigorous log
> filtering can be investigated later if/when it is deemed safe to
> permanently replace the legacy QMP library.)
>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>   tests/qemu-iotests/iotests.py             | 20 +++++++++++++++++++-
>   tests/qemu-iotests/tests/mirror-top-perms | 12 ++++++++----
>   2 files changed, 27 insertions(+), 5 deletions(-)

Reviewed-by: Hanna Reitz <hreitz@redhat.com>



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 6/7] python/aqmp: Create sync QMP wrapper for iotests
  2021-10-12 22:34 ` [PATCH v3 6/7] python/aqmp: Create sync QMP wrapper for iotests John Snow
@ 2021-10-13  8:24   ` Hanna Reitz
  0 siblings, 0 replies; 19+ messages in thread
From: Hanna Reitz @ 2021-10-13  8:24 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Eduardo Habkost,
	qemu-block, Cleber Rosa

On 13.10.21 00:34, John Snow wrote:
> This is a wrapper around the async QMPClient that mimics the old,
> synchronous QEMUMonitorProtocol class. It is designed to be
> interchangeable with the old implementation.
>
> It does not, however, attempt to mimic Exception compatibility.
>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>   python/qemu/aqmp/legacy.py | 138 +++++++++++++++++++++++++++++++++++++
>   1 file changed, 138 insertions(+)
>   create mode 100644 python/qemu/aqmp/legacy.py

Acked-by: Hanna Reitz <hreitz@redhat.com>



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 0/7] Switch iotests to using Async QMP
  2021-10-12 22:34 [PATCH v3 0/7] Switch iotests to using Async QMP John Snow
                   ` (6 preceding siblings ...)
  2021-10-12 22:34 ` [PATCH v3 7/7] python, iotests: replace qmp with aqmp John Snow
@ 2021-10-13  8:45 ` Hanna Reitz
  2021-10-13 12:51   ` John Snow
  7 siblings, 1 reply; 19+ messages in thread
From: Hanna Reitz @ 2021-10-13  8:45 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Eduardo Habkost,
	qemu-block, Cleber Rosa

On 13.10.21 00:34, John Snow wrote:
> Based-on: <20211012214152.802483-1-jsnow@redhat.com>
>            [PULL 00/10] Python patches
> GitLab: https://gitlab.com/jsnow/qemu/-/commits/python-aqmp-iotest-wrapper
> CI: https://gitlab.com/jsnow/qemu/-/pipelines/387210591
>
> Hiya,
>
> This series continues where the last two AQMP series left off and adds a
> synchronous 'legacy' wrapper around the new AQMP interface, then drops
> it straight into iotests to prove that AQMP is functional and totally
> cool and fine. The disruption and churn to iotests is pretty minimal.
>
> In the event that a regression happens and I am not physically proximate
> to inflict damage upon, one may set the QEMU_PYTHON_LEGACY_QMP variable
> to any non-empty string as it pleases you to engage the QMP machinery
> you are used to.
>
> I'd like to try and get this committed early in the 6.2 development
> cycle to give ample time to smooth over any possible regressions. I've
> tested it locally and via gitlab CI, across Python versions 3.6 through
> 3.10, and "worksforme". If something bad happens, we can revert the
> actual switch-flip very trivially.

So running iotests locally, I got one failure:

$ TEST_DIR=/tmp/vdi-tests ./check -c writethrough -vdi 300
[...]
300                             fail       [10:28:06] [10:28:11] 
5.1s                 output mismatch (see 300.out.bad)
--- /home/maxx/projects/qemu/tests/qemu-iotests/300.out
+++ 300.out.bad
@@ -1,4 +1,5 @@
-.......................................
+..............ERROR:qemu.aqmp.qmp_client.qemu-b-222963:Task.Reader: 
ConnectionResetError: [Errno 104] Connection reset by peer
+.........................
  ----------------------------------------------------------------------
  Ran 39 tests
[...]


I’m afraid I can’t really give a reproducer or anything.  It feels like 
just some random spurious timing-related error.  Although then again, 
300 does have an `except machine.AbnormalShutdown` clause at one 
point...  So perhaps that’s the culprit, and we need to disable logging 
there.

Hanna



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 0/7] Switch iotests to using Async QMP
  2021-10-13  8:45 ` [PATCH v3 0/7] Switch iotests to using Async QMP Hanna Reitz
@ 2021-10-13 12:51   ` John Snow
  2021-10-13 14:00     ` John Snow
  0 siblings, 1 reply; 19+ messages in thread
From: John Snow @ 2021-10-13 12:51 UTC (permalink / raw)
  To: Hanna Reitz
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Eduardo Habkost,
	qemu-block, qemu-devel, Cleber Rosa

[-- Attachment #1: Type: text/plain, Size: 2345 bytes --]

On Wed, Oct 13, 2021 at 4:45 AM Hanna Reitz <hreitz@redhat.com> wrote:

> On 13.10.21 00:34, John Snow wrote:
> > Based-on: <20211012214152.802483-1-jsnow@redhat.com>
> >            [PULL 00/10] Python patches
> > GitLab:
> https://gitlab.com/jsnow/qemu/-/commits/python-aqmp-iotest-wrapper
> > CI: https://gitlab.com/jsnow/qemu/-/pipelines/387210591
> >
> > Hiya,
> >
> > This series continues where the last two AQMP series left off and adds a
> > synchronous 'legacy' wrapper around the new AQMP interface, then drops
> > it straight into iotests to prove that AQMP is functional and totally
> > cool and fine. The disruption and churn to iotests is pretty minimal.
> >
> > In the event that a regression happens and I am not physically proximate
> > to inflict damage upon, one may set the QEMU_PYTHON_LEGACY_QMP variable
> > to any non-empty string as it pleases you to engage the QMP machinery
> > you are used to.
> >
> > I'd like to try and get this committed early in the 6.2 development
> > cycle to give ample time to smooth over any possible regressions. I've
> > tested it locally and via gitlab CI, across Python versions 3.6 through
> > 3.10, and "worksforme". If something bad happens, we can revert the
> > actual switch-flip very trivially.
>
> So running iotests locally, I got one failure:
>
> $ TEST_DIR=/tmp/vdi-tests ./check -c writethrough -vdi 300
> [...]
> 300                             fail       [10:28:06] [10:28:11]
> 5.1s                 output mismatch (see 300.out.bad)
> --- /home/maxx/projects/qemu/tests/qemu-iotests/300.out
> +++ 300.out.bad
> @@ -1,4 +1,5 @@
> -.......................................
> +..............ERROR:qemu.aqmp.qmp_client.qemu-b-222963:Task.Reader:
> ConnectionResetError: [Errno 104] Connection reset by peer
> +.........................
>   ----------------------------------------------------------------------
>   Ran 39 tests
> [...]
>
>
Oh, unfortunate.


>
> I’m afraid I can’t really give a reproducer or anything.  It feels like
>

Thank you for the report!


> just some random spurious timing-related error.  Although then again,
> 300 does have an `except machine.AbnormalShutdown` clause at one
> point...  So perhaps that’s the culprit, and we need to disable logging
> there.
>
>
I'll investigate!

[-- Attachment #2: Type: text/html, Size: 3553 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 0/7] Switch iotests to using Async QMP
  2021-10-13 12:51   ` John Snow
@ 2021-10-13 14:00     ` John Snow
  2021-10-13 14:49       ` Hanna Reitz
  0 siblings, 1 reply; 19+ messages in thread
From: John Snow @ 2021-10-13 14:00 UTC (permalink / raw)
  To: Hanna Reitz
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Eduardo Habkost,
	qemu-block, qemu-devel, Cleber Rosa

[-- Attachment #1: Type: text/plain, Size: 3252 bytes --]

On Wed, Oct 13, 2021 at 8:51 AM John Snow <jsnow@redhat.com> wrote:

>
>
> On Wed, Oct 13, 2021 at 4:45 AM Hanna Reitz <hreitz@redhat.com> wrote:
>
>> On 13.10.21 00:34, John Snow wrote:
>> > Based-on: <20211012214152.802483-1-jsnow@redhat.com>
>> >            [PULL 00/10] Python patches
>> > GitLab:
>> https://gitlab.com/jsnow/qemu/-/commits/python-aqmp-iotest-wrapper
>> > CI: https://gitlab.com/jsnow/qemu/-/pipelines/387210591
>> >
>> > Hiya,
>> >
>> > This series continues where the last two AQMP series left off and adds a
>> > synchronous 'legacy' wrapper around the new AQMP interface, then drops
>> > it straight into iotests to prove that AQMP is functional and totally
>> > cool and fine. The disruption and churn to iotests is pretty minimal.
>> >
>> > In the event that a regression happens and I am not physically proximate
>> > to inflict damage upon, one may set the QEMU_PYTHON_LEGACY_QMP variable
>> > to any non-empty string as it pleases you to engage the QMP machinery
>> > you are used to.
>> >
>> > I'd like to try and get this committed early in the 6.2 development
>> > cycle to give ample time to smooth over any possible regressions. I've
>> > tested it locally and via gitlab CI, across Python versions 3.6 through
>> > 3.10, and "worksforme". If something bad happens, we can revert the
>> > actual switch-flip very trivially.
>>
>> So running iotests locally, I got one failure:
>>
>> $ TEST_DIR=/tmp/vdi-tests ./check -c writethrough -vdi 300
>> [...]
>> 300                             fail       [10:28:06] [10:28:11]
>> 5.1s                 output mismatch (see 300.out.bad)
>> --- /home/maxx/projects/qemu/tests/qemu-iotests/300.out
>> +++ 300.out.bad
>> @@ -1,4 +1,5 @@
>> -.......................................
>> +..............ERROR:qemu.aqmp.qmp_client.qemu-b-222963:Task.Reader:
>> ConnectionResetError: [Errno 104] Connection reset by peer
>> +.........................
>>   ----------------------------------------------------------------------
>>   Ran 39 tests
>> [...]
>>
>>
> Oh, unfortunate.
>
>
>>
>> I’m afraid I can’t really give a reproducer or anything.  It feels like
>>
>
> Thank you for the report!
>
>
>> just some random spurious timing-related error.  Although then again,
>> 300 does have an `except machine.AbnormalShutdown` clause at one
>> point...  So perhaps that’s the culprit, and we need to disable logging
>> there.
>>
>>
> I'll investigate!
>

Unfortunately, even in a loop some 150 times I couldn't reproduce this one.
As you point out, it appears to be just a failure caused by logging. The
test logic itself completes as expected.

Still, I would expect, on a "clean" shutdown of the destination host (where
the destination process fails to load the migration stream and voluntarily
exits with an error code) to end with a FIN/ACK for TCP or ... uh, whatever
happens for a UNIX socket. Where's the Connection Reset coming from? Did
the destination VM process *crash*?

I'm not so sure that I *should* silence this error, but I also can't
reproduce it at all to answer these questions, so uh. uhhh. I guess I will
just hammer it on a loop a few hundred times more and see if I get lucky.

[-- Attachment #2: Type: text/html, Size: 4816 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 7/7] python, iotests: replace qmp with aqmp
  2021-10-12 22:34 ` [PATCH v3 7/7] python, iotests: replace qmp with aqmp John Snow
@ 2021-10-13 14:21   ` Eric Blake
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Blake @ 2021-10-13 14:21 UTC (permalink / raw)
  To: John Snow
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Eduardo Habkost,
	qemu-block, qemu-devel, Hanna Reitz, Cleber Rosa

On Tue, Oct 12, 2021 at 06:34:45PM -0400, John Snow wrote:
> Swap out the synchronous QEMUMonitorProtocol from qemu.qmp with the sync
> wrapper from qemu.aqmp instead.
> 
> Add an escape hatch in the form of the environment variable
> QEMU_PYTHON_LEGACY_QMP which allows you to cajole QEMUMachine into using
> the old implementatin, proving that both implementations work

implementation

> concurrently.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> Reviewed-by: Hanna Reitz <hreitz@redhat.com>
> Tested-by: Hanna Reitz <hreitz@redhat.com>
> ---
>  python/qemu/machine/machine.py | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 0/7] Switch iotests to using Async QMP
  2021-10-13 14:00     ` John Snow
@ 2021-10-13 14:49       ` Hanna Reitz
  2021-10-13 16:35         ` John Snow
  0 siblings, 1 reply; 19+ messages in thread
From: Hanna Reitz @ 2021-10-13 14:49 UTC (permalink / raw)
  To: John Snow
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Eduardo Habkost,
	qemu-block, qemu-devel, Cleber Rosa

On 13.10.21 16:00, John Snow wrote:
>
>
> On Wed, Oct 13, 2021 at 8:51 AM John Snow <jsnow@redhat.com> wrote:
>
>
>
>     On Wed, Oct 13, 2021 at 4:45 AM Hanna Reitz <hreitz@redhat.com> wrote:
>
>         On 13.10.21 00:34, John Snow wrote:
>         > Based-on: <20211012214152.802483-1-jsnow@redhat.com>
>         >            [PULL 00/10] Python patches
>         > GitLab:
>         https://gitlab.com/jsnow/qemu/-/commits/python-aqmp-iotest-wrapper
>         > CI: https://gitlab.com/jsnow/qemu/-/pipelines/387210591
>         >
>         > Hiya,
>         >
>         > This series continues where the last two AQMP series left
>         off and adds a
>         > synchronous 'legacy' wrapper around the new AQMP interface,
>         then drops
>         > it straight into iotests to prove that AQMP is functional
>         and totally
>         > cool and fine. The disruption and churn to iotests is pretty
>         minimal.
>         >
>         > In the event that a regression happens and I am not
>         physically proximate
>         > to inflict damage upon, one may set the
>         QEMU_PYTHON_LEGACY_QMP variable
>         > to any non-empty string as it pleases you to engage the QMP
>         machinery
>         > you are used to.
>         >
>         > I'd like to try and get this committed early in the 6.2
>         development
>         > cycle to give ample time to smooth over any possible
>         regressions. I've
>         > tested it locally and via gitlab CI, across Python versions
>         3.6 through
>         > 3.10, and "worksforme". If something bad happens, we can
>         revert the
>         > actual switch-flip very trivially.
>
>         So running iotests locally, I got one failure:
>
>         $ TEST_DIR=/tmp/vdi-tests ./check -c writethrough -vdi 300
>         [...]
>         300                             fail       [10:28:06] [10:28:11]
>         5.1s                 output mismatch (see 300.out.bad)
>         --- /home/maxx/projects/qemu/tests/qemu-iotests/300.out
>         +++ 300.out.bad
>         @@ -1,4 +1,5 @@
>         -.......................................
>         +..............ERROR:qemu.aqmp.qmp_client.qemu-b-222963:Task.Reader:
>
>         ConnectionResetError: [Errno 104] Connection reset by peer
>         +.........................
>           ----------------------------------------------------------------------
>           Ran 39 tests
>         [...]
>
>
>     Oh, unfortunate.
>
>
>         I’m afraid I can’t really give a reproducer or anything.  It
>         feels like
>
>
>     Thank you for the report!
>
>         just some random spurious timing-related error. Although then
>         again,
>         300 does have an `except machine.AbnormalShutdown` clause at one
>         point...  So perhaps that’s the culprit, and we need to
>         disable logging
>         there.
>
>
>     I'll investigate!
>
>
> Unfortunately, even in a loop some 150 times I couldn't reproduce this 
> one. As you point out, it appears to be just a failure caused by 
> logging. The test logic itself completes as expected.
>
> Still, I would expect, on a "clean" shutdown of the destination host 
> (where the destination process fails to load the migration stream and 
> voluntarily exits with an error code) to end with a FIN/ACK for TCP or 
> ... uh, whatever happens for a UNIX socket. Where's the Connection 
> Reset coming from? Did the destination VM process *crash*?
>
> I'm not so sure that I *should* silence this error, but I also can't 
> reproduce it at all to answer these questions, so uh. uhhh. I guess I 
> will just hammer it on a loop a few hundred times more and see if I 
> get lucky.

I could reproduce it, by running 20 instances concurrently.  (Needs a 
change to testrunner.py, so that the reference outputs don’t collide:

diff --git a/tests/qemu-iotests/testrunner.py 
b/tests/qemu-iotests/testrunner.py
index a56b6da396..fd0a3a1eeb 100644
--- a/tests/qemu-iotests/testrunner.py
+++ b/tests/qemu-iotests/testrunner.py
@@ -221,7 +221,7 @@ def find_reference(self, test: str) -> str:

      def do_run_test(self, test: str) -> TestResult:
          f_test = Path(test)
-        f_bad = Path(f_test.name + '.out.bad')
+        f_bad = Path(f'{os.getpid()}-{f_test.name}.out.bad')
          f_notrun = Path(f_test.name + '.notrun')
          f_casenotrun = Path(f_test.name + '.casenotrun')
          f_reference = Path(self.find_reference(test))

)

And then:

$ while TEST_DIR=/tmp/vdi-$$ ./check -vdi 300; do; done

Which pretty quickly shows the error in at least one of those loops 
(under a minute).

As far as I can tell, changing the log level in 300 does indeed fix it:

diff --git a/tests/qemu-iotests/300 b/tests/qemu-iotests/300
index 10f9f2a8da..096f5dabf0 100755
--- a/tests/qemu-iotests/300
+++ b/tests/qemu-iotests/300
@@ -27,6 +27,7 @@ from typing import Dict, List, Optional
  from qemu.machine import machine

  import iotests
+from iotests import change_log_level


  BlockBitmapMapping = List[Dict[str, object]]
@@ -464,7 +465,8 @@ class 
TestBlockBitmapMappingErrors(TestDirtyBitmapMigration):
          # Expect abnormal shutdown of the destination VM because of
          # the failed migration
          try:
-            self.vm_b.shutdown()
+            with change_log_level('qemu.aqmp'):
+                self.vm_b.shutdown()
          except machine.AbnormalShutdown:
              pass



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 0/7] Switch iotests to using Async QMP
  2021-10-13 14:49       ` Hanna Reitz
@ 2021-10-13 16:35         ` John Snow
  0 siblings, 0 replies; 19+ messages in thread
From: John Snow @ 2021-10-13 16:35 UTC (permalink / raw)
  To: Hanna Reitz
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Eduardo Habkost,
	qemu-block, qemu-devel, Cleber Rosa

[-- Attachment #1: Type: text/plain, Size: 6930 bytes --]

On Wed, Oct 13, 2021 at 10:49 AM Hanna Reitz <hreitz@redhat.com> wrote:

> On 13.10.21 16:00, John Snow wrote:
> >
> >
> > On Wed, Oct 13, 2021 at 8:51 AM John Snow <jsnow@redhat.com> wrote:
> >
> >
> >
> >     On Wed, Oct 13, 2021 at 4:45 AM Hanna Reitz <hreitz@redhat.com>
> wrote:
> >
> >         On 13.10.21 00:34, John Snow wrote:
> >         > Based-on: <20211012214152.802483-1-jsnow@redhat.com>
> >         >            [PULL 00/10] Python patches
> >         > GitLab:
> >
> https://gitlab.com/jsnow/qemu/-/commits/python-aqmp-iotest-wrapper
> >         > CI: https://gitlab.com/jsnow/qemu/-/pipelines/387210591
> >         >
> >         > Hiya,
> >         >
> >         > This series continues where the last two AQMP series left
> >         off and adds a
> >         > synchronous 'legacy' wrapper around the new AQMP interface,
> >         then drops
> >         > it straight into iotests to prove that AQMP is functional
> >         and totally
> >         > cool and fine. The disruption and churn to iotests is pretty
> >         minimal.
> >         >
> >         > In the event that a regression happens and I am not
> >         physically proximate
> >         > to inflict damage upon, one may set the
> >         QEMU_PYTHON_LEGACY_QMP variable
> >         > to any non-empty string as it pleases you to engage the QMP
> >         machinery
> >         > you are used to.
> >         >
> >         > I'd like to try and get this committed early in the 6.2
> >         development
> >         > cycle to give ample time to smooth over any possible
> >         regressions. I've
> >         > tested it locally and via gitlab CI, across Python versions
> >         3.6 through
> >         > 3.10, and "worksforme". If something bad happens, we can
> >         revert the
> >         > actual switch-flip very trivially.
> >
> >         So running iotests locally, I got one failure:
> >
> >         $ TEST_DIR=/tmp/vdi-tests ./check -c writethrough -vdi 300
> >         [...]
> >         300                             fail       [10:28:06] [10:28:11]
> >         5.1s                 output mismatch (see 300.out.bad)
> >         --- /home/maxx/projects/qemu/tests/qemu-iotests/300.out
> >         +++ 300.out.bad
> >         @@ -1,4 +1,5 @@
> >         -.......................................
> >
>  +..............ERROR:qemu.aqmp.qmp_client.qemu-b-222963:Task.Reader:
> >
> >         ConnectionResetError: [Errno 104] Connection reset by peer
> >         +.........................
> >
>    ----------------------------------------------------------------------
> >           Ran 39 tests
> >         [...]
> >
> >
> >     Oh, unfortunate.
> >
> >
> >         I’m afraid I can’t really give a reproducer or anything.  It
> >         feels like
> >
> >
> >     Thank you for the report!
> >
> >         just some random spurious timing-related error. Although then
> >         again,
> >         300 does have an `except machine.AbnormalShutdown` clause at one
> >         point...  So perhaps that’s the culprit, and we need to
> >         disable logging
> >         there.
> >
> >
> >     I'll investigate!
> >
> >
> > Unfortunately, even in a loop some 150 times I couldn't reproduce this
> > one. As you point out, it appears to be just a failure caused by
> > logging. The test logic itself completes as expected.
> >
> > Still, I would expect, on a "clean" shutdown of the destination host
> > (where the destination process fails to load the migration stream and
> > voluntarily exits with an error code) to end with a FIN/ACK for TCP or
> > ... uh, whatever happens for a UNIX socket. Where's the Connection
> > Reset coming from? Did the destination VM process *crash*?
> >
> > I'm not so sure that I *should* silence this error, but I also can't
> > reproduce it at all to answer these questions, so uh. uhhh. I guess I
> > will just hammer it on a loop a few hundred times more and see if I
> > get lucky.
>
> I could reproduce it, by running 20 instances concurrently.  (Needs a
> change to testrunner.py, so that the reference outputs don’t collide:
>
> diff --git a/tests/qemu-iotests/testrunner.py
> b/tests/qemu-iotests/testrunner.py
> index a56b6da396..fd0a3a1eeb 100644
> --- a/tests/qemu-iotests/testrunner.py
> +++ b/tests/qemu-iotests/testrunner.py
> @@ -221,7 +221,7 @@ def find_reference(self, test: str) -> str:
>
>       def do_run_test(self, test: str) -> TestResult:
>           f_test = Path(test)
> -        f_bad = Path(f_test.name + '.out.bad')
> +        f_bad = Path(f'{os.getpid()}-{f_test.name}.out.bad')
>           f_notrun = Path(f_test.name + '.notrun')
>           f_casenotrun = Path(f_test.name + '.casenotrun')
>           f_reference = Path(self.find_reference(test))
>
> )
>
> And then:
>
> $ while TEST_DIR=/tmp/vdi-$$ ./check -vdi 300; do; done
>
> Which pretty quickly shows the error in at least one of those loops
> (under a minute).
>
> As far as I can tell, changing the log level in 300 does indeed fix it:
>
> diff --git a/tests/qemu-iotests/300 b/tests/qemu-iotests/300
> index 10f9f2a8da..096f5dabf0 100755
> --- a/tests/qemu-iotests/300
> +++ b/tests/qemu-iotests/300
> @@ -27,6 +27,7 @@ from typing import Dict, List, Optional
>   from qemu.machine import machine
>
>   import iotests
> +from iotests import change_log_level
>
>
>   BlockBitmapMapping = List[Dict[str, object]]
> @@ -464,7 +465,8 @@ class
> TestBlockBitmapMappingErrors(TestDirtyBitmapMigration):
>           # Expect abnormal shutdown of the destination VM because of
>           # the failed migration
>           try:
> -            self.vm_b.shutdown()
> +            with change_log_level('qemu.aqmp'):
> +                self.vm_b.shutdown()
>           except machine.AbnormalShutdown:
>               pass
>
>
Thanks for this! That helped a lot.

Long story short, what is happening here is that in some cases, we manage
to issue a 'quit' command before we acknowledge the EOF from the target VM,
so we get an 'ExecInterruptedError' from the top half and ECONNRESET from
the bottom half.
This gives me some faith that there's no more sinister problem going on
here.

There's really no way to "universally" solve this, though: It's always
going to be a problem that *maybe* sometimes when we go to send a command
that, no matter how recently we checked, the peer appeared to be online and
then upon sending a message we found out that isn't true.

However, there's a cuter way to solve this particular instance: just change
test 300 to wait() for the VM to self-exit instead of trying to shut it
down ourselves. That avoids the race condition in exactly what type of
error message we find when shutting down a dead-or-dying VM.

Thanks for the assistance on debugging this one.

--js

[-- Attachment #2: Type: text/html, Size: 9401 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-10-13 16:36 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-12 22:34 [PATCH v3 0/7] Switch iotests to using Async QMP John Snow
2021-10-12 22:34 ` [PATCH v3 1/7] python/machine: remove has_quit argument John Snow
2021-10-12 22:34 ` [PATCH v3 2/7] python/machine: Handle QMP errors on close more meticulously John Snow
2021-10-13  7:33   ` Hanna Reitz
2021-10-12 22:34 ` [PATCH v3 3/7] python/aqmp: Remove scary message John Snow
2021-10-13  7:35   ` Hanna Reitz
2021-10-12 22:34 ` [PATCH v3 4/7] iotests: Accommodate async QMP Exception classes John Snow
2021-10-13  7:38   ` Hanna Reitz
2021-10-12 22:34 ` [PATCH v3 5/7] iotests: Conditionally silence certain AQMP errors John Snow
2021-10-13  8:20   ` Hanna Reitz
2021-10-12 22:34 ` [PATCH v3 6/7] python/aqmp: Create sync QMP wrapper for iotests John Snow
2021-10-13  8:24   ` Hanna Reitz
2021-10-12 22:34 ` [PATCH v3 7/7] python, iotests: replace qmp with aqmp John Snow
2021-10-13 14:21   ` Eric Blake
2021-10-13  8:45 ` [PATCH v3 0/7] Switch iotests to using Async QMP Hanna Reitz
2021-10-13 12:51   ` John Snow
2021-10-13 14:00     ` John Snow
2021-10-13 14:49       ` Hanna Reitz
2021-10-13 16:35         ` John Snow

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.