All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] block/nbd: Move s->ioc on AioContext change
@ 2022-02-03 16:30 Hanna Reitz
  2022-02-03 16:30 ` [PATCH 1/7] block/nbd: Delete reconnect delay timer when done Hanna Reitz
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Hanna Reitz @ 2022-02-03 16:30 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Vladimir Sementsov-Ogievskiy,
	Eric Blake, qemu-devel

Hi,

I’ve sent an RFC for this before, which can be found here:

https://lists.nongnu.org/archive/html/qemu-block/2022-01/msg00765.html

...and is basically patch 6 in this series.

That was an RFC for two reasons:
(1) I didn’t know what to do with the two timers that the NBD BDS has
    (the open timer and the reconnect delay timer), and
(2) it didn’t include a regression test.

This v1 addresses (2) in the obvious manner (by adding a test), and (1)
like Vladimir has proposed, namely by asserting there are no timers on
AioContext change, because there shouldn’t by any.

The problem is that that assertion is wrong for both timers.  As far as
I can tell, both of them are created so they will cancel the respective
(re-)connection attempt after a user-configurable interval.  However,
they are not deleted when that attempt succeeds (or otherwise returns
before the interval).  So if the attempt does succeed, both of them will
persist for however long they are configured, and they are are never
disarmed/deleted anywhere, not even when the BDS is freed.

That’s a problem beyond “what do I do with them on AioContext change”,
because it means if you delete the BDS when one of those timers is still
active, the timer will still fire afterwards and access (and
dereference!) freed data.

The solution should be clear, though, because as Vladimir has said, they
simply shouldn’t persist.  So once the respective (re-)connection
attempt returns, this series makes it so they are deleted (patches 1 and
2).

Patch 3 adds an assertion that the timers are gone when the BDS is
closed, so that we definitely won’t run into a situation where they
access freed data.


Hanna Reitz (7):
  block/nbd: Delete reconnect delay timer when done
  block/nbd: Delete open timer when done
  block/nbd: Assert there are no timers when closed
  iotests.py: Add QemuStorageDaemon class
  iotests/281: Test lingering timers
  block/nbd: Move s->ioc on AioContext change
  iotests/281: Let NBD connection yield in iothread

 block/nbd.c                   |  64 +++++++++++++++++++++
 tests/qemu-iotests/281        | 101 +++++++++++++++++++++++++++++++++-
 tests/qemu-iotests/281.out    |   4 +-
 tests/qemu-iotests/iotests.py |  42 ++++++++++++++
 4 files changed, 207 insertions(+), 4 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/7] block/nbd: Delete reconnect delay timer when done
  2022-02-03 16:30 [PATCH 0/7] block/nbd: Move s->ioc on AioContext change Hanna Reitz
@ 2022-02-03 16:30 ` Hanna Reitz
  2022-02-04  8:50   ` Vladimir Sementsov-Ogievskiy
  2022-02-03 16:30 ` [PATCH 2/7] block/nbd: Delete open " Hanna Reitz
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Hanna Reitz @ 2022-02-03 16:30 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Vladimir Sementsov-Ogievskiy,
	Eric Blake, qemu-devel

We start the reconnect delay timer to cancel the reconnection attempt
after a while.  Once nbd_co_do_establish_connection() has returned, this
attempt is over, and we no longer need the timer.

Delete it before returning from nbd_reconnect_attempt(), so that it does
not persist beyond the I/O request that was paused for reconnecting; we
do not want it to fire in a drained section, because all sort of things
can happen in such a section (e.g. the AioContext might be changed, and
we do not want the timer to fire in the wrong context; or the BDS might
even be deleted, and so the timer CB would access already-freed data).

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 block/nbd.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index 63dbfa807d..16cd7fef77 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -381,6 +381,13 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
     }
 
     nbd_co_do_establish_connection(s->bs, NULL);
+
+    /*
+     * The reconnect attempt is done (maybe successfully, maybe not), so
+     * we no longer need this timer.  Delete it so it will not outlive
+     * this I/O request (so draining removes all timers).
+     */
+    reconnect_delay_timer_del(s);
 }
 
 static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t handle)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/7] block/nbd: Delete open timer when done
  2022-02-03 16:30 [PATCH 0/7] block/nbd: Move s->ioc on AioContext change Hanna Reitz
  2022-02-03 16:30 ` [PATCH 1/7] block/nbd: Delete reconnect delay timer when done Hanna Reitz
@ 2022-02-03 16:30 ` Hanna Reitz
  2022-02-04  8:52   ` Vladimir Sementsov-Ogievskiy
  2022-02-03 16:30 ` [PATCH 3/7] block/nbd: Assert there are no timers when closed Hanna Reitz
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Hanna Reitz @ 2022-02-03 16:30 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Vladimir Sementsov-Ogievskiy,
	Eric Blake, qemu-devel

We start the open timer to cancel the connection attempt after a while.
Once nbd_do_establish_connection() has returned, the attempt is over,
and we no longer need the timer.

Delete it before returning from nbd_open(), so that it does not persist
for longer.  It has no use after nbd_open(), and just like the reconnect
delay timer, it might well be dangerous if it were to fire afterwards.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 block/nbd.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index 16cd7fef77..5ff8a57314 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1885,11 +1885,19 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
         goto fail;
     }
 
+    /*
+     * The connect attempt is done, so we no longer need this timer.
+     * Delete it, because we do not want it to be around when this node
+     * is drained or closed.
+     */
+    open_timer_del(s);
+
     nbd_client_connection_enable_retry(s->conn);
 
     return 0;
 
 fail:
+    open_timer_del(s);
     nbd_clear_bdrvstate(bs);
     return ret;
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/7] block/nbd: Assert there are no timers when closed
  2022-02-03 16:30 [PATCH 0/7] block/nbd: Move s->ioc on AioContext change Hanna Reitz
  2022-02-03 16:30 ` [PATCH 1/7] block/nbd: Delete reconnect delay timer when done Hanna Reitz
  2022-02-03 16:30 ` [PATCH 2/7] block/nbd: Delete open " Hanna Reitz
@ 2022-02-03 16:30 ` Hanna Reitz
  2022-02-04  8:54   ` Vladimir Sementsov-Ogievskiy
  2022-02-03 16:30 ` [PATCH 4/7] iotests.py: Add QemuStorageDaemon class Hanna Reitz
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Hanna Reitz @ 2022-02-03 16:30 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Vladimir Sementsov-Ogievskiy,
	Eric Blake, qemu-devel

Our two timers must not remain armed beyond nbd_clear_bdrvstate(), or
they will access freed data when they fire.

This patch is separate from the patches that actually fix the issue
(HEAD^^ and HEAD^) so that you can run the associated regression iotest
(281) on a configuration that reproducibly exposes the bug.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 block/nbd.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index 5ff8a57314..dc6c3f3bbc 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -110,6 +110,10 @@ static void nbd_clear_bdrvstate(BlockDriverState *bs)
 
     yank_unregister_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name));
 
+    /* Must not leave timers behind that would access freed data */
+    assert(!s->reconnect_delay_timer);
+    assert(!s->open_timer);
+
     object_unref(OBJECT(s->tlscreds));
     qapi_free_SocketAddress(s->saddr);
     s->saddr = NULL;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/7] iotests.py: Add QemuStorageDaemon class
  2022-02-03 16:30 [PATCH 0/7] block/nbd: Move s->ioc on AioContext change Hanna Reitz
                   ` (2 preceding siblings ...)
  2022-02-03 16:30 ` [PATCH 3/7] block/nbd: Assert there are no timers when closed Hanna Reitz
@ 2022-02-03 16:30 ` Hanna Reitz
  2022-02-04  9:04   ` Vladimir Sementsov-Ogievskiy
  2022-02-03 16:30 ` [PATCH 5/7] iotests/281: Test lingering timers Hanna Reitz
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Hanna Reitz @ 2022-02-03 16:30 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Vladimir Sementsov-Ogievskiy,
	Eric Blake, qemu-devel

This is a rather simple class that allows creating a QSD instance
running in the background and stopping it when no longer needed.

The __del__ handler is a safety net for when something goes so wrong in
a test that e.g. the tearDown() method is not called (e.g. setUp()
launches the QSD, but then launching a VM fails).  We do not want the
QSD to continue running after the test has failed, so __del__() will
take care to kill it.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 tests/qemu-iotests/iotests.py | 42 +++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 8cdb381f2a..c75e402b87 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -73,6 +73,8 @@
 qemu_prog = os.environ.get('QEMU_PROG', 'qemu')
 qemu_opts = os.environ.get('QEMU_OPTIONS', '').strip().split(' ')
 
+qsd_prog = os.environ.get('QSD_PROG', 'qemu-storage-daemon')
+
 gdb_qemu_env = os.environ.get('GDB_OPTIONS')
 qemu_gdb = []
 if gdb_qemu_env:
@@ -345,6 +347,46 @@ def cmd(self, cmd):
         return self._read_output()
 
 
+class QemuStorageDaemon:
+    def __init__(self, *args: str, instance_id: Optional[str] = None):
+        if not instance_id:
+            instance_id = 'a'
+
+        self.pidfile = os.path.join(test_dir, f'qsd-{instance_id}-pid')
+        all_args = [qsd_prog] + list(args) + ['--pidfile', self.pidfile]
+
+        # Cannot use with here, we want the subprocess to stay around
+        # pylint: disable=consider-using-with
+        self._p = subprocess.Popen(all_args)
+        while not os.path.exists(self.pidfile):
+            if self._p.poll() is not None:
+                cmd = ' '.join(all_args)
+                raise RuntimeError(
+                    'qemu-storage-daemon terminated with exit code ' +
+                    f'{self._p.returncode}: {cmd}')
+
+            time.sleep(0.01)
+
+        with open(self.pidfile, encoding='utf-8') as f:
+            self._pid = int(f.read().strip())
+
+        assert self._pid == self._p.pid
+
+    def stop(self, kill_signal=15):
+        self._p.send_signal(kill_signal)
+        self._p.wait()
+        self._p = None
+
+        try:
+            os.remove(self.pidfile)
+        except OSError:
+            pass
+
+    def __del__(self):
+        if self._p is not None:
+            self.stop(kill_signal=9)
+
+
 def qemu_nbd(*args):
     '''Run qemu-nbd in daemon mode and return the parent's exit code'''
     return subprocess.call(qemu_nbd_args + ['--fork'] + list(args))
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/7] iotests/281: Test lingering timers
  2022-02-03 16:30 [PATCH 0/7] block/nbd: Move s->ioc on AioContext change Hanna Reitz
                   ` (3 preceding siblings ...)
  2022-02-03 16:30 ` [PATCH 4/7] iotests.py: Add QemuStorageDaemon class Hanna Reitz
@ 2022-02-03 16:30 ` Hanna Reitz
  2022-02-04  9:17   ` Vladimir Sementsov-Ogievskiy
  2022-02-03 16:30 ` [PATCH 6/7] block/nbd: Move s->ioc on AioContext change Hanna Reitz
  2022-02-03 16:30 ` [PATCH 7/7] iotests/281: Let NBD connection yield in iothread Hanna Reitz
  6 siblings, 1 reply; 16+ messages in thread
From: Hanna Reitz @ 2022-02-03 16:30 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Vladimir Sementsov-Ogievskiy,
	Eric Blake, qemu-devel

Prior to "block/nbd: Delete reconnect delay timer when done" and
"block/nbd: Delete open timer when done", both of those timers would
remain scheduled even after successfully (re-)connecting to the server,
and they would not even be deleted when the BDS is deleted.

This test constructs exactly this situation:
(1) Configure an @open-timeout, so the open timer is armed, and
(2) Configure a @reconnect-delay and trigger a reconnect situation
    (which succeeds immediately), so the reconnect delay timer is armed.
Then we immediately delete the BDS, and sleep for longer than the
@open-timeout and @reconnect-delay.  Prior to said patches, this caused
one (or both) of the timer CBs to access already-freed data.

Accessing freed data may or may not crash, so this test can produce
false successes, but I do not know how to show the problem in a better
or more reliable way.  If you run this test on "block/nbd: Assert there
are no timers when closed" and without the fix patches mentioned above,
you should reliably see an assertion failure.
(But all other tests that use the reconnect delay timer (264 and 277)
will fail in that configuration, too; as will nbd-reconnect-on-open,
which uses the open timer.)

Remove this test from the quick group because of the two second sleep
this patch introduces.

(I decided to put this test case into 281, because the main bug this
series addresses is in the interaction of the NBD block driver and I/O
threads, which is precisely the scope of 281.  The test case for that
other bug will also be put into the test class added here.

Also, excuse the test class's name, I couldn't come up with anything
better.  The "yield" part will make sense two patches from now.)

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 tests/qemu-iotests/281     | 79 +++++++++++++++++++++++++++++++++++++-
 tests/qemu-iotests/281.out |  4 +-
 2 files changed, 79 insertions(+), 4 deletions(-)

diff --git a/tests/qemu-iotests/281 b/tests/qemu-iotests/281
index 318e333939..4fb3cd30dd 100755
--- a/tests/qemu-iotests/281
+++ b/tests/qemu-iotests/281
@@ -1,5 +1,5 @@
 #!/usr/bin/env python3
-# group: rw quick
+# group: rw
 #
 # Test cases for blockdev + IOThread interactions
 #
@@ -20,8 +20,9 @@
 #
 
 import os
+import time
 import iotests
-from iotests import qemu_img
+from iotests import qemu_img, QemuStorageDaemon
 
 image_len = 64 * 1024 * 1024
 
@@ -243,6 +244,80 @@ class TestBlockdevBackupAbort(iotests.QMPTestCase):
         # Hangs on failure, we expect this error.
         self.assert_qmp(result, 'error/class', 'GenericError')
 
+# Test for RHBZ#2033626
+class TestYieldingAndTimers(iotests.QMPTestCase):
+    sock = os.path.join(iotests.sock_dir, 'nbd.sock')
+    qsd = None
+
+    def setUp(self):
+        self.create_nbd_export()
+
+        # Simple VM with an NBD block device connected to the NBD export
+        # provided by the QSD
+        self.vm = iotests.VM()
+        self.vm.add_blockdev('nbd,node-name=nbd,server.type=unix,' +
+                             f'server.path={self.sock},export=exp,' +
+                             'reconnect-delay=1,open-timeout=1')
+
+        self.vm.launch()
+
+    def tearDown(self):
+        self.stop_nbd_export()
+        self.vm.shutdown()
+
+    def test_timers_with_blockdev_del(self):
+        # The NBD BDS will have had an active open timer, because setUp() gave
+        # a positive value for @open-timeout.  It should be gone once the BDS
+        # has been opened.
+        # (But there used to be a bug where it remained active, which will
+        # become important below.)
+
+        # Stop and restart the NBD server, and do some I/O on the client to
+        # trigger a reconnect and start the reconnect delay timer
+        self.stop_nbd_export()
+        self.create_nbd_export()
+
+        result = self.vm.qmp('human-monitor-command',
+                             command_line='qemu-io nbd "write 0 512"')
+        self.assert_qmp(result, 'return', '')
+
+        # Reconnect is done, so the reconnect delay timer should be gone.
+        # (This is similar to how the open timer should be gone after open,
+        # and similarly there used to be a bug where it was not gone.)
+
+        # Delete the BDS to see whether both timers are gone.  If they are not,
+        # they will remain active, fire later, and then access freed data.
+        # (Or, with "block/nbd: Assert there are no timers when closed"
+        # applied, the assertions added in that patch will fail.)
+        result = self.vm.qmp('blockdev-del', node_name='nbd')
+        self.assert_qmp(result, 'return', {})
+
+        # Give the timers some time to fire (both have a timeout of 1 s).
+        # (Sleeping in an iotest may ring some alarm bells, but note that if
+        # the timing is off here, the test will just always pass.  If we kill
+        # the VM too early, then we just kill the timers before they can fire,
+        # thus not see the error, and so the test will pass.)
+        time.sleep(2)
+
+    def create_nbd_export(self):
+        assert self.qsd is None
+
+        # Simple NBD export of a null-co BDS
+        self.qsd = QemuStorageDaemon(
+            '--blockdev',
+            'null-co,node-name=null,read-zeroes=true',
+
+            '--nbd-server',
+            f'addr.type=unix,addr.path={self.sock}',
+
+            '--export',
+            'nbd,id=exp,node-name=null,name=exp,writable=true'
+        )
+
+    def stop_nbd_export(self):
+        self.qsd.stop()
+        self.qsd = None
+
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2'],
                  supported_protocols=['file'],
diff --git a/tests/qemu-iotests/281.out b/tests/qemu-iotests/281.out
index 89968f35d7..914e3737bd 100644
--- a/tests/qemu-iotests/281.out
+++ b/tests/qemu-iotests/281.out
@@ -1,5 +1,5 @@
-....
+.....
 ----------------------------------------------------------------------
-Ran 4 tests
+Ran 5 tests
 
 OK
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/7] block/nbd: Move s->ioc on AioContext change
  2022-02-03 16:30 [PATCH 0/7] block/nbd: Move s->ioc on AioContext change Hanna Reitz
                   ` (4 preceding siblings ...)
  2022-02-03 16:30 ` [PATCH 5/7] iotests/281: Test lingering timers Hanna Reitz
@ 2022-02-03 16:30 ` Hanna Reitz
  2022-02-04  9:20   ` Vladimir Sementsov-Ogievskiy
  2022-02-03 16:30 ` [PATCH 7/7] iotests/281: Let NBD connection yield in iothread Hanna Reitz
  6 siblings, 1 reply; 16+ messages in thread
From: Hanna Reitz @ 2022-02-03 16:30 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Vladimir Sementsov-Ogievskiy,
	Eric Blake, qemu-devel

s->ioc must always be attached to the NBD node's AioContext.  If that
context changes, s->ioc must be attached to the new context.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2033626
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 block/nbd.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index dc6c3f3bbc..5853d85d60 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -2055,6 +2055,42 @@ static void nbd_cancel_in_flight(BlockDriverState *bs)
     nbd_co_establish_connection_cancel(s->conn);
 }
 
+static void nbd_attach_aio_context(BlockDriverState *bs,
+                                   AioContext *new_context)
+{
+    BDRVNBDState *s = bs->opaque;
+
+    /* The open_timer is used only during nbd_open() */
+    assert(!s->open_timer);
+
+    /*
+     * The reconnect_delay_timer is scheduled in I/O paths when the
+     * connection is lost, to cancel the reconnection attempt after a
+     * given time.  Once this attempt is done (successfully or not),
+     * nbd_reconnect_attempt() ensures the timer is deleted before the
+     * respective I/O request is resumed.
+     * Since the AioContext can only be changed when a node is drained,
+     * the reconnect_delay_timer cannot be active here.
+     */
+    assert(!s->reconnect_delay_timer);
+
+    if (s->ioc) {
+        qio_channel_attach_aio_context(s->ioc, new_context);
+    }
+}
+
+static void nbd_detach_aio_context(BlockDriverState *bs)
+{
+    BDRVNBDState *s = bs->opaque;
+
+    assert(!s->open_timer);
+    assert(!s->reconnect_delay_timer);
+
+    if (s->ioc) {
+        qio_channel_detach_aio_context(s->ioc);
+    }
+}
+
 static BlockDriver bdrv_nbd = {
     .format_name                = "nbd",
     .protocol_name              = "nbd",
@@ -2078,6 +2114,9 @@ static BlockDriver bdrv_nbd = {
     .bdrv_dirname               = nbd_dirname,
     .strong_runtime_opts        = nbd_strong_runtime_opts,
     .bdrv_cancel_in_flight      = nbd_cancel_in_flight,
+
+    .bdrv_attach_aio_context    = nbd_attach_aio_context,
+    .bdrv_detach_aio_context    = nbd_detach_aio_context,
 };
 
 static BlockDriver bdrv_nbd_tcp = {
@@ -2103,6 +2142,9 @@ static BlockDriver bdrv_nbd_tcp = {
     .bdrv_dirname               = nbd_dirname,
     .strong_runtime_opts        = nbd_strong_runtime_opts,
     .bdrv_cancel_in_flight      = nbd_cancel_in_flight,
+
+    .bdrv_attach_aio_context    = nbd_attach_aio_context,
+    .bdrv_detach_aio_context    = nbd_detach_aio_context,
 };
 
 static BlockDriver bdrv_nbd_unix = {
@@ -2128,6 +2170,9 @@ static BlockDriver bdrv_nbd_unix = {
     .bdrv_dirname               = nbd_dirname,
     .strong_runtime_opts        = nbd_strong_runtime_opts,
     .bdrv_cancel_in_flight      = nbd_cancel_in_flight,
+
+    .bdrv_attach_aio_context    = nbd_attach_aio_context,
+    .bdrv_detach_aio_context    = nbd_detach_aio_context,
 };
 
 static void bdrv_nbd_init(void)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 7/7] iotests/281: Let NBD connection yield in iothread
  2022-02-03 16:30 [PATCH 0/7] block/nbd: Move s->ioc on AioContext change Hanna Reitz
                   ` (5 preceding siblings ...)
  2022-02-03 16:30 ` [PATCH 6/7] block/nbd: Move s->ioc on AioContext change Hanna Reitz
@ 2022-02-03 16:30 ` Hanna Reitz
  2022-02-04  9:31   ` Vladimir Sementsov-Ogievskiy
  6 siblings, 1 reply; 16+ messages in thread
From: Hanna Reitz @ 2022-02-03 16:30 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Vladimir Sementsov-Ogievskiy,
	Eric Blake, qemu-devel

Put an NBD block device into an I/O thread, and then read data from it,
hoping that the NBD connection will yield during that read.  When it
does, the coroutine must be reentered in the block device's I/O thread,
which will only happen if the NBD block driver attaches the connection's
QIOChannel to the new AioContext.  It did not do that after 4ddb5d2fde
("block/nbd: drop connection_co") and prior to "block/nbd: Move s->ioc
on AioContext change", which would cause an assertion failure.

To improve our chances of yielding, the NBD server is throttled to
reading 64 kB/s, and the NBD client reads 128 kB, so it should yield at
some point.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 tests/qemu-iotests/281     | 28 +++++++++++++++++++++++++---
 tests/qemu-iotests/281.out |  4 ++--
 2 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/tests/qemu-iotests/281 b/tests/qemu-iotests/281
index 4fb3cd30dd..5e1339bd75 100755
--- a/tests/qemu-iotests/281
+++ b/tests/qemu-iotests/281
@@ -253,8 +253,9 @@ class TestYieldingAndTimers(iotests.QMPTestCase):
         self.create_nbd_export()
 
         # Simple VM with an NBD block device connected to the NBD export
-        # provided by the QSD
+        # provided by the QSD, and an (initially unused) iothread
         self.vm = iotests.VM()
+        self.vm.add_object('iothread,id=iothr')
         self.vm.add_blockdev('nbd,node-name=nbd,server.type=unix,' +
                              f'server.path={self.sock},export=exp,' +
                              'reconnect-delay=1,open-timeout=1')
@@ -299,19 +300,40 @@ class TestYieldingAndTimers(iotests.QMPTestCase):
         # thus not see the error, and so the test will pass.)
         time.sleep(2)
 
+    def test_yield_in_iothread(self):
+        # Move the NBD node to the I/O thread; the NBD block driver should
+        # attach the connection's QIOChannel to that thread's AioContext, too
+        result = self.vm.qmp('x-blockdev-set-iothread',
+                             node_name='nbd', iothread='iothr')
+        self.assert_qmp(result, 'return', {})
+
+        # Do some I/O that will be throttled by the QSD, so that the network
+        # connection hopefully will yield here.  When it is resumed, it must
+        # then be resumed in the I/O thread's AioContext.
+        result = self.vm.qmp('human-monitor-command',
+                             command_line='qemu-io nbd "read 0 128K"')
+        self.assert_qmp(result, 'return', '')
+
     def create_nbd_export(self):
         assert self.qsd is None
 
-        # Simple NBD export of a null-co BDS
+        # Export a throttled null-co BDS: Reads are throttled (max 64 kB/s),
+        # writes are not.
         self.qsd = QemuStorageDaemon(
+            '--object',
+            'throttle-group,id=thrgr,x-bps-read=65536,x-bps-read-max=65536',
+
             '--blockdev',
             'null-co,node-name=null,read-zeroes=true',
 
+            '--blockdev',
+            'throttle,node-name=thr,file=null,throttle-group=thrgr',
+
             '--nbd-server',
             f'addr.type=unix,addr.path={self.sock}',
 
             '--export',
-            'nbd,id=exp,node-name=null,name=exp,writable=true'
+            'nbd,id=exp,node-name=thr,name=exp,writable=true'
         )
 
     def stop_nbd_export(self):
diff --git a/tests/qemu-iotests/281.out b/tests/qemu-iotests/281.out
index 914e3737bd..3f8a935a08 100644
--- a/tests/qemu-iotests/281.out
+++ b/tests/qemu-iotests/281.out
@@ -1,5 +1,5 @@
-.....
+......
 ----------------------------------------------------------------------
-Ran 5 tests
+Ran 6 tests
 
 OK
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/7] block/nbd: Delete reconnect delay timer when done
  2022-02-03 16:30 ` [PATCH 1/7] block/nbd: Delete reconnect delay timer when done Hanna Reitz
@ 2022-02-04  8:50   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-04  8:50 UTC (permalink / raw)
  To: Hanna Reitz, qemu-block; +Cc: qemu-devel, Eric Blake, Kevin Wolf

03.02.2022 19:30, Hanna Reitz wrote:
> We start the reconnect delay timer to cancel the reconnection attempt
> after a while.  Once nbd_co_do_establish_connection() has returned, this
> attempt is over, and we no longer need the timer.
> 
> Delete it before returning from nbd_reconnect_attempt(), so that it does
> not persist beyond the I/O request that was paused for reconnecting; we
> do not want it to fire in a drained section, because all sort of things
> can happen in such a section (e.g. the AioContext might be changed, and
> we do not want the timer to fire in the wrong context; or the BDS might
> even be deleted, and so the timer CB would access already-freed data).
> 
> Signed-off-by: Hanna Reitz<hreitz@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/7] block/nbd: Delete open timer when done
  2022-02-03 16:30 ` [PATCH 2/7] block/nbd: Delete open " Hanna Reitz
@ 2022-02-04  8:52   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-04  8:52 UTC (permalink / raw)
  To: Hanna Reitz, qemu-block; +Cc: qemu-devel, Eric Blake, Kevin Wolf

03.02.2022 19:30, Hanna Reitz wrote:
> We start the open timer to cancel the connection attempt after a while.
> Once nbd_do_establish_connection() has returned, the attempt is over,
> and we no longer need the timer.
> 
> Delete it before returning from nbd_open(), so that it does not persist
> for longer.  It has no use after nbd_open(), and just like the reconnect
> delay timer, it might well be dangerous if it were to fire afterwards.
> 
> Signed-off-by: Hanna Reitz<hreitz@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/7] block/nbd: Assert there are no timers when closed
  2022-02-03 16:30 ` [PATCH 3/7] block/nbd: Assert there are no timers when closed Hanna Reitz
@ 2022-02-04  8:54   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-04  8:54 UTC (permalink / raw)
  To: Hanna Reitz, qemu-block; +Cc: qemu-devel, Eric Blake, Kevin Wolf

03.02.2022 19:30, Hanna Reitz wrote:
> Our two timers must not remain armed beyond nbd_clear_bdrvstate(), or
> they will access freed data when they fire.
> 
> This patch is separate from the patches that actually fix the issue
> (HEAD^^ and HEAD^) so that you can run the associated regression iotest
> (281) on a configuration that reproducibly exposes the bug.
> 
> Signed-off-by: Hanna Reitz<hreitz@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/7] iotests.py: Add QemuStorageDaemon class
  2022-02-03 16:30 ` [PATCH 4/7] iotests.py: Add QemuStorageDaemon class Hanna Reitz
@ 2022-02-04  9:04   ` Vladimir Sementsov-Ogievskiy
  2022-02-04  9:58     ` Hanna Reitz
  0 siblings, 1 reply; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-04  9:04 UTC (permalink / raw)
  To: Hanna Reitz, qemu-block; +Cc: qemu-devel, Eric Blake, Kevin Wolf

03.02.2022 19:30, Hanna Reitz wrote:
> This is a rather simple class that allows creating a QSD instance
> running in the background and stopping it when no longer needed.
> 
> The __del__ handler is a safety net for when something goes so wrong in
> a test that e.g. the tearDown() method is not called (e.g. setUp()
> launches the QSD, but then launching a VM fails).  We do not want the
> QSD to continue running after the test has failed, so __del__() will
> take care to kill it.
> 
> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
> ---
>   tests/qemu-iotests/iotests.py | 42 +++++++++++++++++++++++++++++++++++
>   1 file changed, 42 insertions(+)
> 
> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> index 8cdb381f2a..c75e402b87 100644
> --- a/tests/qemu-iotests/iotests.py
> +++ b/tests/qemu-iotests/iotests.py
> @@ -73,6 +73,8 @@
>   qemu_prog = os.environ.get('QEMU_PROG', 'qemu')
>   qemu_opts = os.environ.get('QEMU_OPTIONS', '').strip().split(' ')
>   
> +qsd_prog = os.environ.get('QSD_PROG', 'qemu-storage-daemon')
> +
>   gdb_qemu_env = os.environ.get('GDB_OPTIONS')
>   qemu_gdb = []
>   if gdb_qemu_env:
> @@ -345,6 +347,46 @@ def cmd(self, cmd):
>           return self._read_output()
>   
>   
> +class QemuStorageDaemon:
> +    def __init__(self, *args: str, instance_id: Optional[str] = None):
> +        if not instance_id:
> +            instance_id = 'a'

this is equivalent to simply
  
   instance_id: str = 'a'

> +

I'd add

    assert '--pidfile' not in args

to prove following logic

> +        self.pidfile = os.path.join(test_dir, f'qsd-{instance_id}-pid')
> +        all_args = [qsd_prog] + list(args) + ['--pidfile', self.pidfile]
> +
> +        # Cannot use with here, we want the subprocess to stay around
> +        # pylint: disable=consider-using-with
> +        self._p = subprocess.Popen(all_args)
> +        while not os.path.exists(self.pidfile):
> +            if self._p.poll() is not None:
> +                cmd = ' '.join(all_args)
> +                raise RuntimeError(
> +                    'qemu-storage-daemon terminated with exit code ' +
> +                    f'{self._p.returncode}: {cmd}')
> +
> +            time.sleep(0.01)
> +
> +        with open(self.pidfile, encoding='utf-8') as f:
> +            self._pid = int(f.read().strip())
> +
> +        assert self._pid == self._p.pid
> +
> +    def stop(self, kill_signal=15):
> +        self._p.send_signal(kill_signal)
> +        self._p.wait()
> +        self._p = None
> +
> +        try:
> +            os.remove(self.pidfile)
> +        except OSError:
> +            pass
> +
> +    def __del__(self):
> +        if self._p is not None:
> +            self.stop(kill_signal=9)
> +
> +
>   def qemu_nbd(*args):
>       '''Run qemu-nbd in daemon mode and return the parent's exit code'''
>       return subprocess.call(qemu_nbd_args + ['--fork'] + list(args))
> 

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/7] iotests/281: Test lingering timers
  2022-02-03 16:30 ` [PATCH 5/7] iotests/281: Test lingering timers Hanna Reitz
@ 2022-02-04  9:17   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-04  9:17 UTC (permalink / raw)
  To: Hanna Reitz, qemu-block; +Cc: qemu-devel, Eric Blake, Kevin Wolf

03.02.2022 19:30, Hanna Reitz wrote:
> Prior to "block/nbd: Delete reconnect delay timer when done" and
> "block/nbd: Delete open timer when done", both of those timers would
> remain scheduled even after successfully (re-)connecting to the server,
> and they would not even be deleted when the BDS is deleted.
> 
> This test constructs exactly this situation:
> (1) Configure an @open-timeout, so the open timer is armed, and
> (2) Configure a @reconnect-delay and trigger a reconnect situation
>      (which succeeds immediately), so the reconnect delay timer is armed.
> Then we immediately delete the BDS, and sleep for longer than the
> @open-timeout and @reconnect-delay.  Prior to said patches, this caused
> one (or both) of the timer CBs to access already-freed data.
> 
> Accessing freed data may or may not crash, so this test can produce
> false successes, but I do not know how to show the problem in a better
> or more reliable way.  If you run this test on "block/nbd: Assert there
> are no timers when closed" and without the fix patches mentioned above,
> you should reliably see an assertion failure.
> (But all other tests that use the reconnect delay timer (264 and 277)
> will fail in that configuration, too; as will nbd-reconnect-on-open,
> which uses the open timer.)
> 
> Remove this test from the quick group because of the two second sleep
> this patch introduces.
> 
> (I decided to put this test case into 281, because the main bug this
> series addresses is in the interaction of the NBD block driver and I/O
> threads, which is precisely the scope of 281.  The test case for that
> other bug will also be put into the test class added here.
> 
> Also, excuse the test class's name, I couldn't come up with anything
> better.  The "yield" part will make sense two patches from now.)
> 
> Signed-off-by: Hanna Reitz <hreitz@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/7] block/nbd: Move s->ioc on AioContext change
  2022-02-03 16:30 ` [PATCH 6/7] block/nbd: Move s->ioc on AioContext change Hanna Reitz
@ 2022-02-04  9:20   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-04  9:20 UTC (permalink / raw)
  To: Hanna Reitz, qemu-block; +Cc: qemu-devel, Eric Blake, Kevin Wolf

03.02.2022 19:30, Hanna Reitz wrote:
> s->ioc must always be attached to the NBD node's AioContext.  If that
> context changes, s->ioc must be attached to the new context.
> 
> Buglink:https://bugzilla.redhat.com/show_bug.cgi?id=2033626
> Signed-off-by: Hanna Reitz<hreitz@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 7/7] iotests/281: Let NBD connection yield in iothread
  2022-02-03 16:30 ` [PATCH 7/7] iotests/281: Let NBD connection yield in iothread Hanna Reitz
@ 2022-02-04  9:31   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-04  9:31 UTC (permalink / raw)
  To: Hanna Reitz, qemu-block; +Cc: qemu-devel, Eric Blake, Kevin Wolf

03.02.2022 19:30, Hanna Reitz wrote:
> Put an NBD block device into an I/O thread, and then read data from it,
> hoping that the NBD connection will yield during that read.  When it
> does, the coroutine must be reentered in the block device's I/O thread,
> which will only happen if the NBD block driver attaches the connection's
> QIOChannel to the new AioContext.  It did not do that after 4ddb5d2fde
> ("block/nbd: drop connection_co") and prior to "block/nbd: Move s->ioc
> on AioContext change", which would cause an assertion failure.
> 
> To improve our chances of yielding, the NBD server is throttled to
> reading 64 kB/s, and the NBD client reads 128 kB, so it should yield at
> some point.
> 
> Signed-off-by: Hanna Reitz<hreitz@redhat.com>


Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/7] iotests.py: Add QemuStorageDaemon class
  2022-02-04  9:04   ` Vladimir Sementsov-Ogievskiy
@ 2022-02-04  9:58     ` Hanna Reitz
  0 siblings, 0 replies; 16+ messages in thread
From: Hanna Reitz @ 2022-02-04  9:58 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Eric Blake, qemu-devel

On 04.02.22 10:04, Vladimir Sementsov-Ogievskiy wrote:
> 03.02.2022 19:30, Hanna Reitz wrote:
>> This is a rather simple class that allows creating a QSD instance
>> running in the background and stopping it when no longer needed.
>>
>> The __del__ handler is a safety net for when something goes so wrong in
>> a test that e.g. the tearDown() method is not called (e.g. setUp()
>> launches the QSD, but then launching a VM fails).  We do not want the
>> QSD to continue running after the test has failed, so __del__() will
>> take care to kill it.
>>
>> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
>> ---
>>   tests/qemu-iotests/iotests.py | 42 +++++++++++++++++++++++++++++++++++
>>   1 file changed, 42 insertions(+)
>>
>> diff --git a/tests/qemu-iotests/iotests.py 
>> b/tests/qemu-iotests/iotests.py
>> index 8cdb381f2a..c75e402b87 100644
>> --- a/tests/qemu-iotests/iotests.py
>> +++ b/tests/qemu-iotests/iotests.py
>> @@ -73,6 +73,8 @@
>>   qemu_prog = os.environ.get('QEMU_PROG', 'qemu')
>>   qemu_opts = os.environ.get('QEMU_OPTIONS', '').strip().split(' ')
>>   +qsd_prog = os.environ.get('QSD_PROG', 'qemu-storage-daemon')
>> +
>>   gdb_qemu_env = os.environ.get('GDB_OPTIONS')
>>   qemu_gdb = []
>>   if gdb_qemu_env:
>> @@ -345,6 +347,46 @@ def cmd(self, cmd):
>>           return self._read_output()
>>     +class QemuStorageDaemon:
>> +    def __init__(self, *args: str, instance_id: Optional[str] = None):
>> +        if not instance_id:
>> +            instance_id = 'a'
>
> this is equivalent to simply
>
>   instance_id: str = 'a'

Oh.  Right. :)

>> +
>
> I'd add
>
>    assert '--pidfile' not in args
>
> to prove following logic

Sounds good!

>> +        self.pidfile = os.path.join(test_dir, f'qsd-{instance_id}-pid')
>> +        all_args = [qsd_prog] + list(args) + ['--pidfile', 
>> self.pidfile]
>> +
>> +        # Cannot use with here, we want the subprocess to stay around
>> +        # pylint: disable=consider-using-with
>> +        self._p = subprocess.Popen(all_args)
>> +        while not os.path.exists(self.pidfile):
>> +            if self._p.poll() is not None:
>> +                cmd = ' '.join(all_args)
>> +                raise RuntimeError(
>> +                    'qemu-storage-daemon terminated with exit code ' +
>> +                    f'{self._p.returncode}: {cmd}')
>> +
>> +            time.sleep(0.01)
>> +
>> +        with open(self.pidfile, encoding='utf-8') as f:
>> +            self._pid = int(f.read().strip())
>> +
>> +        assert self._pid == self._p.pid
>> +
>> +    def stop(self, kill_signal=15):
>> +        self._p.send_signal(kill_signal)
>> +        self._p.wait()
>> +        self._p = None
>> +
>> +        try:
>> +            os.remove(self.pidfile)
>> +        except OSError:
>> +            pass
>> +
>> +    def __del__(self):
>> +        if self._p is not None:
>> +            self.stop(kill_signal=9)
>> +
>> +
>>   def qemu_nbd(*args):
>>       '''Run qemu-nbd in daemon mode and return the parent's exit 
>> code'''
>>       return subprocess.call(qemu_nbd_args + ['--fork'] + list(args))
>>
>
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

Thanks a lot for reviewing!



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-02-04 10:07 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-03 16:30 [PATCH 0/7] block/nbd: Move s->ioc on AioContext change Hanna Reitz
2022-02-03 16:30 ` [PATCH 1/7] block/nbd: Delete reconnect delay timer when done Hanna Reitz
2022-02-04  8:50   ` Vladimir Sementsov-Ogievskiy
2022-02-03 16:30 ` [PATCH 2/7] block/nbd: Delete open " Hanna Reitz
2022-02-04  8:52   ` Vladimir Sementsov-Ogievskiy
2022-02-03 16:30 ` [PATCH 3/7] block/nbd: Assert there are no timers when closed Hanna Reitz
2022-02-04  8:54   ` Vladimir Sementsov-Ogievskiy
2022-02-03 16:30 ` [PATCH 4/7] iotests.py: Add QemuStorageDaemon class Hanna Reitz
2022-02-04  9:04   ` Vladimir Sementsov-Ogievskiy
2022-02-04  9:58     ` Hanna Reitz
2022-02-03 16:30 ` [PATCH 5/7] iotests/281: Test lingering timers Hanna Reitz
2022-02-04  9:17   ` Vladimir Sementsov-Ogievskiy
2022-02-03 16:30 ` [PATCH 6/7] block/nbd: Move s->ioc on AioContext change Hanna Reitz
2022-02-04  9:20   ` Vladimir Sementsov-Ogievskiy
2022-02-03 16:30 ` [PATCH 7/7] iotests/281: Let NBD connection yield in iothread Hanna Reitz
2022-02-04  9:31   ` Vladimir Sementsov-Ogievskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.