All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] qemu-iotests fixes for Kevin's block tree
@ 2023-12-21  1:48 Stefan Hajnoczi
  2023-12-21  1:48 ` [PATCH 1/6] fixup block-coroutine-wrapper: use qemu_get_current_aio_context() Stefan Hajnoczi
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Stefan Hajnoczi @ 2023-12-21  1:48 UTC (permalink / raw)
  To: Kevin Wolf, qemu-devel
  Cc: Stefan Hajnoczi, Leonardo Bras, qemu-block, Fam Zheng,
	Paolo Bonzini, Vladimir Sementsov-Ogievskiy, Fabiano Rosas,
	Eric Blake, Hanna Reitz, Juan Quintela, Peter Xu

Kevin merged several of my outstanding multi-queue block layer patch series and
found that qemu-iotests -qcow2 was broken. This patch series fixes the block branch.

Most of the fixes are easy but the NBD server required deeper debugging and
thread-safety fixes. The NBD server patches can be inserted before "aio: make
aio_context_acquire()/aio_context_release() a no-op" to preserve bisectability.
The other patches are fixups that can be squashed into the original patches.

Stefan Hajnoczi (6):
  fixup block-coroutine-wrapper: use qemu_get_current_aio_context()
  fixup block: remove AioContext locking
  fixup scsi: only access SCSIDevice->requests from one thread
  nbd/server: avoid per-NBDRequest nbd_client_get/put()
  nbd/server: only traverse NBDExport->clients from main loop thread
  nbd/server: introduce NBDClient->lock to protect fields

 hw/scsi/scsi-bus.c            |   3 +-
 migration/block.c             |   7 ++
 nbd/server.c                  | 152 +++++++++++++++++++++++++---------
 tests/qemu-iotests/051.pc.out |   4 +-
 4 files changed, 124 insertions(+), 42 deletions(-)

-- 
2.43.0



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/6] fixup block-coroutine-wrapper: use qemu_get_current_aio_context()
  2023-12-21  1:48 [PATCH 0/6] qemu-iotests fixes for Kevin's block tree Stefan Hajnoczi
@ 2023-12-21  1:48 ` Stefan Hajnoczi
  2023-12-21  1:48 ` [PATCH 2/6] fixup block: remove AioContext locking Stefan Hajnoczi
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Stefan Hajnoczi @ 2023-12-21  1:48 UTC (permalink / raw)
  To: Kevin Wolf, qemu-devel
  Cc: Stefan Hajnoczi, Leonardo Bras, qemu-block, Fam Zheng,
	Paolo Bonzini, Vladimir Sementsov-Ogievskiy, Fabiano Rosas,
	Eric Blake, Hanna Reitz, Juan Quintela, Peter Xu

qemu-iotests 051 fails on my machine so the change to 051.pc.out made by
the above commit appears to be incorrect, at least against the current
QEMU source tree. Revert it so that 051 passes again.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tests/qemu-iotests/051.pc.out | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/051.pc.out b/tests/qemu-iotests/051.pc.out
index 86a3e113c4..7e10c5fa1b 100644
--- a/tests/qemu-iotests/051.pc.out
+++ b/tests/qemu-iotests/051.pc.out
@@ -177,11 +177,11 @@ QEMU X.Y.Z monitor - type 'help' for more information
 
 Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device lsi53c895a,id=lsi0 -device scsi-hd,bus=lsi0.0,drive=disk,share-rw=on
 QEMU X.Y.Z monitor - type 'help' for more information
-QEMU_PROG: -device scsi-hd,bus=lsi0.0,drive=disk,share-rw=on: HBA does not support iothreads
+(qemu) QEMU_PROG: -device scsi-hd,bus=lsi0.0,drive=disk,share-rw=on: HBA does not support iothreads
 
 Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device virtio-scsi,id=virtio-scsi1 -device scsi-hd,bus=virtio-scsi1.0,drive=disk,share-rw=on
 QEMU X.Y.Z monitor - type 'help' for more information
-QEMU_PROG: -device scsi-hd,bus=virtio-scsi1.0,drive=disk,share-rw=on: Cannot change iothread of active block backend
+(qemu) QEMU_PROG: -device scsi-hd,bus=virtio-scsi1.0,drive=disk,share-rw=on: Cannot change iothread of active block backend
 
 Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device virtio-blk-pci,drive=disk,iothread=thread0,share-rw=on
 QEMU X.Y.Z monitor - type 'help' for more information
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/6] fixup block: remove AioContext locking
  2023-12-21  1:48 [PATCH 0/6] qemu-iotests fixes for Kevin's block tree Stefan Hajnoczi
  2023-12-21  1:48 ` [PATCH 1/6] fixup block-coroutine-wrapper: use qemu_get_current_aio_context() Stefan Hajnoczi
@ 2023-12-21  1:48 ` Stefan Hajnoczi
  2023-12-21  1:49 ` [PATCH 3/6] fixup scsi: only access SCSIDevice->requests from one thread Stefan Hajnoczi
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Stefan Hajnoczi @ 2023-12-21  1:48 UTC (permalink / raw)
  To: Kevin Wolf, qemu-devel
  Cc: Stefan Hajnoczi, Leonardo Bras, qemu-block, Fam Zheng,
	Paolo Bonzini, Vladimir Sementsov-Ogievskiy, Fabiano Rosas,
	Eric Blake, Hanna Reitz, Juan Quintela, Peter Xu

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 migration/block.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/migration/block.c b/migration/block.c
index 2bcfcbfdf6..6ec6a1d6e6 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -311,10 +311,17 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
     block_mig_state.submitted++;
     blk_mig_unlock();
 
+    /*
+     * The migration thread does not have an AioContext. Lock the BQL so that
+     * I/O runs in the main loop AioContext (see
+     * qemu_get_current_aio_context()).
+     */
+    qemu_mutex_lock_iothread();
     bdrv_reset_dirty_bitmap(bmds->dirty_bitmap, cur_sector * BDRV_SECTOR_SIZE,
                             nr_sectors * BDRV_SECTOR_SIZE);
     blk->aiocb = blk_aio_preadv(bb, cur_sector * BDRV_SECTOR_SIZE, &blk->qiov,
                                 0, blk_mig_read_cb, blk);
+    qemu_mutex_unlock_iothread();
 
     bmds->cur_sector = cur_sector + nr_sectors;
     return (bmds->cur_sector >= total_sectors);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/6] fixup scsi: only access SCSIDevice->requests from one thread
  2023-12-21  1:48 [PATCH 0/6] qemu-iotests fixes for Kevin's block tree Stefan Hajnoczi
  2023-12-21  1:48 ` [PATCH 1/6] fixup block-coroutine-wrapper: use qemu_get_current_aio_context() Stefan Hajnoczi
  2023-12-21  1:48 ` [PATCH 2/6] fixup block: remove AioContext locking Stefan Hajnoczi
@ 2023-12-21  1:49 ` Stefan Hajnoczi
  2023-12-21  7:29   ` Paolo Bonzini
  2023-12-21  1:49 ` [PATCH 4/6] nbd/server: avoid per-NBDRequest nbd_client_get/put() Stefan Hajnoczi
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2023-12-21  1:49 UTC (permalink / raw)
  To: Kevin Wolf, qemu-devel
  Cc: Stefan Hajnoczi, Leonardo Bras, qemu-block, Fam Zheng,
	Paolo Bonzini, Vladimir Sementsov-Ogievskiy, Fabiano Rosas,
	Eric Blake, Hanna Reitz, Juan Quintela, Peter Xu

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/scsi/scsi-bus.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index df68a44b6a..5b08cbf60a 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -127,7 +127,8 @@ static void scsi_device_for_each_req_async_bh(void *opaque)
      */
     ctx = blk_get_aio_context(s->conf.blk);
     if (ctx != qemu_get_current_aio_context()) {
-        aio_bh_schedule_oneshot(ctx, scsi_device_for_each_req_async_bh, data);
+        aio_bh_schedule_oneshot(ctx, scsi_device_for_each_req_async_bh,
+                                g_steal_pointer(&data));
         return;
     }
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/6] nbd/server: avoid per-NBDRequest nbd_client_get/put()
  2023-12-21  1:48 [PATCH 0/6] qemu-iotests fixes for Kevin's block tree Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2023-12-21  1:49 ` [PATCH 3/6] fixup scsi: only access SCSIDevice->requests from one thread Stefan Hajnoczi
@ 2023-12-21  1:49 ` Stefan Hajnoczi
  2023-12-21  7:27   ` Paolo Bonzini
  2023-12-21  1:49 ` [PATCH 5/6] nbd/server: only traverse NBDExport->clients from main loop thread Stefan Hajnoczi
  2023-12-21  1:49 ` [PATCH 6/6] nbd/server: introduce NBDClient->lock to protect fields Stefan Hajnoczi
  5 siblings, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2023-12-21  1:49 UTC (permalink / raw)
  To: Kevin Wolf, qemu-devel
  Cc: Stefan Hajnoczi, Leonardo Bras, qemu-block, Fam Zheng,
	Paolo Bonzini, Vladimir Sementsov-Ogievskiy, Fabiano Rosas,
	Eric Blake, Hanna Reitz, Juan Quintela, Peter Xu

nbd_trip() processes a single NBD request from start to finish and holds
an NBDClient reference throughout. NBDRequest does not outlive the scope
of nbd_trip(). Therefore it is unnecessary to ref/unref NBDClient for
each NBDRequest.

Removing these nbd_client_get()/nbd_client_put() calls will make
thread-safety easier in the commits that follow.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 nbd/server.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 895cf0a752..0b09ccc8dc 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1557,7 +1557,6 @@ static NBDRequestData *nbd_request_get(NBDClient *client)
     client->nb_requests++;
 
     req = g_new0(NBDRequestData, 1);
-    nbd_client_get(client);
     req->client = client;
     return req;
 }
@@ -1578,8 +1577,6 @@ static void nbd_request_put(NBDRequestData *req)
     }
 
     nbd_client_receive_next_request(client);
-
-    nbd_client_put(client);
 }
 
 static void blk_aio_attached(AioContext *ctx, void *opaque)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/6] nbd/server: only traverse NBDExport->clients from main loop thread
  2023-12-21  1:48 [PATCH 0/6] qemu-iotests fixes for Kevin's block tree Stefan Hajnoczi
                   ` (3 preceding siblings ...)
  2023-12-21  1:49 ` [PATCH 4/6] nbd/server: avoid per-NBDRequest nbd_client_get/put() Stefan Hajnoczi
@ 2023-12-21  1:49 ` Stefan Hajnoczi
  2023-12-21  7:23   ` Paolo Bonzini
  2024-01-02 15:32   ` Eric Blake
  2023-12-21  1:49 ` [PATCH 6/6] nbd/server: introduce NBDClient->lock to protect fields Stefan Hajnoczi
  5 siblings, 2 replies; 16+ messages in thread
From: Stefan Hajnoczi @ 2023-12-21  1:49 UTC (permalink / raw)
  To: Kevin Wolf, qemu-devel
  Cc: Stefan Hajnoczi, Leonardo Bras, qemu-block, Fam Zheng,
	Paolo Bonzini, Vladimir Sementsov-Ogievskiy, Fabiano Rosas,
	Eric Blake, Hanna Reitz, Juan Quintela, Peter Xu

The NBD clients list is currently accessed from both the export
AioContext and the main loop thread. When the AioContext lock is removed
there will be nothing protecting the clients list.

Adding a lock around the clients list is tricky because NBDClient
structs are refcounted and may be freed from the export AioContext or
the main loop thread. nbd_export_request_shutdown() -> client_close() ->
nbd_client_put() is also tricky because the list lock would be held
while indirectly dropping references to NDBClients.

A simpler approach is to only allow nbd_client_put() and client_close()
calls from the main loop thread. Then the NBD clients list is only
accessed from the main loop thread and no fancy locking is needed.

nbd_trip() just needs to reschedule itself in the main loop AioContext
before calling nbd_client_put() and client_close(). This costs more CPU
cycles per NBD request but is needed for thread-safety when the
AioContext lock is removed.

Note that nbd_client_get() can still be called from either thread, so
make NBDClient->refcount atomic.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 nbd/server.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 0b09ccc8dc..527fbdab4a 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -122,7 +122,7 @@ struct NBDMetaContexts {
 };
 
 struct NBDClient {
-    int refcount;
+    int refcount; /* atomic */
     void (*close_fn)(NBDClient *client, bool negotiated);
 
     NBDExport *exp;
@@ -1501,14 +1501,17 @@ static int coroutine_fn nbd_receive_request(NBDClient *client, NBDRequest *reque
 
 #define MAX_NBD_REQUESTS 16
 
+/* Runs in export AioContext and main loop thread */
 void nbd_client_get(NBDClient *client)
 {
-    client->refcount++;
+    qatomic_inc(&client->refcount);
 }
 
 void nbd_client_put(NBDClient *client)
 {
-    if (--client->refcount == 0) {
+    assert(qemu_in_main_thread());
+
+    if (qatomic_fetch_dec(&client->refcount) == 1) {
         /* The last reference should be dropped by client->close,
          * which is called by client_close.
          */
@@ -1531,6 +1534,8 @@ void nbd_client_put(NBDClient *client)
 
 static void client_close(NBDClient *client, bool negotiated)
 {
+    assert(qemu_in_main_thread());
+
     if (client->closing) {
         return;
     }
@@ -2938,8 +2943,15 @@ static coroutine_fn void nbd_trip(void *opaque)
     int ret;
     Error *local_err = NULL;
 
+    /*
+     * Note that nbd_client_put() and client_close() must be called from the
+     * main loop thread. Use aio_co_reschedule_self() to switch AioContext
+     * before calling these functions.
+     */
+
     trace_nbd_trip();
     if (client->closing) {
+        aio_co_reschedule_self(qemu_get_aio_context());
         nbd_client_put(client);
         return;
     }
@@ -2949,6 +2961,7 @@ static coroutine_fn void nbd_trip(void *opaque)
          * We're switching between AIO contexts. Don't attempt to receive a new
          * request and kick the main context which may be waiting for us.
          */
+        aio_co_reschedule_self(qemu_get_aio_context());
         nbd_client_put(client);
         client->recv_coroutine = NULL;
         aio_wait_kick();
@@ -3013,6 +3026,8 @@ static coroutine_fn void nbd_trip(void *opaque)
     qio_channel_set_cork(client->ioc, false);
 done:
     nbd_request_put(req);
+
+    aio_co_reschedule_self(qemu_get_aio_context());
     nbd_client_put(client);
     return;
 
@@ -3021,6 +3036,8 @@ disconnect:
         error_reportf_err(local_err, "Disconnect client, due to: ");
     }
     nbd_request_put(req);
+
+    aio_co_reschedule_self(qemu_get_aio_context());
     client_close(client, true);
     nbd_client_put(client);
 }
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/6] nbd/server: introduce NBDClient->lock to protect fields
  2023-12-21  1:48 [PATCH 0/6] qemu-iotests fixes for Kevin's block tree Stefan Hajnoczi
                   ` (4 preceding siblings ...)
  2023-12-21  1:49 ` [PATCH 5/6] nbd/server: only traverse NBDExport->clients from main loop thread Stefan Hajnoczi
@ 2023-12-21  1:49 ` Stefan Hajnoczi
  2023-12-21  7:26   ` Paolo Bonzini
  2023-12-21 10:45   ` Kevin Wolf
  5 siblings, 2 replies; 16+ messages in thread
From: Stefan Hajnoczi @ 2023-12-21  1:49 UTC (permalink / raw)
  To: Kevin Wolf, qemu-devel
  Cc: Stefan Hajnoczi, Leonardo Bras, qemu-block, Fam Zheng,
	Paolo Bonzini, Vladimir Sementsov-Ogievskiy, Fabiano Rosas,
	Eric Blake, Hanna Reitz, Juan Quintela, Peter Xu

NBDClient has a number of fields that are accessed by both the export
AioContext and the main loop thread. When the AioContext lock is removed
these fields will need another form of protection.

Add NBDClient->lock and protect fields that are accessed by both
threads. Also add assertions where possible and otherwise add doc
comments stating assumptions about which thread and lock holding.

Note this patch moves the client->recv_coroutine assertion from
nbd_co_receive_request() to nbd_trip() where client->lock is held.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 nbd/server.c | 128 +++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 94 insertions(+), 34 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 527fbdab4a..4008ec7df9 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -125,23 +125,25 @@ struct NBDClient {
     int refcount; /* atomic */
     void (*close_fn)(NBDClient *client, bool negotiated);
 
+    QemuMutex lock;
+
     NBDExport *exp;
     QCryptoTLSCreds *tlscreds;
     char *tlsauthz;
     QIOChannelSocket *sioc; /* The underlying data channel */
     QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */
 
-    Coroutine *recv_coroutine;
+    Coroutine *recv_coroutine; /* protected by lock */
 
     CoMutex send_lock;
     Coroutine *send_coroutine;
 
-    bool read_yielding;
-    bool quiescing;
+    bool read_yielding; /* protected by lock */
+    bool quiescing; /* protected by lock */
 
     QTAILQ_ENTRY(NBDClient) next;
-    int nb_requests;
-    bool closing;
+    int nb_requests; /* protected by lock */
+    bool closing; /* protected by lock */
 
     uint32_t check_align; /* If non-zero, check for aligned client requests */
 
@@ -1415,11 +1417,18 @@ nbd_read_eof(NBDClient *client, void *buffer, size_t size, Error **errp)
 
         len = qio_channel_readv(client->ioc, &iov, 1, errp);
         if (len == QIO_CHANNEL_ERR_BLOCK) {
-            client->read_yielding = true;
+            WITH_QEMU_LOCK_GUARD(&client->lock) {
+                if (client->quiescing) {
+                    return -EAGAIN;
+                }
+                client->read_yielding = true;
+            }
             qio_channel_yield(client->ioc, G_IO_IN);
-            client->read_yielding = false;
-            if (client->quiescing) {
-                return -EAGAIN;
+            WITH_QEMU_LOCK_GUARD(&client->lock) {
+                client->read_yielding = false;
+                if (client->quiescing) {
+                    return -EAGAIN;
+                }
             }
             continue;
         } else if (len < 0) {
@@ -1528,6 +1537,7 @@ void nbd_client_put(NBDClient *client)
             blk_exp_unref(&client->exp->common);
         }
         g_free(client->contexts.bitmaps);
+        qemu_mutex_destroy(&client->lock);
         g_free(client);
     }
 }
@@ -1536,11 +1546,13 @@ static void client_close(NBDClient *client, bool negotiated)
 {
     assert(qemu_in_main_thread());
 
-    if (client->closing) {
-        return;
-    }
+    WITH_QEMU_LOCK_GUARD(&client->lock) {
+        if (client->closing) {
+            return;
+        }
 
-    client->closing = true;
+        client->closing = true;
+    }
 
     /* Force requests to finish.  They will drop their own references,
      * then we'll close the socket and free the NBDClient.
@@ -1554,6 +1566,7 @@ static void client_close(NBDClient *client, bool negotiated)
     }
 }
 
+/* Runs in export AioContext with client->lock held */
 static NBDRequestData *nbd_request_get(NBDClient *client)
 {
     NBDRequestData *req;
@@ -1566,6 +1579,7 @@ static NBDRequestData *nbd_request_get(NBDClient *client)
     return req;
 }
 
+/* Runs in export AioContext with client->lock held */
 static void nbd_request_put(NBDRequestData *req)
 {
     NBDClient *client = req->client;
@@ -1589,14 +1603,18 @@ static void blk_aio_attached(AioContext *ctx, void *opaque)
     NBDExport *exp = opaque;
     NBDClient *client;
 
+    assert(qemu_in_main_thread());
+
     trace_nbd_blk_aio_attached(exp->name, ctx);
 
     exp->common.ctx = ctx;
 
     QTAILQ_FOREACH(client, &exp->clients, next) {
-        assert(client->nb_requests == 0);
-        assert(client->recv_coroutine == NULL);
-        assert(client->send_coroutine == NULL);
+        WITH_QEMU_LOCK_GUARD(&client->lock) {
+            assert(client->nb_requests == 0);
+            assert(client->recv_coroutine == NULL);
+            assert(client->send_coroutine == NULL);
+        }
     }
 }
 
@@ -1604,6 +1622,8 @@ static void blk_aio_detach(void *opaque)
 {
     NBDExport *exp = opaque;
 
+    assert(qemu_in_main_thread());
+
     trace_nbd_blk_aio_detach(exp->name, exp->common.ctx);
 
     exp->common.ctx = NULL;
@@ -1614,8 +1634,12 @@ static void nbd_drained_begin(void *opaque)
     NBDExport *exp = opaque;
     NBDClient *client;
 
+    assert(qemu_in_main_thread());
+
     QTAILQ_FOREACH(client, &exp->clients, next) {
-        client->quiescing = true;
+        WITH_QEMU_LOCK_GUARD(&client->lock) {
+            client->quiescing = true;
+        }
     }
 }
 
@@ -1624,9 +1648,13 @@ static void nbd_drained_end(void *opaque)
     NBDExport *exp = opaque;
     NBDClient *client;
 
+    assert(qemu_in_main_thread());
+
     QTAILQ_FOREACH(client, &exp->clients, next) {
-        client->quiescing = false;
-        nbd_client_receive_next_request(client);
+        WITH_QEMU_LOCK_GUARD(&client->lock) {
+            client->quiescing = false;
+            nbd_client_receive_next_request(client);
+        }
     }
 }
 
@@ -1635,17 +1663,21 @@ static bool nbd_drained_poll(void *opaque)
     NBDExport *exp = opaque;
     NBDClient *client;
 
+    assert(qemu_in_main_thread());
+
     QTAILQ_FOREACH(client, &exp->clients, next) {
-        if (client->nb_requests != 0) {
-            /*
-             * If there's a coroutine waiting for a request on nbd_read_eof()
-             * enter it here so we don't depend on the client to wake it up.
-             */
-            if (client->recv_coroutine != NULL && client->read_yielding) {
-                qio_channel_wake_read(client->ioc);
+        WITH_QEMU_LOCK_GUARD(&client->lock) {
+            if (client->nb_requests != 0) {
+                /*
+                 * If there's a coroutine waiting for a request on nbd_read_eof()
+                 * enter it here so we don't depend on the client to wake it up.
+                 */
+                if (client->recv_coroutine != NULL && client->read_yielding) {
+                    qio_channel_wake_read(client->ioc);
+                }
+
+                return true;
             }
-
-            return true;
         }
     }
 
@@ -1656,6 +1688,8 @@ static void nbd_eject_notifier(Notifier *n, void *data)
 {
     NBDExport *exp = container_of(n, NBDExport, eject_notifier);
 
+    assert(qemu_in_main_thread());
+
     blk_exp_request_shutdown(&exp->common);
 }
 
@@ -2541,7 +2575,6 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
     int ret;
 
     g_assert(qemu_in_coroutine());
-    assert(client->recv_coroutine == qemu_coroutine_self());
     ret = nbd_receive_request(client, request, errp);
     if (ret < 0) {
         return ret;
@@ -2950,7 +2983,11 @@ static coroutine_fn void nbd_trip(void *opaque)
      */
 
     trace_nbd_trip();
+
+    qemu_mutex_lock(&client->lock);
+
     if (client->closing) {
+        qemu_mutex_unlock(&client->lock);
         aio_co_reschedule_self(qemu_get_aio_context());
         nbd_client_put(client);
         return;
@@ -2961,15 +2998,24 @@ static coroutine_fn void nbd_trip(void *opaque)
          * We're switching between AIO contexts. Don't attempt to receive a new
          * request and kick the main context which may be waiting for us.
          */
-        aio_co_reschedule_self(qemu_get_aio_context());
-        nbd_client_put(client);
         client->recv_coroutine = NULL;
+        qemu_mutex_unlock(&client->lock);
         aio_wait_kick();
+
+        aio_co_reschedule_self(qemu_get_aio_context());
+        nbd_client_put(client);
         return;
     }
 
     req = nbd_request_get(client);
-    ret = nbd_co_receive_request(req, &request, &local_err);
+
+    do {
+        assert(client->recv_coroutine == qemu_coroutine_self());
+        qemu_mutex_unlock(&client->lock);
+        ret = nbd_co_receive_request(req, &request, &local_err);
+        qemu_mutex_lock(&client->lock);
+    } while (ret == -EAGAIN && !client->quiescing);
+
     client->recv_coroutine = NULL;
 
     if (client->closing) {
@@ -2981,11 +3027,13 @@ static coroutine_fn void nbd_trip(void *opaque)
     }
 
     if (ret == -EAGAIN) {
-        assert(client->quiescing);
         goto done;
     }
 
     nbd_client_receive_next_request(client);
+
+    qemu_mutex_unlock(&client->lock);
+
     if (ret == -EIO) {
         goto disconnect;
     }
@@ -3024,8 +3072,10 @@ static coroutine_fn void nbd_trip(void *opaque)
     }
 
     qio_channel_set_cork(client->ioc, false);
+    qemu_mutex_lock(&client->lock);
 done:
     nbd_request_put(req);
+    qemu_mutex_unlock(&client->lock);
 
     aio_co_reschedule_self(qemu_get_aio_context());
     nbd_client_put(client);
@@ -3035,13 +3085,20 @@ disconnect:
     if (local_err) {
         error_reportf_err(local_err, "Disconnect client, due to: ");
     }
+
+    qemu_mutex_lock(&client->lock);
     nbd_request_put(req);
+    qemu_mutex_unlock(&client->lock);
 
     aio_co_reschedule_self(qemu_get_aio_context());
     client_close(client, true);
     nbd_client_put(client);
 }
 
+/*
+ * Runs in export AioContext and main loop thread. Caller must hold
+ * client->lock.
+ */
 static void nbd_client_receive_next_request(NBDClient *client)
 {
     if (!client->recv_coroutine && client->nb_requests < MAX_NBD_REQUESTS &&
@@ -3067,7 +3124,9 @@ static coroutine_fn void nbd_co_client_start(void *opaque)
         return;
     }
 
-    nbd_client_receive_next_request(client);
+    WITH_QEMU_LOCK_GUARD(&client->lock) {
+        nbd_client_receive_next_request(client);
+    }
 }
 
 /*
@@ -3084,6 +3143,7 @@ void nbd_client_new(QIOChannelSocket *sioc,
     Coroutine *co;
 
     client = g_new0(NBDClient, 1);
+    qemu_mutex_init(&client->lock);
     client->refcount = 1;
     client->tlscreds = tlscreds;
     if (tlscreds) {
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/6] nbd/server: only traverse NBDExport->clients from main loop thread
  2023-12-21  1:49 ` [PATCH 5/6] nbd/server: only traverse NBDExport->clients from main loop thread Stefan Hajnoczi
@ 2023-12-21  7:23   ` Paolo Bonzini
  2023-12-21 14:27     ` Stefan Hajnoczi
  2024-01-02 15:32   ` Eric Blake
  1 sibling, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2023-12-21  7:23 UTC (permalink / raw)
  To: Stefan Hajnoczi, Kevin Wolf, qemu-devel
  Cc: Leonardo Bras, qemu-block, Fam Zheng,
	Vladimir Sementsov-Ogievskiy, Fabiano Rosas, Eric Blake,
	Hanna Reitz, Juan Quintela, Peter Xu

On 12/21/23 02:49, Stefan Hajnoczi wrote:
> The NBD clients list is currently accessed from both the export
> AioContext and the main loop thread. When the AioContext lock is removed
> there will be nothing protecting the clients list.
> 
> Adding a lock around the clients list is tricky because NBDClient
> structs are refcounted and may be freed from the export AioContext or
> the main loop thread. nbd_export_request_shutdown() -> client_close() ->
> nbd_client_put() is also tricky because the list lock would be held
> while indirectly dropping references to NDBClients.
> 
> A simpler approach is to only allow nbd_client_put() and client_close()
> calls from the main loop thread. Then the NBD clients list is only
> accessed from the main loop thread and no fancy locking is needed.
> 
> nbd_trip() just needs to reschedule itself in the main loop AioContext
> before calling nbd_client_put() and client_close(). This costs more CPU
> cycles per NBD request but is needed for thread-safety when the
> AioContext lock is removed.
> 
> Note that nbd_client_get() can still be called from either thread, so
> make NBDClient->refcount atomic.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   nbd/server.c | 23 ++++++++++++++++++++---
>   1 file changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index 0b09ccc8dc..527fbdab4a 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -122,7 +122,7 @@ struct NBDMetaContexts {
>   };
>   
>   struct NBDClient {
> -    int refcount;
> +    int refcount; /* atomic */
>       void (*close_fn)(NBDClient *client, bool negotiated);
>   
>       NBDExport *exp;
> @@ -1501,14 +1501,17 @@ static int coroutine_fn nbd_receive_request(NBDClient *client, NBDRequest *reque
>   
>   #define MAX_NBD_REQUESTS 16
>   
> +/* Runs in export AioContext and main loop thread */
>   void nbd_client_get(NBDClient *client)
>   {
> -    client->refcount++;
> +    qatomic_inc(&client->refcount);
>   }
>   
>   void nbd_client_put(NBDClient *client)
>   {
> -    if (--client->refcount == 0) {
> +    assert(qemu_in_main_thread());
> +
> +    if (qatomic_fetch_dec(&client->refcount) == 1) {
>           /* The last reference should be dropped by client->close,
>            * which is called by client_close.
>            */
> @@ -1531,6 +1534,8 @@ void nbd_client_put(NBDClient *client)
>   
>   static void client_close(NBDClient *client, bool negotiated)
>   {
> +    assert(qemu_in_main_thread());
> +
>       if (client->closing) {
>           return;
>       }
> @@ -2938,8 +2943,15 @@ static coroutine_fn void nbd_trip(void *opaque)
>       int ret;
>       Error *local_err = NULL;
>   
> +    /*
> +     * Note that nbd_client_put() and client_close() must be called from the
> +     * main loop thread. Use aio_co_reschedule_self() to switch AioContext
> +     * before calling these functions.
> +     */
> +
>       trace_nbd_trip();
>       if (client->closing) {
> +        aio_co_reschedule_self(qemu_get_aio_context());
>           nbd_client_put(client);
>           return;
>       }
> @@ -2949,6 +2961,7 @@ static coroutine_fn void nbd_trip(void *opaque)
>            * We're switching between AIO contexts. Don't attempt to receive a new
>            * request and kick the main context which may be waiting for us.
>            */
> +        aio_co_reschedule_self(qemu_get_aio_context());
>           nbd_client_put(client);
>           client->recv_coroutine = NULL;
>           aio_wait_kick();
> @@ -3013,6 +3026,8 @@ static coroutine_fn void nbd_trip(void *opaque)
>       qio_channel_set_cork(client->ioc, false);
>   done:
>       nbd_request_put(req);
> +
> +    aio_co_reschedule_self(qemu_get_aio_context());
>       nbd_client_put(client);
>       return;

This is very expensive to do on every NBD receive, considering that it really
can happen only when closing (see the assertion in nbd_client_put).

In Linux there is a common pattern of "if refcount could go to zero, take
a lock before doing the decrement".  We can do something similar with "if
refcount could go to zero, move to main iothread before doing the decrement":

diff --git a/nbd/server.c b/nbd/server.c
index 895cf0a7525..aec306923d8 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1529,6 +1529,21 @@ void nbd_client_put(NBDClient *client)
      }
  }
  
+static bool nbd_client_put_nonzero(NBDClient *client)
+{
+    int old = qatomic_read(&client->refcount);
+    do {
+        if (old == 1) {
+            return false;
+        }
+
+        int expected = old;
+        old = qatomic_cmpxchg(&client->refcount, expected, expected - 1);
+    } while (old != expected);
+
+    return true;
+}
+
  static void client_close(NBDClient *client, bool negotiated)
  {
      if (client->closing) {
@@ -2936,15 +2951,14 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
  static coroutine_fn void nbd_trip(void *opaque)
  {
      NBDClient *client = opaque;
-    NBDRequestData *req;
+    NBDRequestData *req = NULL;
      NBDRequest request = { 0 };    /* GCC thinks it can be used uninitialized */
      int ret;
      Error *local_err = NULL;
  
      trace_nbd_trip();
      if (client->closing) {
-        nbd_client_put(client);
-        return;
+        goto done;
      }
  
      if (client->quiescing) {
@@ -2952,10 +2966,9 @@ static coroutine_fn void nbd_trip(void *opaque)
           * We're switching between AIO contexts. Don't attempt to receive a new
           * request and kick the main context which may be waiting for us.
           */
-        nbd_client_put(client);
          client->recv_coroutine = NULL;
          aio_wait_kick();
-        return;
+        goto done;
      }
  
      req = nbd_request_get(client);
@@ -3015,8 +3028,13 @@ static coroutine_fn void nbd_trip(void *opaque)
  
      qio_channel_set_cork(client->ioc, false);
  done:
-    nbd_request_put(req);
-    nbd_client_put(client);
+    if (req) {
+        nbd_request_put(req);
+    }
+    if (!nbd_client_put_nonzero(client)) {
+        aio_co_reschedule_self(qemu_get_aio_context());
+        nbd_client_put(client);
+    }
      return;
  
  disconnect:

I think adding the "if (req)" should also simplify a little bit the addition of the lock.

Paolo



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/6] nbd/server: introduce NBDClient->lock to protect fields
  2023-12-21  1:49 ` [PATCH 6/6] nbd/server: introduce NBDClient->lock to protect fields Stefan Hajnoczi
@ 2023-12-21  7:26   ` Paolo Bonzini
  2023-12-21 11:56     ` Stefan Hajnoczi
  2023-12-21 10:45   ` Kevin Wolf
  1 sibling, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2023-12-21  7:26 UTC (permalink / raw)
  To: Stefan Hajnoczi, Kevin Wolf, qemu-devel
  Cc: Leonardo Bras, qemu-block, Fam Zheng,
	Vladimir Sementsov-Ogievskiy, Fabiano Rosas, Eric Blake,
	Hanna Reitz, Juan Quintela, Peter Xu

On 12/21/23 02:49, Stefan Hajnoczi wrote:
>       nbd_client_receive_next_request(client);
> +
> +    qemu_mutex_unlock(&client->lock);
> +
>       if (ret == -EIO) {
>           goto disconnect;
>       }

I think I slightly prefer if disconnect is reached with lock taken, for 
consistency with the "done" label.  It does not complicate the code,
because you can just move qio_channel_set_cork() and replace:

> @@ -3024,8 +3072,10 @@ static coroutine_fn void nbd_trip(void *opaque)
>       }
>   
>       qio_channel_set_cork(client->ioc, false);
> +    qemu_mutex_lock(&client->lock);

with:

+    qio_channel_set_cork(client->ioc, false);
+    qemu_mutex_lock(&client->lock);

      if (ret < 0) {
          error_prepend(&local_err, "Failed to send reply: ");
          goto disconnect;
      }

      /*
       * We must disconnect after NBD_CMD_WRITE or BLOCK_STATUS with
       * payload if we did not read the payload.
       */
      if (!req->complete) {
          error_setg(&local_err, "Request handling failed in 
intermediate state");
          goto disconnect;
      }
-    qio_channel_set_cork(client->ioc, false);
  done:

Thanks,

Paolo

>   done:
>       nbd_request_put(req);
> +    qemu_mutex_unlock(&client->lock);



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/6] nbd/server: avoid per-NBDRequest nbd_client_get/put()
  2023-12-21  1:49 ` [PATCH 4/6] nbd/server: avoid per-NBDRequest nbd_client_get/put() Stefan Hajnoczi
@ 2023-12-21  7:27   ` Paolo Bonzini
  0 siblings, 0 replies; 16+ messages in thread
From: Paolo Bonzini @ 2023-12-21  7:27 UTC (permalink / raw)
  To: Stefan Hajnoczi, Kevin Wolf, qemu-devel
  Cc: Leonardo Bras, qemu-block, Fam Zheng,
	Vladimir Sementsov-Ogievskiy, Fabiano Rosas, Eric Blake,
	Hanna Reitz, Juan Quintela, Peter Xu

On 12/21/23 02:49, Stefan Hajnoczi wrote:
> nbd_trip() processes a single NBD request from start to finish and holds
> an NBDClient reference throughout. NBDRequest does not outlive the scope
> of nbd_trip(). Therefore it is unnecessary to ref/unref NBDClient for
> each NBDRequest.
> 
> Removing these nbd_client_get()/nbd_client_put() calls will make
> thread-safety easier in the commits that follow.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   nbd/server.c | 3 ---
>   1 file changed, 3 deletions(-)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index 895cf0a752..0b09ccc8dc 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -1557,7 +1557,6 @@ static NBDRequestData *nbd_request_get(NBDClient *client)
>       client->nb_requests++;
>   
>       req = g_new0(NBDRequestData, 1);
> -    nbd_client_get(client);
>       req->client = client;
>       return req;
>   }
> @@ -1578,8 +1577,6 @@ static void nbd_request_put(NBDRequestData *req)
>       }
>   
>       nbd_client_receive_next_request(client);
> -
> -    nbd_client_put(client);
>   }
>   
>   static void blk_aio_attached(AioContext *ctx, void *opaque)

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/6] fixup scsi: only access SCSIDevice->requests from one thread
  2023-12-21  1:49 ` [PATCH 3/6] fixup scsi: only access SCSIDevice->requests from one thread Stefan Hajnoczi
@ 2023-12-21  7:29   ` Paolo Bonzini
  0 siblings, 0 replies; 16+ messages in thread
From: Paolo Bonzini @ 2023-12-21  7:29 UTC (permalink / raw)
  To: Stefan Hajnoczi, Kevin Wolf, qemu-devel
  Cc: Leonardo Bras, qemu-block, Fam Zheng,
	Vladimir Sementsov-Ogievskiy, Fabiano Rosas, Eric Blake,
	Hanna Reitz, Juan Quintela, Peter Xu

On 12/21/23 02:49, Stefan Hajnoczi wrote:
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   hw/scsi/scsi-bus.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
> index df68a44b6a..5b08cbf60a 100644
> --- a/hw/scsi/scsi-bus.c
> +++ b/hw/scsi/scsi-bus.c
> @@ -127,7 +127,8 @@ static void scsi_device_for_each_req_async_bh(void *opaque)
>        */
>       ctx = blk_get_aio_context(s->conf.blk);
>       if (ctx != qemu_get_current_aio_context()) {
> -        aio_bh_schedule_oneshot(ctx, scsi_device_for_each_req_async_bh, data);
> +        aio_bh_schedule_oneshot(ctx, scsi_device_for_each_req_async_bh,
> +                                g_steal_pointer(&data));
>           return;
>       }
>   

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/6] nbd/server: introduce NBDClient->lock to protect fields
  2023-12-21  1:49 ` [PATCH 6/6] nbd/server: introduce NBDClient->lock to protect fields Stefan Hajnoczi
  2023-12-21  7:26   ` Paolo Bonzini
@ 2023-12-21 10:45   ` Kevin Wolf
  2023-12-21 14:14     ` Stefan Hajnoczi
  1 sibling, 1 reply; 16+ messages in thread
From: Kevin Wolf @ 2023-12-21 10:45 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Leonardo Bras, qemu-block, Fam Zheng, Paolo Bonzini,
	Vladimir Sementsov-Ogievskiy, Fabiano Rosas, Eric Blake,
	Hanna Reitz, Juan Quintela, Peter Xu

Am 21.12.2023 um 02:49 hat Stefan Hajnoczi geschrieben:
> NBDClient has a number of fields that are accessed by both the export
> AioContext and the main loop thread. When the AioContext lock is removed
> these fields will need another form of protection.
> 
> Add NBDClient->lock and protect fields that are accessed by both
> threads. Also add assertions where possible and otherwise add doc
> comments stating assumptions about which thread and lock holding.
> 
> Note this patch moves the client->recv_coroutine assertion from
> nbd_co_receive_request() to nbd_trip() where client->lock is held.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  nbd/server.c | 128 +++++++++++++++++++++++++++++++++++++--------------
>  1 file changed, 94 insertions(+), 34 deletions(-)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index 527fbdab4a..4008ec7df9 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -125,23 +125,25 @@ struct NBDClient {
>      int refcount; /* atomic */
>      void (*close_fn)(NBDClient *client, bool negotiated);
>  
> +    QemuMutex lock;
> +
>      NBDExport *exp;
>      QCryptoTLSCreds *tlscreds;
>      char *tlsauthz;
>      QIOChannelSocket *sioc; /* The underlying data channel */
>      QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */
>  
> -    Coroutine *recv_coroutine;
> +    Coroutine *recv_coroutine; /* protected by lock */
>  
>      CoMutex send_lock;
>      Coroutine *send_coroutine;
>  
> -    bool read_yielding;
> -    bool quiescing;
> +    bool read_yielding; /* protected by lock */
> +    bool quiescing; /* protected by lock */
>  
>      QTAILQ_ENTRY(NBDClient) next;
> -    int nb_requests;
> -    bool closing;
> +    int nb_requests; /* protected by lock */
> +    bool closing; /* protected by lock */
>  
>      uint32_t check_align; /* If non-zero, check for aligned client requests */
>  
> @@ -1415,11 +1417,18 @@ nbd_read_eof(NBDClient *client, void *buffer, size_t size, Error **errp)
>  
>          len = qio_channel_readv(client->ioc, &iov, 1, errp);
>          if (len == QIO_CHANNEL_ERR_BLOCK) {
> -            client->read_yielding = true;
> +            WITH_QEMU_LOCK_GUARD(&client->lock) {
> +                if (client->quiescing) {
> +                    return -EAGAIN;
> +                }

Why did you add another client->quiescing check here?

If it is to address a race, I think you only made the window a bit
smaller, but between releasing the lock and yielding the field could
still change, so drain needs to handle this case anyway.

> +                client->read_yielding = true;
> +            }
>              qio_channel_yield(client->ioc, G_IO_IN);
> -            client->read_yielding = false;
> -            if (client->quiescing) {
> -                return -EAGAIN;
> +            WITH_QEMU_LOCK_GUARD(&client->lock) {
> +                client->read_yielding = false;
> +                if (client->quiescing) {
> +                    return -EAGAIN;
> +                }
>              }
>              continue;
>          } else if (len < 0) {
> @@ -1528,6 +1537,7 @@ void nbd_client_put(NBDClient *client)
>              blk_exp_unref(&client->exp->common);
>          }
>          g_free(client->contexts.bitmaps);
> +        qemu_mutex_destroy(&client->lock);
>          g_free(client);
>      }
>  }
> @@ -1536,11 +1546,13 @@ static void client_close(NBDClient *client, bool negotiated)
>  {
>      assert(qemu_in_main_thread());
>  
> -    if (client->closing) {
> -        return;
> -    }
> +    WITH_QEMU_LOCK_GUARD(&client->lock) {
> +        if (client->closing) {
> +            return;
> +        }
>  
> -    client->closing = true;
> +        client->closing = true;
> +    }
>  
>      /* Force requests to finish.  They will drop their own references,
>       * then we'll close the socket and free the NBDClient.
> @@ -1554,6 +1566,7 @@ static void client_close(NBDClient *client, bool negotiated)
>      }
>  }
>  
> +/* Runs in export AioContext with client->lock held */
>  static NBDRequestData *nbd_request_get(NBDClient *client)
>  {
>      NBDRequestData *req;
> @@ -1566,6 +1579,7 @@ static NBDRequestData *nbd_request_get(NBDClient *client)
>      return req;
>  }
>  
> +/* Runs in export AioContext with client->lock held */
>  static void nbd_request_put(NBDRequestData *req)
>  {
>      NBDClient *client = req->client;
> @@ -1589,14 +1603,18 @@ static void blk_aio_attached(AioContext *ctx, void *opaque)
>      NBDExport *exp = opaque;
>      NBDClient *client;
>  
> +    assert(qemu_in_main_thread());
> +
>      trace_nbd_blk_aio_attached(exp->name, ctx);
>  
>      exp->common.ctx = ctx;
>  
>      QTAILQ_FOREACH(client, &exp->clients, next) {
> -        assert(client->nb_requests == 0);
> -        assert(client->recv_coroutine == NULL);
> -        assert(client->send_coroutine == NULL);
> +        WITH_QEMU_LOCK_GUARD(&client->lock) {
> +            assert(client->nb_requests == 0);
> +            assert(client->recv_coroutine == NULL);
> +            assert(client->send_coroutine == NULL);
> +        }
>      }
>  }
>  
> @@ -1604,6 +1622,8 @@ static void blk_aio_detach(void *opaque)
>  {
>      NBDExport *exp = opaque;
>  
> +    assert(qemu_in_main_thread());
> +
>      trace_nbd_blk_aio_detach(exp->name, exp->common.ctx);
>  
>      exp->common.ctx = NULL;
> @@ -1614,8 +1634,12 @@ static void nbd_drained_begin(void *opaque)
>      NBDExport *exp = opaque;
>      NBDClient *client;
>  
> +    assert(qemu_in_main_thread());
> +
>      QTAILQ_FOREACH(client, &exp->clients, next) {
> -        client->quiescing = true;
> +        WITH_QEMU_LOCK_GUARD(&client->lock) {
> +            client->quiescing = true;
> +        }
>      }
>  }
>  
> @@ -1624,9 +1648,13 @@ static void nbd_drained_end(void *opaque)
>      NBDExport *exp = opaque;
>      NBDClient *client;
>  
> +    assert(qemu_in_main_thread());
> +
>      QTAILQ_FOREACH(client, &exp->clients, next) {
> -        client->quiescing = false;
> -        nbd_client_receive_next_request(client);
> +        WITH_QEMU_LOCK_GUARD(&client->lock) {
> +            client->quiescing = false;
> +            nbd_client_receive_next_request(client);
> +        }
>      }
>  }
>  
> @@ -1635,17 +1663,21 @@ static bool nbd_drained_poll(void *opaque)
>      NBDExport *exp = opaque;
>      NBDClient *client;
>  
> +    assert(qemu_in_main_thread());
> +
>      QTAILQ_FOREACH(client, &exp->clients, next) {
> -        if (client->nb_requests != 0) {
> -            /*
> -             * If there's a coroutine waiting for a request on nbd_read_eof()
> -             * enter it here so we don't depend on the client to wake it up.
> -             */
> -            if (client->recv_coroutine != NULL && client->read_yielding) {
> -                qio_channel_wake_read(client->ioc);
> +        WITH_QEMU_LOCK_GUARD(&client->lock) {
> +            if (client->nb_requests != 0) {
> +                /*
> +                 * If there's a coroutine waiting for a request on nbd_read_eof()
> +                 * enter it here so we don't depend on the client to wake it up.
> +                 */
> +                if (client->recv_coroutine != NULL && client->read_yielding) {
> +                    qio_channel_wake_read(client->ioc);
> +                }

This is where the race from above becomes relevant.

Let's first look at calling qio_channel_wake_read() a tiny bit too
early: Without any locking in qio_channel_yield(), we could catch the
read coroutine between setting ioc->read_coroutine and before actually
yielding. In this case we would call aio_co_wake() on a coroutine that
is still running in a different thread. Since it's in a different
thread, we only schedule it instead entering it directly, and that just
works. The coroutine will immediately be reentered, which is exactly
what we want.

Even earlier calls of qio_channel_wake_read() (i.e. between setting
client->read_yielding and setting ioc->read_coroutine) don't actively
hurt, they just don't do anything if no read is in flight. (This is the
same case as if the nbd_trip() coroutine didn't even set
client->read_yielding yet, just that the check you added above can't
catch it.)

So if nbd_read_eof() didn't yield yet, we don't wake it here, but we
still return true, so the next drained_poll call will try again.

This is good in principle, but it depends on waking up the main thread
when we made progress. So we have to call aio_wait_kick() between
setting ioc->read_coroutine and yielding to make this work. What we
actually may get indirectly is an aio_notify() through setting FD
handlers if all implementations of qio_channel_set_aio_fd_handler()
actually do that. I suppose this could be enough?

Anyway, if my result after thinking really hard about this is "I can't
rule out that it's correct", maybe it would be better to just run this
code in the export AioContext instead so that we don't have to think
about all the subtleties and know that the nbd_co_trip() coroutine is at
a yield point?

> +
> +                return true;
>              }
> -
> -            return true;
>          }
>      }
>  
> @@ -1656,6 +1688,8 @@ static void nbd_eject_notifier(Notifier *n, void *data)
>  {
>      NBDExport *exp = container_of(n, NBDExport, eject_notifier);
>  
> +    assert(qemu_in_main_thread());
> +
>      blk_exp_request_shutdown(&exp->common);
>  }
>  
> @@ -2541,7 +2575,6 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
>      int ret;
>  
>      g_assert(qemu_in_coroutine());
> -    assert(client->recv_coroutine == qemu_coroutine_self());
>      ret = nbd_receive_request(client, request, errp);
>      if (ret < 0) {
>          return ret;
> @@ -2950,7 +2983,11 @@ static coroutine_fn void nbd_trip(void *opaque)
>       */
>  
>      trace_nbd_trip();
> +
> +    qemu_mutex_lock(&client->lock);
> +
>      if (client->closing) {
> +        qemu_mutex_unlock(&client->lock);
>          aio_co_reschedule_self(qemu_get_aio_context());
>          nbd_client_put(client);
>          return;
> @@ -2961,15 +2998,24 @@ static coroutine_fn void nbd_trip(void *opaque)
>           * We're switching between AIO contexts. Don't attempt to receive a new
>           * request and kick the main context which may be waiting for us.
>           */
> -        aio_co_reschedule_self(qemu_get_aio_context());
> -        nbd_client_put(client);
>          client->recv_coroutine = NULL;
> +        qemu_mutex_unlock(&client->lock);
>          aio_wait_kick();
> +
> +        aio_co_reschedule_self(qemu_get_aio_context());
> +        nbd_client_put(client);
>          return;
>      }
>  
>      req = nbd_request_get(client);
> -    ret = nbd_co_receive_request(req, &request, &local_err);
> +
> +    do {
> +        assert(client->recv_coroutine == qemu_coroutine_self());
> +        qemu_mutex_unlock(&client->lock);
> +        ret = nbd_co_receive_request(req, &request, &local_err);
> +        qemu_mutex_lock(&client->lock);
> +    } while (ret == -EAGAIN && !client->quiescing);

I think this deserves a comment to say that the loop is only about the
drain case without polling where drained_end has already happened before
we reach this point, so we may not terminate the coroutine any more
because nothing would restart it.

>      client->recv_coroutine = NULL;

As soon as we're past this, the nbd_client_receive_next_request() called
by drained_end will create a new coroutine, so we don't have to be
careful about the same case after this.

Kevin



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/6] nbd/server: introduce NBDClient->lock to protect fields
  2023-12-21  7:26   ` Paolo Bonzini
@ 2023-12-21 11:56     ` Stefan Hajnoczi
  0 siblings, 0 replies; 16+ messages in thread
From: Stefan Hajnoczi @ 2023-12-21 11:56 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, qemu-devel, Leonardo Bras, qemu-block, Fam Zheng,
	Vladimir Sementsov-Ogievskiy, Fabiano Rosas, Eric Blake,
	Hanna Reitz, Juan Quintela, Peter Xu

[-- Attachment #1: Type: text/plain, Size: 582 bytes --]

On Thu, Dec 21, 2023 at 08:26:58AM +0100, Paolo Bonzini wrote:
> On 12/21/23 02:49, Stefan Hajnoczi wrote:
> >       nbd_client_receive_next_request(client);
> > +
> > +    qemu_mutex_unlock(&client->lock);
> > +
> >       if (ret == -EIO) {
> >           goto disconnect;
> >       }
> 
> I think I slightly prefer if disconnect is reached with lock taken, for
> consistency with the "done" label.  It does not complicate the code,
> because you can just move qio_channel_set_cork() and replace:

Yes, that makes the code easier to follow. Will fix in v2.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/6] nbd/server: introduce NBDClient->lock to protect fields
  2023-12-21 10:45   ` Kevin Wolf
@ 2023-12-21 14:14     ` Stefan Hajnoczi
  0 siblings, 0 replies; 16+ messages in thread
From: Stefan Hajnoczi @ 2023-12-21 14:14 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, Leonardo Bras, qemu-block, Fam Zheng, Paolo Bonzini,
	Vladimir Sementsov-Ogievskiy, Fabiano Rosas, Eric Blake,
	Hanna Reitz, Juan Quintela, Peter Xu

[-- Attachment #1: Type: text/plain, Size: 13876 bytes --]

On Thu, Dec 21, 2023 at 11:45:36AM +0100, Kevin Wolf wrote:
> Am 21.12.2023 um 02:49 hat Stefan Hajnoczi geschrieben:
> > NBDClient has a number of fields that are accessed by both the export
> > AioContext and the main loop thread. When the AioContext lock is removed
> > these fields will need another form of protection.
> > 
> > Add NBDClient->lock and protect fields that are accessed by both
> > threads. Also add assertions where possible and otherwise add doc
> > comments stating assumptions about which thread and lock holding.
> > 
> > Note this patch moves the client->recv_coroutine assertion from
> > nbd_co_receive_request() to nbd_trip() where client->lock is held.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  nbd/server.c | 128 +++++++++++++++++++++++++++++++++++++--------------
> >  1 file changed, 94 insertions(+), 34 deletions(-)
> > 
> > diff --git a/nbd/server.c b/nbd/server.c
> > index 527fbdab4a..4008ec7df9 100644
> > --- a/nbd/server.c
> > +++ b/nbd/server.c
> > @@ -125,23 +125,25 @@ struct NBDClient {
> >      int refcount; /* atomic */
> >      void (*close_fn)(NBDClient *client, bool negotiated);
> >  
> > +    QemuMutex lock;
> > +
> >      NBDExport *exp;
> >      QCryptoTLSCreds *tlscreds;
> >      char *tlsauthz;
> >      QIOChannelSocket *sioc; /* The underlying data channel */
> >      QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */
> >  
> > -    Coroutine *recv_coroutine;
> > +    Coroutine *recv_coroutine; /* protected by lock */
> >  
> >      CoMutex send_lock;
> >      Coroutine *send_coroutine;
> >  
> > -    bool read_yielding;
> > -    bool quiescing;
> > +    bool read_yielding; /* protected by lock */
> > +    bool quiescing; /* protected by lock */
> >  
> >      QTAILQ_ENTRY(NBDClient) next;
> > -    int nb_requests;
> > -    bool closing;
> > +    int nb_requests; /* protected by lock */
> > +    bool closing; /* protected by lock */
> >  
> >      uint32_t check_align; /* If non-zero, check for aligned client requests */
> >  
> > @@ -1415,11 +1417,18 @@ nbd_read_eof(NBDClient *client, void *buffer, size_t size, Error **errp)
> >  
> >          len = qio_channel_readv(client->ioc, &iov, 1, errp);
> >          if (len == QIO_CHANNEL_ERR_BLOCK) {
> > -            client->read_yielding = true;
> > +            WITH_QEMU_LOCK_GUARD(&client->lock) {
> > +                if (client->quiescing) {
> > +                    return -EAGAIN;
> > +                }
> 
> Why did you add another client->quiescing check here?
> 
> If it is to address a race, I think you only made the window a bit
> smaller, but between releasing the lock and yielding the field could
> still change, so drain needs to handle this case anyway.

I added it for consistency/symmetry where nbd_trip() checks
client->quiescing after acquiring client->lock but didn't have any
specific scenario in mind. I'll drop this.

I agree that it does not prevent races. .drained_begin() +
.drained_poll() can run after client->lock is released and before
qio_channel_yield() takes effect. In that case we miss client->quiescing
and still have the race where no wake occurs because
qio_channel_wake_read() sees ioc->read_coroutine == NULL.

> > +                client->read_yielding = true;
> > +            }
> >              qio_channel_yield(client->ioc, G_IO_IN);
> > -            client->read_yielding = false;
> > -            if (client->quiescing) {
> > -                return -EAGAIN;
> > +            WITH_QEMU_LOCK_GUARD(&client->lock) {
> > +                client->read_yielding = false;
> > +                if (client->quiescing) {
> > +                    return -EAGAIN;
> > +                }
> >              }
> >              continue;
> >          } else if (len < 0) {
> > @@ -1528,6 +1537,7 @@ void nbd_client_put(NBDClient *client)
> >              blk_exp_unref(&client->exp->common);
> >          }
> >          g_free(client->contexts.bitmaps);
> > +        qemu_mutex_destroy(&client->lock);
> >          g_free(client);
> >      }
> >  }
> > @@ -1536,11 +1546,13 @@ static void client_close(NBDClient *client, bool negotiated)
> >  {
> >      assert(qemu_in_main_thread());
> >  
> > -    if (client->closing) {
> > -        return;
> > -    }
> > +    WITH_QEMU_LOCK_GUARD(&client->lock) {
> > +        if (client->closing) {
> > +            return;
> > +        }
> >  
> > -    client->closing = true;
> > +        client->closing = true;
> > +    }
> >  
> >      /* Force requests to finish.  They will drop their own references,
> >       * then we'll close the socket and free the NBDClient.
> > @@ -1554,6 +1566,7 @@ static void client_close(NBDClient *client, bool negotiated)
> >      }
> >  }
> >  
> > +/* Runs in export AioContext with client->lock held */
> >  static NBDRequestData *nbd_request_get(NBDClient *client)
> >  {
> >      NBDRequestData *req;
> > @@ -1566,6 +1579,7 @@ static NBDRequestData *nbd_request_get(NBDClient *client)
> >      return req;
> >  }
> >  
> > +/* Runs in export AioContext with client->lock held */
> >  static void nbd_request_put(NBDRequestData *req)
> >  {
> >      NBDClient *client = req->client;
> > @@ -1589,14 +1603,18 @@ static void blk_aio_attached(AioContext *ctx, void *opaque)
> >      NBDExport *exp = opaque;
> >      NBDClient *client;
> >  
> > +    assert(qemu_in_main_thread());
> > +
> >      trace_nbd_blk_aio_attached(exp->name, ctx);
> >  
> >      exp->common.ctx = ctx;
> >  
> >      QTAILQ_FOREACH(client, &exp->clients, next) {
> > -        assert(client->nb_requests == 0);
> > -        assert(client->recv_coroutine == NULL);
> > -        assert(client->send_coroutine == NULL);
> > +        WITH_QEMU_LOCK_GUARD(&client->lock) {
> > +            assert(client->nb_requests == 0);
> > +            assert(client->recv_coroutine == NULL);
> > +            assert(client->send_coroutine == NULL);
> > +        }
> >      }
> >  }
> >  
> > @@ -1604,6 +1622,8 @@ static void blk_aio_detach(void *opaque)
> >  {
> >      NBDExport *exp = opaque;
> >  
> > +    assert(qemu_in_main_thread());
> > +
> >      trace_nbd_blk_aio_detach(exp->name, exp->common.ctx);
> >  
> >      exp->common.ctx = NULL;
> > @@ -1614,8 +1634,12 @@ static void nbd_drained_begin(void *opaque)
> >      NBDExport *exp = opaque;
> >      NBDClient *client;
> >  
> > +    assert(qemu_in_main_thread());
> > +
> >      QTAILQ_FOREACH(client, &exp->clients, next) {
> > -        client->quiescing = true;
> > +        WITH_QEMU_LOCK_GUARD(&client->lock) {
> > +            client->quiescing = true;
> > +        }
> >      }
> >  }
> >  
> > @@ -1624,9 +1648,13 @@ static void nbd_drained_end(void *opaque)
> >      NBDExport *exp = opaque;
> >      NBDClient *client;
> >  
> > +    assert(qemu_in_main_thread());
> > +
> >      QTAILQ_FOREACH(client, &exp->clients, next) {
> > -        client->quiescing = false;
> > -        nbd_client_receive_next_request(client);
> > +        WITH_QEMU_LOCK_GUARD(&client->lock) {
> > +            client->quiescing = false;
> > +            nbd_client_receive_next_request(client);
> > +        }
> >      }
> >  }
> >  
> > @@ -1635,17 +1663,21 @@ static bool nbd_drained_poll(void *opaque)
> >      NBDExport *exp = opaque;
> >      NBDClient *client;
> >  
> > +    assert(qemu_in_main_thread());
> > +
> >      QTAILQ_FOREACH(client, &exp->clients, next) {
> > -        if (client->nb_requests != 0) {
> > -            /*
> > -             * If there's a coroutine waiting for a request on nbd_read_eof()
> > -             * enter it here so we don't depend on the client to wake it up.
> > -             */
> > -            if (client->recv_coroutine != NULL && client->read_yielding) {
> > -                qio_channel_wake_read(client->ioc);
> > +        WITH_QEMU_LOCK_GUARD(&client->lock) {
> > +            if (client->nb_requests != 0) {
> > +                /*
> > +                 * If there's a coroutine waiting for a request on nbd_read_eof()
> > +                 * enter it here so we don't depend on the client to wake it up.
> > +                 */
> > +                if (client->recv_coroutine != NULL && client->read_yielding) {
> > +                    qio_channel_wake_read(client->ioc);
> > +                }
> 
> This is where the race from above becomes relevant.
> 
> Let's first look at calling qio_channel_wake_read() a tiny bit too
> early: Without any locking in qio_channel_yield(), we could catch the
> read coroutine between setting ioc->read_coroutine and before actually
> yielding. In this case we would call aio_co_wake() on a coroutine that
> is still running in a different thread. Since it's in a different
> thread, we only schedule it instead entering it directly, and that just
> works. The coroutine will immediately be reentered, which is exactly
> what we want.
> 
> Even earlier calls of qio_channel_wake_read() (i.e. between setting
> client->read_yielding and setting ioc->read_coroutine) don't actively
> hurt, they just don't do anything if no read is in flight. (This is the
> same case as if the nbd_trip() coroutine didn't even set
> client->read_yielding yet, just that the check you added above can't
> catch it.)
> 
> So if nbd_read_eof() didn't yield yet, we don't wake it here, but we
> still return true, so the next drained_poll call will try again.
> 
> This is good in principle, but it depends on waking up the main thread
> when we made progress. So we have to call aio_wait_kick() between
> setting ioc->read_coroutine and yielding to make this work. What we
> actually may get indirectly is an aio_notify() through setting FD
> handlers if all implementations of qio_channel_set_aio_fd_handler()
> actually do that. I suppose this could be enough?

qio_channel_set_aio_fd_handler() calls aio_notify() on the export's
AioContext. It does not wake the main loop AioContext when an IOThread
is being used so I don't think it helps here.

The state where qio_channel_wake_read() misses that nbd_trip() is
yielding looks like this:

client->nb_requests > 0
client->recv_coroutine = nbd_trip() coroutine
client->quiescing = true
client->read_yielding = true
ioc->read_coroutine = NULL

The main loop thread is waiting for activity and nbd_trip() enters
qemu_coroutine_yield(). There is no progress until the main loop thread
resumes (which can be triggered by the export AioContext completing NBD
I/O too).

I guess the race isn't immediately apparent because there is usually
some event loop activity that hides the problem.

> Anyway, if my result after thinking really hard about this is "I can't
> rule out that it's correct", maybe it would be better to just run this
> code in the export AioContext instead so that we don't have to think
> about all the subtleties and know that the nbd_co_trip() coroutine is at
> a yield point?

Agreed.

> > +
> > +                return true;
> >              }
> > -
> > -            return true;
> >          }
> >      }
> >  
> > @@ -1656,6 +1688,8 @@ static void nbd_eject_notifier(Notifier *n, void *data)
> >  {
> >      NBDExport *exp = container_of(n, NBDExport, eject_notifier);
> >  
> > +    assert(qemu_in_main_thread());
> > +
> >      blk_exp_request_shutdown(&exp->common);
> >  }
> >  
> > @@ -2541,7 +2575,6 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
> >      int ret;
> >  
> >      g_assert(qemu_in_coroutine());
> > -    assert(client->recv_coroutine == qemu_coroutine_self());
> >      ret = nbd_receive_request(client, request, errp);
> >      if (ret < 0) {
> >          return ret;
> > @@ -2950,7 +2983,11 @@ static coroutine_fn void nbd_trip(void *opaque)
> >       */
> >  
> >      trace_nbd_trip();
> > +
> > +    qemu_mutex_lock(&client->lock);
> > +
> >      if (client->closing) {
> > +        qemu_mutex_unlock(&client->lock);
> >          aio_co_reschedule_self(qemu_get_aio_context());
> >          nbd_client_put(client);
> >          return;
> > @@ -2961,15 +2998,24 @@ static coroutine_fn void nbd_trip(void *opaque)
> >           * We're switching between AIO contexts. Don't attempt to receive a new
> >           * request and kick the main context which may be waiting for us.
> >           */
> > -        aio_co_reschedule_self(qemu_get_aio_context());
> > -        nbd_client_put(client);
> >          client->recv_coroutine = NULL;
> > +        qemu_mutex_unlock(&client->lock);
> >          aio_wait_kick();
> > +
> > +        aio_co_reschedule_self(qemu_get_aio_context());
> > +        nbd_client_put(client);
> >          return;
> >      }
> >  
> >      req = nbd_request_get(client);
> > -    ret = nbd_co_receive_request(req, &request, &local_err);
> > +
> > +    do {
> > +        assert(client->recv_coroutine == qemu_coroutine_self());
> > +        qemu_mutex_unlock(&client->lock);
> > +        ret = nbd_co_receive_request(req, &request, &local_err);
> > +        qemu_mutex_lock(&client->lock);
> > +    } while (ret == -EAGAIN && !client->quiescing);
> 
> I think this deserves a comment to say that the loop is only about the
> drain case without polling where drained_end has already happened before
> we reach this point, so we may not terminate the coroutine any more
> because nothing would restart it.

Sounds good, I'll add a comment in the next revision.

> >      client->recv_coroutine = NULL;
> 
> As soon as we're past this, the nbd_client_receive_next_request() called
> by drained_end will create a new coroutine, so we don't have to be
> careful about the same case after this.
> 
> Kevin
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/6] nbd/server: only traverse NBDExport->clients from main loop thread
  2023-12-21  7:23   ` Paolo Bonzini
@ 2023-12-21 14:27     ` Stefan Hajnoczi
  0 siblings, 0 replies; 16+ messages in thread
From: Stefan Hajnoczi @ 2023-12-21 14:27 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, qemu-devel, Leonardo Bras, qemu-block, Fam Zheng,
	Vladimir Sementsov-Ogievskiy, Fabiano Rosas, Eric Blake,
	Hanna Reitz, Juan Quintela, Peter Xu

[-- Attachment #1: Type: text/plain, Size: 6793 bytes --]

On Thu, Dec 21, 2023 at 08:23:15AM +0100, Paolo Bonzini wrote:
> On 12/21/23 02:49, Stefan Hajnoczi wrote:
> > The NBD clients list is currently accessed from both the export
> > AioContext and the main loop thread. When the AioContext lock is removed
> > there will be nothing protecting the clients list.
> > 
> > Adding a lock around the clients list is tricky because NBDClient
> > structs are refcounted and may be freed from the export AioContext or
> > the main loop thread. nbd_export_request_shutdown() -> client_close() ->
> > nbd_client_put() is also tricky because the list lock would be held
> > while indirectly dropping references to NDBClients.
> > 
> > A simpler approach is to only allow nbd_client_put() and client_close()
> > calls from the main loop thread. Then the NBD clients list is only
> > accessed from the main loop thread and no fancy locking is needed.
> > 
> > nbd_trip() just needs to reschedule itself in the main loop AioContext
> > before calling nbd_client_put() and client_close(). This costs more CPU
> > cycles per NBD request but is needed for thread-safety when the
> > AioContext lock is removed.
> > 
> > Note that nbd_client_get() can still be called from either thread, so
> > make NBDClient->refcount atomic.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >   nbd/server.c | 23 ++++++++++++++++++++---
> >   1 file changed, 20 insertions(+), 3 deletions(-)
> > 
> > diff --git a/nbd/server.c b/nbd/server.c
> > index 0b09ccc8dc..527fbdab4a 100644
> > --- a/nbd/server.c
> > +++ b/nbd/server.c
> > @@ -122,7 +122,7 @@ struct NBDMetaContexts {
> >   };
> >   struct NBDClient {
> > -    int refcount;
> > +    int refcount; /* atomic */
> >       void (*close_fn)(NBDClient *client, bool negotiated);
> >       NBDExport *exp;
> > @@ -1501,14 +1501,17 @@ static int coroutine_fn nbd_receive_request(NBDClient *client, NBDRequest *reque
> >   #define MAX_NBD_REQUESTS 16
> > +/* Runs in export AioContext and main loop thread */
> >   void nbd_client_get(NBDClient *client)
> >   {
> > -    client->refcount++;
> > +    qatomic_inc(&client->refcount);
> >   }
> >   void nbd_client_put(NBDClient *client)
> >   {
> > -    if (--client->refcount == 0) {
> > +    assert(qemu_in_main_thread());
> > +
> > +    if (qatomic_fetch_dec(&client->refcount) == 1) {
> >           /* The last reference should be dropped by client->close,
> >            * which is called by client_close.
> >            */
> > @@ -1531,6 +1534,8 @@ void nbd_client_put(NBDClient *client)
> >   static void client_close(NBDClient *client, bool negotiated)
> >   {
> > +    assert(qemu_in_main_thread());
> > +
> >       if (client->closing) {
> >           return;
> >       }
> > @@ -2938,8 +2943,15 @@ static coroutine_fn void nbd_trip(void *opaque)
> >       int ret;
> >       Error *local_err = NULL;
> > +    /*
> > +     * Note that nbd_client_put() and client_close() must be called from the
> > +     * main loop thread. Use aio_co_reschedule_self() to switch AioContext
> > +     * before calling these functions.
> > +     */
> > +
> >       trace_nbd_trip();
> >       if (client->closing) {
> > +        aio_co_reschedule_self(qemu_get_aio_context());
> >           nbd_client_put(client);
> >           return;
> >       }
> > @@ -2949,6 +2961,7 @@ static coroutine_fn void nbd_trip(void *opaque)
> >            * We're switching between AIO contexts. Don't attempt to receive a new
> >            * request and kick the main context which may be waiting for us.
> >            */
> > +        aio_co_reschedule_self(qemu_get_aio_context());
> >           nbd_client_put(client);
> >           client->recv_coroutine = NULL;
> >           aio_wait_kick();
> > @@ -3013,6 +3026,8 @@ static coroutine_fn void nbd_trip(void *opaque)
> >       qio_channel_set_cork(client->ioc, false);
> >   done:
> >       nbd_request_put(req);
> > +
> > +    aio_co_reschedule_self(qemu_get_aio_context());
> >       nbd_client_put(client);
> >       return;
> 
> This is very expensive to do on every NBD receive, considering that it really
> can happen only when closing (see the assertion in nbd_client_put).
> 
> In Linux there is a common pattern of "if refcount could go to zero, take
> a lock before doing the decrement".  We can do something similar with "if
> refcount could go to zero, move to main iothread before doing the decrement":

Nice suggestion, thanks!

> 
> diff --git a/nbd/server.c b/nbd/server.c
> index 895cf0a7525..aec306923d8 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -1529,6 +1529,21 @@ void nbd_client_put(NBDClient *client)
>      }
>  }
> +static bool nbd_client_put_nonzero(NBDClient *client)
> +{
> +    int old = qatomic_read(&client->refcount);
> +    do {
> +        if (old == 1) {
> +            return false;
> +        }
> +
> +        int expected = old;
> +        old = qatomic_cmpxchg(&client->refcount, expected, expected - 1);
> +    } while (old != expected);
> +
> +    return true;
> +}
> +
>  static void client_close(NBDClient *client, bool negotiated)
>  {
>      if (client->closing) {
> @@ -2936,15 +2951,14 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
>  static coroutine_fn void nbd_trip(void *opaque)
>  {
>      NBDClient *client = opaque;
> -    NBDRequestData *req;
> +    NBDRequestData *req = NULL;
>      NBDRequest request = { 0 };    /* GCC thinks it can be used uninitialized */
>      int ret;
>      Error *local_err = NULL;
>      trace_nbd_trip();
>      if (client->closing) {
> -        nbd_client_put(client);
> -        return;
> +        goto done;
>      }
>      if (client->quiescing) {
> @@ -2952,10 +2966,9 @@ static coroutine_fn void nbd_trip(void *opaque)
>           * We're switching between AIO contexts. Don't attempt to receive a new
>           * request and kick the main context which may be waiting for us.
>           */
> -        nbd_client_put(client);
>          client->recv_coroutine = NULL;
>          aio_wait_kick();
> -        return;
> +        goto done;
>      }
>      req = nbd_request_get(client);
> @@ -3015,8 +3028,13 @@ static coroutine_fn void nbd_trip(void *opaque)
>      qio_channel_set_cork(client->ioc, false);
>  done:
> -    nbd_request_put(req);
> -    nbd_client_put(client);
> +    if (req) {
> +        nbd_request_put(req);
> +    }
> +    if (!nbd_client_put_nonzero(client)) {
> +        aio_co_reschedule_self(qemu_get_aio_context());
> +        nbd_client_put(client);
> +    }
>      return;
>  disconnect:
> 
> I think adding the "if (req)" should also simplify a little bit the addition of the lock.
> 
> Paolo
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/6] nbd/server: only traverse NBDExport->clients from main loop thread
  2023-12-21  1:49 ` [PATCH 5/6] nbd/server: only traverse NBDExport->clients from main loop thread Stefan Hajnoczi
  2023-12-21  7:23   ` Paolo Bonzini
@ 2024-01-02 15:32   ` Eric Blake
  1 sibling, 0 replies; 16+ messages in thread
From: Eric Blake @ 2024-01-02 15:32 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, qemu-devel, Leonardo Bras, qemu-block, Fam Zheng,
	Paolo Bonzini, Vladimir Sementsov-Ogievskiy, Fabiano Rosas,
	Hanna Reitz, Juan Quintela, Peter Xu

On Wed, Dec 20, 2023 at 08:49:02PM -0500, Stefan Hajnoczi wrote:
> The NBD clients list is currently accessed from both the export
> AioContext and the main loop thread. When the AioContext lock is removed
> there will be nothing protecting the clients list.
> 
> Adding a lock around the clients list is tricky because NBDClient
> structs are refcounted and may be freed from the export AioContext or
> the main loop thread. nbd_export_request_shutdown() -> client_close() ->
> nbd_client_put() is also tricky because the list lock would be held
> while indirectly dropping references to NDBClients.
> 
> A simpler approach is to only allow nbd_client_put() and client_close()
> calls from the main loop thread. Then the NBD clients list is only
> accessed from the main loop thread and no fancy locking is needed.
> 
> nbd_trip() just needs to reschedule itself in the main loop AioContext
> before calling nbd_client_put() and client_close(). This costs more CPU
> cycles per NBD request but is needed for thread-safety when the
> AioContext lock is removed.

Late review (now that this is already in), but this is a bit
misleading.  The CPU penalty is only incurred for NBD_CMD_DISC or
after detection of a protocol error (that is, only when the connection
is being shut down), and not on every NBD request.

Thanks for working on this!

> 
> Note that nbd_client_get() can still be called from either thread, so
> make NBDClient->refcount atomic.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  nbd/server.c | 23 ++++++++++++++++++++---
>  1 file changed, 20 insertions(+), 3 deletions(-)
 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-01-02 15:33 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-21  1:48 [PATCH 0/6] qemu-iotests fixes for Kevin's block tree Stefan Hajnoczi
2023-12-21  1:48 ` [PATCH 1/6] fixup block-coroutine-wrapper: use qemu_get_current_aio_context() Stefan Hajnoczi
2023-12-21  1:48 ` [PATCH 2/6] fixup block: remove AioContext locking Stefan Hajnoczi
2023-12-21  1:49 ` [PATCH 3/6] fixup scsi: only access SCSIDevice->requests from one thread Stefan Hajnoczi
2023-12-21  7:29   ` Paolo Bonzini
2023-12-21  1:49 ` [PATCH 4/6] nbd/server: avoid per-NBDRequest nbd_client_get/put() Stefan Hajnoczi
2023-12-21  7:27   ` Paolo Bonzini
2023-12-21  1:49 ` [PATCH 5/6] nbd/server: only traverse NBDExport->clients from main loop thread Stefan Hajnoczi
2023-12-21  7:23   ` Paolo Bonzini
2023-12-21 14:27     ` Stefan Hajnoczi
2024-01-02 15:32   ` Eric Blake
2023-12-21  1:49 ` [PATCH 6/6] nbd/server: introduce NBDClient->lock to protect fields Stefan Hajnoczi
2023-12-21  7:26   ` Paolo Bonzini
2023-12-21 11:56     ` Stefan Hajnoczi
2023-12-21 10:45   ` Kevin Wolf
2023-12-21 14:14     ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.