All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-9.0 0/2] nbd: Fix server crash on reset with iothreads
@ 2024-03-14 16:58 Kevin Wolf
  2024-03-14 16:58 ` [PATCH for-9.0 1/2] nbd/server: Fix race in draining the export Kevin Wolf
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Kevin Wolf @ 2024-03-14 16:58 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, stefanha, hreitz, eblake, aliang, qemu-devel, qemu-stable

Kevin Wolf (2):
  nbd/server: Fix race in draining the export
  iotests: Add test for reset/AioContext switches with NBD exports

 nbd/server.c                                  | 15 ++---
 tests/qemu-iotests/tests/iothreads-nbd-export | 66 +++++++++++++++++++
 .../tests/iothreads-nbd-export.out            | 19 ++++++
 3 files changed, 92 insertions(+), 8 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/iothreads-nbd-export
 create mode 100644 tests/qemu-iotests/tests/iothreads-nbd-export.out

-- 
2.44.0



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH for-9.0 1/2] nbd/server: Fix race in draining the export
  2024-03-14 16:58 [PATCH for-9.0 0/2] nbd: Fix server crash on reset with iothreads Kevin Wolf
@ 2024-03-14 16:58 ` Kevin Wolf
  2024-03-19 16:49   ` Michael Tokarev
  2024-03-14 16:58 ` [PATCH for-9.0 2/2] iotests: Add test for reset/AioContext switches with NBD exports Kevin Wolf
  2024-03-18 13:28 ` [PATCH for-9.0 0/2] nbd: Fix server crash on reset with iothreads Stefan Hajnoczi
  2 siblings, 1 reply; 5+ messages in thread
From: Kevin Wolf @ 2024-03-14 16:58 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, stefanha, hreitz, eblake, aliang, qemu-devel, qemu-stable

When draining an NBD export, nbd_drained_begin() first sets
client->quiescing so that nbd_client_receive_next_request() won't start
any new request coroutines. Then nbd_drained_poll() tries to makes sure
that we wait for any existing request coroutines by checking that
client->nb_requests has become 0.

However, there is a small window between creating a new request
coroutine and increasing client->nb_requests. If a coroutine is in this
state, it won't be waited for and drain returns too early.

In the context of switching to a different AioContext, this means that
blk_aio_attached() will see client->recv_coroutine != NULL and fail its
assertion.

Fix this by increasing client->nb_requests immediately when starting the
coroutine. Doing this after the checks if we should create a new
coroutine is okay because client->lock is held.

Cc: qemu-stable@nongnu.org
Fixes: fd6afc501a019682d1b8468b562355a2887087bd
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 nbd/server.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 941832f178..c3484cc1eb 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -3007,8 +3007,8 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
 /* Owns a reference to the NBDClient passed as opaque.  */
 static coroutine_fn void nbd_trip(void *opaque)
 {
-    NBDClient *client = opaque;
-    NBDRequestData *req = NULL;
+    NBDRequestData *req = opaque;
+    NBDClient *client = req->client;
     NBDRequest request = { 0 };    /* GCC thinks it can be used uninitialized */
     int ret;
     Error *local_err = NULL;
@@ -3037,8 +3037,6 @@ static coroutine_fn void nbd_trip(void *opaque)
         goto done;
     }
 
-    req = nbd_request_get(client);
-
     /*
      * nbd_co_receive_request() returns -EAGAIN when nbd_drained_begin() has
      * set client->quiescing but by the time we get back nbd_drained_end() may
@@ -3112,9 +3110,7 @@ static coroutine_fn void nbd_trip(void *opaque)
     }
 
 done:
-    if (req) {
-        nbd_request_put(req);
-    }
+    nbd_request_put(req);
 
     qemu_mutex_unlock(&client->lock);
 
@@ -3143,10 +3139,13 @@ disconnect:
  */
 static void nbd_client_receive_next_request(NBDClient *client)
 {
+    NBDRequestData *req;
+
     if (!client->recv_coroutine && client->nb_requests < MAX_NBD_REQUESTS &&
         !client->quiescing) {
         nbd_client_get(client);
-        client->recv_coroutine = qemu_coroutine_create(nbd_trip, client);
+        req = nbd_request_get(client);
+        client->recv_coroutine = qemu_coroutine_create(nbd_trip, req);
         aio_co_schedule(client->exp->common.ctx, client->recv_coroutine);
     }
 }
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH for-9.0 2/2] iotests: Add test for reset/AioContext switches with NBD exports
  2024-03-14 16:58 [PATCH for-9.0 0/2] nbd: Fix server crash on reset with iothreads Kevin Wolf
  2024-03-14 16:58 ` [PATCH for-9.0 1/2] nbd/server: Fix race in draining the export Kevin Wolf
@ 2024-03-14 16:58 ` Kevin Wolf
  2024-03-18 13:28 ` [PATCH for-9.0 0/2] nbd: Fix server crash on reset with iothreads Stefan Hajnoczi
  2 siblings, 0 replies; 5+ messages in thread
From: Kevin Wolf @ 2024-03-14 16:58 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, stefanha, hreitz, eblake, aliang, qemu-devel, qemu-stable

This replicates the scenario in which the bug was reported.
Unfortunately this relies on actually executing a guest (so that the
firmware initialises the virtio-blk device and moves it to its
configured iothread), so this can't make use of the qtest accelerator
like most other test cases. I tried to find a different easy way to
trigger the bug, but couldn't find one.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/qemu-iotests/tests/iothreads-nbd-export | 66 +++++++++++++++++++
 .../tests/iothreads-nbd-export.out            | 19 ++++++
 2 files changed, 85 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/iothreads-nbd-export
 create mode 100644 tests/qemu-iotests/tests/iothreads-nbd-export.out

diff --git a/tests/qemu-iotests/tests/iothreads-nbd-export b/tests/qemu-iotests/tests/iothreads-nbd-export
new file mode 100755
index 0000000000..63cac8fdbf
--- /dev/null
+++ b/tests/qemu-iotests/tests/iothreads-nbd-export
@@ -0,0 +1,66 @@
+#!/usr/bin/env python3
+# group: rw quick
+#
+# Copyright (C) 2024 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+# Creator/Owner: Kevin Wolf <kwolf@redhat.com>
+
+import asyncio
+import iotests
+import qemu
+import time
+
+iotests.script_initialize(supported_fmts=['qcow2'],
+                          supported_platforms=['linux'])
+
+with iotests.FilePath('disk1.img') as path, \
+     iotests.FilePath('nbd.sock', base_dir=iotests.sock_dir) as nbd_sock, \
+     qemu.machine.QEMUMachine(iotests.qemu_prog) as vm:
+
+    img_size = '10M'
+
+    iotests.log('Preparing disk...')
+    iotests.qemu_img_create('-f', iotests.imgfmt, path, img_size)
+    vm.add_args('-blockdev', f'file,node-name=disk-file,filename={path}')
+    vm.add_args('-blockdev', f'qcow2,node-name=disk,file=disk-file')
+    vm.add_args('-object', 'iothread,id=iothread0')
+    vm.add_args('-device', 'virtio-blk,drive=disk,iothread=iothread0,share-rw=on')
+
+    iotests.log('Launching VM...')
+    vm.add_args('-accel', 'kvm', '-accel', 'tcg')
+    #vm.add_args('-accel', 'qtest')
+    vm.launch()
+
+    iotests.log('Exporting to NBD...')
+    iotests.log(vm.qmp('nbd-server-start',
+                       addr={'type': 'unix', 'data': {'path': nbd_sock}}))
+    iotests.log(vm.qmp('block-export-add', type='nbd', id='exp0',
+                       node_name='disk', writable=True))
+
+    iotests.log('Connecting qemu-img...')
+    qemu_io = iotests.QemuIoInteractive('-f', 'raw',
+                                        f'nbd+unix:///disk?socket={nbd_sock}')
+
+    iotests.log('Moving the NBD export to a different iothread...')
+    for i in range(0, 10):
+        iotests.log(vm.qmp('system_reset'))
+        time.sleep(0.1)
+
+    iotests.log('Checking that it is still alive...')
+    iotests.log(vm.qmp('query-status'))
+
+    qemu_io.close()
+    vm.shutdown()
diff --git a/tests/qemu-iotests/tests/iothreads-nbd-export.out b/tests/qemu-iotests/tests/iothreads-nbd-export.out
new file mode 100644
index 0000000000..bc514e35e5
--- /dev/null
+++ b/tests/qemu-iotests/tests/iothreads-nbd-export.out
@@ -0,0 +1,19 @@
+Preparing disk...
+Launching VM...
+Exporting to NBD...
+{"return": {}}
+{"return": {}}
+Connecting qemu-img...
+Moving the NBD export to a different iothread...
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+Checking that it is still alive...
+{"return": {"running": true, "status": "running"}}
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH for-9.0 0/2] nbd: Fix server crash on reset with iothreads
  2024-03-14 16:58 [PATCH for-9.0 0/2] nbd: Fix server crash on reset with iothreads Kevin Wolf
  2024-03-14 16:58 ` [PATCH for-9.0 1/2] nbd/server: Fix race in draining the export Kevin Wolf
  2024-03-14 16:58 ` [PATCH for-9.0 2/2] iotests: Add test for reset/AioContext switches with NBD exports Kevin Wolf
@ 2024-03-18 13:28 ` Stefan Hajnoczi
  2 siblings, 0 replies; 5+ messages in thread
From: Stefan Hajnoczi @ 2024-03-18 13:28 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, hreitz, eblake, aliang, qemu-devel, qemu-stable

[-- Attachment #1: Type: text/plain, Size: 676 bytes --]

On Thu, Mar 14, 2024 at 05:58:23PM +0100, Kevin Wolf wrote:
> Kevin Wolf (2):
>   nbd/server: Fix race in draining the export
>   iotests: Add test for reset/AioContext switches with NBD exports
> 
>  nbd/server.c                                  | 15 ++---
>  tests/qemu-iotests/tests/iothreads-nbd-export | 66 +++++++++++++++++++
>  .../tests/iothreads-nbd-export.out            | 19 ++++++
>  3 files changed, 92 insertions(+), 8 deletions(-)
>  create mode 100755 tests/qemu-iotests/tests/iothreads-nbd-export
>  create mode 100644 tests/qemu-iotests/tests/iothreads-nbd-export.out
> 
> -- 
> 2.44.0
> 

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH for-9.0 1/2] nbd/server: Fix race in draining the export
  2024-03-14 16:58 ` [PATCH for-9.0 1/2] nbd/server: Fix race in draining the export Kevin Wolf
@ 2024-03-19 16:49   ` Michael Tokarev
  0 siblings, 0 replies; 5+ messages in thread
From: Michael Tokarev @ 2024-03-19 16:49 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block
  Cc: stefanha, hreitz, eblake, aliang, qemu-devel, qemu-stable

14.03.2024 19:58, Kevin Wolf wrote:
> When draining an NBD export, nbd_drained_begin() first sets
> client->quiescing so that nbd_client_receive_next_request() won't start
> any new request coroutines. Then nbd_drained_poll() tries to makes sure
> that we wait for any existing request coroutines by checking that
> client->nb_requests has become 0.
> 
> However, there is a small window between creating a new request
> coroutine and increasing client->nb_requests. If a coroutine is in this
> state, it won't be waited for and drain returns too early.
> 
> In the context of switching to a different AioContext, this means that
> blk_aio_attached() will see client->recv_coroutine != NULL and fail its
> assertion.
> 
> Fix this by increasing client->nb_requests immediately when starting the
> coroutine. Doing this after the checks if we should create a new
> coroutine is okay because client->lock is held.
> 
> Cc: qemu-stable@nongnu.org
> Fixes: fd6afc501a019682d1b8468b562355a2887087bd
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Kevin, Stefan,

This change in master, which is Cc'ed stable, touches (refines) exactly the
same areas as f816310d0c32c "nbd/server: only traverse NBDExport->clients
from main loop thread", which is not (yet?) in stable, neither 7.2 nor 8.2.

Also, 7075d235114b4 "nbd/server: introduce NBDClient->lock to protect fields"
touches one of the places too.

I can try to construct something out of the two, but I think it is better
if either of you can do that, - if this seems a good thing to do anyway.
This way it is definitely much saner than my possible attempts.

Or we can just pick f816310d0c32c and 7075d235114b4 into stable too, - when
I evaluated f816310d0c32c for stable before I thought it isn't needed there
because AioContext lock isn't removed in 8.2 yet.  And I haven't thought
about 7075d235114b4 at all.  All 3 applies cleanly and the result passes
check-block, but it smells a bit too much for stable.

What do you think?

Thanks,

/mjt

> diff --git a/nbd/server.c b/nbd/server.c
> index 941832f178..c3484cc1eb 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -3007,8 +3007,8 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
>   /* Owns a reference to the NBDClient passed as opaque.  */
>   static coroutine_fn void nbd_trip(void *opaque)
>   {
> -    NBDClient *client = opaque;
> -    NBDRequestData *req = NULL;
> +    NBDRequestData *req = opaque;
> +    NBDClient *client = req->client;
>       NBDRequest request = { 0 };    /* GCC thinks it can be used uninitialized */
>       int ret;
>       Error *local_err = NULL;
> @@ -3037,8 +3037,6 @@ static coroutine_fn void nbd_trip(void *opaque)
>           goto done;
>       }
>   
> -    req = nbd_request_get(client);
> -
>       /*
>        * nbd_co_receive_request() returns -EAGAIN when nbd_drained_begin() has
>        * set client->quiescing but by the time we get back nbd_drained_end() may
> @@ -3112,9 +3110,7 @@ static coroutine_fn void nbd_trip(void *opaque)
>       }
>   
>   done:
> -    if (req) {
> -        nbd_request_put(req);
> -    }
> +    nbd_request_put(req);
>   
>       qemu_mutex_unlock(&client->lock);
>   
> @@ -3143,10 +3139,13 @@ disconnect:
>    */
>   static void nbd_client_receive_next_request(NBDClient *client)
>   {
> +    NBDRequestData *req;
> +
>       if (!client->recv_coroutine && client->nb_requests < MAX_NBD_REQUESTS &&
>           !client->quiescing) {
>           nbd_client_get(client);
> -        client->recv_coroutine = qemu_coroutine_create(nbd_trip, client);
> +        req = nbd_request_get(client);
> +        client->recv_coroutine = qemu_coroutine_create(nbd_trip, req);
>           aio_co_schedule(client->exp->common.ctx, client->recv_coroutine);
>       }
>   }



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-03-19 16:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-14 16:58 [PATCH for-9.0 0/2] nbd: Fix server crash on reset with iothreads Kevin Wolf
2024-03-14 16:58 ` [PATCH for-9.0 1/2] nbd/server: Fix race in draining the export Kevin Wolf
2024-03-19 16:49   ` Michael Tokarev
2024-03-14 16:58 ` [PATCH for-9.0 2/2] iotests: Add test for reset/AioContext switches with NBD exports Kevin Wolf
2024-03-18 13:28 ` [PATCH for-9.0 0/2] nbd: Fix server crash on reset with iothreads Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.