From: Sergio Lopez <slp@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] nbd/server: attach client channel to the export's AioContext
Date: Thu, 12 Sep 2019 12:30:08 +0200 [thread overview]
Message-ID: <87y2ytx0wf.fsf@redhat.com> (raw)
In-Reply-To: <20190912081414.GC5383@linux.fritz.box>
[-- Attachment #1: Type: text/plain, Size: 3922 bytes --]
Kevin Wolf <kwolf@redhat.com> writes:
> Am 11.09.2019 um 23:33 hat Eric Blake geschrieben:
>> On 9/11/19 12:21 PM, Eric Blake wrote:
>> > On 9/11/19 11:15 AM, Sergio Lopez wrote:
>> >> On creation, the export's AioContext is set to the same one as the
>> >> BlockBackend, while the AioContext in the client QIOChannel is left
>> >> untouched.
>> >>
>> >> As a result, when using data-plane, nbd_client_receive_next_request()
>> >> schedules coroutines in the IOThread AioContext, while the client's
>> >> QIOChannel is serviced from the main_loop, potentially triggering the
>> >> assertion at qio_channel_restart_[read|write].
>> >>
>> >> To fix this, as soon we have the export corresponding to the client,
>> >> we call qio_channel_attach_aio_context() to attach the QIOChannel
>> >> context to the export's AioContext. This matches with the logic in
>> >> blk_aio_attached().
>> >>
>> >> RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1748253
>> >> Signed-off-by: Sergio Lopez <slp@redhat.com>
>> >> ---
>> >> nbd/server.c | 2 ++
>> >> 1 file changed, 2 insertions(+)
>> >
>> > I'd like a second opinion from Kevin, but the description makes sense to
>> > me. I'm happy to queue this through my NBD tree.
>> >
>> > Reviewed-by: Eric Blake <eblake@redhat.com>
>>
>> I tried to test this patch, but even with it applied, I still got an
>> aio-context crasher by attempting an nbd-server-start, nbd-server-add,
>> nbd-server-stop (intentionally skipping the nbd-server-remove step) on a
>> domain using iothreads, with a backtrace of:
>>
>> #0 0x00007ff09d070e35 in raise () from target:/lib64/libc.so.6
>> #1 0x00007ff09d05b895 in abort () from target:/lib64/libc.so.6
>> #2 0x000055dd03b9ab86 in error_exit (err=1, msg=0x55dd03d59fb0
>> <__func__.15769> "qemu_mutex_unlock_impl")
>> at util/qemu-thread-posix.c:36
>> #3 0x000055dd03b9adcf in qemu_mutex_unlock_impl (mutex=0x55dd062d5090,
>> file=0x55dd03d59041 "util/async.c",
>> line=523) at util/qemu-thread-posix.c:96
>> #4 0x000055dd03b93433 in aio_context_release (ctx=0x55dd062d5030) at
>> util/async.c:523
>> #5 0x000055dd03ac421b in bdrv_do_drained_begin (bs=0x55dd0673a2d0,
>> recursive=false, parent=0x0,
>> ignore_bds_parents=false, poll=true) at block/io.c:428
>> #6 0x000055dd03ac4299 in bdrv_drained_begin (bs=0x55dd0673a2d0) at
>> block/io.c:434
>> #7 0x000055dd03aafb54 in blk_drain (blk=0x55dd06a3ec40) at
>> block/block-backend.c:1605
>> #8 0x000055dd03aae054 in blk_remove_bs (blk=0x55dd06a3ec40) at
>> block/block-backend.c:800
>> #9 0x000055dd03aad54a in blk_delete (blk=0x55dd06a3ec40) at
>> block/block-backend.c:420
>> #10 0x000055dd03aad7d6 in blk_unref (blk=0x55dd06a3ec40) at
>> block/block-backend.c:475
>> #11 0x000055dd03aecb68 in nbd_export_put (exp=0x55dd0726f920) at
>> nbd/server.c:1666
>> #12 0x000055dd03aec8fe in nbd_export_close (exp=0x55dd0726f920) at
>> nbd/server.c:1616
>> #13 0x000055dd03aecbf1 in nbd_export_close_all () at nbd/server.c:1689
>> #14 0x000055dd03748845 in qmp_nbd_server_stop (errp=0x7ffcdf3cb4e8) at
>> blockdev-nbd.c:233
>> ...
>>
>> Does that sound familiar to what you were seeing? Does it mean we
>> missed another spot where the context is set incorrectly?
>
> I think nbd_export_close_all() or one of the NBD functions called by it
> needs to take the AioContext lock of the associated BlockBackend.
>
> The crash is because AIO_POLL_WHILE() wants to temporarily drop the lock
> that we're not even holding.
Yes, I think locking the context during the "if (exp->blk) {" block at
nbd/server.c:1646 should do the trick.
On the other hand, I wonder if there is any situation in which calling
to blk_unref() without locking the context could be safe. If there isn't
any, perhaps we should assert that the lock is held if blk->ctx != NULL
to catch this kind of bugs earlier?
Sergio.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
next prev parent reply other threads:[~2019-09-12 10:35 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-11 16:15 [Qemu-devel] [PATCH] nbd/server: attach client channel to the export's AioContext Sergio Lopez
2019-09-11 17:21 ` Eric Blake
2019-09-11 21:33 ` Eric Blake
2019-09-11 22:03 ` Eric Blake
2019-09-12 6:37 ` Sergio Lopez
2019-09-16 21:11 ` Eric Blake
2019-09-12 8:14 ` Kevin Wolf
2019-09-12 10:30 ` Sergio Lopez [this message]
2019-09-12 11:31 ` Kevin Wolf
2019-09-16 22:30 ` Eric Blake
2019-09-12 8:23 ` Kevin Wolf
2019-09-12 10:13 ` Sergio Lopez
2019-09-12 10:25 ` Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87y2ytx0wf.fsf@redhat.com \
--to=slp@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).