From: "Michael S. Tsirkin" <mst@redhat.com>
To: Denis Plotnikov <den-plotnikov@yandex-team.ru>
Cc: Kevin Wolf <kwolf@redhat.com>,
qemu-block@nongnu.org, qemu-devel@nongnu.org, mreitz@redhat.com,
yc-core@yandex-team.ru, raphael.norwitz@nutanix.com
Subject: Re: [PATCH v3 2/3] vhost-user-blk: perform immediate cleanup if disconnect on initialization
Date: Wed, 21 Apr 2021 15:59:51 -0400 [thread overview]
Message-ID: <20210421155929-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <88b9912e-97d3-f18a-b3cc-7891f3c55e3a@yandex-team.ru>
On Wed, Apr 21, 2021 at 07:13:24PM +0300, Denis Plotnikov wrote:
>
> On 21.04.2021 18:24, Kevin Wolf wrote:
> > Am 25.03.2021 um 16:12 hat Denis Plotnikov geschrieben:
> > > Commit 4bcad76f4c39 ("vhost-user-blk: delay vhost_user_blk_disconnect")
> > > introduced postponing vhost_dev cleanup aiming to eliminate qemu aborts
> > > because of connection problems with vhost-blk daemon.
> > >
> > > However, it introdues a new problem. Now, any communication errors
> > > during execution of vhost_dev_init() called by vhost_user_blk_device_realize()
> > > lead to qemu abort on assert in vhost_dev_get_config().
> > >
> > > This happens because vhost_user_blk_disconnect() is postponed but
> > > it should have dropped s->connected flag by the time
> > > vhost_user_blk_device_realize() performs a new connection opening.
> > > On the connection opening, vhost_dev initialization in
> > > vhost_user_blk_connect() relies on s->connection flag and
> > > if it's not dropped, it skips vhost_dev initialization and returns
> > > with success. Then, vhost_user_blk_device_realize()'s execution flow
> > > goes to vhost_dev_get_config() where it's aborted on the assert.
> > >
> > > To fix the problem this patch adds immediate cleanup on device
> > > initialization(in vhost_user_blk_device_realize()) using different
> > > event handlers for initialization and operation introduced in the
> > > previous patch.
> > > On initialization (in vhost_user_blk_device_realize()) we fully
> > > control the initialization process. At that point, nobody can use the
> > > device since it isn't initialized and we don't need to postpone any
> > > cleanups, so we can do cleaup right away when there is a communication
> > > problem with the vhost-blk daemon.
> > > On operation we leave it as is, since the disconnect may happen when
> > > the device is in use, so the device users may want to use vhost_dev's data
> > > to do rollback before vhost_dev is re-initialized (e.g. in vhost_dev_set_log()).
> > >
> > > Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
> > > Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
> > I think there is something wrong with this patch.
> >
> > I'm debugging an error case, specifically num-queues being larger in
> > QEMU that in the vhost-user-blk export. Before this patch, it has just
> > an unfriendly error message:
> >
> > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=vhost1,id=blk1,iommu_platform=off,disable-legacy=on,num-queues=4: Unexpected end-of-file before all data were read
> > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=vhost1,id=blk1,iommu_platform=off,disable-legacy=on,num-queues=4: Failed to read msg header. Read 0 instead of 12. Original request 24.
> > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=vhost1,id=blk1,iommu_platform=off,disable-legacy=on,num-queues=4: vhost-user-blk: get block config failed
> > qemu-system-x86_64: Failed to set msg fds.
> > qemu-system-x86_64: vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11)
> >
> > After the patch, it crashes:
> >
> > #0 0x0000555555d0a4bd in vhost_user_read_cb (source=0x5555568f4690, condition=(G_IO_IN | G_IO_HUP), opaque=0x7fffffffcbf0) at ../hw/virtio/vhost-user.c:313
> > #1 0x0000555555d950d3 in qio_channel_fd_source_dispatch (source=0x555557c3f750, callback=0x555555d0a478 <vhost_user_read_cb>, user_data=0x7fffffffcbf0) at ../io/channel-watch.c:84
> > #2 0x00007ffff7b32a9f in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
> > #3 0x00007ffff7b84a98 in g_main_context_iterate.constprop () at /lib64/libglib-2.0.so.0
> > #4 0x00007ffff7b32163 in g_main_loop_run () at /lib64/libglib-2.0.so.0
> > #5 0x0000555555d0a724 in vhost_user_read (dev=0x555557bc62f8, msg=0x7fffffffcc50) at ../hw/virtio/vhost-user.c:402
> > #6 0x0000555555d0ee6b in vhost_user_get_config (dev=0x555557bc62f8, config=0x555557bc62ac "", config_len=60) at ../hw/virtio/vhost-user.c:2133
> > #7 0x0000555555d56d46 in vhost_dev_get_config (hdev=0x555557bc62f8, config=0x555557bc62ac "", config_len=60) at ../hw/virtio/vhost.c:1566
> > #8 0x0000555555cdd150 in vhost_user_blk_device_realize (dev=0x555557bc60b0, errp=0x7fffffffcf90) at ../hw/block/vhost-user-blk.c:510
> > #9 0x0000555555d08f6d in virtio_device_realize (dev=0x555557bc60b0, errp=0x7fffffffcff0) at ../hw/virtio/virtio.c:3660
> >
> > The problem is that vhost_user_read_cb() still accesses dev->opaque even
> > though the device has been cleaned up meanwhile when the connection was
> > closed (the vhost_user_blk_disconnect() added by this patch), so it's
> > NULL now. This problem was actually mentioned in the comment that is
> > removed by this patch.
> >
> > I tried to fix this by making vhost_user_read() cope with the fact that
> > the device might have been cleaned up meanwhile, but then I'm running
> > into the next set of problems.
> >
> > The first is that retrying is pointless, the error condition is in the
> > configuration, it will never change.
> >
> > The other is that after many repetitions of the same error message, I
> > got a crash where the device is cleaned up a second time in
> > vhost_dev_init() and the virtqueues are already NULL.
> >
> > So it seems to me that erroring out during the initialisation phase
> > makes a lot more sense than retrying.
> >
> > Kevin
>
> But without the patch there will be another problem which the patch actually
> addresses.
>
> It seems to me that there is a case when the retrying is useless and this is
> exactly your case -- we'll never get a proper configuration.
>
> What if we get rid of the re-connection and give the only try to realize the
> device? Than we don't need case separating for initialization and operation
> of device, correct? But I don't familiar with the cases where the reconnect
> is needed? Do you know something it?
Reconnect is for when server is restarted while we are talking to it.
> Denis
>
> >
> > > hw/block/vhost-user-blk.c | 48 +++++++++++++++++++--------------------
> > > 1 file changed, 24 insertions(+), 24 deletions(-)
> > >
> > > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> > > index 1af95ec6aae7..4e215f71f152 100644
> > > --- a/hw/block/vhost-user-blk.c
> > > +++ b/hw/block/vhost-user-blk.c
> > > @@ -402,38 +402,38 @@ static void vhost_user_blk_event(void *opaque, QEMUChrEvent event,
> > > break;
> > > case CHR_EVENT_CLOSED:
> > > /*
> > > - * A close event may happen during a read/write, but vhost
> > > - * code assumes the vhost_dev remains setup, so delay the
> > > - * stop & clear. There are two possible paths to hit this
> > > - * disconnect event:
> > > - * 1. When VM is in the RUN_STATE_PRELAUNCH state. The
> > > - * vhost_user_blk_device_realize() is a caller.
> > > - * 2. In tha main loop phase after VM start.
> > > - *
> > > - * For p2 the disconnect event will be delayed. We can't
> > > - * do the same for p1, because we are not running the loop
> > > - * at this moment. So just skip this step and perform
> > > - * disconnect in the caller function.
> > > - *
> > > - * TODO: maybe it is a good idea to make the same fix
> > > - * for other vhost-user devices.
> > > + * Closing the connection should happen differently on device
> > > + * initialization and operation stages.
> > > + * On initalization, we want to re-start vhost_dev initialization
> > > + * from the very beginning right away when the connection is closed,
> > > + * so we clean up vhost_dev on each connection closing.
> > > + * On operation, we want to postpone vhost_dev cleanup to let the
> > > + * other code perform its own cleanup sequence using vhost_dev data
> > > + * (e.g. vhost_dev_set_log).
> > > */
> > > if (realized) {
> > > + /*
> > > + * A close event may happen during a read/write, but vhost
> > > + * code assumes the vhost_dev remains setup, so delay the
> > > + * stop & clear.
> > > + */
> > > AioContext *ctx = qemu_get_current_aio_context();
> > > qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, NULL, NULL,
> > > NULL, NULL, false);
> > > aio_bh_schedule_oneshot(ctx, vhost_user_blk_chr_closed_bh, opaque);
> > > - }
> > > - /*
> > > - * Move vhost device to the stopped state. The vhost-user device
> > > - * will be clean up and disconnected in BH. This can be useful in
> > > - * the vhost migration code. If disconnect was caught there is an
> > > - * option for the general vhost code to get the dev state without
> > > - * knowing its type (in this case vhost-user).
> > > - */
> > > - s->dev.started = false;
> > > + /*
> > > + * Move vhost device to the stopped state. The vhost-user device
> > > + * will be clean up and disconnected in BH. This can be useful in
> > > + * the vhost migration code. If disconnect was caught there is an
> > > + * option for the general vhost code to get the dev state without
> > > + * knowing its type (in this case vhost-user).
> > > + */
> > > + s->dev.started = false;
> > > + } else {
> > > + vhost_user_blk_disconnect(dev);
> > > + }
> > > break;
> > > case CHR_EVENT_BREAK:
> > > case CHR_EVENT_MUX_IN:
> > > --
> > > 2.25.1
> > >
next prev parent reply other threads:[~2021-04-21 20:02 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-25 15:12 [PATCH v3 0/3] vhost-user-blk: fix bug on device disconnection during initialization Denis Plotnikov
2021-03-25 15:12 ` [PATCH v3 1/3] vhost-user-blk: use different event handlers on initialization Denis Plotnikov
2021-03-25 15:12 ` [PATCH v3 2/3] vhost-user-blk: perform immediate cleanup if disconnect " Denis Plotnikov
2021-04-21 15:24 ` Kevin Wolf
2021-04-21 16:13 ` Denis Plotnikov
2021-04-21 19:59 ` Michael S. Tsirkin [this message]
2021-04-22 8:09 ` Denis Plotnikov
2021-04-22 9:31 ` Kevin Wolf
2021-03-25 15:12 ` [PATCH v3 3/3] vhost-user-blk: add immediate cleanup on shutdown Denis Plotnikov
2021-03-29 13:44 ` [PATCH v3 0/3] vhost-user-blk: fix bug on device disconnection during initialization Denis Plotnikov
2021-04-01 9:21 ` [BUG FIX][PATCH " Denis Plotnikov
2021-04-01 19:01 ` Valentin Sinitsyn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210421155929-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=den-plotnikov@yandex-team.ru \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=raphael.norwitz@nutanix.com \
--cc=yc-core@yandex-team.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).