Re: [PATCH v3 2/3] vhost-user-blk: perform immediate cleanup if disconnect on initialization

From: Kevin Wolf <kwolf@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org, mreitz@redhat.com,
	Denis Plotnikov <den-plotnikov@yandex-team.ru>,
	yc-core@yandex-team.ru, raphael.norwitz@nutanix.com
Subject: Re: [PATCH v3 2/3] vhost-user-blk: perform immediate cleanup if disconnect on initialization
Date: Thu, 22 Apr 2021 11:31:55 +0200	[thread overview]
Message-ID: <YIFCi1vJR4jpMJ1r@merkur.fritz.box> (raw)
In-Reply-To: <20210421155929-mutt-send-email-mst@kernel.org>

Am 21.04.2021 um 21:59 hat Michael S. Tsirkin geschrieben:
> On Wed, Apr 21, 2021 at 07:13:24PM +0300, Denis Plotnikov wrote:
> > 
> > On 21.04.2021 18:24, Kevin Wolf wrote:
> > > Am 25.03.2021 um 16:12 hat Denis Plotnikov geschrieben:
> > > > Commit 4bcad76f4c39 ("vhost-user-blk: delay vhost_user_blk_disconnect")
> > > > introduced postponing vhost_dev cleanup aiming to eliminate qemu aborts
> > > > because of connection problems with vhost-blk daemon.
> > > > 
> > > > However, it introdues a new problem. Now, any communication errors
> > > > during execution of vhost_dev_init() called by vhost_user_blk_device_realize()
> > > > lead to qemu abort on assert in vhost_dev_get_config().
> > > > 
> > > > This happens because vhost_user_blk_disconnect() is postponed but
> > > > it should have dropped s->connected flag by the time
> > > > vhost_user_blk_device_realize() performs a new connection opening.
> > > > On the connection opening, vhost_dev initialization in
> > > > vhost_user_blk_connect() relies on s->connection flag and
> > > > if it's not dropped, it skips vhost_dev initialization and returns
> > > > with success. Then, vhost_user_blk_device_realize()'s execution flow
> > > > goes to vhost_dev_get_config() where it's aborted on the assert.
> > > > 
> > > > To fix the problem this patch adds immediate cleanup on device
> > > > initialization(in vhost_user_blk_device_realize()) using different
> > > > event handlers for initialization and operation introduced in the
> > > > previous patch.
> > > > On initialization (in vhost_user_blk_device_realize()) we fully
> > > > control the initialization process. At that point, nobody can use the
> > > > device since it isn't initialized and we don't need to postpone any
> > > > cleanups, so we can do cleaup right away when there is a communication
> > > > problem with the vhost-blk daemon.
> > > > On operation we leave it as is, since the disconnect may happen when
> > > > the device is in use, so the device users may want to use vhost_dev's data
> > > > to do rollback before vhost_dev is re-initialized (e.g. in vhost_dev_set_log()).
> > > > 
> > > > Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
> > > > Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
> > > I think there is something wrong with this patch.
> > > 
> > > I'm debugging an error case, specifically num-queues being larger in
> > > QEMU that in the vhost-user-blk export. Before this patch, it has just
> > > an unfriendly error message:
> > > 
> > > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=vhost1,id=blk1,iommu_platform=off,disable-legacy=on,num-queues=4: Unexpected end-of-file before all data were read
> > > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=vhost1,id=blk1,iommu_platform=off,disable-legacy=on,num-queues=4: Failed to read msg header. Read 0 instead of 12. Original request 24.
> > > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=vhost1,id=blk1,iommu_platform=off,disable-legacy=on,num-queues=4: vhost-user-blk: get block config failed
> > > qemu-system-x86_64: Failed to set msg fds.
> > > qemu-system-x86_64: vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11)
> > > 
> > > After the patch, it crashes:
> > > 
> > > #0  0x0000555555d0a4bd in vhost_user_read_cb (source=0x5555568f4690, condition=(G_IO_IN | G_IO_HUP), opaque=0x7fffffffcbf0) at ../hw/virtio/vhost-user.c:313
> > > #1  0x0000555555d950d3 in qio_channel_fd_source_dispatch (source=0x555557c3f750, callback=0x555555d0a478 <vhost_user_read_cb>, user_data=0x7fffffffcbf0) at ../io/channel-watch.c:84
> > > #2  0x00007ffff7b32a9f in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
> > > #3  0x00007ffff7b84a98 in g_main_context_iterate.constprop () at /lib64/libglib-2.0.so.0
> > > #4  0x00007ffff7b32163 in g_main_loop_run () at /lib64/libglib-2.0.so.0
> > > #5  0x0000555555d0a724 in vhost_user_read (dev=0x555557bc62f8, msg=0x7fffffffcc50) at ../hw/virtio/vhost-user.c:402
> > > #6  0x0000555555d0ee6b in vhost_user_get_config (dev=0x555557bc62f8, config=0x555557bc62ac "", config_len=60) at ../hw/virtio/vhost-user.c:2133
> > > #7  0x0000555555d56d46 in vhost_dev_get_config (hdev=0x555557bc62f8, config=0x555557bc62ac "", config_len=60) at ../hw/virtio/vhost.c:1566
> > > #8  0x0000555555cdd150 in vhost_user_blk_device_realize (dev=0x555557bc60b0, errp=0x7fffffffcf90) at ../hw/block/vhost-user-blk.c:510
> > > #9  0x0000555555d08f6d in virtio_device_realize (dev=0x555557bc60b0, errp=0x7fffffffcff0) at ../hw/virtio/virtio.c:3660
> > > 
> > > The problem is that vhost_user_read_cb() still accesses dev->opaque even
> > > though the device has been cleaned up meanwhile when the connection was
> > > closed (the vhost_user_blk_disconnect() added by this patch), so it's
> > > NULL now. This problem was actually mentioned in the comment that is
> > > removed by this patch.
> > > 
> > > I tried to fix this by making vhost_user_read() cope with the fact that
> > > the device might have been cleaned up meanwhile, but then I'm running
> > > into the next set of problems.
> > > 
> > > The first is that retrying is pointless, the error condition is in the
> > > configuration, it will never change.
> > > 
> > > The other is that after many repetitions of the same error message, I
> > > got a crash where the device is cleaned up a second time in
> > > vhost_dev_init() and the virtqueues are already NULL.
> > > 
> > > So it seems to me that erroring out during the initialisation phase
> > > makes a lot more sense than retrying.
> > > 
> > > Kevin
> > 
> > But without the patch there will be another problem which the patch actually
> > addresses.
> > 
> > It seems to me that there is a case when the retrying is useless and this is
> > exactly your case -- we'll never get a proper configuration.
> > 
> > What if we get rid of the re-connection and give the only try to realize the
> > device? Than we don't need case separating for initialization and operation
> > of device, correct? But I don't familiar with the cases where the reconnect
> > is needed? Do you know something it?
> 
> Reconnect is for when server is restarted while we are talking to it.

Sure, and that makes sense once the device has actually been realized.
But do we really have a use case for reconnecting during initialisation?
Commit 77542d43149 seems to just have added both at once, without
explaining in the commit message what the reason for this decision is.

The problem here is that we can't seem to tell what is a fatal error and
what is just a server restart. Fatal errors just seem much more likely
during device initialisation and turning them into endless loops in
vhost_user_blk_device_realize() is probably not a great idea.

Either way, we have a use-after-free in the error case and that needs
fixing once we have decided on the intended behaviour.

(The real fix for my num-queues error is of course checking the number
explicitly instead of making invalid requests, and this would hide the
bug we're discussing here, but I assume that more error cases like this
exist.)

Unrelated to the bug, I also wonder why the reconnection logic was
implemented specifically in vhost-user-blk. Isn't this something that
should apply to all vhost-user devices? Possibly with some callbacks for
device specific logic?

Kevin