Re: Re: Re: [PATCH 0/3] Introduce vhost-vdpa block device

From: Yongji Xie <xieyongji@bytedance.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: kwolf@redhat.com, "Michael S. Tsirkin" <mst@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	qemu-devel@nongnu.org,
	Raphael Norwitz <raphael.norwitz@nutanix.com>,
	"Zhu, Lingshan" <lingshan.zhu@intel.com>,
	mreitz@redhat.com, changpeng.liu@intel.com,
	Stefano Garzarella <sgarzare@redhat.com>
Subject: Re: Re: Re: [PATCH 0/3] Introduce vhost-vdpa block device
Date: Wed, 28 Apr 2021 19:27:03 +0800	[thread overview]
Message-ID: <CACycT3sV_Bv63Y5pvYV=e2-B5c3Lky2UWWc4tk=DDN0tKs=6Ng@mail.gmail.com> (raw)
In-Reply-To: <YIgfh632Evrdid12@stefanha-x1.localdomain>

On Tue, Apr 27, 2021 at 10:28 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Tue, Apr 27, 2021 at 06:24:55PM +0800, Yongji Xie wrote:
> > On Mon, Apr 26, 2021 at 11:34 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > >
> > > On Thu, Apr 08, 2021 at 06:12:49PM +0800, Xie Yongji wrote:
> > > > Since we already have some ways to emulate vDPA block device
> > > > in kernel[1] or userspace[2]. This series tries to introduce a
> > > > new vhost-vdpa block device for that. To use it, we can add
> > > > something like:
> > > >
> > > > qemu-system-x86_64 \
> > > >     -device vhost-vdpa-blk-pci,vdpa-dev=/dev/vhost-vdpa-0
> > >
> > > This device is similar to vhost-user-blk. QEMU does not see it as a
> > > block device so storage migration, I/O throttling, image formats, etc
> > > are not supported. Stefano Garzarella and I discussed how vdpa-blk
> > > devices could integrate more deeply with QEMU's block layer. The QEMU
> > > block layer could be enabled only when necessary and otherwise bypassed
> > > for maximum performance.
> > >
> >
> > Do you mean we can make use of the shadow virtqueue for the slow path
> > (I/O will go through the block layer) and add a fast path (like what
> > we do now) to bypass the block layer?
>
> Yes.
>
> > > This alternative approach is similar to how vhost-net is implemented in
> > > QEMU. A BlockDriver would handle the vdpa-blk device and the regular
> > > virtio-blk-pci device would still be present. The virtqueues could be
> > > delegated to the vdpa-blk device in order to bypass the QEMU block
> > > layer.
> > >
> > > I wanted to mention this since it's likely that this kind of vdpa-blk
> > > device implementation will be posted in the future and you might be
> > > interested. It makes live migration with non-shared storage possible,
> > > for example.
> > >
> >
> > That would be nice, I'm looking forward to it!
> >
> > So do you think whether it's necessary to continue this approach?
> > Looks like we don't need a vhost-vdpa-blk device any more in the new
> > approach.
>
> There is no code for the vdpa-blk BlockDriver yet and the implementation
> will be more complex than this patch series (more risk the feature could
> be delayed).
>
> If you need vdpa-blk as soon as possible then this patch series may be a
> pragmatic solution. As long as it doesn't limit the vdpa-blk interface
> in a way that prevents the BlockDriver implementation then I think QEMU
> could support both.
>
> In the long term the vdpa-blk BlockDriver could replace -device
> vdpa-blk-pci since the performance should be identical and it supports
> more use cases.
>
> It's up to you. You're welcome to wait for the BlockDriver, work
> together with Stefano on the BlockDriver, or continue with your patch
> series.
>

I like the new idea! Let me wait for it.

> > > An issue with vhost-user-blk is that the ownership of qdev properties
> > > and VIRTIO Configuration Space fields was not clearly defined. Some
> > > things are handled by QEMU's vhost-user-blk code, some things are
> > > handled by the vhost-user device backend, and some things are negotiated
> > > between both entities. This patch series follows the existing
> > > vhost-user-blk approach, which I think will show serious issues as the
> > > device is more widely used and whenever virtio-blk or the implementation
> > > is extended with new features. It is very hard to provide backwards
> > > compatibility with the current approach where the ownership of qdev
> > > properties and VIRTIO Configuration Space fields is ad-hoc and largely
> > > undefined.
> > >
> > > Since vDPA has VIRTIO Configuration Space APIs, I suggest that the
> > > vhost-vDPA device controls the entire configuration space. QEMU should
> > > simply forward accesses between the guest and vhost-vdpa.
> > >
> >
> > Does this already achieved by vhost_ops->vhost_get_config？ And I want
> > to know how to handle the endianness issue if qemu just simply does
> > forwarding and doesn't care about the content of config space.
>
> QEMU just copies bytes betwen the driver and the device via
> vdpa_config_ops->get/set_config(). I don't think it needs to worry about
> endianness in VIRTIO Configuration Space access code or did I miss
> something?
>
> My understanding is that vDPA currently supports same-endian Legacy and
> little-endian Modern devices. Cross-endian Legacy devices are currently
> not supported because there is no API to set endianness.
>

OK, I get you.

> If such an API is added in the future, then QEMU can use it to tell the
> vDPA device about guest endianness. QEMU still won't need to modify
> Configuration Space data itself.
>
> > > Regarding qdev properties, it's a little trickier because QEMU needs to
> > > do the emulated VIRTIO device setup (allocating virtqueues, setting
> > > their sizes, etc). Can QEMU query this information from the vDPA device?
> > > If not, which qdev properties are read-only and must match the
> > > configuration of the vDPA device and which are read-write and can
> > > control the vDPA device?
> > >
> >
> > Yes, that's an issue. We have to make sure the number of virtqueues
> > and their size set by qemu is not greater than hardware limitation.
> > Now I think we can query the max queue size, but looks like we don't
> > have an interface to query the max number of virtqueues.
>
> Okay, this is something that Jason and Stefano can discuss more. Ideally
> the QEMU command-line does not need to duplicate information about the
> vDPA device. The vdpa management tool
> (https://github.com/shemminger/iproute2/tree/main/vdpa) and VDUSE
> virtio-blk device will probably be where the main device configuration
> happens.
>
> As a starting point, can you describe how your VDUSE virtio-blk device
> is configured? Does it have a command-line/config file/RPC API to set
> the number of virtqueues, block size, etc?
>

Yes, we have a command-line to set those properties, something like:

qemu-storage-daemon \
      --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server,nowait \
      --monitor chardev=charmonitor \
      --blockdev
driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0
\
      --export type=vduse-blk,id=test,node-name=disk0,writable=on,name=vduse-null,num-queues=16,queue-size=128

Thanks,
Yongji