All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yongji Xie <xieyongji@bytedance.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: kwolf@redhat.com, "Michael S. Tsirkin" <mst@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	qemu-devel@nongnu.org,
	Raphael Norwitz <raphael.norwitz@nutanix.com>,
	"Zhu, Lingshan" <lingshan.zhu@intel.com>,
	mreitz@redhat.com, changpeng.liu@intel.com,
	Stefano Garzarella <sgarzare@redhat.com>
Subject: Re: Re: Re: [PATCH 0/3] Introduce vhost-vdpa block device
Date: Wed, 28 Apr 2021 19:27:03 +0800	[thread overview]
Message-ID: <CACycT3sV_Bv63Y5pvYV=e2-B5c3Lky2UWWc4tk=DDN0tKs=6Ng@mail.gmail.com> (raw)
In-Reply-To: <YIgfh632Evrdid12@stefanha-x1.localdomain>

On Tue, Apr 27, 2021 at 10:28 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Tue, Apr 27, 2021 at 06:24:55PM +0800, Yongji Xie wrote:
> > On Mon, Apr 26, 2021 at 11:34 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > >
> > > On Thu, Apr 08, 2021 at 06:12:49PM +0800, Xie Yongji wrote:
> > > > Since we already have some ways to emulate vDPA block device
> > > > in kernel[1] or userspace[2]. This series tries to introduce a
> > > > new vhost-vdpa block device for that. To use it, we can add
> > > > something like:
> > > >
> > > > qemu-system-x86_64 \
> > > >     -device vhost-vdpa-blk-pci,vdpa-dev=/dev/vhost-vdpa-0
> > >
> > > This device is similar to vhost-user-blk. QEMU does not see it as a
> > > block device so storage migration, I/O throttling, image formats, etc
> > > are not supported. Stefano Garzarella and I discussed how vdpa-blk
> > > devices could integrate more deeply with QEMU's block layer. The QEMU
> > > block layer could be enabled only when necessary and otherwise bypassed
> > > for maximum performance.
> > >
> >
> > Do you mean we can make use of the shadow virtqueue for the slow path
> > (I/O will go through the block layer) and add a fast path (like what
> > we do now) to bypass the block layer?
>
> Yes.
>
> > > This alternative approach is similar to how vhost-net is implemented in
> > > QEMU. A BlockDriver would handle the vdpa-blk device and the regular
> > > virtio-blk-pci device would still be present. The virtqueues could be
> > > delegated to the vdpa-blk device in order to bypass the QEMU block
> > > layer.
> > >
> > > I wanted to mention this since it's likely that this kind of vdpa-blk
> > > device implementation will be posted in the future and you might be
> > > interested. It makes live migration with non-shared storage possible,
> > > for example.
> > >
> >
> > That would be nice, I'm looking forward to it!
> >
> > So do you think whether it's necessary to continue this approach?
> > Looks like we don't need a vhost-vdpa-blk device any more in the new
> > approach.
>
> There is no code for the vdpa-blk BlockDriver yet and the implementation
> will be more complex than this patch series (more risk the feature could
> be delayed).
>
> If you need vdpa-blk as soon as possible then this patch series may be a
> pragmatic solution. As long as it doesn't limit the vdpa-blk interface
> in a way that prevents the BlockDriver implementation then I think QEMU
> could support both.
>
> In the long term the vdpa-blk BlockDriver could replace -device
> vdpa-blk-pci since the performance should be identical and it supports
> more use cases.
>
> It's up to you. You're welcome to wait for the BlockDriver, work
> together with Stefano on the BlockDriver, or continue with your patch
> series.
>

I like the new idea! Let me wait for it.

> > > An issue with vhost-user-blk is that the ownership of qdev properties
> > > and VIRTIO Configuration Space fields was not clearly defined. Some
> > > things are handled by QEMU's vhost-user-blk code, some things are
> > > handled by the vhost-user device backend, and some things are negotiated
> > > between both entities. This patch series follows the existing
> > > vhost-user-blk approach, which I think will show serious issues as the
> > > device is more widely used and whenever virtio-blk or the implementation
> > > is extended with new features. It is very hard to provide backwards
> > > compatibility with the current approach where the ownership of qdev
> > > properties and VIRTIO Configuration Space fields is ad-hoc and largely
> > > undefined.
> > >
> > > Since vDPA has VIRTIO Configuration Space APIs, I suggest that the
> > > vhost-vDPA device controls the entire configuration space. QEMU should
> > > simply forward accesses between the guest and vhost-vdpa.
> > >
> >
> > Does this already achieved by vhost_ops->vhost_get_config? And I want
> > to know how to handle the endianness issue if qemu just simply does
> > forwarding and doesn't care about the content of config space.
>
> QEMU just copies bytes betwen the driver and the device via
> vdpa_config_ops->get/set_config(). I don't think it needs to worry about
> endianness in VIRTIO Configuration Space access code or did I miss
> something?
>
> My understanding is that vDPA currently supports same-endian Legacy and
> little-endian Modern devices. Cross-endian Legacy devices are currently
> not supported because there is no API to set endianness.
>

OK, I get you.

> If such an API is added in the future, then QEMU can use it to tell the
> vDPA device about guest endianness. QEMU still won't need to modify
> Configuration Space data itself.
>
> > > Regarding qdev properties, it's a little trickier because QEMU needs to
> > > do the emulated VIRTIO device setup (allocating virtqueues, setting
> > > their sizes, etc). Can QEMU query this information from the vDPA device?
> > > If not, which qdev properties are read-only and must match the
> > > configuration of the vDPA device and which are read-write and can
> > > control the vDPA device?
> > >
> >
> > Yes, that's an issue. We have to make sure the number of virtqueues
> > and their size set by qemu is not greater than hardware limitation.
> > Now I think we can query the max queue size, but looks like we don't
> > have an interface to query the max number of virtqueues.
>
> Okay, this is something that Jason and Stefano can discuss more. Ideally
> the QEMU command-line does not need to duplicate information about the
> vDPA device. The vdpa management tool
> (https://github.com/shemminger/iproute2/tree/main/vdpa) and VDUSE
> virtio-blk device will probably be where the main device configuration
> happens.
>
> As a starting point, can you describe how your VDUSE virtio-blk device
> is configured? Does it have a command-line/config file/RPC API to set
> the number of virtqueues, block size, etc?
>

Yes, we have a command-line to set those properties, something like:

qemu-storage-daemon \
      --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server,nowait \
      --monitor chardev=charmonitor \
      --blockdev
driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0
\
      --export type=vduse-blk,id=test,node-name=disk0,writable=on,name=vduse-null,num-queues=16,queue-size=128

Thanks,
Yongji


  reply	other threads:[~2021-04-28 11:28 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-08 10:12 [PATCH 0/3] Introduce vhost-vdpa block device Xie Yongji
2021-04-08 10:12 ` [PATCH 1/3] vhost-vdpa: Remove redundant declaration of address_space_memory Xie Yongji
2021-04-08 10:19   ` Philippe Mathieu-Daudé
2021-04-08 10:12 ` [PATCH 2/3] vhost-blk: Add vhost-blk-common abstraction Xie Yongji
2021-04-08 23:21   ` Raphael Norwitz
2021-04-09  2:38     ` Yongji Xie
2021-04-26 14:49   ` Stefan Hajnoczi
2021-04-27 10:26     ` Yongji Xie
2021-04-08 10:12 ` [PATCH 3/3] vhost-vdpa-blk: Introduce vhost-vdpa-blk host device Xie Yongji
2021-04-09  6:02   ` Jason Wang
2021-04-09  8:17     ` Yongji Xie
2021-04-12  7:14       ` Jason Wang
2021-04-12  7:51         ` Yongji Xie
2021-04-26 15:05   ` Stefan Hajnoczi
2021-04-27 10:33     ` Yongji Xie
2021-04-26 15:33 ` [PATCH 0/3] Introduce vhost-vdpa block device Stefan Hajnoczi
2021-04-27 10:24   ` Yongji Xie
2021-04-27 14:28     ` Stefan Hajnoczi
2021-04-28 11:27       ` Yongji Xie [this message]
2021-04-28 12:21         ` Stefano Garzarella

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACycT3sV_Bv63Y5pvYV=e2-B5c3Lky2UWWc4tk=DDN0tKs=6Ng@mail.gmail.com' \
    --to=xieyongji@bytedance.com \
    --cc=changpeng.liu@intel.com \
    --cc=jasowang@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=lingshan.zhu@intel.com \
    --cc=mreitz@redhat.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=raphael.norwitz@nutanix.com \
    --cc=sgarzare@redhat.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.