Re: [RFC] vhost: new rte_vhost API proposal

From: "Stojaczyk, DariuszX" <dariuszx.stojaczyk@intel.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	Maxime Coquelin <maxime.coquelin@redhat.com>,
	"Bie, Tiwei" <tiwei.bie@intel.com>,
	"Tetsuya Mukawa" <mtetsuyah@gmail.com>,
	Thomas Monjalon <thomas@monjalon.net>,
	"yliu@fridaylinux.org" <yliu@fridaylinux.org>,
	"Liu, Changpeng" <changpeng.liu@intel.com>
Subject: Re: [RFC] vhost: new rte_vhost API proposal
Date: Fri, 18 May 2018 07:51:51 +0000	[thread overview]
Message-ID: <FBE7E039FA50BF47A673AD0BD3CD56A8449A2C91@HASMSX105.ger.corp.intel.com> (raw)
In-Reply-To: <20180511100531.GA19894@stefanha-x1.localdomain>

> -----Original Message-----
> From: Stefan Hajnoczi [mailto:stefanha@redhat.com]
> Sent: Friday, May 11, 2018 6:06 PM
> On Fri, May 11, 2018 at 05:55:45AM +0000, Stojaczyk, DariuszX wrote:
> > > -----Original Message-----
> > > From: Stefan Hajnoczi [mailto:stefanha@redhat.com]
> > > Sent: Friday, May 11, 2018 12:37 AM
> > > On Thu, May 10, 2018 at 03:22:53PM +0200, Dariusz Stojaczyk wrote:
> > > > rte_virtio would offer both vhost and virtio driver APIs. These two
> > > > have a lot of common code for vhost-user handling or PCI access for
> > > > initiator/virtio-vhost-user (and possibly vDPA) so there's little
> > > > sense to keep target and initiator code separated between different
> > > > libs. Of course, the APIs would be separate - only some parts of
> > > > the code would be shared.
> > >
> > > The API below seems to be for vhost backends (aka slaves).
> rte_virtio_*
> > > is a misnomer because vhost and virtio are two different things.  This
> > > is not for implementing virtio devices, it's specifically for vhost
> > > devices.
> >
> > I agree it's named a bit off if we're talking about vhost. My idea was to
> introduce a generic library for userspace Virtio processing and that's
> where the name came from. Even when you use just the vhost API that's
> introduced here, you are only required to implement vring processing,
> config access, and possibly interrupt handling, all of which are typical
> Virtio things. The vhost logic is hidden inside.
> 
> No, the vhost logic is not hidden: there is custom_msg() and the whole
> tgt_ops struct is an abstraction of the vhost protocol, not virtio.
> 
> It sounds like you're hoping to create a single API that can support
> both vhost and virtio access.  For example, one "net" device backend
> implementation using rte_virtio can be accessed via vhost or virtio.
> 
> This won't work because vhost and virtio are not equivalent.  vhost-net
> devices don't implement the virtio-net config space and they only have a
> subset of the virtqueues.  vhost-net devices support special vhost
> messages that don't exist in virtio-net.
> 
> Additionally, the virtio and vhost-user specifications are independent
> and make no promise of a 1:1 mapping.  They have the freedom to
> change
> in ways which will break any abstraction you come up with today.
> 
> I hope it will be possible to unify the two in the future, but that
> needs to happen at the spec level first, before trying to unify them in
> code.
> 
> This is why I'm belaboring the point that vhost should not be confused
> with virtio.  Each needs to be separate and clearly identified to avoid
> confusion.
> 	

Ok, I'm convinced now. Thanks for the explanation. I'll name the lib rte_vhost2 in v2.

> >
> > >
> > > Vhost does not offer the full virtio device model - otherwise it would
> > > be just another transport in the VIRTIO specification.  Instead vhost is
> > > a protocol for vhost devices, which are subsets of virtio devices.
> > >
> > > I suggest calling it rte_vhost2 since it's basically a new, incompatible
> > > rte_vhost API.
> >
> > Rte_vhost2 sounds better for what we have now, but would that name
> be still valid once we add a true Virtio driver functionality? (I believe it's
> called Virtio PMD in DPDK right now). That driver would reuse a lot of the
> vhost code for PCI and vhost-user, so it makes some sense to put these
> two together.
> >
> > I don't think rte_vhost2 is a permanent name anyway, so maybe we
> could call it like so for now, and rename it later once I introduce that
> additional Virtio functionality? Would that work?
> 
> The natural layering for is that vhost depends on virtio.  Virtio header
> files (feature bits, config space layout, vring layout) and the vring
> API can be reused by vhost.
> 
> Virtio doesn't need knowledge of virtio though and the two can be in
> separate packages without code duplication.
> 
> That said, it doesn't really matter whether there are rte_virtio +
> rte_vhost2 packages or a single rte_virtio package, as long as the
> function and struct names for vhost interfaces contain the name "vhost"
> so they cannot be confused with virtio.
> 
> > > > + * or before actions that require changing vhost device/vq state.
> > > > + */
> > > > + void (*queue_stop)(struct rte_virtio_dev *vdev, struct
> rte_virtio_vq
> > > *vq);
> > > > + /** Device disconnected. All queues are guaranteed to be stopped
> by
> > > now */
> > > > + void (*device_destroy)(struct rte_virtio_dev *vdev);
> > > > + /**
> > > > + * Custom message handler. `vdev` and `vq` can be NULL. This is
> called
> > > > + * for backend-specific actions. The `id` should be prefixed by the
> > >
> > > Since vdev can be NULL, does this mean custom_msg() may be invoked
> at
> > > any time during the lifecycle and even before/after
> > > device_create()/device_destroy()?
> >
> > Theoretically. I was thinking of some poorly-written backends notifying
> they're out of internal resources, but I agree it's just poor. I'll remove the
> `vdev can be NULL` part.
> 
> Okay, I wasn't suggesting it's bad, I just wanted the docs to state at
> which points in the lifecycle this callback can be invoked.
> 
> > > > + */
> > > > + void (*custom_msg)(struct rte_virtio_dev *vdev, struct
> rte_virtio_vq
> > > *vq,
> > > > +   char *id, void *ctx);
> > >
> > > What is the purpose of id and why is it char* instead of const char*?
> >
> > Ack, It should be const. (same thing goes to every other char* in this
> patch)
> >
> > For example vhost-crypto introduces two new vhost-user messages for
> initializing and destroying crypto session. The underlying vhost-crypto
> vhost-user backend after receiving such message could execute this
> callback as follows:
> >
> > struct my_crypto_data *data = calloc();
> > [...]
> > Ops->custom_msg(vdev, NULL, "crypto_sess_init", data);
> 
> So it's necessary to modify rte_virtio vhost code when implementing new
> device backends with custom messages?
> 
> It seems like rte_virtio needs to have knowledge of how to parse any
> custom messages :(.  It would be cleaner for rte_virtio to have no
> knowledge of device-specific messages.
> 
> And how does the device backend reply to custom messages?
>

The library would send proper response after receiving rte_virtio_tgt_cb_complete(). If it needs additional data from the user, there's the `ctx` field in custom_msg that he [the user] can write into.

However, I started to work on the implementation and came to conclusion that it's unnecessarily difficult to implement new Vhost device backends this way. I've changed the custom_msg callback to parse raw Vhost-user messages now. Still, new Vhost-user messages are usually a type of a protocol extension negotiated by a protocol feature flag, and protocol extensions should be implemented inside the lib in my opinion. If a protocol extension changes existing message rather than introduces new one, we'll *need* to implement it inside the lib. 

Both solutions have their good and bad points.
I'm sending v2 in a couple minutes, maybe it'll help us decide which one is better.

> > > > +/**
> > > > + * Unregisters a vhost target asynchronously.
> > >
> > > How are existing device instances affected?
> >
> > Ack. How about:
> > All active queues will be stopped and all devices destroyed.
> >
> > This is analogous to what rte_vhost has now.
> 
> Sounds good.
> 
> Stefan

Regards,
D.