From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: virtio-dev-return-5419-cohuck=redhat.com@lists.oasis-open.org Sender: List-Post: List-Help: List-Unsubscribe: List-Subscribe: Received: from lists.oasis-open.org (oasis.ws5.connectedcommunity.org [10.110.1.242]) by lists.oasis-open.org (Postfix) with ESMTP id C0F92985E09 for ; Tue, 12 Feb 2019 20:15:59 +0000 (UTC) MIME-Version: 1.0 References: <20190204101316.4e3e6rj32suwdmur@sirius.home.kraxel.org> <20190211092943-mutt-send-email-mst@kernel.org> <20190212112547.GC2715@work-vm> <20190212144741.60083682.cohuck@redhat.com> <20190212090121-mutt-send-email-mst@kernel.org> <20190212125701-mutt-send-email-mst@kernel.org> <20190212140706-mutt-send-email-mst@kernel.org> In-Reply-To: <20190212140706-mutt-send-email-mst@kernel.org> From: Frank Yang Date: Tue, 12 Feb 2019 12:15:45 -0800 Message-ID: Content-Type: multipart/alternative; boundary="000000000000d5b1bc0581b817c3" Subject: Re: [virtio-dev] Memory sharing device To: "Michael S. Tsirkin" Cc: Cornelia Huck , "Dr. David Alan Gilbert" , Roman Kiryanov , Gerd Hoffmann , Stefan Hajnoczi , virtio-dev@lists.oasis-open.org, Greg Hartman List-ID: --000000000000d5b1bc0581b817c3 Content-Type: text/plain; charset="UTF-8" On Tue, Feb 12, 2019 at 11:15 AM Michael S. Tsirkin wrote: > On Tue, Feb 12, 2019 at 11:01:21AM -0800, Frank Yang wrote: > > > > > > > > On Tue, Feb 12, 2019 at 10:22 AM Michael S. Tsirkin > wrote: > > > > On Tue, Feb 12, 2019 at 07:56:58AM -0800, Frank Yang wrote: > > > Stepping back to standardization and portability concerns, it is > also not > > > necessarily desirable to use general pipes to do what we want, > because > > even > > > though that device exists and is part of the spec already, that > results > > in > > > _de-facto_ non-portability. > > > > That's not different from e.g. TCP. > > > > > If we had some kind of spec to enumerate such > > > 'user-defined' devices, at least we can have _de-jure_ > non-portability; > > an > > > enumerated device doesn't work as advertised. > > > > I am not sure distinguishing between different types of non > portability > > will be in scope for virtio. Actually having devices that are > portable > > would be. > > > > > > The device itself is portable; the user-defined drivers that run on them > will > > work or not depending on > > negotiating device IDs. > > > > ... > > > > > Note that virtio-serial/virtio-vsock is not considered because > they do > > not > > > standardize the set of devices that operate on top of them, but in > > practice, > > > are often used for fully general devices. Spec-wise, this is not > a great > > > situation because we would still have potentially non portable > device > > > implementations where there is no standard mechanism to determine > whether > > or > > > not things are portable. > > > > Well it's easy to add an enumeration on top of sockets, and several > well > > known solutions exist. There's an advantage to just reusing these. > > > > > > Sure, but there are many unique features/desirable properties of having > the > > virtio meta device > > because (as explained in the spec) there are limitations to > network/socket > > based communication. > > > > > > > virtio-user provides a device enumeration mechanism > > > to better control this. > > > > We'll have to see what it all looks like. For virtio pci transport > it's > > important that you can reason about the device at a basic level > based on > > it's PCI ID, and that is quite fundamental. > > > > > > The spec contains more details; basically the device itself is always > portable, > > and there is a configuration protocol > > to negotiate whether a particular use of the device is available. This is > > similar to PCI, > > but with more defined ways to operate the device in terms of callbacks in > > shared libraries on the host. > > > > > > Maybe what you are looking for is a new virtio transport then? > > > > > > > > Perhaps, something like virtio host memory transport? But > > at the same time, it needs to interact with shared memory which is best > set as > > a PCI device. > > Can we mix transport types? In any case, the analog of "PCI ID"'s here > (the > > vendor/device/version numbers) > > are meaningful, with the contract being that the user of the device > needs to > > match on vendor/device id and > > negotiate on version number. > > Virtio is fundamentally using feature bits not versions. > It's been pretty successful in maintaining compatiblity > across a wide range of hypervisor/guest revisions. > The proposed device should be compatible with all hypervisor/guest revisions as is, without feature bits. > > Wha are the advantages of defining a new virtio transport type? > > it would be something that has the IDs, and be able to handle resolving > offsets > > to > > physical addresses to host memory addresses, > > in addition to dispatching to callbacks on the host. > > But it would be effectively equivalent to having a new virtio device > type with > > device ID enumeration, right? > > Under virtio PCI Device IDs are all defined in virtio spec. > If you want your own ID scheme you want an alternative transport. > But now what you describe looks kind of like vhost pci to me. > > Yet, vhost is still not a good fit, due to reliance on sockets/network functionality. It looks like we want to build something that has aspects in common with other virtio devices, but there is no combination of existing virtio devices that works. - the locality of virtio-fs, but not specialized to fuse - the device id enumeration from virtio pci, but not in the pci id space - the general communication of virtio-vsock, but not specialized to sockets and allowing host memory-based transport So, I still think it should be a new device (or transport if that works better). > > > > > > > In addition, for performance considerations in applications such as > > graphics > > > and media, virtio-serial/virtio-vsock have the overhead of sending > actual > > > traffic through the virtqueue, while an approach based on shared > memory > > can > > > result in having fewer copies and virtqueue messages. > virtio-serial is > > also > > > limited in being specialized for console forwarding and having a > cap on > > the > > > number of clients. virtio-vsock is also not optimal in its choice > of > > sockets > > > API for transport; shared memory cannot be used, arbitrary strings > can be > > > passed without an designation of the device/driver being run > de-facto, > > and the > > > guest must have additional machinery to handle socket APIs. In > addition, > > on > > > the host, sockets are only dependable on Linux, with less > predictable > > behavior > > > from Windows/macOS regarding Unix sockets. Waiting for socket > traffic on > > the > > > host also requires a poll() loop, which is suboptimal for > latency. With > > > virtio-user, only the bare set of standard driver calls > > > (open/close/ioctl/mmap/read) is needed, and RAM is a more universal > > transport > > > abstraction. We also explicitly spec out callbacks on host that > are > > triggered > > > by virtqueue messages, which results in lower latency and makes it > easy > > to > > > dispatch to a particular device implementation without polling. > > > > open/close/mmap/read seem to make sense. ioctl gives one pause. > > > > > > ioctl would be to send ping messages, but I'm not fixated on that > choice. write > > () is also a possibility to send ping messages; I preferred ioctl() > because it > > should be clear that it's a control message not a data message. > > Yes if ioctls supported are white-listed and not blindly passed through > (e.g. send a ping message), then it does not matter. > > There would be one whitelisted ioctl: IOCTL_PING, with struct close to what is specified in the proposed spec, with the instance handle field populated by the kernel. > > > > > Given open/close this begins to look a bit like virtio-fs. > > Have you looked at that? > > > > > > > > That's an interesting possibility since virtio-fs maps host pointers as > well, > > which fits our use cases. > > Another alternative is to add the features unique about virtio-user to > > virtio-fs: > > device enumeration, memory sharing operations, operation in terms of > callbacks > > on the host. > > However, it doesn't seem like a good fit due to being specialized to > filesystem > > operations. > > Well everything is a file :) > > Not according to virtio-fs; it specializes in fuse support in the guest kernel. However, it would work for me if the unique features about virtio-user were added to it, and dropping the requirement to use fuse. > > > > > > -- > > MST > > > --000000000000d5b1bc0581b817c3 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Tue, Feb 12, 2019 at 11:15 AM Mich= ael S. Tsirkin <mst@redhat.com>= wrote:
On Tue, = Feb 12, 2019 at 11:01:21AM -0800, Frank Yang wrote:
>
>
>
> On Tue, Feb 12, 2019 at 10:22 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
>=C2=A0 =C2=A0 =C2=A0On Tue, Feb 12, 2019 at 07:56:58AM -0800, Frank Yan= g wrote:
>=C2=A0 =C2=A0 =C2=A0> Stepping back to standardization and portabili= ty concerns, it is also not
>=C2=A0 =C2=A0 =C2=A0> necessarily desirable to use general pipes to = do what we want, because
>=C2=A0 =C2=A0 =C2=A0even
>=C2=A0 =C2=A0 =C2=A0> though that device exists and is part of the s= pec already, that results
>=C2=A0 =C2=A0 =C2=A0in
>=C2=A0 =C2=A0 =C2=A0> _de-facto_ non-portability.
>
>=C2=A0 =C2=A0 =C2=A0That's not different from e.g. TCP.
>
>=C2=A0 =C2=A0 =C2=A0> If we had some kind of spec to enumerate such<= br> >=C2=A0 =C2=A0 =C2=A0> 'user-defined' devices, at least we ca= n have _de-jure_ non-portability;
>=C2=A0 =C2=A0 =C2=A0an
>=C2=A0 =C2=A0 =C2=A0> enumerated device doesn't work as advertis= ed.
>
>=C2=A0 =C2=A0 =C2=A0I am not sure distinguishing between different type= s of non portability
>=C2=A0 =C2=A0 =C2=A0will be in scope for virtio. Actually having device= s that are portable
>=C2=A0 =C2=A0 =C2=A0would be.
>
>
> The device itself is portable; the user-defined drivers that run on th= em will
> work or not depending on
> negotiating device IDs.
>
>=C2=A0 =C2=A0 =C2=A0...=C2=A0
>
>=C2=A0 =C2=A0 =C2=A0> Note that virtio-serial/virtio-vsock is not co= nsidered because they do
>=C2=A0 =C2=A0 =C2=A0not
>=C2=A0 =C2=A0 =C2=A0> standardize the set of devices that operate on= top of them, but in
>=C2=A0 =C2=A0 =C2=A0practice,
>=C2=A0 =C2=A0 =C2=A0> are often used for fully general devices.=C2= =A0 Spec-wise, this is not a great
>=C2=A0 =C2=A0 =C2=A0> situation because we would still have potentia= lly non portable device
>=C2=A0 =C2=A0 =C2=A0> implementations where there is no standard mec= hanism to determine whether
>=C2=A0 =C2=A0 =C2=A0or
>=C2=A0 =C2=A0 =C2=A0> not things are portable.
>
>=C2=A0 =C2=A0 =C2=A0Well it's easy to add an enumeration on top of = sockets, and several well
>=C2=A0 =C2=A0 =C2=A0known solutions exist. There's an advantage to = just reusing these.=C2=A0=C2=A0
>
>
> Sure, but there are many unique features/desirable properties of havin= g the
> virtio meta device
> because (as explained in the spec) there are limitations to network/so= cket
> based communication.
> =C2=A0
>
>=C2=A0 =C2=A0 =C2=A0>=C2=A0virtio-user provides a device enumeration= mechanism
>=C2=A0 =C2=A0 =C2=A0> to better control this.
>
>=C2=A0 =C2=A0 =C2=A0We'll have to see what it all looks like. For v= irtio pci transport it's
>=C2=A0 =C2=A0 =C2=A0important that you can reason about the device at a= basic level based on
>=C2=A0 =C2=A0 =C2=A0it's PCI ID, and that is quite fundamental.
>
>
> The spec contains more details; basically the device itself is always = portable,
> and there is a configuration protocol
> to negotiate whether a particular use of the device is available. This= is
> similar to PCI,
> but with more defined ways to operate the device in terms of callbacks= in
> shared libraries on the host.
> =C2=A0
>
>=C2=A0 =C2=A0 =C2=A0Maybe what you are looking for is a new virtio tran= sport then?
>
>
> =C2=A0
> Perhaps, something like virtio host memory transport? But
> at the same time, it needs to interact with shared memory which is bes= t set as
> a PCI device.
> Can we mix transport types? In any case, the analog of "PCI ID&qu= ot;'s here (the
> vendor/device/version numbers)
> are meaningful, with the contract being that the user of the device ne= eds to
> match on vendor/device id and
> negotiate on version number.

Virtio is fundamentally using feature bits not versions.
It's been pretty successful in maintaining compatiblity
across a wide range of hypervisor/guest revisions.=C2=A0

The proposed device should be compatible with all hypervi= sor/guest revisions as is,
without feature bits.
=C2=A0=
> Wha are the advantages of defining a new virtio transport type?
> it would be something that has the IDs, and be able to handle resolvin= g offsets
> to
> physical addresses to host memory addresses,
> in addition to dispatching to callbacks on the host.
> But it would be effectively equivalent to having a new virtio device t= ype with
> device ID enumeration, right?

Under virtio PCI Device IDs are all defined in virtio spec.
If you want your own ID scheme you want an alternative transport.
But now what you describe looks kind of like vhost pci to me.

Yet, vhost is still not a good fit, due to reliance o= n sockets/network functionality.
It looks like we want to build s= omething that has aspects in common with other virtio devices,
bu= t there is no combination of existing virtio devices that works.
=
  • the locality of virtio-fs, but not specialized to fuse
  • the = device id enumeration from virtio pci, but not in the pci id space
  • = the general communication of virtio-vsock, but not specialized to sockets a= nd allowing host memory-based transport
So, I still think it shoul= d be a new device (or transport if that works better).

=

>
>
>=C2=A0 =C2=A0 =C2=A0> In addition, for performance considerations in= applications such as
>=C2=A0 =C2=A0 =C2=A0graphics
>=C2=A0 =C2=A0 =C2=A0> and media, virtio-serial/virtio-vsock have the= overhead of sending actual
>=C2=A0 =C2=A0 =C2=A0> traffic through the virtqueue, while an approa= ch based on shared memory
>=C2=A0 =C2=A0 =C2=A0can
>=C2=A0 =C2=A0 =C2=A0> result in having fewer copies and virtqueue me= ssages.=C2=A0 virtio-serial is
>=C2=A0 =C2=A0 =C2=A0also
>=C2=A0 =C2=A0 =C2=A0> limited in being specialized for console forwa= rding and having a cap on
>=C2=A0 =C2=A0 =C2=A0the
>=C2=A0 =C2=A0 =C2=A0> number of clients.=C2=A0 virtio-vsock is also = not optimal in its choice of
>=C2=A0 =C2=A0 =C2=A0sockets
>=C2=A0 =C2=A0 =C2=A0> API for transport; shared memory cannot be use= d, arbitrary strings can be
>=C2=A0 =C2=A0 =C2=A0> passed without an designation of the device/dr= iver being run de-facto,
>=C2=A0 =C2=A0 =C2=A0and the
>=C2=A0 =C2=A0 =C2=A0> guest must have additional machinery to handle= socket APIs.=C2=A0 In addition,
>=C2=A0 =C2=A0 =C2=A0on
>=C2=A0 =C2=A0 =C2=A0> the host, sockets are only dependable on Linux= , with less predictable
>=C2=A0 =C2=A0 =C2=A0behavior
>=C2=A0 =C2=A0 =C2=A0> from Windows/macOS regarding Unix sockets.=C2= =A0 Waiting for socket traffic on
>=C2=A0 =C2=A0 =C2=A0the
>=C2=A0 =C2=A0 =C2=A0> host also requires a poll() loop, which is sub= optimal for latency.=C2=A0 With
>=C2=A0 =C2=A0 =C2=A0> virtio-user, only the bare set of standard dri= ver calls
>=C2=A0 =C2=A0 =C2=A0> (open/close/ioctl/mmap/read) is needed, and RA= M is a more universal
>=C2=A0 =C2=A0 =C2=A0transport
>=C2=A0 =C2=A0 =C2=A0> abstraction.=C2=A0 We also explicitly spec out= callbacks on host that are
>=C2=A0 =C2=A0 =C2=A0triggered
>=C2=A0 =C2=A0 =C2=A0> by virtqueue messages, which results in lower = latency and makes it easy
>=C2=A0 =C2=A0 =C2=A0to
>=C2=A0 =C2=A0 =C2=A0> dispatch to a particular device implementation= without polling.
>
>=C2=A0 =C2=A0 =C2=A0open/close/mmap/read seem to make sense. ioctl give= s one pause.
>
>
> ioctl would be to send ping messages, but I'm not fixated on that = choice. write
> () is also a possibility to send ping messages; I preferred ioctl() be= cause it
> should be clear that it's a control message not a data message.
Yes if ioctls supported are white-listed and not blindly passed through
(e.g. send a ping message), then it does not matter.


There would be one whitelisted ioctl: = IOCTL_PING, with struct close to what is specified in the proposed spec,
with the instance handle field populated by the kernel.
= =C2=A0

>
>=C2=A0 =C2=A0 =C2=A0Given open/close this begins to look a bit like vir= tio-fs.
>=C2=A0 =C2=A0 =C2=A0Have you looked at that?
>
>
> =C2=A0
> That's an interesting possibility since virtio-fs maps host pointe= rs as well,
> which fits our use cases.
> Another alternative is to add the features unique about virtio-user to=
> virtio-fs:
> device enumeration, memory sharing operations, operation in terms of c= allbacks
> on the host.
> However, it doesn't seem like a good fit due to being specialized = to filesystem
> operations.

Well everything is a file :)

Not according to virtio-fs; it specializes in fuse su= pport in the guest kernel.
However, it would work for me if the u= nique features about virtio-user were added to it,
and dropping t= he requirement to use fuse.
>
>
>=C2=A0 =C2=A0 =C2=A0--
>=C2=A0 =C2=A0 =C2=A0MST
>
--000000000000d5b1bc0581b817c3--