On Mon, May 04, 2020 at 10:49:11AM -0700, John G Johnson wrote:
> 
> 
> > On May 4, 2020, at 2:45 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > 
> > On Fri, May 01, 2020 at 04:28:25PM +0100, Daniel P. Berrangé wrote:
> >> On Fri, May 01, 2020 at 03:01:01PM +0000, Felipe Franciosi wrote:
> >>> Hi,
> >>> 
> >>>> On Apr 30, 2020, at 4:20 PM, Thanos Makatos <thanos.makatos@nutanix.com> wrote:
> >>>> 
> >>>>>>> More importantly, considering:
> >>>>>>> a) Marc-André's comments about data alignment etc., and
> >>>>>>> b) the possibility to run the server on another guest or host,
> >>>>>>> we won't be able to use native VFIO types. If we do want to support that
> >>>>>>> then
> >>>>>>> we'll have to redefine all data formats, similar to
> >>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
> >>>>>>> 3A__github.com_qemu_qemu_blob_master_docs_interop_vhost-
> >>>>>>> 
> >>>>> 2Duser.rst&d=DwIFAw&c=s883GpUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJvtw6
> >>>>>>> 
> >>>>> ogtti46atk736SI4vgsJiUKIyDE&m=lJC7YeMMsAaVsr99tmTYncQdjEfOXiJQkRkJ
> >>>>>>> W7NMgRg&s=1d_kB7VWQ-
> >>>>> 8d4t6Ikga5KSVwws4vwiVMvTyWVaS6PRU&e= .
> >>>>>>> 
> >>>>>>> So the protocol will be more like an enhanced version of the Vhost-user
> >>>>>>> protocol
> >>>>>>> than VFIO. I'm fine with either direction (VFIO vs. enhanced Vhost-user),
> >>>>>>> so we need to decide before proceeding as the request format is
> >>>>>>> substantially
> >>>>>>> different.
> >>>>>> 
> >>>>>> Regarding the ability to use the protocol on non-AF_UNIX sockets, we can
> >>>>>> support this future use case without unnecessarily complicating the
> >>>>> protocol by
> >>>>>> defining the C structs and stating that data alignment and endianness for
> >>>>> the
> >>>>>> non AF_UNIX case must be the one used by GCC on a x86_64 bit machine,
> >>>>> or can
> >>>>>> be overridden as required.
> >>>>> 
> >>>>> Defining it to be x86_64 semantics is effectively saying "we're not going
> >>>>> to do anything and it is up to other arch maintainers to fix the inevitable
> >>>>> portability problems that arise".
> >>>> 
> >>>> Pretty much.
> >>>> 
> >>>>> Since this is a new protocol should we take the opportunity to model it
> >>>>> explicitly in some common standard RPC protocol language. This would have
> >>>>> the benefit of allowing implementors to use off the shelf APIs for their
> >>>>> wire protocol marshalling, and eliminate questions about endianness and
> >>>>> alignment across architectures.
> >>>> 
> >>>> The problem is that we haven't defined the scope very well. My initial impression 
> >>>> was that we should use the existing VFIO structs and constants, however that's 
> >>>> impossible if we're to support non AF_UNIX. We need consensus on this, we're 
> >>>> open to ideas how to do this.
> >>> 
> >>> Thanos has a point.
> >>> 
> >>> From https://wiki.qemu.org/Features/MultiProcessQEMU, which I believe
> >>> was written by Stefan, I read:
> >>> 
> >>>> Inventing a new device emulation protocol from scratch has many
> >>>> disadvantages. VFIO could be used as the protocol to avoid reinventing
> >>>> the wheel ...
> >>> 
> >>> At the same time, this appears to be incompatible with the (new?)
> >>> requirement of supporting device emulation which may run in non-VFIO
> >>> compliant OSs or even across OSs (ie. via TCP or similar).
> >> 
> >> To be clear, I don't have any opinion on whether we need to support
> >> cross-OS/TCP or not.
> >> 
> >> I'm merely saying that if we do decide to support cross-OS/TCP, then
> >> I think we need a more explicitly modelled protocol, instead of relying
> >> on serialization of C structs.
> >> 
> >> There could be benefits to an explicitly modelled protocol, even for
> >> local only usage, if we want to more easily support non-C languages
> >> doing serialization, but again I don't have a strong opinion on whether
> >> that's neccessary to worry about or not.
> >> 
> >> So I guess largely the question boils down to setting the scope of
> >> what we want to be able to achieve in terms of RPC endpoints.
> > 
> > The protocol relies on both file descriptor and memory mapping. These
> > are hard to achieve with networking.
> > 
> > I think the closest would be using RDMA to accelerate memory access and
> > switching to a network notification mechanism instead of eventfd.
> > 
> > Sooner or later someone will probably try this. I don't think it makes
> > sense to define this transport in detail now if there are no users, but
> > we should try to make it possible to add it in the future, if necessary.
> > 
> > Another use case that is interesting and not yet directly addressed is:
> > how can another VM play the role of the device? This is important in
> > compute cloud environments where everything is a VM and running a
> > process on the host is not possible.
> > 
> 
> 	Cross-VM is not a lot different from networking.  You can’t
> use AF_UNIX; and AF_VSOCK and AF_INET do not support FD passing.
> You’d either have to add FD passing to AF_VSOCK, which will have
> some security issues, or fall back to message passing that will
> degrade performance.

In the approach where vfio-user terminates in the device VMM and the
device guest uses a new virtio-vhost-user style device we can continue
to use AF_UNIX with file descriptor passing on the host. The vfio-user
protocol doesn't need to be extended like it would for AF_VSOCK/AF_INET.

   Driver guest                              Device guest
        ^                                         ^
	| PCI device             virtio-vfio-user |
	v                                         v
    Driver VMM  <---- vfio-user AF_UNIX ----> Device VMM

It does not require changing the vfio-user protocol because the driver
VMM is talking to a regular vfio-user device process that happens to be
the device VMM.

The trick is that the device VMM makes the shared memory accessible as
VIRTIO shared memory regions (already in the VIRTIO spec) and eventfds
as VIRTIO doorbells/interrupts (proposed by not yet added to the VIRTIO
spec). This allows the device guest to directly access these resources
so it can DMA to the driver guest's RAM, inject interrupts, and receive
doorbell notifications.

Stefan