Outline for VHOST_USER_PROTOCOL_F_VDPA

* Outline for VHOST_USER_PROTOCOL_F_VDPA
@ 2020-09-28  9:25 Stefan Hajnoczi
  2020-09-28 11:21 ` Marc-André Lureau
  2020-09-29  6:09 ` Michael S. Tsirkin
  0 siblings, 2 replies; 15+ messages in thread
From: Stefan Hajnoczi @ 2020-09-28  9:25 UTC (permalink / raw)
  To: maxime.coquelin, jasowang, lulu, Michael S. Tsirkin, tiwei.bie,
	changpeng.liu, raphael.norwitz, Felipe Franciosi,
	marcandre.lureau, kraxel, Nikos Dragazis, Daniele Buono
  Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 11305 bytes --]

Hi,
Thanks for the positive responses to the initial discussion about
introducing VHOST_USER_PROTOCOL_F_VDPA to use vDPA semantics and bring
the full VIRTIO device model to vhost-user:
https://lists.gnu.org/archive/html/qemu-devel/2020-08/msg05181.html

Below is an inlined version of the more detailed explanation I posted on
my blog yesterday[1]. Email is better for discussion.

If anyone wants to tackle this, please reply so others can stay
up-to-date.

Stefan

[1] http://blog.vmsplice.net/2020/09/on-unifying-vhost-user-and-virtio.html
---
The recent development of a Linux driver framework called VIRTIO Data
Path Acceleration (vDPA) has laid the groundwork for exciting new
vhost-user features. The implications of vDPA have not yet rippled
through the community so I want to share my thoughts on how the
vhost-user protocol can take advantage of new ideas from vDPA.

This post is aimed at developers and assumes familiarity with the
vhost-user protocol and VIRTIO. No knowledge of vDPA is required.

vDPA helps with writing drivers for hardware that supports VIRTIO
offload. Its goal is to enable hybrid hardware/software VIRTIO devices,
but as a nice side-effect it has overcome limitations in the kernel
vhost interface. It turns out that applying ideas from vDPA to the
vhost-user protocol solves the same issues there. In this article I'll
show how extending the vhost-user protocol with vDPA has the following
benefits:

* Allows existing VIRTIO device emulation code to act as a vhost-user
  device backend.
* Removes the need for shim devices in the virtual machine monitor (VMM).
* Replaces undocumented conventions with a well-defined device model.

These things can be done while reusing existing vhost-user and VIRTIO
code. In fact, this is especially good news for existing codebases like
QEMU because they already have a wealth of vhost-user and VIRTIO code
that can now finally be connected together!

Let's look at the advantages of extending vhost-user with vDPA first and
then discuss how to do it.

Why extend vhost-user with vDPA?
================================
Reusing VIRTIO emulation code for vhost-user backends
-----------------------------------------------------
It is a common misconception that a vhost device is a VIRTIO device.
VIRTIO devices are defined in the VIRTIO specification and consist of a
configuration space, virtqueues, and a device lifecycle that includes
feature negotiation. A vhost device is a subset of the corresponding
VIRTIO device. The exact subset depends on the device type, and some
vhost devices are closer to the full functionality of their
corresponding VIRTIO device than others. The most well-known example is
that vhost-net devices have rx/tx virtqueues and but lack the virtio-net
control virtqueue. Also, the configuration space and device lifecycle
are only partially available to vhost devices.

This difference makes it impossible to use a VIRTIO device as a
vhost-user device and vice versa. There is an impedance mismatch and
missing functionality. That's a shame because existing VIRTIO device
emulation code is mature and duplicating it to provide vhost-user
backends creates additional work.

If there was a way to reuse existing VIRTIO device emulation code it
would be easier to move to a multi-process architecture in QEMU. Want to
run --netdev user,id=netdev0 --device virtio-net-pci,netdev=netdev0 in a
separate, sandboxed process? Easy, run it as a vhost-user-net device
instead of as virtio-net.

Making VMM device shims optional
--------------------------------
Today each new vhost device type requires a shim device in the VMM. QEMU
has --device vhost-user-blk-pci, --device vhost-user-input-pci, and so
on. Why can't there be a single --device vhost-user device?

This limitation is a consequence of the fact that vhost devices are not
full VIRTIO devices. In fact, a vhost device does not even have a way to
report its device type (net, blk, scsi, etc). Therefore it is impossible
for today's VMMs to offer a generic device. Each vhost device type
requires a shim device.

In some cases a shim device is desirable because it allows the VMM to
handle some aspects of the device instead of passing everything through
to the vhost device backend. But requiring shims by default involves
lots of tedious boilerplate code and prevents new device types from
being used by older VMMs.

Providing a well-defined device model in vhost-user
---------------------------------------------------
Although vhost-user works well for users, it is difficult for developers
to learn and extend. The protocol does not have a well-defined device
model. Each device type has its own undocumented set of protocol
messages that are used. For example, the vhost-user-blk device uses the
configuration space whereas most other device types do not use the
configuration space at all.

Since protocol use is not fully documented in the specification,
developers might resort to reading Linux, QEMU, and DPDK code in order
to figure out how to make their devices work. They typically have to
debug vhost-user protocol messages and adjust their code until it
appears to work. Hopefully the critical bugs are caught before the code
ships. This is problematic because it's hard to produce high-quality
vhost-user implementations.

Although the protocol specification can certainly be cleaned up, the
problem is more fundamental. vhost-user badly needs a well-defined
device model so that protocol usage is clear and uniform for all device
types. The best way to do that is to simply adopt the VIRTIO device
model. The VIRTIO specification already defines the device lifecycle and
details of the device types. By making vhost-user devices full VIRTIO
devices there is no need for additional vhost device specifications. The
vhost-user specification just becomes a transport for the established
VIRTIO device model. Luckily that is effectively what vDPA has done for
kernel vhost ioctls.

How to do this in QEMU
======================
The following QEMU changes are needed to implement vhost-user vDPA
support. Below I will focus on vhost-user-net but most of the work is
generic and benefits all device types.
Import vDPA ioctls into vhost-user

vDPA extends the Linux vhost ioctl interface. It uses a subset of vhost
ioctls and adds new vDPA-specific ioctls that are implemented in the
vhost_vdpa.ko kernel module. These new ioctls enable the full VIRTIO
device model, including device IDs, the status register, configuration
space, and so on.

In theory vhost-user could be fixed without vDPA, but it would involve
effectively adding the same set of functionality that vDPA has already
added onto kernel vhost. Reusing the vDPA ioctls allows VMMs to support
both kernel vDPA and vhost-user with minimal code duplication.

This can be done by adding a VHOST_USER_PROTOCOL_F_VDPA feature bit to
the vhost-user protocol. If both the vhost-user frontend and backend
support vDPA then all vDPA messages are available. Otherwise they can
either fall back on legacy vhost-user behavior or drop the connection.

The vhost-user specification could be split into a legacy section and a
modern vDPA-enabled section. The modern protocol will remove vhost-user
messages that are not needed by vDPA, simplifying the protocol for new
developers while allowing existing implementations to support both with
minimal changes.

One detail is that vDPA does not use the memory table mechanism for
sharing memory. Instead it relies on the richer IOMMU message family
that is optional in vhost-user today. This approach can be adopted in
vhost-user too, making the IOMMU code path standard for all
implementations and dropping the memory table mechanism.

Add vhost-user vDPA to the vhost code
-------------------------------------
QEMU's hw/virtio/vhost*.c code supports kernel vhost, vhost-user, and
kernel vDPA. A vhost-user vDPA mode must be added to implement the new
protocol. It can be implemented as a combination of the vhost-user and
kernel vDPA code already in QEMU. Most likely the existing vhost-user
code can simply be extended to enable vDPA support if the backend
supports it.

Only small changes to hw/net/virtio-net.c and net/vhost-user.c are
needed to use vhost-user vDPA with net devices. At that point QEMU can
connect to a vhost-user-net device backend and use vDPA extensions.

Add a vhost-user vDPA VIRTIO transport
--------------------------------------
Next a vhost-user-net device backend can be put together using QEMU's
virtio-net emulation. A translation layer is needed between the
vhost-user protocol and the VirtIODevice type in QEMU. This can be done
by implementing a new VIRTIO transport alongside the existing pci, mmio,
and ccw transports. The transport processes vhost-user protocol messages
from the UNIX domain socket and invokes VIRTIO device emulation code
inside QEMU. It acts as a VIRTIO bus so that virtio-net-device,
virtio-blk-device, and other device types can be plugged in.

This piece is the most involved but the vhost-user protocol
communication part was already implemented in the virtio-vhost-user
prototype that I worked on previously. Most of the communication code
can be reused and the remainder is implementing the VirtioBusClass
interface.

To summarize, a new transport acts as the vhost-user device backend and
invokes QEMU VirtIODevice methods in response to vhost-user protocol
messages. The syntax would be something like --device
virtio-net-device,id=virtio-net0 --device
vhost-user-backend,device=virtio-net0,addr.type=unix,addr.path=/tmp/vhost-user-net.sock.

Where this converges with multi-process QEMU
--------------------------------------------
At this point QEMU can run ad-hoc vhost-user backends using existing
VIRTIO device models. It is possible to go further by creating a
qemu-dev launcher executable that implements the vhost-user spec's
"Backend program conventions". This way a minimal device emulator
executable hosts the device instead of a full system emulator.

The requirements for this are similar to the multi-process QEMU effort,
which aims to run QEMU devices as separate processes. One of the main
open questions is how to design build system and Kconfig support for
building minimal device emulator executables.

In the case of vhost-user-net the qemu-dev-vhost-user-net executable
would contain virtio-net-device, vhost-user-backend, any netdevs the
user wishes to include, a QMP monitor, and a vhost-user backend
command-line interface.

Where does this leave us? QEMU's existing VIRTIO device models can be
used as vhost-user devices and run in a separate processes from the VMM.
It's a great way of reusing code and having the flexibility to deploy it
in the way that makes most sense for the intended use case.

Conclusion
==========
The vhost-user protocol can be simplified by adopting the vhost vDPA
ioctls that have recently been introduced in Linux. This turns
vhost-user into a VIRTIO transport and vhost-user devices become full
VIRTIO devices. Existing VIRTIO device emulation code can then be reused
in vhost-user device backends.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread