All of lore.kernel.org
 help / color / mirror / Atom feed
* Outline for VHOST_USER_PROTOCOL_F_VDPA
@ 2020-09-28  9:25 Stefan Hajnoczi
  2020-09-28 11:21 ` Marc-André Lureau
  2020-09-29  6:09 ` Michael S. Tsirkin
  0 siblings, 2 replies; 15+ messages in thread
From: Stefan Hajnoczi @ 2020-09-28  9:25 UTC (permalink / raw)
  To: maxime.coquelin, jasowang, lulu, Michael S. Tsirkin, tiwei.bie,
	changpeng.liu, raphael.norwitz, Felipe Franciosi,
	marcandre.lureau, kraxel, Nikos Dragazis, Daniele Buono
  Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 11305 bytes --]

Hi,
Thanks for the positive responses to the initial discussion about
introducing VHOST_USER_PROTOCOL_F_VDPA to use vDPA semantics and bring
the full VIRTIO device model to vhost-user:
https://lists.gnu.org/archive/html/qemu-devel/2020-08/msg05181.html

Below is an inlined version of the more detailed explanation I posted on
my blog yesterday[1]. Email is better for discussion.

If anyone wants to tackle this, please reply so others can stay
up-to-date.

Stefan

[1] http://blog.vmsplice.net/2020/09/on-unifying-vhost-user-and-virtio.html
---
The recent development of a Linux driver framework called VIRTIO Data
Path Acceleration (vDPA) has laid the groundwork for exciting new
vhost-user features. The implications of vDPA have not yet rippled
through the community so I want to share my thoughts on how the
vhost-user protocol can take advantage of new ideas from vDPA.

This post is aimed at developers and assumes familiarity with the
vhost-user protocol and VIRTIO. No knowledge of vDPA is required.

vDPA helps with writing drivers for hardware that supports VIRTIO
offload. Its goal is to enable hybrid hardware/software VIRTIO devices,
but as a nice side-effect it has overcome limitations in the kernel
vhost interface. It turns out that applying ideas from vDPA to the
vhost-user protocol solves the same issues there. In this article I'll
show how extending the vhost-user protocol with vDPA has the following
benefits:

* Allows existing VIRTIO device emulation code to act as a vhost-user
  device backend.
* Removes the need for shim devices in the virtual machine monitor (VMM).
* Replaces undocumented conventions with a well-defined device model.

These things can be done while reusing existing vhost-user and VIRTIO
code. In fact, this is especially good news for existing codebases like
QEMU because they already have a wealth of vhost-user and VIRTIO code
that can now finally be connected together!

Let's look at the advantages of extending vhost-user with vDPA first and
then discuss how to do it.

Why extend vhost-user with vDPA?
================================
Reusing VIRTIO emulation code for vhost-user backends
-----------------------------------------------------
It is a common misconception that a vhost device is a VIRTIO device.
VIRTIO devices are defined in the VIRTIO specification and consist of a
configuration space, virtqueues, and a device lifecycle that includes
feature negotiation. A vhost device is a subset of the corresponding
VIRTIO device. The exact subset depends on the device type, and some
vhost devices are closer to the full functionality of their
corresponding VIRTIO device than others. The most well-known example is
that vhost-net devices have rx/tx virtqueues and but lack the virtio-net
control virtqueue. Also, the configuration space and device lifecycle
are only partially available to vhost devices.

This difference makes it impossible to use a VIRTIO device as a
vhost-user device and vice versa. There is an impedance mismatch and
missing functionality. That's a shame because existing VIRTIO device
emulation code is mature and duplicating it to provide vhost-user
backends creates additional work.

If there was a way to reuse existing VIRTIO device emulation code it
would be easier to move to a multi-process architecture in QEMU. Want to
run --netdev user,id=netdev0 --device virtio-net-pci,netdev=netdev0 in a
separate, sandboxed process? Easy, run it as a vhost-user-net device
instead of as virtio-net.

Making VMM device shims optional
--------------------------------
Today each new vhost device type requires a shim device in the VMM. QEMU
has --device vhost-user-blk-pci, --device vhost-user-input-pci, and so
on. Why can't there be a single --device vhost-user device?

This limitation is a consequence of the fact that vhost devices are not
full VIRTIO devices. In fact, a vhost device does not even have a way to
report its device type (net, blk, scsi, etc). Therefore it is impossible
for today's VMMs to offer a generic device. Each vhost device type
requires a shim device.

In some cases a shim device is desirable because it allows the VMM to
handle some aspects of the device instead of passing everything through
to the vhost device backend. But requiring shims by default involves
lots of tedious boilerplate code and prevents new device types from
being used by older VMMs.

Providing a well-defined device model in vhost-user
---------------------------------------------------
Although vhost-user works well for users, it is difficult for developers
to learn and extend. The protocol does not have a well-defined device
model. Each device type has its own undocumented set of protocol
messages that are used. For example, the vhost-user-blk device uses the
configuration space whereas most other device types do not use the
configuration space at all.

Since protocol use is not fully documented in the specification,
developers might resort to reading Linux, QEMU, and DPDK code in order
to figure out how to make their devices work. They typically have to
debug vhost-user protocol messages and adjust their code until it
appears to work. Hopefully the critical bugs are caught before the code
ships. This is problematic because it's hard to produce high-quality
vhost-user implementations.

Although the protocol specification can certainly be cleaned up, the
problem is more fundamental. vhost-user badly needs a well-defined
device model so that protocol usage is clear and uniform for all device
types. The best way to do that is to simply adopt the VIRTIO device
model. The VIRTIO specification already defines the device lifecycle and
details of the device types. By making vhost-user devices full VIRTIO
devices there is no need for additional vhost device specifications. The
vhost-user specification just becomes a transport for the established
VIRTIO device model. Luckily that is effectively what vDPA has done for
kernel vhost ioctls.

How to do this in QEMU
======================
The following QEMU changes are needed to implement vhost-user vDPA
support. Below I will focus on vhost-user-net but most of the work is
generic and benefits all device types.
Import vDPA ioctls into vhost-user

vDPA extends the Linux vhost ioctl interface. It uses a subset of vhost
ioctls and adds new vDPA-specific ioctls that are implemented in the
vhost_vdpa.ko kernel module. These new ioctls enable the full VIRTIO
device model, including device IDs, the status register, configuration
space, and so on.

In theory vhost-user could be fixed without vDPA, but it would involve
effectively adding the same set of functionality that vDPA has already
added onto kernel vhost. Reusing the vDPA ioctls allows VMMs to support
both kernel vDPA and vhost-user with minimal code duplication.

This can be done by adding a VHOST_USER_PROTOCOL_F_VDPA feature bit to
the vhost-user protocol. If both the vhost-user frontend and backend
support vDPA then all vDPA messages are available. Otherwise they can
either fall back on legacy vhost-user behavior or drop the connection.

The vhost-user specification could be split into a legacy section and a
modern vDPA-enabled section. The modern protocol will remove vhost-user
messages that are not needed by vDPA, simplifying the protocol for new
developers while allowing existing implementations to support both with
minimal changes.

One detail is that vDPA does not use the memory table mechanism for
sharing memory. Instead it relies on the richer IOMMU message family
that is optional in vhost-user today. This approach can be adopted in
vhost-user too, making the IOMMU code path standard for all
implementations and dropping the memory table mechanism.

Add vhost-user vDPA to the vhost code
-------------------------------------
QEMU's hw/virtio/vhost*.c code supports kernel vhost, vhost-user, and
kernel vDPA. A vhost-user vDPA mode must be added to implement the new
protocol. It can be implemented as a combination of the vhost-user and
kernel vDPA code already in QEMU. Most likely the existing vhost-user
code can simply be extended to enable vDPA support if the backend
supports it.

Only small changes to hw/net/virtio-net.c and net/vhost-user.c are
needed to use vhost-user vDPA with net devices. At that point QEMU can
connect to a vhost-user-net device backend and use vDPA extensions.

Add a vhost-user vDPA VIRTIO transport
--------------------------------------
Next a vhost-user-net device backend can be put together using QEMU's
virtio-net emulation. A translation layer is needed between the
vhost-user protocol and the VirtIODevice type in QEMU. This can be done
by implementing a new VIRTIO transport alongside the existing pci, mmio,
and ccw transports. The transport processes vhost-user protocol messages
from the UNIX domain socket and invokes VIRTIO device emulation code
inside QEMU. It acts as a VIRTIO bus so that virtio-net-device,
virtio-blk-device, and other device types can be plugged in.

This piece is the most involved but the vhost-user protocol
communication part was already implemented in the virtio-vhost-user
prototype that I worked on previously. Most of the communication code
can be reused and the remainder is implementing the VirtioBusClass
interface.

To summarize, a new transport acts as the vhost-user device backend and
invokes QEMU VirtIODevice methods in response to vhost-user protocol
messages. The syntax would be something like --device
virtio-net-device,id=virtio-net0 --device
vhost-user-backend,device=virtio-net0,addr.type=unix,addr.path=/tmp/vhost-user-net.sock.

Where this converges with multi-process QEMU
--------------------------------------------
At this point QEMU can run ad-hoc vhost-user backends using existing
VIRTIO device models. It is possible to go further by creating a
qemu-dev launcher executable that implements the vhost-user spec's
"Backend program conventions". This way a minimal device emulator
executable hosts the device instead of a full system emulator.

The requirements for this are similar to the multi-process QEMU effort,
which aims to run QEMU devices as separate processes. One of the main
open questions is how to design build system and Kconfig support for
building minimal device emulator executables.

In the case of vhost-user-net the qemu-dev-vhost-user-net executable
would contain virtio-net-device, vhost-user-backend, any netdevs the
user wishes to include, a QMP monitor, and a vhost-user backend
command-line interface.

Where does this leave us? QEMU's existing VIRTIO device models can be
used as vhost-user devices and run in a separate processes from the VMM.
It's a great way of reusing code and having the flexibility to deploy it
in the way that makes most sense for the intended use case.

Conclusion
==========
The vhost-user protocol can be simplified by adopting the vhost vDPA
ioctls that have recently been introduced in Linux. This turns
vhost-user into a VIRTIO transport and vhost-user devices become full
VIRTIO devices. Existing VIRTIO device emulation code can then be reused
in vhost-user device backends.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Outline for VHOST_USER_PROTOCOL_F_VDPA
  2020-09-28  9:25 Outline for VHOST_USER_PROTOCOL_F_VDPA Stefan Hajnoczi
@ 2020-09-28 11:21 ` Marc-André Lureau
  2020-09-28 15:32   ` Stefan Hajnoczi
  2020-09-29  6:09 ` Michael S. Tsirkin
  1 sibling, 1 reply; 15+ messages in thread
From: Marc-André Lureau @ 2020-09-28 11:21 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: lulu, tiwei.bie, Michael S. Tsirkin, Jason Wang, qemu-devel,
	Raphael Norwitz, Coquelin, Maxime, Hoffmann, Gerd,
	Felipe Franciosi, Nikos Dragazis, Liu, Changpeng, Daniele Buono

Hi

On Mon, Sep 28, 2020 at 1:25 PM Stefan Hajnoczi <stefanha@redhat.com wrote:
> Where this converges with multi-process QEMU
> --------------------------------------------
> At this point QEMU can run ad-hoc vhost-user backends using existing
> VIRTIO device models. It is possible to go further by creating a
> qemu-dev launcher executable that implements the vhost-user spec's
> "Backend program conventions". This way a minimal device emulator
> executable hosts the device instead of a full system emulator.
>
> The requirements for this are similar to the multi-process QEMU effort,
> which aims to run QEMU devices as separate processes. One of the main
> open questions is how to design build system and Kconfig support for
> building minimal device emulator executables.
>
> In the case of vhost-user-net the qemu-dev-vhost-user-net executable
> would contain virtio-net-device, vhost-user-backend, any netdevs the
> user wishes to include, a QMP monitor, and a vhost-user backend
> command-line interface.
>
> Where does this leave us? QEMU's existing VIRTIO device models can be
> used as vhost-user devices and run in a separate processes from the VMM.
> It's a great way of reusing code and having the flexibility to deploy it
> in the way that makes most sense for the intended use case.
>

My understanding is that this would only be able to expose virtio
devices from external processes. But vfio-user could expose more kinds
of devices, including the virtio devices.

Shouldn't we focus on vfio-user now, as the general out-of-process
device solution?



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Outline for VHOST_USER_PROTOCOL_F_VDPA
  2020-09-28 11:21 ` Marc-André Lureau
@ 2020-09-28 15:32   ` Stefan Hajnoczi
  2020-10-12  2:56     ` Jason Wang
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Hajnoczi @ 2020-09-28 15:32 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: lulu, tiwei.bie, Michael S. Tsirkin, Jason Wang, qemu-devel,
	Raphael Norwitz, Coquelin, Maxime, Hoffmann, Gerd,
	Felipe Franciosi, Nikos Dragazis, Liu, Changpeng, Daniele Buono

[-- Attachment #1: Type: text/plain, Size: 2029 bytes --]

On Mon, Sep 28, 2020 at 03:21:56PM +0400, Marc-André Lureau wrote:
> On Mon, Sep 28, 2020 at 1:25 PM Stefan Hajnoczi <stefanha@redhat.com wrote:
> > Where this converges with multi-process QEMU
> > --------------------------------------------
> > At this point QEMU can run ad-hoc vhost-user backends using existing
> > VIRTIO device models. It is possible to go further by creating a
> > qemu-dev launcher executable that implements the vhost-user spec's
> > "Backend program conventions". This way a minimal device emulator
> > executable hosts the device instead of a full system emulator.
> >
> > The requirements for this are similar to the multi-process QEMU effort,
> > which aims to run QEMU devices as separate processes. One of the main
> > open questions is how to design build system and Kconfig support for
> > building minimal device emulator executables.
> >
> > In the case of vhost-user-net the qemu-dev-vhost-user-net executable
> > would contain virtio-net-device, vhost-user-backend, any netdevs the
> > user wishes to include, a QMP monitor, and a vhost-user backend
> > command-line interface.
> >
> > Where does this leave us? QEMU's existing VIRTIO device models can be
> > used as vhost-user devices and run in a separate processes from the VMM.
> > It's a great way of reusing code and having the flexibility to deploy it
> > in the way that makes most sense for the intended use case.
> >
> 
> My understanding is that this would only be able to expose virtio
> devices from external processes. But vfio-user could expose more kinds
> of devices, including the virtio devices.
> 
> Shouldn't we focus on vfio-user now, as the general out-of-process
> device solution?

Eventually vfio-user can replace vhost-user. However, vfio-user
development will take longer so for anyone already comfortable with
vhost-user I think extending the protocol with vDPA ioctls is
attractive.

Maybe we can get more organized around vfio-user and make progress
quicker?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Outline for VHOST_USER_PROTOCOL_F_VDPA
  2020-09-28  9:25 Outline for VHOST_USER_PROTOCOL_F_VDPA Stefan Hajnoczi
  2020-09-28 11:21 ` Marc-André Lureau
@ 2020-09-29  6:09 ` Michael S. Tsirkin
  2020-09-29  8:57   ` Stefan Hajnoczi
  1 sibling, 1 reply; 15+ messages in thread
From: Michael S. Tsirkin @ 2020-09-29  6:09 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: lulu, tiwei.bie, jasowang, qemu-devel, raphael.norwitz,
	maxime.coquelin, kraxel, Felipe Franciosi, marcandre.lureau,
	Nikos Dragazis, changpeng.liu, Daniele Buono

Thanks for the post!
I have one comment:

On Mon, Sep 28, 2020 at 10:25:37AM +0100, Stefan Hajnoczi wrote:
> Why extend vhost-user with vDPA?
> ================================
> Reusing VIRTIO emulation code for vhost-user backends
> -----------------------------------------------------
> It is a common misconception that a vhost device is a VIRTIO device.
> VIRTIO devices are defined in the VIRTIO specification and consist of a
> configuration space, virtqueues, and a device lifecycle that includes
> feature negotiation. A vhost device is a subset of the corresponding
> VIRTIO device. The exact subset depends on the device type, and some
> vhost devices are closer to the full functionality of their
> corresponding VIRTIO device than others. The most well-known example is
> that vhost-net devices have rx/tx virtqueues and but lack the virtio-net
> control virtqueue. Also, the configuration space and device lifecycle
> are only partially available to vhost devices.
> 
> This difference makes it impossible to use a VIRTIO device as a
> vhost-user device and vice versa. There is an impedance mismatch and
> missing functionality. That's a shame because existing VIRTIO device
> emulation code is mature and duplicating it to provide vhost-user
> backends creates additional work.


The biggest issue facing vhost-user and absent in vdpa is
backend disconnect handling. This is the reason control path
is kept under QEMU control: we do not need any logic to
restore control path data, and we can verify a new backend
is consistent with old one.

> If there was a way to reuse existing VIRTIO device emulation code it
> would be easier to move to a multi-process architecture in QEMU. Want to
> run --netdev user,id=netdev0 --device virtio-net-pci,netdev=netdev0 in a
> separate, sandboxed process? Easy, run it as a vhost-user-net device
> instead of as virtio-net.

Given vhost-user is using a socket, and given there's an elaborate
protocol due to need for backwards compatibility, it seems safer to
have vhost-user interface in a separate process too.


-- 
MST



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Outline for VHOST_USER_PROTOCOL_F_VDPA
  2020-09-29  6:09 ` Michael S. Tsirkin
@ 2020-09-29  8:57   ` Stefan Hajnoczi
  2020-09-29 10:04     ` Michael S. Tsirkin
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Hajnoczi @ 2020-09-29  8:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: lulu, tiwei.bie, jasowang, qemu-devel, raphael.norwitz,
	maxime.coquelin, kraxel, Felipe Franciosi, marcandre.lureau,
	Nikos Dragazis, changpeng.liu, Daniele Buono

[-- Attachment #1: Type: text/plain, Size: 3821 bytes --]

On Tue, Sep 29, 2020 at 02:09:55AM -0400, Michael S. Tsirkin wrote:
> On Mon, Sep 28, 2020 at 10:25:37AM +0100, Stefan Hajnoczi wrote:
> > Why extend vhost-user with vDPA?
> > ================================
> > Reusing VIRTIO emulation code for vhost-user backends
> > -----------------------------------------------------
> > It is a common misconception that a vhost device is a VIRTIO device.
> > VIRTIO devices are defined in the VIRTIO specification and consist of a
> > configuration space, virtqueues, and a device lifecycle that includes
> > feature negotiation. A vhost device is a subset of the corresponding
> > VIRTIO device. The exact subset depends on the device type, and some
> > vhost devices are closer to the full functionality of their
> > corresponding VIRTIO device than others. The most well-known example is
> > that vhost-net devices have rx/tx virtqueues and but lack the virtio-net
> > control virtqueue. Also, the configuration space and device lifecycle
> > are only partially available to vhost devices.
> > 
> > This difference makes it impossible to use a VIRTIO device as a
> > vhost-user device and vice versa. There is an impedance mismatch and
> > missing functionality. That's a shame because existing VIRTIO device
> > emulation code is mature and duplicating it to provide vhost-user
> > backends creates additional work.
> 
> 
> The biggest issue facing vhost-user and absent in vdpa is
> backend disconnect handling. This is the reason control path
> is kept under QEMU control: we do not need any logic to
> restore control path data, and we can verify a new backend
> is consistent with old one.

I don't think using vhost-user with vDPA changes that. The VMM still
needs to emulate a virtio-pci/ccw/mmio device that the guest interfaces
with. If the device backend goes offline it's possible to restore that
state upon reconnection. What have I missed?

Regarding reconnection in general, it currently seems like a partially
solved problem in vhost-user. There is the "Inflight I/O tracking"
mechanism in the spec and some wording about reconnecting the socket,
but in practice I wouldn't expect all device types, VMMs, or device
backends to actually support reconnection. This is an area where a
uniform solution would be very welcome too.

There was discussion about recovering state in muser. The original idea
was for the muser kernel module to host state that persists across
device backend restart. That way the device backend can go away
temporarily and resume without guest intervention.

Then when the vfio-user discussion started the idea morphed into simply
keeping a tmpfs file for each device instance (no special muser.ko
support needed anymore). This allows the device backend to resume
without losing state. In practice a programming framework is needed to
make this easy and safe to use but it boils down to a tmpfs mmap.

> > If there was a way to reuse existing VIRTIO device emulation code it
> > would be easier to move to a multi-process architecture in QEMU. Want to
> > run --netdev user,id=netdev0 --device virtio-net-pci,netdev=netdev0 in a
> > separate, sandboxed process? Easy, run it as a vhost-user-net device
> > instead of as virtio-net.
> 
> Given vhost-user is using a socket, and given there's an elaborate
> protocol due to need for backwards compatibility, it seems safer to
> have vhost-user interface in a separate process too.

Right, with vhost-user only the virtqueue processing is done in the
device backend. The VMM still has to do the virtio transport emulation
(pci, ccw, mmio) and vhost-user connection lifecycle, which is complex.

Going back to Marc-André's point, why don't we focus on vfio-user so the
entire device can be moved out of the VMM?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Outline for VHOST_USER_PROTOCOL_F_VDPA
  2020-09-29  8:57   ` Stefan Hajnoczi
@ 2020-09-29 10:04     ` Michael S. Tsirkin
  2020-09-29 18:38       ` Stefan Hajnoczi
  0 siblings, 1 reply; 15+ messages in thread
From: Michael S. Tsirkin @ 2020-09-29 10:04 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: lulu, tiwei.bie, jasowang, qemu-devel, raphael.norwitz,
	maxime.coquelin, kraxel, Felipe Franciosi, marcandre.lureau,
	Nikos Dragazis, changpeng.liu, Daniele Buono

On Tue, Sep 29, 2020 at 09:57:51AM +0100, Stefan Hajnoczi wrote:
> On Tue, Sep 29, 2020 at 02:09:55AM -0400, Michael S. Tsirkin wrote:
> > On Mon, Sep 28, 2020 at 10:25:37AM +0100, Stefan Hajnoczi wrote:
> > > Why extend vhost-user with vDPA?
> > > ================================
> > > Reusing VIRTIO emulation code for vhost-user backends
> > > -----------------------------------------------------
> > > It is a common misconception that a vhost device is a VIRTIO device.
> > > VIRTIO devices are defined in the VIRTIO specification and consist of a
> > > configuration space, virtqueues, and a device lifecycle that includes
> > > feature negotiation. A vhost device is a subset of the corresponding
> > > VIRTIO device. The exact subset depends on the device type, and some
> > > vhost devices are closer to the full functionality of their
> > > corresponding VIRTIO device than others. The most well-known example is
> > > that vhost-net devices have rx/tx virtqueues and but lack the virtio-net
> > > control virtqueue. Also, the configuration space and device lifecycle
> > > are only partially available to vhost devices.
> > > 
> > > This difference makes it impossible to use a VIRTIO device as a
> > > vhost-user device and vice versa. There is an impedance mismatch and
> > > missing functionality. That's a shame because existing VIRTIO device
> > > emulation code is mature and duplicating it to provide vhost-user
> > > backends creates additional work.
> > 
> > 
> > The biggest issue facing vhost-user and absent in vdpa is
> > backend disconnect handling. This is the reason control path
> > is kept under QEMU control: we do not need any logic to
> > restore control path data, and we can verify a new backend
> > is consistent with old one.
> 
> I don't think using vhost-user with vDPA changes that. The VMM still
> needs to emulate a virtio-pci/ccw/mmio device that the guest interfaces
> with. If the device backend goes offline it's possible to restore that
> state upon reconnection. What have I missed?

The need to maintain the state in a way that is robust
against backend disconnects and can be restored.

> Regarding reconnection in general, it currently seems like a partially
> solved problem in vhost-user. There is the "Inflight I/O tracking"
> mechanism in the spec and some wording about reconnecting the socket,
> but in practice I wouldn't expect all device types, VMMs, or device
> backends to actually support reconnection. This is an area where a
> uniform solution would be very welcome too.

I'm not aware of big issues. What are they?

> There was discussion about recovering state in muser. The original idea
> was for the muser kernel module to host state that persists across
> device backend restart. That way the device backend can go away
> temporarily and resume without guest intervention.
> 
> Then when the vfio-user discussion started the idea morphed into simply
> keeping a tmpfs file for each device instance (no special muser.ko
> support needed anymore). This allows the device backend to resume
> without losing state. In practice a programming framework is needed to
> make this easy and safe to use but it boils down to a tmpfs mmap.
> 
> > > If there was a way to reuse existing VIRTIO device emulation code it
> > > would be easier to move to a multi-process architecture in QEMU. Want to
> > > run --netdev user,id=netdev0 --device virtio-net-pci,netdev=netdev0 in a
> > > separate, sandboxed process? Easy, run it as a vhost-user-net device
> > > instead of as virtio-net.
> > 
> > Given vhost-user is using a socket, and given there's an elaborate
> > protocol due to need for backwards compatibility, it seems safer to
> > have vhost-user interface in a separate process too.
> 
> Right, with vhost-user only the virtqueue processing is done in the
> device backend. The VMM still has to do the virtio transport emulation
> (pci, ccw, mmio) and vhost-user connection lifecycle, which is complex.

IIUC all vfio user does is add another protocol in the VMM,
and move code out of VMM to backend.

Architecturally I don't see why it's safer.

Something like multi-process patches seems like a way to
add defence in depth by having a process in the middle,
outside both VMM and backend.

> Going back to Marc-André's point, why don't we focus on vfio-user so the
> entire device can be moved out of the VMM?
> 
> Stefan

The fact that vfio-user adds a kernel component is one issue.

-- 
MST



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Outline for VHOST_USER_PROTOCOL_F_VDPA
  2020-09-29 10:04     ` Michael S. Tsirkin
@ 2020-09-29 18:38       ` Stefan Hajnoczi
  2020-09-30  8:07         ` Michael S. Tsirkin
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Hajnoczi @ 2020-09-29 18:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: lulu, tiwei.bie, jasowang, qemu-devel, raphael.norwitz,
	maxime.coquelin, kraxel, Felipe Franciosi, marcandre.lureau,
	Nikos Dragazis, changpeng.liu, Daniele Buono

[-- Attachment #1: Type: text/plain, Size: 7259 bytes --]

On Tue, Sep 29, 2020 at 06:04:34AM -0400, Michael S. Tsirkin wrote:
> On Tue, Sep 29, 2020 at 09:57:51AM +0100, Stefan Hajnoczi wrote:
> > On Tue, Sep 29, 2020 at 02:09:55AM -0400, Michael S. Tsirkin wrote:
> > > On Mon, Sep 28, 2020 at 10:25:37AM +0100, Stefan Hajnoczi wrote:
> > > > Why extend vhost-user with vDPA?
> > > > ================================
> > > > Reusing VIRTIO emulation code for vhost-user backends
> > > > -----------------------------------------------------
> > > > It is a common misconception that a vhost device is a VIRTIO device.
> > > > VIRTIO devices are defined in the VIRTIO specification and consist of a
> > > > configuration space, virtqueues, and a device lifecycle that includes
> > > > feature negotiation. A vhost device is a subset of the corresponding
> > > > VIRTIO device. The exact subset depends on the device type, and some
> > > > vhost devices are closer to the full functionality of their
> > > > corresponding VIRTIO device than others. The most well-known example is
> > > > that vhost-net devices have rx/tx virtqueues and but lack the virtio-net
> > > > control virtqueue. Also, the configuration space and device lifecycle
> > > > are only partially available to vhost devices.
> > > > 
> > > > This difference makes it impossible to use a VIRTIO device as a
> > > > vhost-user device and vice versa. There is an impedance mismatch and
> > > > missing functionality. That's a shame because existing VIRTIO device
> > > > emulation code is mature and duplicating it to provide vhost-user
> > > > backends creates additional work.
> > > 
> > > 
> > > The biggest issue facing vhost-user and absent in vdpa is
> > > backend disconnect handling. This is the reason control path
> > > is kept under QEMU control: we do not need any logic to
> > > restore control path data, and we can verify a new backend
> > > is consistent with old one.
> > 
> > I don't think using vhost-user with vDPA changes that. The VMM still
> > needs to emulate a virtio-pci/ccw/mmio device that the guest interfaces
> > with. If the device backend goes offline it's possible to restore that
> > state upon reconnection. What have I missed?
> 
> The need to maintain the state in a way that is robust
> against backend disconnects and can be restored.

QEMU is only bypassed for virtqueue accesses. Everything else still
goes through the virtio-pci emulation in QEMU (VIRTIO configuration
space, status register). vDPA doesn't change this.

Existing vhost-user messages can be kept if they are useful (e.g.
virtqueue state tracking). So I think the situation is no different than
with the existing vhost-user protocol.

> > Regarding reconnection in general, it currently seems like a partially
> > solved problem in vhost-user. There is the "Inflight I/O tracking"
> > mechanism in the spec and some wording about reconnecting the socket,
> > but in practice I wouldn't expect all device types, VMMs, or device
> > backends to actually support reconnection. This is an area where a
> > uniform solution would be very welcome too.
> 
> I'm not aware of big issues. What are they?

I think "Inflight I/O tracking" can only be used when request processing
is idempotent? In other words, it can only be used when submitting the
same request multiple times is safe.

A silly example where this recovery mechanism cannot be used is if a
device has a persistent counter that is incremented by the request. The
guest can't be sure that the counter will be incremented exactly once.

Another example: devices that support requests with compare-and-swap
semantics cannot use this mechanism. During recover the compare will
fail if the request was just completing when the backend crashed.

Do I understand the limitations of this mechanism correctly? It doesn't
seem general and I doubt it can be applied to all existing device types.

> > There was discussion about recovering state in muser. The original idea
> > was for the muser kernel module to host state that persists across
> > device backend restart. That way the device backend can go away
> > temporarily and resume without guest intervention.
> > 
> > Then when the vfio-user discussion started the idea morphed into simply
> > keeping a tmpfs file for each device instance (no special muser.ko
> > support needed anymore). This allows the device backend to resume
> > without losing state. In practice a programming framework is needed to
> > make this easy and safe to use but it boils down to a tmpfs mmap.
> > 
> > > > If there was a way to reuse existing VIRTIO device emulation code it
> > > > would be easier to move to a multi-process architecture in QEMU. Want to
> > > > run --netdev user,id=netdev0 --device virtio-net-pci,netdev=netdev0 in a
> > > > separate, sandboxed process? Easy, run it as a vhost-user-net device
> > > > instead of as virtio-net.
> > > 
> > > Given vhost-user is using a socket, and given there's an elaborate
> > > protocol due to need for backwards compatibility, it seems safer to
> > > have vhost-user interface in a separate process too.
> > 
> > Right, with vhost-user only the virtqueue processing is done in the
> > device backend. The VMM still has to do the virtio transport emulation
> > (pci, ccw, mmio) and vhost-user connection lifecycle, which is complex.
> 
> IIUC all vfio user does is add another protocol in the VMM,
> and move code out of VMM to backend.
> 
> Architecturally I don't see why it's safer.

It eliminates one layer of device emulation (virtio-pci). Fewer
registers to emulate means a smaller attack surface.

It's possible to take things further, maybe with the proposed ioregionfd
mechanism, where the VMM's KVM_RUN loop no longer handles MMIO/PIO
exits. A separate process can handle them. Maybe some platform devices
need CPU state access though.

BTW I think the goal of removing as much emulation from the VMM as
possible is interesting.

Did you have some other approach in mind to remove the PCI and
virtio-pci device from the VMM?

> Something like multi-process patches seems like a way to
> add defence in depth by having a process in the middle,
> outside both VMM and backend.

There is no third process in mpqemu. The VMM uses a UNIX domain socket
to communicate directly with the device backend. There is a PCI "proxy"
device in the VMM that does this communication when the guest accesses
registers. The device backend has a PCI "remote" host controller that a
PCIDevice instance is plugged into and the UNIX domain socket protocol
commands are translated into PCIDevice operations.

This is exactly the same as vfio-user. The only difference is that
vfio-user uses an existing set of commands, whereas mpqemu defines a new
protocol that will eventually need to provide equivalent functionality.

> > Going back to Marc-André's point, why don't we focus on vfio-user so the
> > entire device can be moved out of the VMM?
> > 
> > Stefan
> 
> The fact that vfio-user adds a kernel component is one issue.

vfio-user only needs a UNIX domain socket. The muser.ko kernel module
that was discussed after last KVM Forum is not used by vfio-user.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Outline for VHOST_USER_PROTOCOL_F_VDPA
  2020-09-29 18:38       ` Stefan Hajnoczi
@ 2020-09-30  8:07         ` Michael S. Tsirkin
  2020-09-30 14:57           ` Stefan Hajnoczi
  2020-10-12  3:52           ` Jason Wang
  0 siblings, 2 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2020-09-30  8:07 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: lulu, tiwei.bie, jasowang, qemu-devel, raphael.norwitz,
	maxime.coquelin, kraxel, Felipe Franciosi, marcandre.lureau,
	Nikos Dragazis, changpeng.liu, Daniele Buono

On Tue, Sep 29, 2020 at 07:38:24PM +0100, Stefan Hajnoczi wrote:
> On Tue, Sep 29, 2020 at 06:04:34AM -0400, Michael S. Tsirkin wrote:
> > On Tue, Sep 29, 2020 at 09:57:51AM +0100, Stefan Hajnoczi wrote:
> > > On Tue, Sep 29, 2020 at 02:09:55AM -0400, Michael S. Tsirkin wrote:
> > > > On Mon, Sep 28, 2020 at 10:25:37AM +0100, Stefan Hajnoczi wrote:
> > > > > Why extend vhost-user with vDPA?
> > > > > ================================
> > > > > Reusing VIRTIO emulation code for vhost-user backends
> > > > > -----------------------------------------------------
> > > > > It is a common misconception that a vhost device is a VIRTIO device.
> > > > > VIRTIO devices are defined in the VIRTIO specification and consist of a
> > > > > configuration space, virtqueues, and a device lifecycle that includes
> > > > > feature negotiation. A vhost device is a subset of the corresponding
> > > > > VIRTIO device. The exact subset depends on the device type, and some
> > > > > vhost devices are closer to the full functionality of their
> > > > > corresponding VIRTIO device than others. The most well-known example is
> > > > > that vhost-net devices have rx/tx virtqueues and but lack the virtio-net
> > > > > control virtqueue. Also, the configuration space and device lifecycle
> > > > > are only partially available to vhost devices.
> > > > > 
> > > > > This difference makes it impossible to use a VIRTIO device as a
> > > > > vhost-user device and vice versa. There is an impedance mismatch and
> > > > > missing functionality. That's a shame because existing VIRTIO device
> > > > > emulation code is mature and duplicating it to provide vhost-user
> > > > > backends creates additional work.
> > > > 
> > > > 
> > > > The biggest issue facing vhost-user and absent in vdpa is
> > > > backend disconnect handling. This is the reason control path
> > > > is kept under QEMU control: we do not need any logic to
> > > > restore control path data, and we can verify a new backend
> > > > is consistent with old one.
> > > 
> > > I don't think using vhost-user with vDPA changes that. The VMM still
> > > needs to emulate a virtio-pci/ccw/mmio device that the guest interfaces
> > > with. If the device backend goes offline it's possible to restore that
> > > state upon reconnection. What have I missed?
> > 
> > The need to maintain the state in a way that is robust
> > against backend disconnects and can be restored.
> 
> QEMU is only bypassed for virtqueue accesses. Everything else still
> goes through the virtio-pci emulation in QEMU (VIRTIO configuration
> space, status register). vDPA doesn't change this.
> 
> Existing vhost-user messages can be kept if they are useful (e.g.
> virtqueue state tracking). So I think the situation is no different than
> with the existing vhost-user protocol.
> 
> > > Regarding reconnection in general, it currently seems like a partially
> > > solved problem in vhost-user. There is the "Inflight I/O tracking"
> > > mechanism in the spec and some wording about reconnecting the socket,
> > > but in practice I wouldn't expect all device types, VMMs, or device
> > > backends to actually support reconnection. This is an area where a
> > > uniform solution would be very welcome too.
> > 
> > I'm not aware of big issues. What are they?
> 
> I think "Inflight I/O tracking" can only be used when request processing
> is idempotent? In other words, it can only be used when submitting the
> same request multiple times is safe.


Not inherently it just does not attempt to address this problem.


Inflight tracking only tries to address issues on the guest side,
that is, making sure the same buffer is used exactly once.

> A silly example where this recovery mechanism cannot be used is if a
> device has a persistent counter that is incremented by the request. The
> guest can't be sure that the counter will be incremented exactly once.
> 
> Another example: devices that support requests with compare-and-swap
> semantics cannot use this mechanism. During recover the compare will
> fail if the request was just completing when the backend crashed.
> 
> Do I understand the limitations of this mechanism correctly? It doesn't
> seem general and I doubt it can be applied to all existing device types.

Device with any kind of atomicity guarantees will
have to use some internal mechanism (e.g. log?) to ensure
internal consistency, that is out of scope for tracking.



> > > There was discussion about recovering state in muser. The original idea
> > > was for the muser kernel module to host state that persists across
> > > device backend restart. That way the device backend can go away
> > > temporarily and resume without guest intervention.
> > > 
> > > Then when the vfio-user discussion started the idea morphed into simply
> > > keeping a tmpfs file for each device instance (no special muser.ko
> > > support needed anymore). This allows the device backend to resume
> > > without losing state. In practice a programming framework is needed to
> > > make this easy and safe to use but it boils down to a tmpfs mmap.
> > > 
> > > > > If there was a way to reuse existing VIRTIO device emulation code it
> > > > > would be easier to move to a multi-process architecture in QEMU. Want to
> > > > > run --netdev user,id=netdev0 --device virtio-net-pci,netdev=netdev0 in a
> > > > > separate, sandboxed process? Easy, run it as a vhost-user-net device
> > > > > instead of as virtio-net.
> > > > 
> > > > Given vhost-user is using a socket, and given there's an elaborate
> > > > protocol due to need for backwards compatibility, it seems safer to
> > > > have vhost-user interface in a separate process too.
> > > 
> > > Right, with vhost-user only the virtqueue processing is done in the
> > > device backend. The VMM still has to do the virtio transport emulation
> > > (pci, ccw, mmio) and vhost-user connection lifecycle, which is complex.
> > 
> > IIUC all vfio user does is add another protocol in the VMM,
> > and move code out of VMM to backend.
> > 
> > Architecturally I don't see why it's safer.
> 
> It eliminates one layer of device emulation (virtio-pci). Fewer
> registers to emulate means a smaller attack surface.

Well it does not eliminate it as such, it moves it to the backend.
Which in a variety of setups is actually a more sensitive
place as the backend can do things like access host
storage/network which VMM can be prevented from doing.

> It's possible to take things further, maybe with the proposed ioregionfd
> mechanism, where the VMM's KVM_RUN loop no longer handles MMIO/PIO
> exits. A separate process can handle them. Maybe some platform devices
> need CPU state access though.
> 
> BTW I think the goal of removing as much emulation from the VMM as
> possible is interesting.
> 
> Did you have some other approach in mind to remove the PCI and
> virtio-pci device from the VMM?

Architecturally, I think we can have 3 processes:


VMM -- guest device emulation -- host backend


to me this looks like increasing our defence in depth strength,
as opposed to just shifting things around ...




> > Something like multi-process patches seems like a way to
> > add defence in depth by having a process in the middle,
> > outside both VMM and backend.
> 
> There is no third process in mpqemu. The VMM uses a UNIX domain socket
> to communicate directly with the device backend. There is a PCI "proxy"
> device in the VMM that does this communication when the guest accesses
> registers. The device backend has a PCI "remote" host controller that a
> PCIDevice instance is plugged into and the UNIX domain socket protocol
> commands are translated into PCIDevice operations.

Yes, but does anything prevent us from further splitting the backend
up to emulation part and host side part?


> This is exactly the same as vfio-user. The only difference is that
> vfio-user uses an existing set of commands, whereas mpqemu defines a new
> protocol that will eventually need to provide equivalent functionality.
>
> > > Going back to Marc-André's point, why don't we focus on vfio-user so the
> > > entire device can be moved out of the VMM?
> > > 
> > > Stefan
> > 
> > The fact that vfio-user adds a kernel component is one issue.
> 
> vfio-user only needs a UNIX domain socket. The muser.ko kernel module
> that was discussed after last KVM Forum is not used by vfio-user.
> 
> Stefan

Sorry I will need to go and read the doc which I didn't yet, sorry
about that.

-- 
MST



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Outline for VHOST_USER_PROTOCOL_F_VDPA
  2020-09-30  8:07         ` Michael S. Tsirkin
@ 2020-09-30 14:57           ` Stefan Hajnoczi
  2020-09-30 15:31             ` Michael S. Tsirkin
                               ` (2 more replies)
  2020-10-12  3:52           ` Jason Wang
  1 sibling, 3 replies; 15+ messages in thread
From: Stefan Hajnoczi @ 2020-09-30 14:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: lulu, tiwei.bie, jasowang, qemu-devel, raphael.norwitz,
	maxime.coquelin, kraxel, Felipe Franciosi, marcandre.lureau,
	Nikos Dragazis, changpeng.liu, Daniele Buono

[-- Attachment #1: Type: text/plain, Size: 10101 bytes --]

On Wed, Sep 30, 2020 at 04:07:59AM -0400, Michael S. Tsirkin wrote:
> On Tue, Sep 29, 2020 at 07:38:24PM +0100, Stefan Hajnoczi wrote:
> > On Tue, Sep 29, 2020 at 06:04:34AM -0400, Michael S. Tsirkin wrote:
> > > On Tue, Sep 29, 2020 at 09:57:51AM +0100, Stefan Hajnoczi wrote:
> > > > On Tue, Sep 29, 2020 at 02:09:55AM -0400, Michael S. Tsirkin wrote:
> > > > > On Mon, Sep 28, 2020 at 10:25:37AM +0100, Stefan Hajnoczi wrote:
> > > > > > Why extend vhost-user with vDPA?
> > > > > > ================================
> > > > > > Reusing VIRTIO emulation code for vhost-user backends
> > > > > > -----------------------------------------------------
> > > > > > It is a common misconception that a vhost device is a VIRTIO device.
> > > > > > VIRTIO devices are defined in the VIRTIO specification and consist of a
> > > > > > configuration space, virtqueues, and a device lifecycle that includes
> > > > > > feature negotiation. A vhost device is a subset of the corresponding
> > > > > > VIRTIO device. The exact subset depends on the device type, and some
> > > > > > vhost devices are closer to the full functionality of their
> > > > > > corresponding VIRTIO device than others. The most well-known example is
> > > > > > that vhost-net devices have rx/tx virtqueues and but lack the virtio-net
> > > > > > control virtqueue. Also, the configuration space and device lifecycle
> > > > > > are only partially available to vhost devices.
> > > > > > 
> > > > > > This difference makes it impossible to use a VIRTIO device as a
> > > > > > vhost-user device and vice versa. There is an impedance mismatch and
> > > > > > missing functionality. That's a shame because existing VIRTIO device
> > > > > > emulation code is mature and duplicating it to provide vhost-user
> > > > > > backends creates additional work.
> > > > > 
> > > > > 
> > > > > The biggest issue facing vhost-user and absent in vdpa is
> > > > > backend disconnect handling. This is the reason control path
> > > > > is kept under QEMU control: we do not need any logic to
> > > > > restore control path data, and we can verify a new backend
> > > > > is consistent with old one.
> > > > 
> > > > I don't think using vhost-user with vDPA changes that. The VMM still
> > > > needs to emulate a virtio-pci/ccw/mmio device that the guest interfaces
> > > > with. If the device backend goes offline it's possible to restore that
> > > > state upon reconnection. What have I missed?
> > > 
> > > The need to maintain the state in a way that is robust
> > > against backend disconnects and can be restored.
> > 
> > QEMU is only bypassed for virtqueue accesses. Everything else still
> > goes through the virtio-pci emulation in QEMU (VIRTIO configuration
> > space, status register). vDPA doesn't change this.
> > 
> > Existing vhost-user messages can be kept if they are useful (e.g.
> > virtqueue state tracking). So I think the situation is no different than
> > with the existing vhost-user protocol.
> > 
> > > > Regarding reconnection in general, it currently seems like a partially
> > > > solved problem in vhost-user. There is the "Inflight I/O tracking"
> > > > mechanism in the spec and some wording about reconnecting the socket,
> > > > but in practice I wouldn't expect all device types, VMMs, or device
> > > > backends to actually support reconnection. This is an area where a
> > > > uniform solution would be very welcome too.
> > > 
> > > I'm not aware of big issues. What are they?
> > 
> > I think "Inflight I/O tracking" can only be used when request processing
> > is idempotent? In other words, it can only be used when submitting the
> > same request multiple times is safe.
> 
> 
> Not inherently it just does not attempt to address this problem.
> 
> 
> Inflight tracking only tries to address issues on the guest side,
> that is, making sure the same buffer is used exactly once.
> 
> > A silly example where this recovery mechanism cannot be used is if a
> > device has a persistent counter that is incremented by the request. The
> > guest can't be sure that the counter will be incremented exactly once.
> > 
> > Another example: devices that support requests with compare-and-swap
> > semantics cannot use this mechanism. During recover the compare will
> > fail if the request was just completing when the backend crashed.
> > 
> > Do I understand the limitations of this mechanism correctly? It doesn't
> > seem general and I doubt it can be applied to all existing device types.
> 
> Device with any kind of atomicity guarantees will
> have to use some internal mechanism (e.g. log?) to ensure
> internal consistency, that is out of scope for tracking.

Rant warning, but probably useful to think about for future vhost-user
and vfio-user development... :)

IMO "Inflight I/O tracking" is best placed into libvhost-user instead of
the vhost-user protocol. Here is why:

QEMU's vhost-user code actually does nothing with the inflight data
except passing it back to the reconnected vhost-user device backend and
migrating it as an opaque blob.

The fact that it's opaque to QEMU is a warning sign. QEMU is simply a
mechanism for stashing a blob of data. Stashing data is generic
functionality and not specific to vhost-user devices. One could argue
it's convenient to have the inflight data available to QEMU for
reconnection, but as you said, device backends may still need to
maintain additional state.

It's not clear how the opaque inflight data is within the scope of
vhost-user but additional device backend data is outside the scope. This
is why I think "Inflight I/O tracking" shouldn't be part of the
protocol.

"Inflight I/O tracking" should be a utility API in libvhost-user instead
of a vhost-user protocol feature. That way the backend can stash any
additional data it needs along with the virtqueues. There needs to be
device state save/load support in the vhost-user protocol but eventually
we'll need that anyway because some backends are stateful.

> > > > There was discussion about recovering state in muser. The original idea
> > > > was for the muser kernel module to host state that persists across
> > > > device backend restart. That way the device backend can go away
> > > > temporarily and resume without guest intervention.
> > > > 
> > > > Then when the vfio-user discussion started the idea morphed into simply
> > > > keeping a tmpfs file for each device instance (no special muser.ko
> > > > support needed anymore). This allows the device backend to resume
> > > > without losing state. In practice a programming framework is needed to
> > > > make this easy and safe to use but it boils down to a tmpfs mmap.
> > > > 
> > > > > > If there was a way to reuse existing VIRTIO device emulation code it
> > > > > > would be easier to move to a multi-process architecture in QEMU. Want to
> > > > > > run --netdev user,id=netdev0 --device virtio-net-pci,netdev=netdev0 in a
> > > > > > separate, sandboxed process? Easy, run it as a vhost-user-net device
> > > > > > instead of as virtio-net.
> > > > > 
> > > > > Given vhost-user is using a socket, and given there's an elaborate
> > > > > protocol due to need for backwards compatibility, it seems safer to
> > > > > have vhost-user interface in a separate process too.
> > > > 
> > > > Right, with vhost-user only the virtqueue processing is done in the
> > > > device backend. The VMM still has to do the virtio transport emulation
> > > > (pci, ccw, mmio) and vhost-user connection lifecycle, which is complex.
> > > 
> > > IIUC all vfio user does is add another protocol in the VMM,
> > > and move code out of VMM to backend.
> > > 
> > > Architecturally I don't see why it's safer.
> > 
> > It eliminates one layer of device emulation (virtio-pci). Fewer
> > registers to emulate means a smaller attack surface.
> 
> Well it does not eliminate it as such, it moves it to the backend.
> Which in a variety of setups is actually a more sensitive
> place as the backend can do things like access host
> storage/network which VMM can be prevented from doing.
> 
> > It's possible to take things further, maybe with the proposed ioregionfd
> > mechanism, where the VMM's KVM_RUN loop no longer handles MMIO/PIO
> > exits. A separate process can handle them. Maybe some platform devices
> > need CPU state access though.
> > 
> > BTW I think the goal of removing as much emulation from the VMM as
> > possible is interesting.
> > 
> > Did you have some other approach in mind to remove the PCI and
> > virtio-pci device from the VMM?
> 
> Architecturally, I think we can have 3 processes:
> 
> 
> VMM -- guest device emulation -- host backend
> 
> 
> to me this looks like increasing our defence in depth strength,
> as opposed to just shifting things around ...

Cool idea.

Performance will be hard because there is separation between the guest
device emulation and the host backend.

There is also more communication code involved, which might make it hard
to change the guest device emulation <-> host backend interfaces.

These are the challenges I see but it would be awesome to run guest
device emulation in a tightly sandboxed environment that has almost no
syscalls available.

> > > Something like multi-process patches seems like a way to
> > > add defence in depth by having a process in the middle,
> > > outside both VMM and backend.
> > 
> > There is no third process in mpqemu. The VMM uses a UNIX domain socket
> > to communicate directly with the device backend. There is a PCI "proxy"
> > device in the VMM that does this communication when the guest accesses
> > registers. The device backend has a PCI "remote" host controller that a
> > PCIDevice instance is plugged into and the UNIX domain socket protocol
> > commands are translated into PCIDevice operations.
> 
> Yes, but does anything prevent us from further splitting the backend
> up to emulation part and host side part?

See above.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Outline for VHOST_USER_PROTOCOL_F_VDPA
  2020-09-30 14:57           ` Stefan Hajnoczi
@ 2020-09-30 15:31             ` Michael S. Tsirkin
  2020-09-30 15:34             ` Michael S. Tsirkin
  2020-10-01  7:28             ` Gerd Hoffmann
  2 siblings, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2020-09-30 15:31 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: lulu, tiwei.bie, jasowang, qemu-devel, raphael.norwitz,
	maxime.coquelin, kraxel, Felipe Franciosi, marcandre.lureau,
	Nikos Dragazis, changpeng.liu, Daniele Buono

On Wed, Sep 30, 2020 at 03:57:52PM +0100, Stefan Hajnoczi wrote:
> IMO "Inflight I/O tracking" is best placed into libvhost-user instead of
> the vhost-user protocol.

Oh I agree qemu does nothing with it. The reason we have it defined in
the spec is to facilitate compatibility across backends.
I have zero confidence in backend developers being able to support
e.g. cross-version migration consistently, and lots of backends
do not use libvhost-user.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Outline for VHOST_USER_PROTOCOL_F_VDPA
  2020-09-30 14:57           ` Stefan Hajnoczi
  2020-09-30 15:31             ` Michael S. Tsirkin
@ 2020-09-30 15:34             ` Michael S. Tsirkin
  2020-10-01  7:28             ` Gerd Hoffmann
  2 siblings, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2020-09-30 15:34 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: lulu, tiwei.bie, jasowang, qemu-devel, raphael.norwitz,
	maxime.coquelin, kraxel, Felipe Franciosi, marcandre.lureau,
	Nikos Dragazis, changpeng.liu, Daniele Buono

On Wed, Sep 30, 2020 at 03:57:52PM +0100, Stefan Hajnoczi wrote:
> > Architecturally, I think we can have 3 processes:
> > 
> > 
> > VMM -- guest device emulation -- host backend
> > 
> > 
> > to me this looks like increasing our defence in depth strength,
> > as opposed to just shifting things around ...
> 
> Cool idea.
> 
> Performance will be hard because there is separation between the guest
> device emulation and the host backend.

Absolutely. As a tradeoff we could put some data path things in the backend,
e.g. for virtio it is practical to have control path in emulation layer,
data path in the backend.

-- 
MST



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Outline for VHOST_USER_PROTOCOL_F_VDPA
  2020-09-30 14:57           ` Stefan Hajnoczi
  2020-09-30 15:31             ` Michael S. Tsirkin
  2020-09-30 15:34             ` Michael S. Tsirkin
@ 2020-10-01  7:28             ` Gerd Hoffmann
  2020-10-01 15:13               ` Stefan Hajnoczi
  2 siblings, 1 reply; 15+ messages in thread
From: Gerd Hoffmann @ 2020-10-01  7:28 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: lulu, tiwei.bie, Michael S. Tsirkin, jasowang, qemu-devel,
	raphael.norwitz, maxime.coquelin, Felipe Franciosi,
	marcandre.lureau, Nikos Dragazis, changpeng.liu, Daniele Buono

  Hi,

> > Architecturally, I think we can have 3 processes:
> > 
> > VMM -- guest device emulation -- host backend
> > 
> > to me this looks like increasing our defence in depth strength,
> > as opposed to just shifting things around ...
> 
> Cool idea.

Isn't that exactly what we can do once the multi-process qemu patches
did land, at least for block devices?  With "VMM" being main qemu,
"guest device emulation" being offloaded to one (or more) remote qemu
process(es), and qemu-storage-daemon being the host backend?

take care,
  Gerd



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Outline for VHOST_USER_PROTOCOL_F_VDPA
  2020-10-01  7:28             ` Gerd Hoffmann
@ 2020-10-01 15:13               ` Stefan Hajnoczi
  0 siblings, 0 replies; 15+ messages in thread
From: Stefan Hajnoczi @ 2020-10-01 15:13 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: lulu, tiwei.bie, Michael S. Tsirkin, jasowang, qemu-devel,
	raphael.norwitz, maxime.coquelin, Felipe Franciosi,
	marcandre.lureau, Nikos Dragazis, changpeng.liu, Daniele Buono

[-- Attachment #1: Type: text/plain, Size: 1471 bytes --]

On Thu, Oct 01, 2020 at 09:28:37AM +0200, Gerd Hoffmann wrote:
>   Hi,
> 
> > > Architecturally, I think we can have 3 processes:
> > > 
> > > VMM -- guest device emulation -- host backend
> > > 
> > > to me this looks like increasing our defence in depth strength,
> > > as opposed to just shifting things around ...
> > 
> > Cool idea.
> 
> Isn't that exactly what we can do once the multi-process qemu patches
> did land, at least for block devices?  With "VMM" being main qemu,
> "guest device emulation" being offloaded to one (or more) remote qemu
> process(es), and qemu-storage-daemon being the host backend?

Status of mpqemu: the current mpqemu patch series has limited
functionality (so that we can merge it sooner rather than later). Don't
expect to use it with arbitrary PCI devices yet, only the LSI SCSI
controller.

In mpqemu (and vfio-user) QEMU handles all MMIO/PIO accesses by
forwarding them to the device emulation process. Therefore QEMU is still
involved to an extent. This can be fixed with ioeventfd for doorbells,
the proposed ioregionfd mechanism for MMIO/PIO, and vfio-user mmap
regions for RAM-backed device memory.

However, QEMU itself still emulates the PCI controller. This means
PCI configuration space and other device operations still go to QEMU. In
order to fully move emulation out of QEMU we'd need to do something more
drastic and I think this is what we're discussion in this sub-thread.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Outline for VHOST_USER_PROTOCOL_F_VDPA
  2020-09-28 15:32   ` Stefan Hajnoczi
@ 2020-10-12  2:56     ` Jason Wang
  0 siblings, 0 replies; 15+ messages in thread
From: Jason Wang @ 2020-10-12  2:56 UTC (permalink / raw)
  To: Stefan Hajnoczi, Marc-André Lureau
  Cc: lulu, tiwei.bie, Michael S. Tsirkin, qemu-devel, Raphael Norwitz,
	Coquelin, Maxime, Hoffmann, Gerd, Felipe Franciosi,
	Nikos Dragazis, Liu, Changpeng, Daniele Buono


On 2020/9/28 下午11:32, Stefan Hajnoczi wrote:
> On Mon, Sep 28, 2020 at 03:21:56PM +0400, Marc-André Lureau wrote:
>> On Mon, Sep 28, 2020 at 1:25 PM Stefan Hajnoczi <stefanha@redhat.com wrote:
>>> Where this converges with multi-process QEMU
>>> --------------------------------------------
>>> At this point QEMU can run ad-hoc vhost-user backends using existing
>>> VIRTIO device models. It is possible to go further by creating a
>>> qemu-dev launcher executable that implements the vhost-user spec's
>>> "Backend program conventions". This way a minimal device emulator
>>> executable hosts the device instead of a full system emulator.
>>>
>>> The requirements for this are similar to the multi-process QEMU effort,
>>> which aims to run QEMU devices as separate processes. One of the main
>>> open questions is how to design build system and Kconfig support for
>>> building minimal device emulator executables.
>>>
>>> In the case of vhost-user-net the qemu-dev-vhost-user-net executable
>>> would contain virtio-net-device, vhost-user-backend, any netdevs the
>>> user wishes to include, a QMP monitor, and a vhost-user backend
>>> command-line interface.
>>>
>>> Where does this leave us? QEMU's existing VIRTIO device models can be
>>> used as vhost-user devices and run in a separate processes from the VMM.
>>> It's a great way of reusing code and having the flexibility to deploy it
>>> in the way that makes most sense for the intended use case.
>>>
>> My understanding is that this would only be able to expose virtio
>> devices from external processes. But vfio-user could expose more kinds
>> of devices, including the virtio devices.
>>
>> Shouldn't we focus on vfio-user now, as the general out-of-process
>> device solution?


Similar question could be asked for vDPA(kernel) vs VFIO(kernel).


> Eventually vfio-user can replace vhost-user. However, vfio-user
> development will take longer so for anyone already comfortable with
> vhost-user I think extending the protocol with vDPA ioctls is
> attractive.


My understanding is for vhost-user may advantages:

- well defined interface, this helps a lot for e.g live migration (cross 
migration among different vendors), backend disconnection, device 
failover and there will be no vendor lock
- high level abstraction, not tie to a specific bus implementation, 
micro VM that want to get rid of PCI can use MMIO transport

So it doesn't conflict with vfio(-user) which is more suitable for any 
vendor specific device (API)s.

Thanks


>
> Maybe we can get more organized around vfio-user and make progress
> quicker?
>
> Stefan



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Outline for VHOST_USER_PROTOCOL_F_VDPA
  2020-09-30  8:07         ` Michael S. Tsirkin
  2020-09-30 14:57           ` Stefan Hajnoczi
@ 2020-10-12  3:52           ` Jason Wang
  1 sibling, 0 replies; 15+ messages in thread
From: Jason Wang @ 2020-10-12  3:52 UTC (permalink / raw)
  To: Michael S. Tsirkin, Stefan Hajnoczi
  Cc: lulu, tiwei.bie, qemu-devel, raphael.norwitz, maxime.coquelin,
	kraxel, Felipe Franciosi, marcandre.lureau, Nikos Dragazis,
	changpeng.liu, Daniele Buono


On 2020/9/30 下午4:07, Michael S. Tsirkin wrote:
> On Tue, Sep 29, 2020 at 07:38:24PM +0100, Stefan Hajnoczi wrote:
>> On Tue, Sep 29, 2020 at 06:04:34AM -0400, Michael S. Tsirkin wrote:
>>> On Tue, Sep 29, 2020 at 09:57:51AM +0100, Stefan Hajnoczi wrote:
>>>> On Tue, Sep 29, 2020 at 02:09:55AM -0400, Michael S. Tsirkin wrote:
>>>>> On Mon, Sep 28, 2020 at 10:25:37AM +0100, Stefan Hajnoczi wrote:
>>>>>> Why extend vhost-user with vDPA?
>>>>>> ================================
>>>>>> Reusing VIRTIO emulation code for vhost-user backends
>>>>>> -----------------------------------------------------
>>>>>> It is a common misconception that a vhost device is a VIRTIO device.
>>>>>> VIRTIO devices are defined in the VIRTIO specification and consist of a
>>>>>> configuration space, virtqueues, and a device lifecycle that includes
>>>>>> feature negotiation. A vhost device is a subset of the corresponding
>>>>>> VIRTIO device. The exact subset depends on the device type, and some
>>>>>> vhost devices are closer to the full functionality of their
>>>>>> corresponding VIRTIO device than others. The most well-known example is
>>>>>> that vhost-net devices have rx/tx virtqueues and but lack the virtio-net
>>>>>> control virtqueue. Also, the configuration space and device lifecycle
>>>>>> are only partially available to vhost devices.
>>>>>>
>>>>>> This difference makes it impossible to use a VIRTIO device as a
>>>>>> vhost-user device and vice versa. There is an impedance mismatch and
>>>>>> missing functionality. That's a shame because existing VIRTIO device
>>>>>> emulation code is mature and duplicating it to provide vhost-user
>>>>>> backends creates additional work.
>>>>> The biggest issue facing vhost-user and absent in vdpa is
>>>>> backend disconnect handling. This is the reason control path
>>>>> is kept under QEMU control: we do not need any logic to
>>>>> restore control path data, and we can verify a new backend
>>>>> is consistent with old one.
>>>> I don't think using vhost-user with vDPA changes that. The VMM still
>>>> needs to emulate a virtio-pci/ccw/mmio device that the guest interfaces
>>>> with. If the device backend goes offline it's possible to restore that
>>>> state upon reconnection. What have I missed?
>>> The need to maintain the state in a way that is robust
>>> against backend disconnects and can be restored.
>> QEMU is only bypassed for virtqueue accesses. Everything else still
>> goes through the virtio-pci emulation in QEMU (VIRTIO configuration
>> space, status register). vDPA doesn't change this.
>>
>> Existing vhost-user messages can be kept if they are useful (e.g.
>> virtqueue state tracking). So I think the situation is no different than
>> with the existing vhost-user protocol.
>>
>>>> Regarding reconnection in general, it currently seems like a partially
>>>> solved problem in vhost-user. There is the "Inflight I/O tracking"
>>>> mechanism in the spec and some wording about reconnecting the socket,
>>>> but in practice I wouldn't expect all device types, VMMs, or device
>>>> backends to actually support reconnection. This is an area where a
>>>> uniform solution would be very welcome too.
>>> I'm not aware of big issues. What are they?
>> I think "Inflight I/O tracking" can only be used when request processing
>> is idempotent? In other words, it can only be used when submitting the
>> same request multiple times is safe.
> Not inherently it just does not attempt to address this problem.
>
>
> Inflight tracking only tries to address issues on the guest side,
> that is, making sure the same buffer is used exactly once.
>

As discussed, if we design virito ring carefully, there's probably no 
need for using extra metadata for inflight tracking.

And I remember that the current inflight tracking doesn't support packed 
virtqueue.

Thanks



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-10-12  3:53 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-28  9:25 Outline for VHOST_USER_PROTOCOL_F_VDPA Stefan Hajnoczi
2020-09-28 11:21 ` Marc-André Lureau
2020-09-28 15:32   ` Stefan Hajnoczi
2020-10-12  2:56     ` Jason Wang
2020-09-29  6:09 ` Michael S. Tsirkin
2020-09-29  8:57   ` Stefan Hajnoczi
2020-09-29 10:04     ` Michael S. Tsirkin
2020-09-29 18:38       ` Stefan Hajnoczi
2020-09-30  8:07         ` Michael S. Tsirkin
2020-09-30 14:57           ` Stefan Hajnoczi
2020-09-30 15:31             ` Michael S. Tsirkin
2020-09-30 15:34             ` Michael S. Tsirkin
2020-10-01  7:28             ` Gerd Hoffmann
2020-10-01 15:13               ` Stefan Hajnoczi
2020-10-12  3:52           ` Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.