All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] Towards an ivshmem 2.0?
@ 2017-01-16  8:36 Jan Kiszka
  2017-01-16 12:41 ` Marc-André Lureau
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Jan Kiszka @ 2017-01-16  8:36 UTC (permalink / raw)
  To: qemu-devel, Jailhouse

[-- Attachment #1: Type: text/plain, Size: 4640 bytes --]

Hi,

some of you may know that we are using a shared memory device similar to
ivshmem in the partitioning hypervisor Jailhouse [1].

We started as being compatible to the original ivshmem that QEMU
implements, but we quickly deviated in some details, and in the recent
months even more. Some of the deviations are related to making the
implementation simpler. The new ivshmem takes <500 LoC - Jailhouse is
aiming at safety critical systems and, therefore, a small code base.
Other changes address deficits in the original design, like missing
life-cycle management.

Now the question is if there is interest in defining a common new
revision of this device and maybe also of some protocols used on top,
such as virtual network links. Ideally, this would enable us to share
Linux drivers. We will definitely go for upstreaming at least a network
driver such as [2], a UIO driver and maybe also a serial port/console.

I've attached a first draft of the specification of our new ivshmem
device. A working implementation can be found in the wip/ivshmem2 branch
of Jailhouse [3], the corresponding ivshmem-net driver in [4].

Deviations from the original design:

- Only two peers per link

  This simplifies the implementation and also the interfaces (think of
  life-cycle management in a multi-peer environment). Moreover, we do
  not have an urgent use case for multiple peers, thus also not
  reference for a protocol that could be used in such setups. If someone
  else happens to share such a protocol, it would be possible to discuss
  potential extensions and their implications.

- Side-band registers to discover and configure share memory regions

  This was one of the first changes: We removed the memory regions from
  the PCI BARs and gave them special configuration space registers. By
  now, these registers are embedded in a PCI capability. The reasons are
  that Jailhouse does not allow to relocate the regions in guest address
  space (but other hypervisors may if they like to) and that we now have
  up to three of them.

- Changed PCI base class code to 0xff (unspecified class)

  This allows us to define our own sub classes and interfaces. That is
  now exploited for specifying the shared memory protocol the two
  connected peers should use. It also allows the Linux drivers to match
  on that.

- INTx interrupts support is back

  This is needed on target platforms without MSI controllers, i.e.
  without the required guest support. Namely some PCI-less ARM SoCs
  required the reintroduction. While doing this, we also took care of
  keeping the MMIO registers free of privileged controls so that a
  guest OS can map them safely into a guest userspace application.

And then there are some extensions of the original ivshmem:

- Multiple shared memory regions, including unidirectional ones

  It is now possible to expose up to three different shared memory
  regions: The first one is read/writable for both sides. The second
  region is read/writable for the local peer and read-only for the
  remote peer (useful for output queues). And the third is read-only
  locally but read/writable remotely (ie. for input queues).
  Unidirectional regions prevent that the receiver of some data can
  interfere with the sender while it is still building the message, a
  property that is not only useful for safety critical communication,
  we are sure.

- Life-cycle management via local and remote state

  Each device can now signal its own state in form of a value to the
  remote side, which triggers an event there. Moreover, state changes
  done by the hypervisor to one peer are signalled to the other side.
  And we introduced a write-to-shared-memory mechanism for the
  respective remote state so that guests do not have to issue an MMIO
  access in order to check the state.

So, this is our proposal. Would be great to hear some opinions if you
see value in adding support for such an "ivshmem 2.0" device to QEMU as
well and expand its ecosystem towards Linux upstream, maybe also DPDK
again. If you see problems in the new design /wrt what QEMU provides so
far with its ivshmem device, let's discuss how to resolve them. Looking
forward to any feedback!

Jan

[1] https://github.com/siemens/jailhouse
[2]
http://git.kiszka.org/?p=linux.git;a=blob;f=drivers/net/ivshmem-net.c;h=0e770ca293a4aca14a55ac0e66871b09c82647af;hb=refs/heads/queues/jailhouse
[3] https://github.com/siemens/jailhouse/commits/wip/ivshmem2
[4]
http://git.kiszka.org/?p=linux.git;a=shortlog;h=refs/heads/queues/jailhouse-ivshmem2

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux

[-- Attachment #2: ivshmem-v2-specification.md --]
[-- Type: text/markdown, Size: 8840 bytes --]

IVSHMEM Device Specification
============================

The Inter-VM Shared Memory device provides the following features to its users:

- Interconnection between two peers

- Up to three shared memory regions per connection

    - one read/writable for both sides

    - two unidirectional, i.e. read/writable for one side and only readable for
      the other

- Event signaling via interrupt to the remote side

- Support for life-cycle management via state value exchange and interrupt
  notification on changes

- Free choice of protocol to be used on top

- Optional protocol type suggestion to both sides

- Unprivileged access to memory-mapped control and status registers feasible

- Discoverable and configurable via standard PCI mechanisms


Provider Model
--------------

In order to provide a consistent link between two peers, two instances of the
IVSHMEM device need to be configured, created and run by the provider according
to the following requirements:

- The instances of the device need to be accessible via PCI programming
  interfaces on both sides.

- If present, the first shared memory region of both devices have to be of the
  same size and have to be backed by the same physical memory.

- If present, the second shared memory region has to be configured to be
  read/writable for the user of the device.

- If present, the third shared memory region has to be configured to be
  read-only for the user of the device.

- If the second shared memory region of one side is present, the third shared
  memory region of the other side needs to be present as well, both regions have
  to be of the size, and both have to be backed by the same physical memory.

- Interrupts events triggered by one side have to be delivered to other side,
  provided the receiving side has enabled the delivery.

- State register changes on one side have to be propagated to the other side.

- The value of the suggested protocol type needs to be identical on both sides.


Programming Model
-----------------

An IVSHMEM device appears as a PCI device to its users. Unless otherwise noted,
it conforms to the PCI Local Bus Specification, Revision 3.0 As such, it is
discoverable via the PCI configuration space and provides a number of standard
and custom PCI configuration registers.

### Configuration Space Registers

#### Header Registers

Offset | Register               | Content
------:|:---------------------- |:-------------------------------------------
   00h | Vendor ID              | 1AF4h
   02h | Device ID              | 1110h
   04h | Command Register       | 0000h on reset, implementing bits 1, 2, 10
   06h | Status Register        | 0010h, static value (bit 3 not implemented)
   08h | Revision ID            | 00h
   09h | Class Code, Interface  | Protocol Revision, see [Protocols](#Protocols)
   0Ah | Class Code, Sub-Class  | Protocol Type, see [Protocols](#Protocols)
   0Bh | Class Code, Base Class | FFh
   0Eh | Header Type            | 00h
   10h | BAR 0 (with BAR 1)     | 64-bit MMIO register region
   18h | BAR 2 (with BAR 3)     | 64-bit MSI-X region
   2Ch | Subsystem Vendor ID    | 1AF4h or provider-specifc value
   2Eh | Subsystem ID           | 1110h or provider-specifc value
   34h | Capability Pointer     | First capability
   3Eh | Interrupt Pin          | 01h-04h, may be 00h if MSI-X is available

Other header registers may not be implemented. If not implemented, they return 0
on read and ignore write accesses.

#### Vendor Specific Capability (ID 09h)

Offset | Register         | Content
------:|:---------------- |:-------------------------------------------------
   00h | ID               | 09h
   01h | Next Capability  | Pointer to next capability or 00h
   02h | Length           | 34h
   03h | Flags            | Bit 0: Enable INTx (0 on reset), Bits 1-7: RsvdZ
   04h | Region Address 0 | 64-bit adddress of read-write region 0
   0Ch | Region Size 0    | 64-bit size of region 0
   14h | Region Address 1 | 64-bit adddress of unidirectional output region 1
   1Ch | Region Size 1    | 64-bit size of region 1
   24h | Region Address 2 | 64-bit adddress of unidirectional input region 2
   2Ch | Region Size 2    | 64-bit size of region 2

All registers are read-only, except for bit 0 of the Flags register and the
Region Address registers under certain conditions.

If an IVSHMEM device supports relocatable shared memory regions, Region Address
registers have to be implemented read-writable if the region has a non-zero
size. The reset value of the Region Address registers is 0 in that case. In
order to define the location of a region in the user's address space, bit 1 on
the Command register has to cleared and the desired address has to written to
the Region Address register.

If an IVSHMEM device does not support relocation of its shared memory regions,
the Region Address register have to implemented read-only. Region Address
registers of regions with non-zero size have to be pre-initialized by the
provide to report the location of the region in the user's address space.

An non-existing shared-memory region has to report 0 in both its Region Address
and Region Size registers, and the Region Address register must be implemented
read-only.

#### MSI-X Capability (ID 11h)

On platform support MSI-X, IVSHMEM has to provide interrupt delivery via this
mechanism. In that case, the legacy INTx delivery mechanism may not be
available, and the Interrupt Pin configuration register returns 0.

The IVSHMEM device has no notion of pending interrupts. Therefore, reading from
the MSI-X Pending Bit Array will always return 0.

The corresponding MSI-X MMIO region is configured via BAR 2.
 

### MMIO Register Region

The IVSHMEM device provider has to ensure that the MMIO register region can be
mapped as one page into the address space of the user. Write accesses to region
offsets that are not backed by registers have to be ignored, read accesses have
to return 0. This enables the user to hand out the complete region, along with
the shared memory regions, to an unprivileged instance.

The region location in the user's physical address space is configured via BAR
0. The following table visualizes the region layout:

Offset | Register
------:|:------------------
   00h | ID
   04h | Doorbell
   08h | Local State
   0Ch | Remote State
   10h | Remote State Write

#### ID Register (Offset 00h)

Read-only register that reports the ID of the device, 0 or 1. It is unique for
each of two connected devices and remains unchanged over the lifetime of an
IVSHMEM device.

#### Doorbell Register (Offset 04h)

Write-only register that triggers an interrupt vector in the remote device if it
is enabled there. The vector number is defined by the value written to the
register. Writing an invalid vector number has no effect.

The behavior on reading from this register is undefined.

#### Local State Register (Offset 08h)

Read/write register that defines the state of the local device. Writing to this
register sets the state and triggers interrupt vector 0 on the remote device.
The user of the remote device can read the value written to this register from
the corresponding Remote State Register or from the shared memory address
defined remotely via the Remote State Write Register.

The value of this register after reset is 0.

#### Remote State Register (Offset 0Ch)

Read-only register that reports the current state of the remote device. If the
remote device is currently not present, 0 is returned.

#### Remote State Write Register (Offset 10h)

This registers controls the writing of remote state changes to a shared memory
region at a defined offset. It enables the user to check its peer state without
issuing a more costly MMIO register access.

The remote state is written once when enabling this feature and then on each
state change of the remote device. If the remote device disappears, 0 is
written.

Bits | Content
----:| -----------
   0 | Enable remote state write
   1 | 0: write to region 0, 1: write to region 1
2-63 | Write offset in selected region


Protocols
---------

The IVSHMEM device shall enable both sides of a connection to agree on the protocol used over the shared memory devices. For that purpose, the sub-class byte of the Class Code register (offset 0Ah) of two connected devices encode a protocol type suggestion for the users. The following type values are defined:

Protocol Type | Description
-------------:| ----------------------
          00h | Undefined type
          01h | Virtual Ethernet
          02h | Virtual serial port
      03h-7Fh | Reserved
      80h-FFh | User-defined protocols

The interface byte of the Class Code register (offset 09h) encodes the revision of the protocol, starting with 0 for the first release.

Details of the protocol are not in the scope of this specification.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] Towards an ivshmem 2.0?
  2017-01-16  8:36 [Qemu-devel] Towards an ivshmem 2.0? Jan Kiszka
@ 2017-01-16 12:41 ` Marc-André Lureau
  2017-01-16 13:10   ` Jan Kiszka
  2017-01-16 14:18 ` Stefan Hajnoczi
  2017-01-23 14:19 ` Markus Armbruster
  2 siblings, 1 reply; 29+ messages in thread
From: Marc-André Lureau @ 2017-01-16 12:41 UTC (permalink / raw)
  To: Jan Kiszka, qemu-devel, Jailhouse; +Cc: Wei Wang, Markus Armbruster

Hi

On Mon, Jan 16, 2017 at 12:37 PM Jan Kiszka <jan.kiszka@siemens.com> wrote:

> Hi,
>
> some of you may know that we are using a shared memory device similar to
> ivshmem in the partitioning hypervisor Jailhouse [1].
>
> We started as being compatible to the original ivshmem that QEMU
> implements, but we quickly deviated in some details, and in the recent
> months even more. Some of the deviations are related to making the
> implementation simpler. The new ivshmem takes <500 LoC - Jailhouse is
> aiming at safety critical systems and, therefore, a small code base.
> Other changes address deficits in the original design, like missing
> life-cycle management.
>
> Now the question is if there is interest in defining a common new
> revision of this device and maybe also of some protocols used on top,
> such as virtual network links. Ideally, this would enable us to share
> Linux drivers. We will definitely go for upstreaming at least a network
> driver such as [2], a UIO driver and maybe also a serial port/console.
>
>
This sounds like duplicating efforts done with virtio and vhost-pci. Have
you looked at Wei Wang proposal?

I've attached a first draft of the specification of our new ivshmem
> device. A working implementation can be found in the wip/ivshmem2 branch
> of Jailhouse [3], the corresponding ivshmem-net driver in [4].
>

You don't have qemu branch, right?


>
> Deviations from the original design:
>
> - Only two peers per link
>
>
sound sane, that's also what vhost-pci aims to afaik


>   This simplifies the implementation and also the interfaces (think of
>   life-cycle management in a multi-peer environment). Moreover, we do
>   not have an urgent use case for multiple peers, thus also not
>   reference for a protocol that could be used in such setups. If someone
>   else happens to share such a protocol, it would be possible to discuss
>   potential extensions and their implications.
>
> - Side-band registers to discover and configure share memory regions
>
>   This was one of the first changes: We removed the memory regions from
>   the PCI BARs and gave them special configuration space registers. By
>   now, these registers are embedded in a PCI capability. The reasons are
>   that Jailhouse does not allow to relocate the regions in guest address
>   space (but other hypervisors may if they like to) and that we now have
>   up to three of them.
>

 Sorry, I can't comment on that.


> - Changed PCI base class code to 0xff (unspecified class)
>
>   This allows us to define our own sub classes and interfaces. That is
>   now exploited for specifying the shared memory protocol the two
>   connected peers should use. It also allows the Linux drivers to match
>   on that.
>
>
Why not, but it worries me that you are going to invent protocols similar
to virtio devices, aren't you?


> - INTx interrupts support is back
>
>   This is needed on target platforms without MSI controllers, i.e.
>   without the required guest support. Namely some PCI-less ARM SoCs
>   required the reintroduction. While doing this, we also took care of
>   keeping the MMIO registers free of privileged controls so that a
>   guest OS can map them safely into a guest userspace application.
>
>
Right, it's not completely removed from ivshmem qemu upstream, although it
should probably be allowed to setup a doorbell-ivshmem with msi=off (this
may be quite trivial to add back)


> And then there are some extensions of the original ivshmem:
>
> - Multiple shared memory regions, including unidirectional ones
>
>   It is now possible to expose up to three different shared memory
>   regions: The first one is read/writable for both sides. The second
>   region is read/writable for the local peer and read-only for the
>   remote peer (useful for output queues). And the third is read-only
>   locally but read/writable remotely (ie. for input queues).
>   Unidirectional regions prevent that the receiver of some data can
>   interfere with the sender while it is still building the message, a
>   property that is not only useful for safety critical communication,
>   we are sure.
>

Sounds like a good idea, and something we may want in virtio too

>
> - Life-cycle management via local and remote state
>
>   Each device can now signal its own state in form of a value to the
>   remote side, which triggers an event there. Moreover, state changes
>   done by the hypervisor to one peer are signalled to the other side.
>   And we introduced a write-to-shared-memory mechanism for the
>   respective remote state so that guests do not have to issue an MMIO
>   access in order to check the state.
>

There is also ongoing work to better support disconnect/reconnect in
virtio.


>
> So, this is our proposal. Would be great to hear some opinions if you
> see value in adding support for such an "ivshmem 2.0" device to QEMU as
> well and expand its ecosystem towards Linux upstream, maybe also DPDK
> again. If you see problems in the new design /wrt what QEMU provides so
> far with its ivshmem device, let's discuss how to resolve them. Looking
> forward to any feedback!
>
>
My feeling is that ivshmem is not being actively developped in qemu, but
rather virtio-based solutions (vhost-pci for vm2vm).

Jan
>
> [1] https://github.com/siemens/jailhouse
> [2]
>
> http://git.kiszka.org/?p=linux.git;a=blob;f=drivers/net/ivshmem-net.c;h=0e770ca293a4aca14a55ac0e66871b09c82647af;hb=refs/heads/queues/jailhouse
> [3] https://github.com/siemens/jailhouse/commits/wip/ivshmem2
> [4]
>
> http://git.kiszka.org/?p=linux.git;a=shortlog;h=refs/heads/queues/jailhouse-ivshmem2
>
> --
> Siemens AG, Corporate Technology, CT RDA ITP SES-DE
> Corporate Competence Center Embedded Linux
>
-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] Towards an ivshmem 2.0?
  2017-01-16 12:41 ` Marc-André Lureau
@ 2017-01-16 13:10   ` Jan Kiszka
  2017-01-17  9:13     ` Wang, Wei W
  2017-01-17  9:59     ` Stefan Hajnoczi
  0 siblings, 2 replies; 29+ messages in thread
From: Jan Kiszka @ 2017-01-16 13:10 UTC (permalink / raw)
  To: Marc-André Lureau, qemu-devel, Jailhouse; +Cc: Wei Wang, Markus Armbruster

Hi Marc-André,

On 2017-01-16 13:41, Marc-André Lureau wrote:
> Hi
> 
> On Mon, Jan 16, 2017 at 12:37 PM Jan Kiszka <jan.kiszka@siemens.com
> <mailto:jan.kiszka@siemens.com>> wrote:
> 
>     Hi,
> 
>     some of you may know that we are using a shared memory device similar to
>     ivshmem in the partitioning hypervisor Jailhouse [1].
> 
>     We started as being compatible to the original ivshmem that QEMU
>     implements, but we quickly deviated in some details, and in the recent
>     months even more. Some of the deviations are related to making the
>     implementation simpler. The new ivshmem takes <500 LoC - Jailhouse is
>     aiming at safety critical systems and, therefore, a small code base.
>     Other changes address deficits in the original design, like missing
>     life-cycle management.
> 
>     Now the question is if there is interest in defining a common new
>     revision of this device and maybe also of some protocols used on top,
>     such as virtual network links. Ideally, this would enable us to share
>     Linux drivers. We will definitely go for upstreaming at least a network
>     driver such as [2], a UIO driver and maybe also a serial port/console.
> 
> 
> This sounds like duplicating efforts done with virtio and vhost-pci.
> Have you looked at Wei Wang proposal?

I didn't follow it recently, but the original concept was about
introducing an IOMMU model to the picture, and that's complexity-wise a
no-go for us (we can do this whole thing in less than 500 lines, even
virtio itself is more complex). IIUC, the alternative to an IOMMU is
mapping the whole frontend VM memory into the backend VM - that's
security/safety-wise an absolute no-go.

> 
>     I've attached a first draft of the specification of our new ivshmem
>     device. A working implementation can be found in the wip/ivshmem2 branch
>     of Jailhouse [3], the corresponding ivshmem-net driver in [4].
> 
> 
> You don't have qemu branch, right?

Yes, not yet. I would look into creating a QEMU device model if there is
serious interest.

>  
> 
> 
>     Deviations from the original design:
> 
>     - Only two peers per link
> 
> 
> sound sane, that's also what vhost-pci aims to afaik
>  
> 
>       This simplifies the implementation and also the interfaces (think of
>       life-cycle management in a multi-peer environment). Moreover, we do
>       not have an urgent use case for multiple peers, thus also not
>       reference for a protocol that could be used in such setups. If someone
>       else happens to share such a protocol, it would be possible to discuss
>       potential extensions and their implications.
> 
>     - Side-band registers to discover and configure share memory regions
> 
>       This was one of the first changes: We removed the memory regions from
>       the PCI BARs and gave them special configuration space registers. By
>       now, these registers are embedded in a PCI capability. The reasons are
>       that Jailhouse does not allow to relocate the regions in guest address
>       space (but other hypervisors may if they like to) and that we now have
>       up to three of them.
> 
> 
>  Sorry, I can't comment on that.
> 
> 
>     - Changed PCI base class code to 0xff (unspecified class)
> 
>       This allows us to define our own sub classes and interfaces. That is
>       now exploited for specifying the shared memory protocol the two
>       connected peers should use. It also allows the Linux drivers to match
>       on that.
> 
> 
> Why not, but it worries me that you are going to invent protocols
> similar to virtio devices, aren't you?

That partly comes with the desire to simplify the transport (pure shared
memory). With ivshmem-net, we are at least reusing virtio rings and will
try to do this with the new (and faster) virtio ring format as well.

>  
> 
>     - INTx interrupts support is back
> 
>       This is needed on target platforms without MSI controllers, i.e.
>       without the required guest support. Namely some PCI-less ARM SoCs
>       required the reintroduction. While doing this, we also took care of
>       keeping the MMIO registers free of privileged controls so that a
>       guest OS can map them safely into a guest userspace application.
> 
> 
> Right, it's not completely removed from ivshmem qemu upstream, although
> it should probably be allowed to setup a doorbell-ivshmem with msi=off
> (this may be quite trivial to add back)
>  
> 
>     And then there are some extensions of the original ivshmem:
> 
>     - Multiple shared memory regions, including unidirectional ones
> 
>       It is now possible to expose up to three different shared memory
>       regions: The first one is read/writable for both sides. The second
>       region is read/writable for the local peer and read-only for the
>       remote peer (useful for output queues). And the third is read-only
>       locally but read/writable remotely (ie. for input queues).
>       Unidirectional regions prevent that the receiver of some data can
>       interfere with the sender while it is still building the message, a
>       property that is not only useful for safety critical communication,
>       we are sure.
> 
> 
> Sounds like a good idea, and something we may want in virtio too
> 
> 
>     - Life-cycle management via local and remote state
> 
>       Each device can now signal its own state in form of a value to the
>       remote side, which triggers an event there. Moreover, state changes
>       done by the hypervisor to one peer are signalled to the other side.
>       And we introduced a write-to-shared-memory mechanism for the
>       respective remote state so that guests do not have to issue an MMIO
>       access in order to check the state.
> 
> 
> There is also ongoing work to better support disconnect/reconnect in
> virtio.
>  
> 
> 
>     So, this is our proposal. Would be great to hear some opinions if you
>     see value in adding support for such an "ivshmem 2.0" device to QEMU as
>     well and expand its ecosystem towards Linux upstream, maybe also DPDK
>     again. If you see problems in the new design /wrt what QEMU provides so
>     far with its ivshmem device, let's discuss how to resolve them. Looking
>     forward to any feedback!
> 
> 
> My feeling is that ivshmem is not being actively developped in qemu, but
> rather virtio-based solutions (vhost-pci for vm2vm).

As pointed out, for us it's most important to keep the design simple -
even at the price of "reinventing" some drivers for upstream (at least,
we do not need two sets of drivers because our interface is fully
symmetric). I don't see yet how vhost-pci could achieve the same, but
I'm open to learn more!

Thanks,
Jan

> 
>     Jan
> 
>     [1] https://github.com/siemens/jailhouse
>     [2]
>     http://git.kiszka.org/?p=linux.git;a=blob;f=drivers/net/ivshmem-net.c;h=0e770ca293a4aca14a55ac0e66871b09c82647af;hb=refs/heads/queues/jailhouse
>     [3] https://github.com/siemens/jailhouse/commits/wip/ivshmem2
>     [4]
>     http://git.kiszka.org/?p=linux.git;a=shortlog;h=refs/heads/queues/jailhouse-ivshmem2
> 
>     --
>     Siemens AG, Corporate Technology, CT RDA ITP SES-DE
>     Corporate Competence Center Embedded Linux
> 
> -- 
> Marc-André Lureau

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] Towards an ivshmem 2.0?
  2017-01-16  8:36 [Qemu-devel] Towards an ivshmem 2.0? Jan Kiszka
  2017-01-16 12:41 ` Marc-André Lureau
@ 2017-01-16 14:18 ` Stefan Hajnoczi
  2017-01-16 14:34   ` Jan Kiszka
  2017-01-23 14:19 ` Markus Armbruster
  2 siblings, 1 reply; 29+ messages in thread
From: Stefan Hajnoczi @ 2017-01-16 14:18 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: qemu-devel, Jailhouse

[-- Attachment #1: Type: text/plain, Size: 939 bytes --]

On Mon, Jan 16, 2017 at 09:36:51AM +0100, Jan Kiszka wrote:
> some of you may know that we are using a shared memory device similar to
> ivshmem in the partitioning hypervisor Jailhouse [1].
> 
> We started as being compatible to the original ivshmem that QEMU
> implements, but we quickly deviated in some details, and in the recent
> months even more. Some of the deviations are related to making the
> implementation simpler. The new ivshmem takes <500 LoC - Jailhouse is
> aiming at safety critical systems and, therefore, a small code base.
> Other changes address deficits in the original design, like missing
> life-cycle management.

My first thought is "what about virtio?".  Can you share some background
on why ivshmem fits the use case better than virtio?

The reason I ask is because the ivshmem devices you define would have
parallels to existing virtio devices and this could lead to duplication.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] Towards an ivshmem 2.0?
  2017-01-16 14:18 ` Stefan Hajnoczi
@ 2017-01-16 14:34   ` Jan Kiszka
  2017-01-17 10:00     ` Stefan Hajnoczi
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Kiszka @ 2017-01-16 14:34 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel, Jailhouse

On 2017-01-16 15:18, Stefan Hajnoczi wrote:
> On Mon, Jan 16, 2017 at 09:36:51AM +0100, Jan Kiszka wrote:
>> some of you may know that we are using a shared memory device similar to
>> ivshmem in the partitioning hypervisor Jailhouse [1].
>>
>> We started as being compatible to the original ivshmem that QEMU
>> implements, but we quickly deviated in some details, and in the recent
>> months even more. Some of the deviations are related to making the
>> implementation simpler. The new ivshmem takes <500 LoC - Jailhouse is
>> aiming at safety critical systems and, therefore, a small code base.
>> Other changes address deficits in the original design, like missing
>> life-cycle management.
> 
> My first thought is "what about virtio?".  Can you share some background
> on why ivshmem fits the use case better than virtio?
> 
> The reason I ask is because the ivshmem devices you define would have
> parallels to existing virtio devices and this could lead to duplication.

virtio was created as an interface between a host and a guest. It has no
notion of direct (or even symmetric) connection between guests. With
ivshmem, we want to establish only a minimal host-guest interface. We
want to keep the host out of the business negotiating protocol details
between two connected guests.

So, the trade-off was between reusing existing virtio drivers - in the
best case, some changes would have been required definitely - and
requiring complex translation of virtio into a vm-to-vm model on the one
side and establishing a new driver ecosystem on much simpler host
services (500 LoC...). We went for the latter.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] Towards an ivshmem 2.0?
  2017-01-16 13:10   ` Jan Kiszka
@ 2017-01-17  9:13     ` Wang, Wei W
  2017-01-17  9:46       ` Jan Kiszka
  2017-01-17  9:59     ` Stefan Hajnoczi
  1 sibling, 1 reply; 29+ messages in thread
From: Wang, Wei W @ 2017-01-17  9:13 UTC (permalink / raw)
  To: Jan Kiszka, Marc-André Lureau, qemu-devel, Jailhouse
  Cc: Markus Armbruster

Hi Jan,

On Monday, January 16, 2017 9:10 PM, Jan Kiszka wrote:
> On 2017-01-16 13:41, Marc-André Lureau wrote:
> > On Mon, Jan 16, 2017 at 12:37 PM Jan Kiszka <jan.kiszka@siemens.com
> > <mailto:jan.kiszka@siemens.com>> wrote:
> >     some of you may know that we are using a shared memory device similar to
> >     ivshmem in the partitioning hypervisor Jailhouse [1].
> >
> >     We started as being compatible to the original ivshmem that QEMU
> >     implements, but we quickly deviated in some details, and in the recent
> >     months even more. Some of the deviations are related to making the
> >     implementation simpler. The new ivshmem takes <500 LoC - Jailhouse is
> >     aiming at safety critical systems and, therefore, a small code base.
> >     Other changes address deficits in the original design, like missing
> >     life-cycle management.
> >
> >     Now the question is if there is interest in defining a common new
> >     revision of this device and maybe also of some protocols used on top,
> >     such as virtual network links. Ideally, this would enable us to share
> >     Linux drivers. We will definitely go for upstreaming at least a network
> >     driver such as [2], a UIO driver and maybe also a serial port/console.
> >
> >
> > This sounds like duplicating efforts done with virtio and vhost-pci.
> > Have you looked at Wei Wang proposal?
> 
> I didn't follow it recently, but the original concept was about introducing an
> IOMMU model to the picture, and that's complexity-wise a no-go for us (we can
> do this whole thing in less than 500 lines, even virtio itself is more complex). IIUC,
> the alternative to an IOMMU is mapping the whole frontend VM memory into
> the backend VM - that's security/safety-wise an absolute no-go.

Though the virtio based solution might be complex for you, a big advantage is that we have lots of people working to improve virtio. For example, the upcoming virtio 1.1 has vring improvement, we can easily upgrade all the virtio based solutions, such as vhost-pci, to take advantage of this improvement. From the long term perspective, I think this kind of complexity is worthwhile.

We further have security features(e.g. vIOMMU) can be applied to vhost-pci.

> >
> >     Deviations from the original design:
> >
> >     - Only two peers per link
> >
> >
> > sound sane, that's also what vhost-pci aims to afaik
> >
> >
> >       This simplifies the implementation and also the interfaces (think of
> >       life-cycle management in a multi-peer environment). Moreover, we do
> >       not have an urgent use case for multiple peers, thus also not
> >       reference for a protocol that could be used in such setups. If someone
> >       else happens to share such a protocol, it would be possible to discuss
> >       potential extensions and their implications.
> >
> >     - Side-band registers to discover and configure share memory
> > regions
> >
> >       This was one of the first changes: We removed the memory regions from
> >       the PCI BARs and gave them special configuration space registers. By
> >       now, these registers are embedded in a PCI capability. The reasons are
> >       that Jailhouse does not allow to relocate the regions in guest address
> >       space (but other hypervisors may if they like to) and that we now have
> >       up to three of them.
> >
> >
> >  Sorry, I can't comment on that.
> >
> >
> >     - Changed PCI base class code to 0xff (unspecified class)
> >
> >       This allows us to define our own sub classes and interfaces. That is
> >       now exploited for specifying the shared memory protocol the two
> >       connected peers should use. It also allows the Linux drivers to match
> >       on that.
> >
> >
> > Why not, but it worries me that you are going to invent protocols
> > similar to virtio devices, aren't you?
> 
> That partly comes with the desire to simplify the transport (pure shared memory).
> With ivshmem-net, we are at least reusing virtio rings and will try to do this with
> the new (and faster) virtio ring format as well.
> 
> >
> >
> >     - INTx interrupts support is back
> >
> >       This is needed on target platforms without MSI controllers, i.e.
> >       without the required guest support. Namely some PCI-less ARM SoCs
> >       required the reintroduction. While doing this, we also took care of
> >       keeping the MMIO registers free of privileged controls so that a
> >       guest OS can map them safely into a guest userspace application.
> >
> >
> > Right, it's not completely removed from ivshmem qemu upstream,
> > although it should probably be allowed to setup a doorbell-ivshmem
> > with msi=off (this may be quite trivial to add back)
> >
> >
> >     And then there are some extensions of the original ivshmem:
> >
> >     - Multiple shared memory regions, including unidirectional ones
> >
> >       It is now possible to expose up to three different shared memory
> >       regions: The first one is read/writable for both sides. The second
> >       region is read/writable for the local peer and read-only for the
> >       remote peer (useful for output queues). And the third is read-only
> >       locally but read/writable remotely (ie. for input queues).
> >       Unidirectional regions prevent that the receiver of some data can
> >       interfere with the sender while it is still building the message, a
> >       property that is not only useful for safety critical communication,
> >       we are sure.
> >
> >
> > Sounds like a good idea, and something we may want in virtio too

Can you please explain more about the process of transferring a packet using the three different memory regions?
In the kernel implementation, the sk_buf can be allocated anywhere.

Btw, this looks similar to the memory access protection mechanism using EPTP switching:
Slide 25 http://www.linux-kvm.org/images/8/87/02x09-Aspen-Jun_Nakajima-KVM_as_the_NFV_Hypervisor.pdf
This missed right side of the figure is an alternative EPT, which gives a full access permission to the small piece of security code.

> >
> >
> >     - Life-cycle management via local and remote state
> >
> >       Each device can now signal its own state in form of a value to the
> >       remote side, which triggers an event there. Moreover, state changes
> >       done by the hypervisor to one peer are signalled to the other side.
> >       And we introduced a write-to-shared-memory mechanism for the
> >       respective remote state so that guests do not have to issue an MMIO
> >       access in order to check the state.
> >
> >
> > There is also ongoing work to better support disconnect/reconnect in
> > virtio.
> >
> >
> >
> >     So, this is our proposal. Would be great to hear some opinions if you
> >     see value in adding support for such an "ivshmem 2.0" device to QEMU as
> >     well and expand its ecosystem towards Linux upstream, maybe also DPDK
> >     again. If you see problems in the new design /wrt what QEMU provides so
> >     far with its ivshmem device, let's discuss how to resolve them. Looking
> >     forward to any feedback!
> >
> >
> > My feeling is that ivshmem is not being actively developped in qemu,
> > but rather virtio-based solutions (vhost-pci for vm2vm).
> 
> As pointed out, for us it's most important to keep the design simple - even at the
> price of "reinventing" some drivers for upstream (at least, we do not need two
> sets of drivers because our interface is fully symmetric). I don't see yet how
> vhost-pci could achieve the same, but I'm open to learn more!

Maybe I didn’t fully understand this - "we do not need two sets of drivers because our interface is fully symmetric"?

The vhost-pci driver is a standalone network driver from the local guest point of view - it's no different than any other network drivers in the guest. When talking about usage,  it's used together with another VM's virtio device - would this be the "two sets of drivers" that you meant? I think this is pretty nature and reasonable, as it is essentially a vm-to-vm communication. Furthermore, we are able to dynamically create/destroy and hot-plug in/out a vhost-pci device based on runtime requests. 

Thanks for sharing your ideas.

Best,
Wei

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] Towards an ivshmem 2.0?
  2017-01-17  9:13     ` Wang, Wei W
@ 2017-01-17  9:46       ` Jan Kiszka
  2017-01-20 11:54         ` Wang, Wei W
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Kiszka @ 2017-01-17  9:46 UTC (permalink / raw)
  To: Wang, Wei W, Marc-André Lureau, qemu-devel, Jailhouse
  Cc: Markus Armbruster

On 2017-01-17 10:13, Wang, Wei W wrote:
> Hi Jan,
> 
> On Monday, January 16, 2017 9:10 PM, Jan Kiszka wrote:
>> On 2017-01-16 13:41, Marc-André Lureau wrote:
>>> On Mon, Jan 16, 2017 at 12:37 PM Jan Kiszka <jan.kiszka@siemens.com
>>> <mailto:jan.kiszka@siemens.com>> wrote:
>>>     some of you may know that we are using a shared memory device similar to
>>>     ivshmem in the partitioning hypervisor Jailhouse [1].
>>>
>>>     We started as being compatible to the original ivshmem that QEMU
>>>     implements, but we quickly deviated in some details, and in the recent
>>>     months even more. Some of the deviations are related to making the
>>>     implementation simpler. The new ivshmem takes <500 LoC - Jailhouse is
>>>     aiming at safety critical systems and, therefore, a small code base.
>>>     Other changes address deficits in the original design, like missing
>>>     life-cycle management.
>>>
>>>     Now the question is if there is interest in defining a common new
>>>     revision of this device and maybe also of some protocols used on top,
>>>     such as virtual network links. Ideally, this would enable us to share
>>>     Linux drivers. We will definitely go for upstreaming at least a network
>>>     driver such as [2], a UIO driver and maybe also a serial port/console.
>>>
>>>
>>> This sounds like duplicating efforts done with virtio and vhost-pci.
>>> Have you looked at Wei Wang proposal?
>>
>> I didn't follow it recently, but the original concept was about introducing an
>> IOMMU model to the picture, and that's complexity-wise a no-go for us (we can
>> do this whole thing in less than 500 lines, even virtio itself is more complex). IIUC,
>> the alternative to an IOMMU is mapping the whole frontend VM memory into
>> the backend VM - that's security/safety-wise an absolute no-go.
> 
> Though the virtio based solution might be complex for you, a big advantage is that we have lots of people working to improve virtio. For example, the upcoming virtio 1.1 has vring improvement, we can easily upgrade all the virtio based solutions, such as vhost-pci, to take advantage of this improvement. From the long term perspective, I think this kind of complexity is worthwhile.

We will adopt virtio 1.1 ring formats. That's one reason why there is
also still a bidirectional shared memory region: to host the new
descriptors (while keeping the payload safely in the unidirectional
regions).

> 
> We further have security features(e.g. vIOMMU) can be applied to vhost-pci.

As pointed out, this is way too complex for us. A complete vIOMMU model
would easily add a few thousand lines of code to a hypervisor that tries
to stay below 10k LoC. Each line costs a lot of money when going for
certification. Plus I'm not even sure that there will always be
performance benefits, but that's to be seen when both solutions matured.

> 
>>>
>>>     Deviations from the original design:
>>>
>>>     - Only two peers per link
>>>
>>>
>>> sound sane, that's also what vhost-pci aims to afaik
>>>
>>>
>>>       This simplifies the implementation and also the interfaces (think of
>>>       life-cycle management in a multi-peer environment). Moreover, we do
>>>       not have an urgent use case for multiple peers, thus also not
>>>       reference for a protocol that could be used in such setups. If someone
>>>       else happens to share such a protocol, it would be possible to discuss
>>>       potential extensions and their implications.
>>>
>>>     - Side-band registers to discover and configure share memory
>>> regions
>>>
>>>       This was one of the first changes: We removed the memory regions from
>>>       the PCI BARs and gave them special configuration space registers. By
>>>       now, these registers are embedded in a PCI capability. The reasons are
>>>       that Jailhouse does not allow to relocate the regions in guest address
>>>       space (but other hypervisors may if they like to) and that we now have
>>>       up to three of them.
>>>
>>>
>>>  Sorry, I can't comment on that.
>>>
>>>
>>>     - Changed PCI base class code to 0xff (unspecified class)
>>>
>>>       This allows us to define our own sub classes and interfaces. That is
>>>       now exploited for specifying the shared memory protocol the two
>>>       connected peers should use. It also allows the Linux drivers to match
>>>       on that.
>>>
>>>
>>> Why not, but it worries me that you are going to invent protocols
>>> similar to virtio devices, aren't you?
>>
>> That partly comes with the desire to simplify the transport (pure shared memory).
>> With ivshmem-net, we are at least reusing virtio rings and will try to do this with
>> the new (and faster) virtio ring format as well.
>>
>>>
>>>
>>>     - INTx interrupts support is back
>>>
>>>       This is needed on target platforms without MSI controllers, i.e.
>>>       without the required guest support. Namely some PCI-less ARM SoCs
>>>       required the reintroduction. While doing this, we also took care of
>>>       keeping the MMIO registers free of privileged controls so that a
>>>       guest OS can map them safely into a guest userspace application.
>>>
>>>
>>> Right, it's not completely removed from ivshmem qemu upstream,
>>> although it should probably be allowed to setup a doorbell-ivshmem
>>> with msi=off (this may be quite trivial to add back)
>>>
>>>
>>>     And then there are some extensions of the original ivshmem:
>>>
>>>     - Multiple shared memory regions, including unidirectional ones
>>>
>>>       It is now possible to expose up to three different shared memory
>>>       regions: The first one is read/writable for both sides. The second
>>>       region is read/writable for the local peer and read-only for the
>>>       remote peer (useful for output queues). And the third is read-only
>>>       locally but read/writable remotely (ie. for input queues).
>>>       Unidirectional regions prevent that the receiver of some data can
>>>       interfere with the sender while it is still building the message, a
>>>       property that is not only useful for safety critical communication,
>>>       we are sure.
>>>
>>>
>>> Sounds like a good idea, and something we may want in virtio too
> 
> Can you please explain more about the process of transferring a packet using the three different memory regions?
> In the kernel implementation, the sk_buf can be allocated anywhere.

With shared memory-backed communication, you obviously will have to
copy, to and sometimes also from the communication regions. But you no
longer have to flip any mappings (or even give up on secure isolation).

Why we have up to three regions: two unidirectional ones for payload,
one for shared control structures or custom protocols. See also above.

> 
> Btw, this looks similar to the memory access protection mechanism using EPTP switching:
> Slide 25 http://www.linux-kvm.org/images/8/87/02x09-Aspen-Jun_Nakajima-KVM_as_the_NFV_Hypervisor.pdf
> This missed right side of the figure is an alternative EPT, which gives a full access permission to the small piece of security code.

EPTP might be some nice optimization for scenarios where you have to
switch (but are its security problems resolved by now?), but a) we can
avoid switching and b) it's Intel-only while we need a generic solution
for all archs.

> 
>>>
>>>
>>>     - Life-cycle management via local and remote state
>>>
>>>       Each device can now signal its own state in form of a value to the
>>>       remote side, which triggers an event there. Moreover, state changes
>>>       done by the hypervisor to one peer are signalled to the other side.
>>>       And we introduced a write-to-shared-memory mechanism for the
>>>       respective remote state so that guests do not have to issue an MMIO
>>>       access in order to check the state.
>>>
>>>
>>> There is also ongoing work to better support disconnect/reconnect in
>>> virtio.
>>>
>>>
>>>
>>>     So, this is our proposal. Would be great to hear some opinions if you
>>>     see value in adding support for such an "ivshmem 2.0" device to QEMU as
>>>     well and expand its ecosystem towards Linux upstream, maybe also DPDK
>>>     again. If you see problems in the new design /wrt what QEMU provides so
>>>     far with its ivshmem device, let's discuss how to resolve them. Looking
>>>     forward to any feedback!
>>>
>>>
>>> My feeling is that ivshmem is not being actively developped in qemu,
>>> but rather virtio-based solutions (vhost-pci for vm2vm).
>>
>> As pointed out, for us it's most important to keep the design simple - even at the
>> price of "reinventing" some drivers for upstream (at least, we do not need two
>> sets of drivers because our interface is fully symmetric). I don't see yet how
>> vhost-pci could achieve the same, but I'm open to learn more!
> 
> Maybe I didn’t fully understand this - "we do not need two sets of drivers because our interface is fully symmetric"?

We have no backend/frontend drivers. While vhost-pci can reuse virtio
frontend drivers, it still requires new backend drivers. We use the same
drivers on both sides - it's just symmetric. That also simplifies
arguing over non-interference because both sides have equal capabilities.

> 
> The vhost-pci driver is a standalone network driver from the local guest point of view - it's no different than any other network drivers in the guest. When talking about usage,  it's used together with another VM's virtio device - would this be the "two sets of drivers" that you meant? I think this is pretty nature and reasonable, as it is essentially a vm-to-vm communication. Furthermore, we are able to dynamically create/destroy and hot-plug in/out a vhost-pci device based on runtime requests. 

Hotplugging works with shared memory devices as well. We don't use it
during runtime of the hypervisor due to safety constraints, but devices
show up and disappear in the root cell (the primary Linux) as the
hypervisor starts or stops.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] Towards an ivshmem 2.0?
  2017-01-16 13:10   ` Jan Kiszka
  2017-01-17  9:13     ` Wang, Wei W
@ 2017-01-17  9:59     ` Stefan Hajnoczi
  2017-01-17 10:32       ` Jan Kiszka
  2017-01-29 11:56       ` msuchanek
  1 sibling, 2 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2017-01-17  9:59 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Marc-André Lureau, qemu-devel, Jailhouse, Wei Wang,
	Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 1486 bytes --]

On Mon, Jan 16, 2017 at 02:10:17PM +0100, Jan Kiszka wrote:
> On 2017-01-16 13:41, Marc-André Lureau wrote:
> > On Mon, Jan 16, 2017 at 12:37 PM Jan Kiszka <jan.kiszka@siemens.com
> > <mailto:jan.kiszka@siemens.com>> wrote:
> >     So, this is our proposal. Would be great to hear some opinions if you
> >     see value in adding support for such an "ivshmem 2.0" device to QEMU as
> >     well and expand its ecosystem towards Linux upstream, maybe also DPDK
> >     again. If you see problems in the new design /wrt what QEMU provides so
> >     far with its ivshmem device, let's discuss how to resolve them. Looking
> >     forward to any feedback!
> > 
> > 
> > My feeling is that ivshmem is not being actively developped in qemu, but
> > rather virtio-based solutions (vhost-pci for vm2vm).
> 
> As pointed out, for us it's most important to keep the design simple -
> even at the price of "reinventing" some drivers for upstream (at least,
> we do not need two sets of drivers because our interface is fully
> symmetric). I don't see yet how vhost-pci could achieve the same, but
> I'm open to learn more!

The concept of symmetry is nice but only applies for communications
channels like networking and serial.

It doesn't apply for I/O that is fundamentally asymmetric like disk I/O.

I just wanted to point this out because lack symmetry has also bothered
me about virtio but it's actually impossible to achieve it for all
device types.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] Towards an ivshmem 2.0?
  2017-01-16 14:34   ` Jan Kiszka
@ 2017-01-17 10:00     ` Stefan Hajnoczi
  0 siblings, 0 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2017-01-17 10:00 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: qemu-devel, Jailhouse

[-- Attachment #1: Type: text/plain, Size: 1889 bytes --]

On Mon, Jan 16, 2017 at 03:34:58PM +0100, Jan Kiszka wrote:
> On 2017-01-16 15:18, Stefan Hajnoczi wrote:
> > On Mon, Jan 16, 2017 at 09:36:51AM +0100, Jan Kiszka wrote:
> >> some of you may know that we are using a shared memory device similar to
> >> ivshmem in the partitioning hypervisor Jailhouse [1].
> >>
> >> We started as being compatible to the original ivshmem that QEMU
> >> implements, but we quickly deviated in some details, and in the recent
> >> months even more. Some of the deviations are related to making the
> >> implementation simpler. The new ivshmem takes <500 LoC - Jailhouse is
> >> aiming at safety critical systems and, therefore, a small code base.
> >> Other changes address deficits in the original design, like missing
> >> life-cycle management.
> > 
> > My first thought is "what about virtio?".  Can you share some background
> > on why ivshmem fits the use case better than virtio?
> > 
> > The reason I ask is because the ivshmem devices you define would have
> > parallels to existing virtio devices and this could lead to duplication.
> 
> virtio was created as an interface between a host and a guest. It has no
> notion of direct (or even symmetric) connection between guests. With
> ivshmem, we want to establish only a minimal host-guest interface. We
> want to keep the host out of the business negotiating protocol details
> between two connected guests.
> 
> So, the trade-off was between reusing existing virtio drivers - in the
> best case, some changes would have been required definitely - and
> requiring complex translation of virtio into a vm-to-vm model on the one
> side and establishing a new driver ecosystem on much simpler host
> services (500 LoC...). We went for the latter.

Thanks.  I was going in the same direction about vhost-pci as
Marc-André.  Let's switch to his sub-thread.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] Towards an ivshmem 2.0?
  2017-01-17  9:59     ` Stefan Hajnoczi
@ 2017-01-17 10:32       ` Jan Kiszka
  2017-01-29 11:56       ` msuchanek
  1 sibling, 0 replies; 29+ messages in thread
From: Jan Kiszka @ 2017-01-17 10:32 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Marc-André Lureau, qemu-devel, Jailhouse, Wei Wang,
	Markus Armbruster

On 2017-01-17 10:59, Stefan Hajnoczi wrote:
> On Mon, Jan 16, 2017 at 02:10:17PM +0100, Jan Kiszka wrote:
>> On 2017-01-16 13:41, Marc-André Lureau wrote:
>>> On Mon, Jan 16, 2017 at 12:37 PM Jan Kiszka <jan.kiszka@siemens.com
>>> <mailto:jan.kiszka@siemens.com>> wrote:
>>>     So, this is our proposal. Would be great to hear some opinions if you
>>>     see value in adding support for such an "ivshmem 2.0" device to QEMU as
>>>     well and expand its ecosystem towards Linux upstream, maybe also DPDK
>>>     again. If you see problems in the new design /wrt what QEMU provides so
>>>     far with its ivshmem device, let's discuss how to resolve them. Looking
>>>     forward to any feedback!
>>>
>>>
>>> My feeling is that ivshmem is not being actively developped in qemu, but
>>> rather virtio-based solutions (vhost-pci for vm2vm).
>>
>> As pointed out, for us it's most important to keep the design simple -
>> even at the price of "reinventing" some drivers for upstream (at least,
>> we do not need two sets of drivers because our interface is fully
>> symmetric). I don't see yet how vhost-pci could achieve the same, but
>> I'm open to learn more!
> 
> The concept of symmetry is nice but only applies for communications
> channels like networking and serial.
> 
> It doesn't apply for I/O that is fundamentally asymmetric like disk I/O.
> 
> I just wanted to point this out because lack symmetry has also bothered
> me about virtio but it's actually impossible to achieve it for all
> device types.

That's true. Not sure what all is planned for vhost-pci. Our scope is
limited (though mass storage proxying could be interesting at some
point), plus there is the option to do X-over-network.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] Towards an ivshmem 2.0?
  2017-01-17  9:46       ` Jan Kiszka
@ 2017-01-20 11:54         ` Wang, Wei W
  2017-01-20 16:37           ` Jan Kiszka
  0 siblings, 1 reply; 29+ messages in thread
From: Wang, Wei W @ 2017-01-20 11:54 UTC (permalink / raw)
  To: Jan Kiszka, Marc-André Lureau, qemu-devel, Jailhouse
  Cc: Markus Armbruster

On Tuesday, January 17, 2017 5:46 PM, Jan Kiszka wrote:
> On 2017-01-17 10:13, Wang, Wei W wrote:
> > Hi Jan,
> >
> > On Monday, January 16, 2017 9:10 PM, Jan Kiszka wrote:
> >> On 2017-01-16 13:41, Marc-André Lureau wrote:
> >>> On Mon, Jan 16, 2017 at 12:37 PM Jan Kiszka <jan.kiszka@siemens.com
> >>> <mailto:jan.kiszka@siemens.com>> wrote:
> >>>     some of you may know that we are using a shared memory device similar
> to
> >>>     ivshmem in the partitioning hypervisor Jailhouse [1].
> >>>
> >>>     We started as being compatible to the original ivshmem that QEMU
> >>>     implements, but we quickly deviated in some details, and in the recent
> >>>     months even more. Some of the deviations are related to making the
> >>>     implementation simpler. The new ivshmem takes <500 LoC - Jailhouse is
> >>>     aiming at safety critical systems and, therefore, a small code base.
> >>>     Other changes address deficits in the original design, like missing
> >>>     life-cycle management.
> >>>
> >>>     Now the question is if there is interest in defining a common new
> >>>     revision of this device and maybe also of some protocols used on top,
> >>>     such as virtual network links. Ideally, this would enable us to share
> >>>     Linux drivers. We will definitely go for upstreaming at least a network
> >>>     driver such as [2], a UIO driver and maybe also a serial port/console.
> >>>
> >>>
> >>> This sounds like duplicating efforts done with virtio and vhost-pci.
> >>> Have you looked at Wei Wang proposal?
> >>
> >> I didn't follow it recently, but the original concept was about
> >> introducing an IOMMU model to the picture, and that's complexity-wise
> >> a no-go for us (we can do this whole thing in less than 500 lines,
> >> even virtio itself is more complex). IIUC, the alternative to an
> >> IOMMU is mapping the whole frontend VM memory into the backend VM -
> that's security/safety-wise an absolute no-go.
> >
> > Though the virtio based solution might be complex for you, a big advantage is
> that we have lots of people working to improve virtio. For example, the
> upcoming virtio 1.1 has vring improvement, we can easily upgrade all the virtio
> based solutions, such as vhost-pci, to take advantage of this improvement. From
> the long term perspective, I think this kind of complexity is worthwhile.
> 
> We will adopt virtio 1.1 ring formats. That's one reason why there is also still a
> bidirectional shared memory region: to host the new descriptors (while keeping
> the payload safely in the unidirectional regions).

The vring example I gave might be confusing, sorry about  that. My point is that every part of virtio is getting matured and improved from time to time.  Personally, having a new device developed and maintained in an active and popular model is helpful. Also, as new features being gradually added in the future, a simple device could become complex. 

Having a theoretical analysis on the performance: 
The traditional shared memory mechanism, sharing an intermediate memory, requires 2 copies to get the packet transmitted. It's not just one more copy compared to the 1-copy solution, I think some more things we may need to take into account:
1) there are extra ring operation overhead  on both the sending and receiving side to access the shared memory (i.e. IVSHMEM);
2) extra protocol to use the shared memory;
3) the piece of allocated shared memory from the host = C(n,2), where n is the number of VMs. Like for 20 VMs who want to talk to each other, there will be 190 pieces of memory allocated from the host. 

That being said, if people really want the 2-copy solution, we can also have vhost-pci support it that way as a new feature (not sure if you would be interested in collaborating on the project):
With the new feature added, the master VM sends only a piece of memory (equivalent to IVSHMEM, but allocated by the guest) to the slave over vhost-user protocol, and the vhost-pci device on the slave side only hosts that piece of shared memory.

Best,
Wei

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] Towards an ivshmem 2.0?
  2017-01-20 11:54         ` Wang, Wei W
@ 2017-01-20 16:37           ` Jan Kiszka
  2017-01-23  3:49             ` Wang, Wei W
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Kiszka @ 2017-01-20 16:37 UTC (permalink / raw)
  To: Wang, Wei W, Marc-André Lureau, qemu-devel, Jailhouse
  Cc: Markus Armbruster

On 2017-01-20 12:54, Wang, Wei W wrote:
> On Tuesday, January 17, 2017 5:46 PM, Jan Kiszka wrote:
>> On 2017-01-17 10:13, Wang, Wei W wrote:
>>> Hi Jan,
>>>
>>> On Monday, January 16, 2017 9:10 PM, Jan Kiszka wrote:
>>>> On 2017-01-16 13:41, Marc-André Lureau wrote:
>>>>> On Mon, Jan 16, 2017 at 12:37 PM Jan Kiszka <jan.kiszka@siemens.com
>>>>> <mailto:jan.kiszka@siemens.com>> wrote:
>>>>>     some of you may know that we are using a shared memory device similar
>> to
>>>>>     ivshmem in the partitioning hypervisor Jailhouse [1].
>>>>>
>>>>>     We started as being compatible to the original ivshmem that QEMU
>>>>>     implements, but we quickly deviated in some details, and in the recent
>>>>>     months even more. Some of the deviations are related to making the
>>>>>     implementation simpler. The new ivshmem takes <500 LoC - Jailhouse is
>>>>>     aiming at safety critical systems and, therefore, a small code base.
>>>>>     Other changes address deficits in the original design, like missing
>>>>>     life-cycle management.
>>>>>
>>>>>     Now the question is if there is interest in defining a common new
>>>>>     revision of this device and maybe also of some protocols used on top,
>>>>>     such as virtual network links. Ideally, this would enable us to share
>>>>>     Linux drivers. We will definitely go for upstreaming at least a network
>>>>>     driver such as [2], a UIO driver and maybe also a serial port/console.
>>>>>
>>>>>
>>>>> This sounds like duplicating efforts done with virtio and vhost-pci.
>>>>> Have you looked at Wei Wang proposal?
>>>>
>>>> I didn't follow it recently, but the original concept was about
>>>> introducing an IOMMU model to the picture, and that's complexity-wise
>>>> a no-go for us (we can do this whole thing in less than 500 lines,
>>>> even virtio itself is more complex). IIUC, the alternative to an
>>>> IOMMU is mapping the whole frontend VM memory into the backend VM -
>> that's security/safety-wise an absolute no-go.
>>>
>>> Though the virtio based solution might be complex for you, a big advantage is
>> that we have lots of people working to improve virtio. For example, the
>> upcoming virtio 1.1 has vring improvement, we can easily upgrade all the virtio
>> based solutions, such as vhost-pci, to take advantage of this improvement. From
>> the long term perspective, I think this kind of complexity is worthwhile.
>>
>> We will adopt virtio 1.1 ring formats. That's one reason why there is also still a
>> bidirectional shared memory region: to host the new descriptors (while keeping
>> the payload safely in the unidirectional regions).
> 
> The vring example I gave might be confusing, sorry about  that. My point is that every part of virtio is getting matured and improved from time to time.  Personally, having a new device developed and maintained in an active and popular model is helpful. Also, as new features being gradually added in the future, a simple device could become complex. 

We can't afford becoming more complex, that is the whole point.
Complexity shall go into the guest, not the hypervisor, when it is
really needed.

> 
> Having a theoretical analysis on the performance: 
> The traditional shared memory mechanism, sharing an intermediate memory, requires 2 copies to get the packet transmitted. It's not just one more copy compared to the 1-copy solution, I think some more things we may need to take into account:

1-copy (+potential transfers to userspace, but that's the same for
everyone) is conceptually possible, definitely under stacks like DPDK.
However, Linux skbs are currently not prepared for picking up
shmem-backed packets, we already looked into this. Likely addressable,
though.

> 1) there are extra ring operation overhead  on both the sending and receiving side to access the shared memory (i.e. IVSHMEM);
> 2) extra protocol to use the shared memory;
> 3) the piece of allocated shared memory from the host = C(n,2), where n is the number of VMs. Like for 20 VMs who want to talk to each other, there will be 190 pieces of memory allocated from the host. 

Well, only if all VMs need to talk to all others directly. On real
setups, you would add direct links for heavy traffic and otherwise do
software switching. Moreover, those links would only have to be backed
by physical memory in static setups all the time.

Also, we didn't completely rule out a shmem bus with multiple peers
connected. That's just looking for a strong use case - and then a robust
design, of course.

> 
> That being said, if people really want the 2-copy solution, we can also have vhost-pci support it that way as a new feature (not sure if you would be interested in collaborating on the project):
> With the new feature added, the master VM sends only a piece of memory (equivalent to IVSHMEM, but allocated by the guest) to the slave over vhost-user protocol, and the vhost-pci device on the slave side only hosts that piece of shared memory.

I'm all in for something that allows to strip down vhost-pci to
something that - while staying secure - is simple and /also/ allows
static configurations. But I'm not yet seeing that this would still be
virtio or vhost-pci.

What would be the minimal viable vhost-pci device set from your POV?
What would have to be provided by the hypervisor for that?

Jan

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] Towards an ivshmem 2.0?
  2017-01-20 16:37           ` Jan Kiszka
@ 2017-01-23  3:49             ` Wang, Wei W
  2017-01-23 10:14               ` Måns Rullgård
  0 siblings, 1 reply; 29+ messages in thread
From: Wang, Wei W @ 2017-01-23  3:49 UTC (permalink / raw)
  To: Jan Kiszka, Marc-André Lureau, qemu-devel, Jailhouse
  Cc: Markus Armbruster

On Saturday, January 21, 2017 12:38 AM, Jan Kiszka wrote:
> On 2017-01-20 12:54, Wang, Wei W wrote:
> > On Tuesday, January 17, 2017 5:46 PM, Jan Kiszka wrote:
> >> On 2017-01-17 10:13, Wang, Wei W wrote:
> >>> Hi Jan,
> >>>
> >>> On Monday, January 16, 2017 9:10 PM, Jan Kiszka wrote:
> >>>> On 2017-01-16 13:41, Marc-André Lureau wrote:
> >>>>> On Mon, Jan 16, 2017 at 12:37 PM Jan Kiszka
> >>>>> <jan.kiszka@siemens.com <mailto:jan.kiszka@siemens.com>> wrote:
> >>>>>     some of you may know that we are using a shared memory device
> >>>>> similar
> >> to
> >>>>>     ivshmem in the partitioning hypervisor Jailhouse [1].
> >>>>>
> >>>>>     We started as being compatible to the original ivshmem that QEMU
> >>>>>     implements, but we quickly deviated in some details, and in the recent
> >>>>>     months even more. Some of the deviations are related to making the
> >>>>>     implementation simpler. The new ivshmem takes <500 LoC - Jailhouse
> is
> >>>>>     aiming at safety critical systems and, therefore, a small code base.
> >>>>>     Other changes address deficits in the original design, like missing
> >>>>>     life-cycle management.
> >>>>>
> >>>>>     Now the question is if there is interest in defining a common new
> >>>>>     revision of this device and maybe also of some protocols used on top,
> >>>>>     such as virtual network links. Ideally, this would enable us to share
> >>>>>     Linux drivers. We will definitely go for upstreaming at least a network
> >>>>>     driver such as [2], a UIO driver and maybe also a serial port/console.
> >>>>>
> >>>>>
> >>>>> This sounds like duplicating efforts done with virtio and vhost-pci.
> >>>>> Have you looked at Wei Wang proposal?
> >>>>
> >>>> I didn't follow it recently, but the original concept was about
> >>>> introducing an IOMMU model to the picture, and that's
> >>>> complexity-wise a no-go for us (we can do this whole thing in less
> >>>> than 500 lines, even virtio itself is more complex). IIUC, the
> >>>> alternative to an IOMMU is mapping the whole frontend VM memory
> >>>> into the backend VM -
> >> that's security/safety-wise an absolute no-go.
> >>>
> >>> Though the virtio based solution might be complex for you, a big
> >>> advantage is
> >> that we have lots of people working to improve virtio. For example,
> >> the upcoming virtio 1.1 has vring improvement, we can easily upgrade
> >> all the virtio based solutions, such as vhost-pci, to take advantage
> >> of this improvement. From the long term perspective, I think this kind of
> complexity is worthwhile.
> >>
> >> We will adopt virtio 1.1 ring formats. That's one reason why there is
> >> also still a bidirectional shared memory region: to host the new
> >> descriptors (while keeping the payload safely in the unidirectional regions).
> >
> > The vring example I gave might be confusing, sorry about  that. My point is
> that every part of virtio is getting matured and improved from time to time.
> Personally, having a new device developed and maintained in an active and
> popular model is helpful. Also, as new features being gradually added in the
> future, a simple device could become complex.
> 
> We can't afford becoming more complex, that is the whole point.
> Complexity shall go into the guest, not the hypervisor, when it is really needed.
> 
> >
> > Having a theoretical analysis on the performance:
> > The traditional shared memory mechanism, sharing an intermediate memory,
> requires 2 copies to get the packet transmitted. It's not just one more copy
> compared to the 1-copy solution, I think some more things we may need to take
> into account:
> 
> 1-copy (+potential transfers to userspace, but that's the same for
> everyone) is conceptually possible, definitely under stacks like DPDK.
> However, Linux skbs are currently not prepared for picking up shmem-backed
> packets, we already looked into this. Likely addressable, though.

Not sure how difficult it would be to get that change upstream-ed to the kernel, but looking forward to seeing your solutions.
 
> > 1) there are extra ring operation overhead  on both the sending and
> > receiving side to access the shared memory (i.e. IVSHMEM);
> > 2) extra protocol to use the shared memory;
> > 3) the piece of allocated shared memory from the host = C(n,2), where n is the
> number of VMs. Like for 20 VMs who want to talk to each other, there will be
> 190 pieces of memory allocated from the host.
> 
> Well, only if all VMs need to talk to all others directly. On real setups, you would
> add direct links for heavy traffic and otherwise do software switching. Moreover,
> those links would only have to be backed by physical memory in static setups all
> the time.
> 
> Also, we didn't completely rule out a shmem bus with multiple peers connected.
> That's just looking for a strong use case - and then a robust design, of course.
> 
> >
> > That being said, if people really want the 2-copy solution, we can also have
> vhost-pci support it that way as a new feature (not sure if you would be
> interested in collaborating on the project):
> > With the new feature added, the master VM sends only a piece of memory
> (equivalent to IVSHMEM, but allocated by the guest) to the slave over vhost-user
> protocol, and the vhost-pci device on the slave side only hosts that piece of
> shared memory.
> 
> I'm all in for something that allows to strip down vhost-pci to something that -
> while staying secure - is simple and /also/ allows static configurations. But I'm
> not yet seeing that this would still be virtio or vhost-pci.
> 
> What would be the minimal viable vhost-pci device set from your POV?

For the static configuration option, I think it mainly needs the device emulation part of the current implementation, which has ~500 LOC currently. This would also need to add a new feature to virtio_net, to let the virtio_net driver use the IVSHMEM, and same for the vhost-pci device.

I think most part of the vhost-user protocol can be bypassed for this usage, because the device feature bits don’t need to be negotiated between the two devices, the memory and vring info doesn’t need to be transferred. To support interrupt, we may still need vhost-user to share irqfd.

> What would have to be provided by the hypervisor for that?
> 

We don’t need any support from KVM, for the qemu part support, please see above.

Best,
Wei


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] Towards an ivshmem 2.0?
  2017-01-23  3:49             ` Wang, Wei W
@ 2017-01-23 10:14               ` Måns Rullgård
  0 siblings, 0 replies; 29+ messages in thread
From: Måns Rullgård @ 2017-01-23 10:14 UTC (permalink / raw)
  To: Wang, Wei W
  Cc: Jan Kiszka, Marc-André Lureau, qemu-devel, Jailhouse,
	Markus Armbruster

"Wang, Wei W" <wei.w.wang@intel.com> writes:

> On Saturday, January 21, 2017 12:38 AM, Jan Kiszka wrote:
>> On 2017-01-20 12:54, Wang, Wei W wrote:
>>> Having a theoretical analysis on the performance: The traditional
>>> shared memory mechanism, sharing an intermediate memory, requires 2
>>> copies to get the packet transmitted. It's not just one more copy
>>> compared to the 1-copy solution, I think some more things we may
>>> need to take into account:
>> 
>> 1-copy (+potential transfers to userspace, but that's the same for
>> everyone) is conceptually possible, definitely under stacks like
>> DPDK.  However, Linux skbs are currently not prepared for picking up
>> shmem-backed packets, we already looked into this. Likely
>> addressable, though.
>
> Not sure how difficult it would be to get that change upstream-ed to
> the kernel, but looking forward to seeing your solutions.

The problem is that the shared memory mapping doesn't have a struct page
as required by lots of networking code.

-- 
Måns Rullgård

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] Towards an ivshmem 2.0?
  2017-01-16  8:36 [Qemu-devel] Towards an ivshmem 2.0? Jan Kiszka
  2017-01-16 12:41 ` Marc-André Lureau
  2017-01-16 14:18 ` Stefan Hajnoczi
@ 2017-01-23 14:19 ` Markus Armbruster
  2017-01-25  9:18   ` Jan Kiszka
  2 siblings, 1 reply; 29+ messages in thread
From: Markus Armbruster @ 2017-01-23 14:19 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: qemu-devel, Jailhouse

Jan Kiszka <jan.kiszka@siemens.com> writes:

> Hi,
>
> some of you may know that we are using a shared memory device similar to
> ivshmem in the partitioning hypervisor Jailhouse [1].
>
> We started as being compatible to the original ivshmem that QEMU
> implements, but we quickly deviated in some details, and in the recent
> months even more. Some of the deviations are related to making the
> implementation simpler. The new ivshmem takes <500 LoC - Jailhouse is

Compare: hw/misc/ivshmem.c ~1000 SLOC, measured with sloccount.

> aiming at safety critical systems and, therefore, a small code base.
> Other changes address deficits in the original design, like missing
> life-cycle management.
>
> Now the question is if there is interest in defining a common new
> revision of this device and maybe also of some protocols used on top,
> such as virtual network links. Ideally, this would enable us to share
> Linux drivers. We will definitely go for upstreaming at least a network
> driver such as [2], a UIO driver and maybe also a serial port/console.
>
> I've attached a first draft of the specification of our new ivshmem
> device. A working implementation can be found in the wip/ivshmem2 branch
> of Jailhouse [3], the corresponding ivshmem-net driver in [4].
>
> Deviations from the original design:
>
> - Only two peers per link

Uh, define "link".

>   This simplifies the implementation and also the interfaces (think of
>   life-cycle management in a multi-peer environment). Moreover, we do
>   not have an urgent use case for multiple peers, thus also not
>   reference for a protocol that could be used in such setups. If someone
>   else happens to share such a protocol, it would be possible to discuss
>   potential extensions and their implications.
>
> - Side-band registers to discover and configure share memory regions
>
>   This was one of the first changes: We removed the memory regions from
>   the PCI BARs and gave them special configuration space registers. By
>   now, these registers are embedded in a PCI capability. The reasons are
>   that Jailhouse does not allow to relocate the regions in guest address
>   space (but other hypervisors may if they like to) and that we now have
>   up to three of them.

I'm afraid I don't quite understand the change, nor the rationale.  I
guess I could figure out the former by studying the specification.

> - Changed PCI base class code to 0xff (unspecified class)

Changed from 0x5 (memory controller).

>   This allows us to define our own sub classes and interfaces. That is
>   now exploited for specifying the shared memory protocol the two
>   connected peers should use. It also allows the Linux drivers to match
>   on that.
>
> - INTx interrupts support is back
>
>   This is needed on target platforms without MSI controllers, i.e.
>   without the required guest support. Namely some PCI-less ARM SoCs
>   required the reintroduction. While doing this, we also took care of
>   keeping the MMIO registers free of privileged controls so that a
>   guest OS can map them safely into a guest userspace application.

So you need interrupt capability.  Current upstream ivshmem requires a
server such as the one in contrib/ivshmem-server/.  What about yours?

The interrupt feature enables me to guess a definition of "link": A and
B are peers of the same link if they can interrupt each other.

Does your ivshmem2 support interrupt-less operation similar to
ivshmem-plain?

> And then there are some extensions of the original ivshmem:
>
> - Multiple shared memory regions, including unidirectional ones
>
>   It is now possible to expose up to three different shared memory
>   regions: The first one is read/writable for both sides. The second
>   region is read/writable for the local peer and read-only for the
>   remote peer (useful for output queues). And the third is read-only
>   locally but read/writable remotely (ie. for input queues).
>   Unidirectional regions prevent that the receiver of some data can
>   interfere with the sender while it is still building the message, a
>   property that is not only useful for safety critical communication,
>   we are sure.
>
> - Life-cycle management via local and remote state
>
>   Each device can now signal its own state in form of a value to the
>   remote side, which triggers an event there.

How are "events" related to interrupts?

>                                               Moreover, state changes
>   done by the hypervisor to one peer are signalled to the other side.
>   And we introduced a write-to-shared-memory mechanism for the
>   respective remote state so that guests do not have to issue an MMIO
>   access in order to check the state.
>
> So, this is our proposal. Would be great to hear some opinions if you
> see value in adding support for such an "ivshmem 2.0" device to QEMU as
> well and expand its ecosystem towards Linux upstream, maybe also DPDK
> again. If you see problems in the new design /wrt what QEMU provides so
> far with its ivshmem device, let's discuss how to resolve them. Looking
> forward to any feedback!

My general opinion on ivshmem is well-known, but I repeat it for the
record: merging it was a mistake, and using it is probably a mistake.  I
detailed my concerns in "Why I advise against using ivshmem"[*].

My philosophical concerns remain.  Perhaps you can assuage them.

Only some of my practical concerns have since been addressed.  In part
by myself, because having a flawed implementation of a bad idea is
strictly worse than the same with flaws corrected as far as practical.
But even today, docs/specs/ivshmem-spec.txt is a rather depressing read.

However, there's one thing that's still worse than a more or less flawed
implementation of a bad idea: two implementations of a bad idea.  Could
ivshmem2 be done in a way that permits *replacing* ivshmem?

> [1] https://github.com/siemens/jailhouse
> [2]
> http://git.kiszka.org/?p=linux.git;a=blob;f=drivers/net/ivshmem-net.c;h=0e770ca293a4aca14a55ac0e66871b09c82647af;hb=refs/heads/queues/jailhouse
> [3] https://github.com/siemens/jailhouse/commits/wip/ivshmem2
> [4]
> http://git.kiszka.org/?p=linux.git;a=shortlog;h=refs/heads/queues/jailhouse-ivshmem2

[*] http://lists.gnu.org/archive/html/qemu-devel/2014-06/msg02968.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] Towards an ivshmem 2.0?
  2017-01-23 14:19 ` Markus Armbruster
@ 2017-01-25  9:18   ` Jan Kiszka
  2017-01-27 19:36     ` Markus Armbruster
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Kiszka @ 2017-01-25  9:18 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, Jailhouse

On 2017-01-23 15:19, Markus Armbruster wrote:
> Jan Kiszka <jan.kiszka@siemens.com> writes:
> 
>> Hi,
>>
>> some of you may know that we are using a shared memory device similar to
>> ivshmem in the partitioning hypervisor Jailhouse [1].
>>
>> We started as being compatible to the original ivshmem that QEMU
>> implements, but we quickly deviated in some details, and in the recent
>> months even more. Some of the deviations are related to making the
>> implementation simpler. The new ivshmem takes <500 LoC - Jailhouse is
> 
> Compare: hw/misc/ivshmem.c ~1000 SLOC, measured with sloccount.

That difference comes from remote/migration support and general QEMU
integration - likely not very telling due to the different environments.

> 
>> aiming at safety critical systems and, therefore, a small code base.
>> Other changes address deficits in the original design, like missing
>> life-cycle management.
>>
>> Now the question is if there is interest in defining a common new
>> revision of this device and maybe also of some protocols used on top,
>> such as virtual network links. Ideally, this would enable us to share
>> Linux drivers. We will definitely go for upstreaming at least a network
>> driver such as [2], a UIO driver and maybe also a serial port/console.
>>
>> I've attached a first draft of the specification of our new ivshmem
>> device. A working implementation can be found in the wip/ivshmem2 branch
>> of Jailhouse [3], the corresponding ivshmem-net driver in [4].
>>
>> Deviations from the original design:
>>
>> - Only two peers per link
> 
> Uh, define "link".

VMs are linked via a common shared memory. Interrupt delivery follows
that route as well.

> 
>>   This simplifies the implementation and also the interfaces (think of
>>   life-cycle management in a multi-peer environment). Moreover, we do
>>   not have an urgent use case for multiple peers, thus also not
>>   reference for a protocol that could be used in such setups. If someone
>>   else happens to share such a protocol, it would be possible to discuss
>>   potential extensions and their implications.
>>
>> - Side-band registers to discover and configure share memory regions
>>
>>   This was one of the first changes: We removed the memory regions from
>>   the PCI BARs and gave them special configuration space registers. By
>>   now, these registers are embedded in a PCI capability. The reasons are
>>   that Jailhouse does not allow to relocate the regions in guest address
>>   space (but other hypervisors may if they like to) and that we now have
>>   up to three of them.
> 
> I'm afraid I don't quite understand the change, nor the rationale.  I
> guess I could figure out the former by studying the specification.

a) It's a Jailhouse thing (we disallow the guest to move the regions
   around in its address space)
b) With 3 regions + MSI-X + MMIO registers, we run out of BARs (or
   would have to downgrade them to 32 bit)

> 
>> - Changed PCI base class code to 0xff (unspecified class)
> 
> Changed from 0x5 (memory controller).

Right.

> 
>>   This allows us to define our own sub classes and interfaces. That is
>>   now exploited for specifying the shared memory protocol the two
>>   connected peers should use. It also allows the Linux drivers to match
>>   on that.
>>
>> - INTx interrupts support is back
>>
>>   This is needed on target platforms without MSI controllers, i.e.
>>   without the required guest support. Namely some PCI-less ARM SoCs
>>   required the reintroduction. While doing this, we also took care of
>>   keeping the MMIO registers free of privileged controls so that a
>>   guest OS can map them safely into a guest userspace application.
> 
> So you need interrupt capability.  Current upstream ivshmem requires a
> server such as the one in contrib/ivshmem-server/.  What about yours?

IIRC, the need for a server with QEMU/KVM is related to live migration.
Jailhouse is simpler, all guests are managed by the same hypervisor
instance, and there is no migration. That makes interrupt delivery much
simpler as well. However, the device spec should not exclude other
architectures.

> 
> The interrupt feature enables me to guess a definition of "link": A and
> B are peers of the same link if they can interrupt each other.
> 
> Does your ivshmem2 support interrupt-less operation similar to
> ivshmem-plain?

Each receiver of interrupts is free to enable that - or leave it off as
it is the default after reset. But currently the spec demands that
either MSI-X or INTx is reported as available to the guests. We could
extend it to permit reporting no interrupts support if there is a good
case for it.

I will have to look into the details of the client-server structure of
QEMU's ivshmem again to answer the question under with restriction we
can make it both simpler and more robust. As Jailhouse has no live
migration support, requirements on ivshmem related to that may only be
addressed by chance so far.

> 
>> And then there are some extensions of the original ivshmem:
>>
>> - Multiple shared memory regions, including unidirectional ones
>>
>>   It is now possible to expose up to three different shared memory
>>   regions: The first one is read/writable for both sides. The second
>>   region is read/writable for the local peer and read-only for the
>>   remote peer (useful for output queues). And the third is read-only
>>   locally but read/writable remotely (ie. for input queues).
>>   Unidirectional regions prevent that the receiver of some data can
>>   interfere with the sender while it is still building the message, a
>>   property that is not only useful for safety critical communication,
>>   we are sure.
>>
>> - Life-cycle management via local and remote state
>>
>>   Each device can now signal its own state in form of a value to the
>>   remote side, which triggers an event there.
> 
> How are "events" related to interrupts?

Confusing term chosen here: an interrupt is triggered on the remote side
(if it has interrupts enabled).

> 
>>                                               Moreover, state changes
>>   done by the hypervisor to one peer are signalled to the other side.
>>   And we introduced a write-to-shared-memory mechanism for the
>>   respective remote state so that guests do not have to issue an MMIO
>>   access in order to check the state.
>>
>> So, this is our proposal. Would be great to hear some opinions if you
>> see value in adding support for such an "ivshmem 2.0" device to QEMU as
>> well and expand its ecosystem towards Linux upstream, maybe also DPDK
>> again. If you see problems in the new design /wrt what QEMU provides so
>> far with its ivshmem device, let's discuss how to resolve them. Looking
>> forward to any feedback!
> 
> My general opinion on ivshmem is well-known, but I repeat it for the
> record: merging it was a mistake, and using it is probably a mistake.  I
> detailed my concerns in "Why I advise against using ivshmem"[*].
> 
> My philosophical concerns remain.  Perhaps you can assuage them.
> 
> Only some of my practical concerns have since been addressed.  In part
> by myself, because having a flawed implementation of a bad idea is
> strictly worse than the same with flaws corrected as far as practical.
> But even today, docs/specs/ivshmem-spec.txt is a rather depressing read.

I agree.

> 
> However, there's one thing that's still worse than a more or less flawed
> implementation of a bad idea: two implementations of a bad idea.  Could
> ivshmem2 be done in a way that permits *replacing* ivshmem?

If people see the need for having a common ivshmem2, that should of
course be designed to replace the original version of QEMU. I wouldn't
like to design it being backward compatible, but the new version should
provide all useful and required features of the old one.

Of course, I'm careful with investing much time into expanding the
existing, for Jailhouse possibly sufficient design if there no real
interest in continuing the ivshmem support in QEMU - because of
vhost-pci or other reasons. But if that interest exists, it would be
beneficial for us to have QEMU supporting a compatible version and using
the same guest drivers. Then I would start looking into concrete patches
for it as well.

Jan

> 
>> [1] https://github.com/siemens/jailhouse
>> [2]
>> http://git.kiszka.org/?p=linux.git;a=blob;f=drivers/net/ivshmem-net.c;h=0e770ca293a4aca14a55ac0e66871b09c82647af;hb=refs/heads/queues/jailhouse
>> [3] https://github.com/siemens/jailhouse/commits/wip/ivshmem2
>> [4]
>> http://git.kiszka.org/?p=linux.git;a=shortlog;h=refs/heads/queues/jailhouse-ivshmem2
> 
> [*] http://lists.gnu.org/archive/html/qemu-devel/2014-06/msg02968.html
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] Towards an ivshmem 2.0?
  2017-01-25  9:18   ` Jan Kiszka
@ 2017-01-27 19:36     ` Markus Armbruster
  2017-01-29  8:43       ` Jan Kiszka
  0 siblings, 1 reply; 29+ messages in thread
From: Markus Armbruster @ 2017-01-27 19:36 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Jailhouse, qemu-devel

Jan Kiszka <jan.kiszka@web.de> writes:

> On 2017-01-23 15:19, Markus Armbruster wrote:
>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> 
>>> Hi,
>>>
>>> some of you may know that we are using a shared memory device similar to
>>> ivshmem in the partitioning hypervisor Jailhouse [1].
>>>
>>> We started as being compatible to the original ivshmem that QEMU
>>> implements, but we quickly deviated in some details, and in the recent
>>> months even more. Some of the deviations are related to making the
>>> implementation simpler. The new ivshmem takes <500 LoC - Jailhouse is
>> 
>> Compare: hw/misc/ivshmem.c ~1000 SLOC, measured with sloccount.
>
> That difference comes from remote/migration support and general QEMU
> integration - likely not very telling due to the different environments.

Plausible.

>>> aiming at safety critical systems and, therefore, a small code base.
>>> Other changes address deficits in the original design, like missing
>>> life-cycle management.
>>>
>>> Now the question is if there is interest in defining a common new
>>> revision of this device and maybe also of some protocols used on top,
>>> such as virtual network links. Ideally, this would enable us to share
>>> Linux drivers. We will definitely go for upstreaming at least a network
>>> driver such as [2], a UIO driver and maybe also a serial port/console.
>>>
>>> I've attached a first draft of the specification of our new ivshmem
>>> device. A working implementation can be found in the wip/ivshmem2 branch
>>> of Jailhouse [3], the corresponding ivshmem-net driver in [4].
>>>
>>> Deviations from the original design:
>>>
>>> - Only two peers per link
>> 
>> Uh, define "link".
>
> VMs are linked via a common shared memory. Interrupt delivery follows
> that route as well.
>
>> 
>>>   This simplifies the implementation and also the interfaces (think of
>>>   life-cycle management in a multi-peer environment). Moreover, we do
>>>   not have an urgent use case for multiple peers, thus also not
>>>   reference for a protocol that could be used in such setups. If someone
>>>   else happens to share such a protocol, it would be possible to discuss
>>>   potential extensions and their implications.
>>>
>>> - Side-band registers to discover and configure share memory regions
>>>
>>>   This was one of the first changes: We removed the memory regions from
>>>   the PCI BARs and gave them special configuration space registers. By
>>>   now, these registers are embedded in a PCI capability. The reasons are
>>>   that Jailhouse does not allow to relocate the regions in guest address
>>>   space (but other hypervisors may if they like to) and that we now have
>>>   up to three of them.
>> 
>> I'm afraid I don't quite understand the change, nor the rationale.  I
>> guess I could figure out the former by studying the specification.
>
> a) It's a Jailhouse thing (we disallow the guest to move the regions
>    around in its address space)
> b) With 3 regions + MSI-X + MMIO registers, we run out of BARs (or
>    would have to downgrade them to 32 bit)

Have you considered putting your three shared memory regions in memory
consecutively, so they can be covered by a single BAR?  Similar to how a
single BAR covers both MSI-X table and PBA.

>>> - Changed PCI base class code to 0xff (unspecified class)
>> 
>> Changed from 0x5 (memory controller).
>
> Right.
>
>> 
>>>   This allows us to define our own sub classes and interfaces. That is
>>>   now exploited for specifying the shared memory protocol the two
>>>   connected peers should use. It also allows the Linux drivers to match
>>>   on that.
>>>
>>> - INTx interrupts support is back
>>>
>>>   This is needed on target platforms without MSI controllers, i.e.
>>>   without the required guest support. Namely some PCI-less ARM SoCs
>>>   required the reintroduction. While doing this, we also took care of
>>>   keeping the MMIO registers free of privileged controls so that a
>>>   guest OS can map them safely into a guest userspace application.
>> 
>> So you need interrupt capability.  Current upstream ivshmem requires a
>> server such as the one in contrib/ivshmem-server/.  What about yours?
>
> IIRC, the need for a server with QEMU/KVM is related to live migration.
> Jailhouse is simpler, all guests are managed by the same hypervisor
> instance, and there is no migration. That makes interrupt delivery much
> simpler as well. However, the device spec should not exclude other
> architectures.

The server doesn't really help with live migration.  It's used to dole
out file descriptors for shared memory and interrupt signalling, and to
notify of peer connect/disconnect.

>> The interrupt feature enables me to guess a definition of "link": A and
>> B are peers of the same link if they can interrupt each other.
>> 
>> Does your ivshmem2 support interrupt-less operation similar to
>> ivshmem-plain?
>
> Each receiver of interrupts is free to enable that - or leave it off as
> it is the default after reset. But currently the spec demands that
> either MSI-X or INTx is reported as available to the guests. We could
> extend it to permit reporting no interrupts support if there is a good
> case for it.

I think the case for interrupt-incapable ivshmem-plain is that
interrupt-capable ivshmem-doorbell requires a server, and is therefore a
bit more complex to set up, and has additional failure modes.

If that wasn't the case, a single device variant would make more sense.

Besides, contrib/ivshmem-server/ is not fit for production use.

> I will have to look into the details of the client-server structure of
> QEMU's ivshmem again to answer the question under with restriction we
> can make it both simpler and more robust. As Jailhouse has no live
> migration support, requirements on ivshmem related to that may only be
> addressed by chance so far.

Here's how live migration works with QEMU's ivshmem: exactly one peer
(the "master") migrates with its ivshmem device, all others need to hot
unplug ivshmem, migrate, hot plug it back after the master completed its
migration.  The master connects to the new server on the destination on
startup, then live migration copies over the shared memory.  The other
peers connect to the new server when they get their ivshmem hot plugged
again.

>>> And then there are some extensions of the original ivshmem:
>>>
>>> - Multiple shared memory regions, including unidirectional ones
>>>
>>>   It is now possible to expose up to three different shared memory
>>>   regions: The first one is read/writable for both sides. The second
>>>   region is read/writable for the local peer and read-only for the
>>>   remote peer (useful for output queues). And the third is read-only
>>>   locally but read/writable remotely (ie. for input queues).
>>>   Unidirectional regions prevent that the receiver of some data can
>>>   interfere with the sender while it is still building the message, a
>>>   property that is not only useful for safety critical communication,
>>>   we are sure.
>>>
>>> - Life-cycle management via local and remote state
>>>
>>>   Each device can now signal its own state in form of a value to the
>>>   remote side, which triggers an event there.
>> 
>> How are "events" related to interrupts?
>
> Confusing term chosen here: an interrupt is triggered on the remote side
> (if it has interrupts enabled).

Got it.

>>>                                               Moreover, state changes
>>>   done by the hypervisor to one peer are signalled to the other side.
>>>   And we introduced a write-to-shared-memory mechanism for the
>>>   respective remote state so that guests do not have to issue an MMIO
>>>   access in order to check the state.
>>>
>>> So, this is our proposal. Would be great to hear some opinions if you
>>> see value in adding support for such an "ivshmem 2.0" device to QEMU as
>>> well and expand its ecosystem towards Linux upstream, maybe also DPDK
>>> again. If you see problems in the new design /wrt what QEMU provides so
>>> far with its ivshmem device, let's discuss how to resolve them. Looking
>>> forward to any feedback!
>> 
>> My general opinion on ivshmem is well-known, but I repeat it for the
>> record: merging it was a mistake, and using it is probably a mistake.  I
>> detailed my concerns in "Why I advise against using ivshmem"[*].
>> 
>> My philosophical concerns remain.  Perhaps you can assuage them.
>> 
>> Only some of my practical concerns have since been addressed.  In part
>> by myself, because having a flawed implementation of a bad idea is
>> strictly worse than the same with flaws corrected as far as practical.
>> But even today, docs/specs/ivshmem-spec.txt is a rather depressing read.
>
> I agree.
>
>> 
>> However, there's one thing that's still worse than a more or less flawed
>> implementation of a bad idea: two implementations of a bad idea.  Could
>> ivshmem2 be done in a way that permits *replacing* ivshmem?
>
> If people see the need for having a common ivshmem2, that should of
> course be designed to replace the original version of QEMU. I wouldn't
> like to design it being backward compatible, but the new version should
> provide all useful and required features of the old one.

Nobody likes to provide backward compability, but everybody likes to
take advantage of it :)

Seriously, I can't say whether feature parity would suffice, or whether
we need full backward compatibility.

> Of course, I'm careful with investing much time into expanding the
> existing, for Jailhouse possibly sufficient design if there no real
> interest in continuing the ivshmem support in QEMU - because of
> vhost-pci or other reasons. But if that interest exists, it would be
> beneficial for us to have QEMU supporting a compatible version and using
> the same guest drivers. Then I would start looking into concrete patches
> for it as well.

Interest is difficult for me to gauge, not least because alternatives
are still being worked on.

> Jan
>
>> 
>>> [1] https://github.com/siemens/jailhouse
>>> [2]
>>> http://git.kiszka.org/?p=linux.git;a=blob;f=drivers/net/ivshmem-net.c;h=0e770ca293a4aca14a55ac0e66871b09c82647af;hb=refs/heads/queues/jailhouse
>>> [3] https://github.com/siemens/jailhouse/commits/wip/ivshmem2
>>> [4]
>>> http://git.kiszka.org/?p=linux.git;a=shortlog;h=refs/heads/queues/jailhouse-ivshmem2
>> 
>> [*] http://lists.gnu.org/archive/html/qemu-devel/2014-06/msg02968.html
>> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] Towards an ivshmem 2.0?
  2017-01-27 19:36     ` Markus Armbruster
@ 2017-01-29  8:43       ` Jan Kiszka
  2017-01-29 14:00         ` Marc-André Lureau
  2017-01-30  8:00         ` Markus Armbruster
  0 siblings, 2 replies; 29+ messages in thread
From: Jan Kiszka @ 2017-01-29  8:43 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Jailhouse, qemu-devel

On 2017-01-27 20:36, Markus Armbruster wrote:
> Jan Kiszka <jan.kiszka@web.de> writes:
> 
>> On 2017-01-23 15:19, Markus Armbruster wrote:
>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>
>>>> Hi,
>>>>
>>>> some of you may know that we are using a shared memory device similar to
>>>> ivshmem in the partitioning hypervisor Jailhouse [1].
>>>>
>>>> We started as being compatible to the original ivshmem that QEMU
>>>> implements, but we quickly deviated in some details, and in the recent
>>>> months even more. Some of the deviations are related to making the
>>>> implementation simpler. The new ivshmem takes <500 LoC - Jailhouse is
>>>
>>> Compare: hw/misc/ivshmem.c ~1000 SLOC, measured with sloccount.
>>
>> That difference comes from remote/migration support and general QEMU
>> integration - likely not very telling due to the different environments.
> 
> Plausible.
> 
>>>> aiming at safety critical systems and, therefore, a small code base.
>>>> Other changes address deficits in the original design, like missing
>>>> life-cycle management.
>>>>
>>>> Now the question is if there is interest in defining a common new
>>>> revision of this device and maybe also of some protocols used on top,
>>>> such as virtual network links. Ideally, this would enable us to share
>>>> Linux drivers. We will definitely go for upstreaming at least a network
>>>> driver such as [2], a UIO driver and maybe also a serial port/console.
>>>>
>>>> I've attached a first draft of the specification of our new ivshmem
>>>> device. A working implementation can be found in the wip/ivshmem2 branch
>>>> of Jailhouse [3], the corresponding ivshmem-net driver in [4].
>>>>
>>>> Deviations from the original design:
>>>>
>>>> - Only two peers per link
>>>
>>> Uh, define "link".
>>
>> VMs are linked via a common shared memory. Interrupt delivery follows
>> that route as well.
>>
>>>
>>>>   This simplifies the implementation and also the interfaces (think of
>>>>   life-cycle management in a multi-peer environment). Moreover, we do
>>>>   not have an urgent use case for multiple peers, thus also not
>>>>   reference for a protocol that could be used in such setups. If someone
>>>>   else happens to share such a protocol, it would be possible to discuss
>>>>   potential extensions and their implications.
>>>>
>>>> - Side-band registers to discover and configure share memory regions
>>>>
>>>>   This was one of the first changes: We removed the memory regions from
>>>>   the PCI BARs and gave them special configuration space registers. By
>>>>   now, these registers are embedded in a PCI capability. The reasons are
>>>>   that Jailhouse does not allow to relocate the regions in guest address
>>>>   space (but other hypervisors may if they like to) and that we now have
>>>>   up to three of them.
>>>
>>> I'm afraid I don't quite understand the change, nor the rationale.  I
>>> guess I could figure out the former by studying the specification.
>>
>> a) It's a Jailhouse thing (we disallow the guest to move the regions
>>    around in its address space)
>> b) With 3 regions + MSI-X + MMIO registers, we run out of BARs (or
>>    would have to downgrade them to 32 bit)
> 
> Have you considered putting your three shared memory regions in memory
> consecutively, so they can be covered by a single BAR?  Similar to how a
> single BAR covers both MSI-X table and PBA.

Would still require to pass three times some size information (each
region can be different or empty/non-existent). Moreover, a) is not
possible then without ugly modifications to the guest because they
expect BAR-based regions to be relocatable.

> 
>>>> - Changed PCI base class code to 0xff (unspecified class)
>>>
>>> Changed from 0x5 (memory controller).
>>
>> Right.
>>
>>>
>>>>   This allows us to define our own sub classes and interfaces. That is
>>>>   now exploited for specifying the shared memory protocol the two
>>>>   connected peers should use. It also allows the Linux drivers to match
>>>>   on that.
>>>>
>>>> - INTx interrupts support is back
>>>>
>>>>   This is needed on target platforms without MSI controllers, i.e.
>>>>   without the required guest support. Namely some PCI-less ARM SoCs
>>>>   required the reintroduction. While doing this, we also took care of
>>>>   keeping the MMIO registers free of privileged controls so that a
>>>>   guest OS can map them safely into a guest userspace application.
>>>
>>> So you need interrupt capability.  Current upstream ivshmem requires a
>>> server such as the one in contrib/ivshmem-server/.  What about yours?
>>
>> IIRC, the need for a server with QEMU/KVM is related to live migration.
>> Jailhouse is simpler, all guests are managed by the same hypervisor
>> instance, and there is no migration. That makes interrupt delivery much
>> simpler as well. However, the device spec should not exclude other
>> architectures.
> 
> The server doesn't really help with live migration.  It's used to dole
> out file descriptors for shared memory and interrupt signalling, and to
> notify of peer connect/disconnect.

That should be solvable directly between two peers.

> 
>>> The interrupt feature enables me to guess a definition of "link": A and
>>> B are peers of the same link if they can interrupt each other.
>>>
>>> Does your ivshmem2 support interrupt-less operation similar to
>>> ivshmem-plain?
>>
>> Each receiver of interrupts is free to enable that - or leave it off as
>> it is the default after reset. But currently the spec demands that
>> either MSI-X or INTx is reported as available to the guests. We could
>> extend it to permit reporting no interrupts support if there is a good
>> case for it.
> 
> I think the case for interrupt-incapable ivshmem-plain is that
> interrupt-capable ivshmem-doorbell requires a server, and is therefore a
> bit more complex to set up, and has additional failure modes.
> 
> If that wasn't the case, a single device variant would make more sense.
> 
> Besides, contrib/ivshmem-server/ is not fit for production use.
> 
>> I will have to look into the details of the client-server structure of
>> QEMU's ivshmem again to answer the question under with restriction we
>> can make it both simpler and more robust. As Jailhouse has no live
>> migration support, requirements on ivshmem related to that may only be
>> addressed by chance so far.
> 
> Here's how live migration works with QEMU's ivshmem: exactly one peer
> (the "master") migrates with its ivshmem device, all others need to hot
> unplug ivshmem, migrate, hot plug it back after the master completed its
> migration.  The master connects to the new server on the destination on
> startup, then live migration copies over the shared memory.  The other
> peers connect to the new server when they get their ivshmem hot plugged
> again.

OK, hot-plug is a simple answer to this problem. It would be even
cleaner to support from the guest POV with the new state signalling
mechanism of ivshmem2.

> 
>>>> And then there are some extensions of the original ivshmem:
>>>>
>>>> - Multiple shared memory regions, including unidirectional ones
>>>>
>>>>   It is now possible to expose up to three different shared memory
>>>>   regions: The first one is read/writable for both sides. The second
>>>>   region is read/writable for the local peer and read-only for the
>>>>   remote peer (useful for output queues). And the third is read-only
>>>>   locally but read/writable remotely (ie. for input queues).
>>>>   Unidirectional regions prevent that the receiver of some data can
>>>>   interfere with the sender while it is still building the message, a
>>>>   property that is not only useful for safety critical communication,
>>>>   we are sure.
>>>>
>>>> - Life-cycle management via local and remote state
>>>>
>>>>   Each device can now signal its own state in form of a value to the
>>>>   remote side, which triggers an event there.
>>>
>>> How are "events" related to interrupts?
>>
>> Confusing term chosen here: an interrupt is triggered on the remote side
>> (if it has interrupts enabled).
> 
> Got it.
> 
>>>>                                               Moreover, state changes
>>>>   done by the hypervisor to one peer are signalled to the other side.
>>>>   And we introduced a write-to-shared-memory mechanism for the
>>>>   respective remote state so that guests do not have to issue an MMIO
>>>>   access in order to check the state.
>>>>
>>>> So, this is our proposal. Would be great to hear some opinions if you
>>>> see value in adding support for such an "ivshmem 2.0" device to QEMU as
>>>> well and expand its ecosystem towards Linux upstream, maybe also DPDK
>>>> again. If you see problems in the new design /wrt what QEMU provides so
>>>> far with its ivshmem device, let's discuss how to resolve them. Looking
>>>> forward to any feedback!
>>>
>>> My general opinion on ivshmem is well-known, but I repeat it for the
>>> record: merging it was a mistake, and using it is probably a mistake.  I
>>> detailed my concerns in "Why I advise against using ivshmem"[*].
>>>
>>> My philosophical concerns remain.  Perhaps you can assuage them.
>>>
>>> Only some of my practical concerns have since been addressed.  In part
>>> by myself, because having a flawed implementation of a bad idea is
>>> strictly worse than the same with flaws corrected as far as practical.
>>> But even today, docs/specs/ivshmem-spec.txt is a rather depressing read.
>>
>> I agree.
>>
>>>
>>> However, there's one thing that's still worse than a more or less flawed
>>> implementation of a bad idea: two implementations of a bad idea.  Could
>>> ivshmem2 be done in a way that permits *replacing* ivshmem?
>>
>> If people see the need for having a common ivshmem2, that should of
>> course be designed to replace the original version of QEMU. I wouldn't
>> like to design it being backward compatible, but the new version should
>> provide all useful and required features of the old one.
> 
> Nobody likes to provide backward compability, but everybody likes to
> take advantage of it :)
> 
> Seriously, I can't say whether feature parity would suffice, or whether
> we need full backward compatibility.

Given the deficits of the current design and the lacking driver support
in Linux, people should be happy if the new interface is default but the
old one could still be selected for a while. But a first step will
likely be a separate implementation of the interface.

> 
>> Of course, I'm careful with investing much time into expanding the
>> existing, for Jailhouse possibly sufficient design if there no real
>> interest in continuing the ivshmem support in QEMU - because of
>> vhost-pci or other reasons. But if that interest exists, it would be
>> beneficial for us to have QEMU supporting a compatible version and using
>> the same guest drivers. Then I would start looking into concrete patches
>> for it as well.
> 
> Interest is difficult for me to gauge, not least because alternatives
> are still being worked on.

I'm considering to suggest this as GSoC project now.

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] Towards an ivshmem 2.0?
  2017-01-17  9:59     ` Stefan Hajnoczi
  2017-01-17 10:32       ` Jan Kiszka
@ 2017-01-29 11:56       ` msuchanek
  2017-01-30 11:25         ` Stefan Hajnoczi
  1 sibling, 1 reply; 29+ messages in thread
From: msuchanek @ 2017-01-29 11:56 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Jan Kiszka, Jailhouse, Wei Wang, Marc-André Lureau,
	qemu-devel, Markus Armbruster, Qemu-devel

On 2017-01-17 10:59, Stefan Hajnoczi wrote:
> On Mon, Jan 16, 2017 at 02:10:17PM +0100, Jan Kiszka wrote:
>> On 2017-01-16 13:41, Marc-André Lureau wrote:
>> > On Mon, Jan 16, 2017 at 12:37 PM Jan Kiszka <jan.kiszka@siemens.com
>> > <mailto:jan.kiszka@siemens.com>> wrote:
>> >     So, this is our proposal. Would be great to hear some opinions if you
>> >     see value in adding support for such an "ivshmem 2.0" device to QEMU as
>> >     well and expand its ecosystem towards Linux upstream, maybe also DPDK
>> >     again. If you see problems in the new design /wrt what QEMU provides so
>> >     far with its ivshmem device, let's discuss how to resolve them. Looking
>> >     forward to any feedback!
>> >
>> >
>> > My feeling is that ivshmem is not being actively developped in qemu, but
>> > rather virtio-based solutions (vhost-pci for vm2vm).
>> 
>> As pointed out, for us it's most important to keep the design simple -
>> even at the price of "reinventing" some drivers for upstream (at 
>> least,
>> we do not need two sets of drivers because our interface is fully
>> symmetric). I don't see yet how vhost-pci could achieve the same, but
>> I'm open to learn more!
> 
> The concept of symmetry is nice but only applies for communications
> channels like networking and serial.
> 
> It doesn't apply for I/O that is fundamentally asymmetric like disk 
> I/O.
> 
> I just wanted to point this out because lack symmetry has also bothered
> me about virtio but it's actually impossible to achieve it for all
> device types.
> 

What's asymetric about storage? IIRC both SCSI and Firewire which can be
used for storage are symmetric. All asymmetry only comes from usage
convention or less capable buses like IDE/SATA.

Thanks

Michal

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] Towards an ivshmem 2.0?
  2017-01-29  8:43       ` Jan Kiszka
@ 2017-01-29 14:00         ` Marc-André Lureau
  2017-01-29 14:14           ` Jan Kiszka
  2017-01-30  8:00         ` Markus Armbruster
  1 sibling, 1 reply; 29+ messages in thread
From: Marc-André Lureau @ 2017-01-29 14:00 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Jailhouse, Markus Armbruster, qemu-devel, Wei Wang

Hi

On Sun, Jan 29, 2017 at 12:44 PM Jan Kiszka <jan.kiszka@web.de> wrote:

> >> Of course, I'm careful with investing much time into expanding the
> >> existing, for Jailhouse possibly sufficient design if there no real
> >> interest in continuing the ivshmem support in QEMU - because of
> >> vhost-pci or other reasons. But if that interest exists, it would be
> >> beneficial for us to have QEMU supporting a compatible version and using
> >> the same guest drivers. Then I would start looking into concrete patches
> >> for it as well.
> >
> > Interest is difficult for me to gauge, not least because alternatives
> > are still being worked on.
>
> I'm considering to suggest this as GSoC project now.
>

It's better for a student and for the community if the work get accepted in
the end.

So, I think that could be an intersting GSoC (implementing your ivshmem 2
proposal). However, if the qemu community isn't ready to accept a new
ivshmem, and would rather have vhost-pci based solution, I would suggest a
different project (hopefully Wei Wang can help define it and mentor): work
on a vhost-pci using dedicated shared PCI BARs (and kernel support to avoid
extra copy - if I understand the extra copy situation correctly).
-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] Towards an ivshmem 2.0?
  2017-01-29 14:00         ` Marc-André Lureau
@ 2017-01-29 14:14           ` Jan Kiszka
  2017-01-30  8:02             ` Markus Armbruster
  2017-01-31  2:51             ` Wang, Wei W
  0 siblings, 2 replies; 29+ messages in thread
From: Jan Kiszka @ 2017-01-29 14:14 UTC (permalink / raw)
  To: Marc-André Lureau; +Cc: Jailhouse, Markus Armbruster, qemu-devel, Wei Wang

On 2017-01-29 15:00, Marc-André Lureau wrote:
> Hi
> 
> On Sun, Jan 29, 2017 at 12:44 PM Jan Kiszka <jan.kiszka@web.de
> <mailto:jan.kiszka@web.de>> wrote:
> 
>     >> Of course, I'm careful with investing much time into expanding the
>     >> existing, for Jailhouse possibly sufficient design if there no real
>     >> interest in continuing the ivshmem support in QEMU - because of
>     >> vhost-pci or other reasons. But if that interest exists, it would be
>     >> beneficial for us to have QEMU supporting a compatible version
>     and using
>     >> the same guest drivers. Then I would start looking into concrete
>     patches
>     >> for it as well.
>     >
>     > Interest is difficult for me to gauge, not least because alternatives
>     > are still being worked on.
> 
>     I'm considering to suggest this as GSoC project now.
> 
> 
> It's better for a student and for the community if the work get accepted
> in the end.
> 
> So, I think that could be an intersting GSoC (implementing your ivshmem
> 2 proposal). However, if the qemu community isn't ready to accept a new
> ivshmem, and would rather have vhost-pci based solution, I would suggest
> a different project (hopefully Wei Wang can help define it and mentor):
> work on a vhost-pci using dedicated shared PCI BARs (and kernel support
> to avoid extra copy - if I understand the extra copy situation correctly).

It's still open if vhost-pci can replace ivshmem (not to speak of being
desirable for Jailhouse - I'm still studying). In that light, having
both implementations available to do real comparisons is valuable IMHO.

That said, we will play with open cards, explain the student the
situation and let her/him decide knowingly.

Jan

PS: We have a mixed history /wrt actually merging student projects.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] Towards an ivshmem 2.0?
  2017-01-29  8:43       ` Jan Kiszka
  2017-01-29 14:00         ` Marc-André Lureau
@ 2017-01-30  8:00         ` Markus Armbruster
  2017-01-30  8:14           ` Jan Kiszka
  1 sibling, 1 reply; 29+ messages in thread
From: Markus Armbruster @ 2017-01-30  8:00 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Jailhouse, qemu-devel

Jan Kiszka <jan.kiszka@web.de> writes:

> On 2017-01-27 20:36, Markus Armbruster wrote:
>> Jan Kiszka <jan.kiszka@web.de> writes:
>> 
>>> On 2017-01-23 15:19, Markus Armbruster wrote:
>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>
>>>>> Hi,
>>>>>
>>>>> some of you may know that we are using a shared memory device similar to
>>>>> ivshmem in the partitioning hypervisor Jailhouse [1].
>>>>>
>>>>> We started as being compatible to the original ivshmem that QEMU
>>>>> implements, but we quickly deviated in some details, and in the recent
>>>>> months even more. Some of the deviations are related to making the
>>>>> implementation simpler. The new ivshmem takes <500 LoC - Jailhouse is
>>>>
>>>> Compare: hw/misc/ivshmem.c ~1000 SLOC, measured with sloccount.
>>>
>>> That difference comes from remote/migration support and general QEMU
>>> integration - likely not very telling due to the different environments.
>> 
>> Plausible.
>> 
>>>>> aiming at safety critical systems and, therefore, a small code base.
>>>>> Other changes address deficits in the original design, like missing
>>>>> life-cycle management.
>>>>>
>>>>> Now the question is if there is interest in defining a common new
>>>>> revision of this device and maybe also of some protocols used on top,
>>>>> such as virtual network links. Ideally, this would enable us to share
>>>>> Linux drivers. We will definitely go for upstreaming at least a network
>>>>> driver such as [2], a UIO driver and maybe also a serial port/console.
>>>>>
>>>>> I've attached a first draft of the specification of our new ivshmem
>>>>> device. A working implementation can be found in the wip/ivshmem2 branch
>>>>> of Jailhouse [3], the corresponding ivshmem-net driver in [4].
>>>>>
>>>>> Deviations from the original design:
>>>>>
>>>>> - Only two peers per link
>>>>
>>>> Uh, define "link".
>>>
>>> VMs are linked via a common shared memory. Interrupt delivery follows
>>> that route as well.
>>>
>>>>
>>>>>   This simplifies the implementation and also the interfaces (think of
>>>>>   life-cycle management in a multi-peer environment). Moreover, we do
>>>>>   not have an urgent use case for multiple peers, thus also not
>>>>>   reference for a protocol that could be used in such setups. If someone
>>>>>   else happens to share such a protocol, it would be possible to discuss
>>>>>   potential extensions and their implications.
>>>>>
>>>>> - Side-band registers to discover and configure share memory regions
>>>>>
>>>>>   This was one of the first changes: We removed the memory regions from
>>>>>   the PCI BARs and gave them special configuration space registers. By
>>>>>   now, these registers are embedded in a PCI capability. The reasons are
>>>>>   that Jailhouse does not allow to relocate the regions in guest address
>>>>>   space (but other hypervisors may if they like to) and that we now have
>>>>>   up to three of them.
>>>>
>>>> I'm afraid I don't quite understand the change, nor the rationale.  I
>>>> guess I could figure out the former by studying the specification.
>>>
>>> a) It's a Jailhouse thing (we disallow the guest to move the regions
>>>    around in its address space)
>>> b) With 3 regions + MSI-X + MMIO registers, we run out of BARs (or
>>>    would have to downgrade them to 32 bit)
>> 
>> Have you considered putting your three shared memory regions in memory
>> consecutively, so they can be covered by a single BAR?  Similar to how a
>> single BAR covers both MSI-X table and PBA.
>
> Would still require to pass three times some size information (each
> region can be different or empty/non-existent).

Yes.  Precedence: location of MSI-X table and PBA are specified in the
MSI-X Capability Structure as offset and BIR.

>                                                 Moreover, a) is not
> possible then without ugly modifications to the guest because they
> expect BAR-based regions to be relocatable.

Can you explain why not letting the guest map the shared memory into its
address space on its own just like any other piece of device memory is a
requirement?

>>>>> - Changed PCI base class code to 0xff (unspecified class)
>>>>
>>>> Changed from 0x5 (memory controller).
>>>
>>> Right.
>>>
>>>>
>>>>>   This allows us to define our own sub classes and interfaces. That is
>>>>>   now exploited for specifying the shared memory protocol the two
>>>>>   connected peers should use. It also allows the Linux drivers to match
>>>>>   on that.
>>>>>
>>>>> - INTx interrupts support is back
>>>>>
>>>>>   This is needed on target platforms without MSI controllers, i.e.
>>>>>   without the required guest support. Namely some PCI-less ARM SoCs
>>>>>   required the reintroduction. While doing this, we also took care of
>>>>>   keeping the MMIO registers free of privileged controls so that a
>>>>>   guest OS can map them safely into a guest userspace application.
>>>>
>>>> So you need interrupt capability.  Current upstream ivshmem requires a
>>>> server such as the one in contrib/ivshmem-server/.  What about yours?
>>>
>>> IIRC, the need for a server with QEMU/KVM is related to live migration.
>>> Jailhouse is simpler, all guests are managed by the same hypervisor
>>> instance, and there is no migration. That makes interrupt delivery much
>>> simpler as well. However, the device spec should not exclude other
>>> architectures.
>> 
>> The server doesn't really help with live migration.  It's used to dole
>> out file descriptors for shared memory and interrupt signalling, and to
>> notify of peer connect/disconnect.
>
> That should be solvable directly between two peers.

Even between multiple peers, but it might complicate the peers.

Note that the current ivshmem client-server protocol doesn't support
graceful recovery from a server crash.  The clients can hobble on with
reduced functionality, though (see ivshmem-spec.txt).  Live migration
could be a way to recover, if the application permits it.

>>>> The interrupt feature enables me to guess a definition of "link": A and
>>>> B are peers of the same link if they can interrupt each other.
>>>>
>>>> Does your ivshmem2 support interrupt-less operation similar to
>>>> ivshmem-plain?
>>>
>>> Each receiver of interrupts is free to enable that - or leave it off as
>>> it is the default after reset. But currently the spec demands that
>>> either MSI-X or INTx is reported as available to the guests. We could
>>> extend it to permit reporting no interrupts support if there is a good
>>> case for it.
>> 
>> I think the case for interrupt-incapable ivshmem-plain is that
>> interrupt-capable ivshmem-doorbell requires a server, and is therefore a
>> bit more complex to set up, and has additional failure modes.
>> 
>> If that wasn't the case, a single device variant would make more sense.
>> 
>> Besides, contrib/ivshmem-server/ is not fit for production use.
>> 
>>> I will have to look into the details of the client-server structure of
>>> QEMU's ivshmem again to answer the question under with restriction we
>>> can make it both simpler and more robust. As Jailhouse has no live
>>> migration support, requirements on ivshmem related to that may only be
>>> addressed by chance so far.
>> 
>> Here's how live migration works with QEMU's ivshmem: exactly one peer
>> (the "master") migrates with its ivshmem device, all others need to hot
>> unplug ivshmem, migrate, hot plug it back after the master completed its
>> migration.  The master connects to the new server on the destination on
>> startup, then live migration copies over the shared memory.  The other
>> peers connect to the new server when they get their ivshmem hot plugged
>> again.
>
> OK, hot-plug is a simple answer to this problem. It would be even
> cleaner to support from the guest POV with the new state signalling
> mechanism of ivshmem2.

Yes, proper state signalling should make this cleaner.  Without it,
every protocol built on top of ivshmem needs to come up with its own
state signalling.  The robustness problems should be obvious.

This is one aspect of my objection to the idea "just share some memory,
it's simple": it's not a protocol.  It's at best a building block for
protocols.

With ivshmem-doorbell, peers get notified of connects and disconnects.
However, the device can't notify guest software.  Fixable with
additional registers and an interrupt.

The design of ivshmem-plain has peers knowing nothing about their peers,
so a fix would require a redesign.

[...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] Towards an ivshmem 2.0?
  2017-01-29 14:14           ` Jan Kiszka
@ 2017-01-30  8:02             ` Markus Armbruster
  2017-01-30  8:05               ` Jan Kiszka
  2017-01-31  2:51             ` Wang, Wei W
  1 sibling, 1 reply; 29+ messages in thread
From: Markus Armbruster @ 2017-01-30  8:02 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Marc-André Lureau, Jailhouse, Wei Wang, qemu-devel

Jan Kiszka <jan.kiszka@web.de> writes:

> On 2017-01-29 15:00, Marc-André Lureau wrote:
>> Hi
>> 
>> On Sun, Jan 29, 2017 at 12:44 PM Jan Kiszka <jan.kiszka@web.de
>> <mailto:jan.kiszka@web.de>> wrote:
>> 
>>     >> Of course, I'm careful with investing much time into expanding the
>>     >> existing, for Jailhouse possibly sufficient design if there no real
>>     >> interest in continuing the ivshmem support in QEMU - because of
>>     >> vhost-pci or other reasons. But if that interest exists, it would be
>>     >> beneficial for us to have QEMU supporting a compatible version
>>     and using
>>     >> the same guest drivers. Then I would start looking into concrete
>>     patches
>>     >> for it as well.
>>     >
>>     > Interest is difficult for me to gauge, not least because alternatives
>>     > are still being worked on.
>> 
>>     I'm considering to suggest this as GSoC project now.
>> 
>> 
>> It's better for a student and for the community if the work get accepted
>> in the end.

Yes.

>> So, I think that could be an intersting GSoC (implementing your ivshmem
>> 2 proposal). However, if the qemu community isn't ready to accept a new
>> ivshmem, and would rather have vhost-pci based solution, I would suggest
>> a different project (hopefully Wei Wang can help define it and mentor):
>> work on a vhost-pci using dedicated shared PCI BARs (and kernel support
>> to avoid extra copy - if I understand the extra copy situation correctly).
>
> It's still open if vhost-pci can replace ivshmem (not to speak of being
> desirable for Jailhouse - I'm still studying). In that light, having
> both implementations available to do real comparisons is valuable IMHO.

Yes, but is it appropriate for GSoC?

> That said, we will play with open cards, explain the student the
> situation and let her/him decide knowingly.

Both the student and the QEMU project need to consider the situation
carefully.

> Jan
>
> PS: We have a mixed history /wrt actually merging student projects.

Yes, but having screwed up is no license to screw up some more :)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] Towards an ivshmem 2.0?
  2017-01-30  8:02             ` Markus Armbruster
@ 2017-01-30  8:05               ` Jan Kiszka
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Kiszka @ 2017-01-30  8:05 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Marc-André Lureau, Jailhouse, Wei Wang, qemu-devel

On 2017-01-30 09:02, Markus Armbruster wrote:
> Jan Kiszka <jan.kiszka@web.de> writes:
> 
>> On 2017-01-29 15:00, Marc-André Lureau wrote:
>>> Hi
>>>
>>> On Sun, Jan 29, 2017 at 12:44 PM Jan Kiszka <jan.kiszka@web.de
>>> <mailto:jan.kiszka@web.de>> wrote:
>>>
>>>     >> Of course, I'm careful with investing much time into expanding the
>>>     >> existing, for Jailhouse possibly sufficient design if there no real
>>>     >> interest in continuing the ivshmem support in QEMU - because of
>>>     >> vhost-pci or other reasons. But if that interest exists, it would be
>>>     >> beneficial for us to have QEMU supporting a compatible version
>>>     and using
>>>     >> the same guest drivers. Then I would start looking into concrete
>>>     patches
>>>     >> for it as well.
>>>     >
>>>     > Interest is difficult for me to gauge, not least because alternatives
>>>     > are still being worked on.
>>>
>>>     I'm considering to suggest this as GSoC project now.
>>>
>>>
>>> It's better for a student and for the community if the work get accepted
>>> in the end.
> 
> Yes.
> 
>>> So, I think that could be an intersting GSoC (implementing your ivshmem
>>> 2 proposal). However, if the qemu community isn't ready to accept a new
>>> ivshmem, and would rather have vhost-pci based solution, I would suggest
>>> a different project (hopefully Wei Wang can help define it and mentor):
>>> work on a vhost-pci using dedicated shared PCI BARs (and kernel support
>>> to avoid extra copy - if I understand the extra copy situation correctly).
>>
>> It's still open if vhost-pci can replace ivshmem (not to speak of being
>> desirable for Jailhouse - I'm still studying). In that light, having
>> both implementations available to do real comparisons is valuable IMHO.
> 
> Yes, but is it appropriate for GSoC?
> 
>> That said, we will play with open cards, explain the student the
>> situation and let her/him decide knowingly.
> 
> Both the student and the QEMU project need to consider the situation
> carefully.
> 
>> Jan
>>
>> PS: We have a mixed history /wrt actually merging student projects.
> 
> Yes, but having screwed up is no license to screw up some more :)
> 

After having received multiple feedbacks in this direction, I will drop
that proposal from our list. So, don't worry. ;)

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] Towards an ivshmem 2.0?
  2017-01-30  8:00         ` Markus Armbruster
@ 2017-01-30  8:14           ` Jan Kiszka
  2017-01-30 12:19             ` Markus Armbruster
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Kiszka @ 2017-01-30  8:14 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Jailhouse, qemu-devel

On 2017-01-30 09:00, Markus Armbruster wrote:
> Jan Kiszka <jan.kiszka@web.de> writes:
> 
>> On 2017-01-27 20:36, Markus Armbruster wrote:
>>> Jan Kiszka <jan.kiszka@web.de> writes:
>>>
>>>> On 2017-01-23 15:19, Markus Armbruster wrote:
>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> some of you may know that we are using a shared memory device similar to
>>>>>> ivshmem in the partitioning hypervisor Jailhouse [1].
>>>>>>
>>>>>> We started as being compatible to the original ivshmem that QEMU
>>>>>> implements, but we quickly deviated in some details, and in the recent
>>>>>> months even more. Some of the deviations are related to making the
>>>>>> implementation simpler. The new ivshmem takes <500 LoC - Jailhouse is
>>>>>
>>>>> Compare: hw/misc/ivshmem.c ~1000 SLOC, measured with sloccount.
>>>>
>>>> That difference comes from remote/migration support and general QEMU
>>>> integration - likely not very telling due to the different environments.
>>>
>>> Plausible.
>>>
>>>>>> aiming at safety critical systems and, therefore, a small code base.
>>>>>> Other changes address deficits in the original design, like missing
>>>>>> life-cycle management.
>>>>>>
>>>>>> Now the question is if there is interest in defining a common new
>>>>>> revision of this device and maybe also of some protocols used on top,
>>>>>> such as virtual network links. Ideally, this would enable us to share
>>>>>> Linux drivers. We will definitely go for upstreaming at least a network
>>>>>> driver such as [2], a UIO driver and maybe also a serial port/console.
>>>>>>
>>>>>> I've attached a first draft of the specification of our new ivshmem
>>>>>> device. A working implementation can be found in the wip/ivshmem2 branch
>>>>>> of Jailhouse [3], the corresponding ivshmem-net driver in [4].
>>>>>>
>>>>>> Deviations from the original design:
>>>>>>
>>>>>> - Only two peers per link
>>>>>
>>>>> Uh, define "link".
>>>>
>>>> VMs are linked via a common shared memory. Interrupt delivery follows
>>>> that route as well.
>>>>
>>>>>
>>>>>>   This simplifies the implementation and also the interfaces (think of
>>>>>>   life-cycle management in a multi-peer environment). Moreover, we do
>>>>>>   not have an urgent use case for multiple peers, thus also not
>>>>>>   reference for a protocol that could be used in such setups. If someone
>>>>>>   else happens to share such a protocol, it would be possible to discuss
>>>>>>   potential extensions and their implications.
>>>>>>
>>>>>> - Side-band registers to discover and configure share memory regions
>>>>>>
>>>>>>   This was one of the first changes: We removed the memory regions from
>>>>>>   the PCI BARs and gave them special configuration space registers. By
>>>>>>   now, these registers are embedded in a PCI capability. The reasons are
>>>>>>   that Jailhouse does not allow to relocate the regions in guest address
>>>>>>   space (but other hypervisors may if they like to) and that we now have
>>>>>>   up to three of them.
>>>>>
>>>>> I'm afraid I don't quite understand the change, nor the rationale.  I
>>>>> guess I could figure out the former by studying the specification.
>>>>
>>>> a) It's a Jailhouse thing (we disallow the guest to move the regions
>>>>    around in its address space)
>>>> b) With 3 regions + MSI-X + MMIO registers, we run out of BARs (or
>>>>    would have to downgrade them to 32 bit)
>>>
>>> Have you considered putting your three shared memory regions in memory
>>> consecutively, so they can be covered by a single BAR?  Similar to how a
>>> single BAR covers both MSI-X table and PBA.
>>
>> Would still require to pass three times some size information (each
>> region can be different or empty/non-existent).
> 
> Yes.  Precedence: location of MSI-X table and PBA are specified in the
> MSI-X Capability Structure as offset and BIR.
> 
>>                                                 Moreover, a) is not
>> possible then without ugly modifications to the guest because they
>> expect BAR-based regions to be relocatable.
> 
> Can you explain why not letting the guest map the shared memory into its
> address space on its own just like any other piece of device memory is a
> requirement?

It requires reconfiguration of the sensitive 2nd level page tables
during runtime of the guest. We are avoiding the neccessery checking and
synchronization measures so far which reduces code complexity further.

BTW, PCI has a similar concept of static assignment (PCI EA), but that
is unfortunately incompatible to our needs [1].

> 
>>>>>> - Changed PCI base class code to 0xff (unspecified class)
>>>>>
>>>>> Changed from 0x5 (memory controller).
>>>>
>>>> Right.
>>>>
>>>>>
>>>>>>   This allows us to define our own sub classes and interfaces. That is
>>>>>>   now exploited for specifying the shared memory protocol the two
>>>>>>   connected peers should use. It also allows the Linux drivers to match
>>>>>>   on that.
>>>>>>
>>>>>> - INTx interrupts support is back
>>>>>>
>>>>>>   This is needed on target platforms without MSI controllers, i.e.
>>>>>>   without the required guest support. Namely some PCI-less ARM SoCs
>>>>>>   required the reintroduction. While doing this, we also took care of
>>>>>>   keeping the MMIO registers free of privileged controls so that a
>>>>>>   guest OS can map them safely into a guest userspace application.
>>>>>
>>>>> So you need interrupt capability.  Current upstream ivshmem requires a
>>>>> server such as the one in contrib/ivshmem-server/.  What about yours?
>>>>
>>>> IIRC, the need for a server with QEMU/KVM is related to live migration.
>>>> Jailhouse is simpler, all guests are managed by the same hypervisor
>>>> instance, and there is no migration. That makes interrupt delivery much
>>>> simpler as well. However, the device spec should not exclude other
>>>> architectures.
>>>
>>> The server doesn't really help with live migration.  It's used to dole
>>> out file descriptors for shared memory and interrupt signalling, and to
>>> notify of peer connect/disconnect.
>>
>> That should be solvable directly between two peers.
> 
> Even between multiple peers, but it might complicate the peers.
> 
> Note that the current ivshmem client-server protocol doesn't support
> graceful recovery from a server crash.  The clients can hobble on with
> reduced functionality, though (see ivshmem-spec.txt).  Live migration
> could be a way to recover, if the application permits it.
> 
>>>>> The interrupt feature enables me to guess a definition of "link": A and
>>>>> B are peers of the same link if they can interrupt each other.
>>>>>
>>>>> Does your ivshmem2 support interrupt-less operation similar to
>>>>> ivshmem-plain?
>>>>
>>>> Each receiver of interrupts is free to enable that - or leave it off as
>>>> it is the default after reset. But currently the spec demands that
>>>> either MSI-X or INTx is reported as available to the guests. We could
>>>> extend it to permit reporting no interrupts support if there is a good
>>>> case for it.
>>>
>>> I think the case for interrupt-incapable ivshmem-plain is that
>>> interrupt-capable ivshmem-doorbell requires a server, and is therefore a
>>> bit more complex to set up, and has additional failure modes.
>>>
>>> If that wasn't the case, a single device variant would make more sense.
>>>
>>> Besides, contrib/ivshmem-server/ is not fit for production use.
>>>
>>>> I will have to look into the details of the client-server structure of
>>>> QEMU's ivshmem again to answer the question under with restriction we
>>>> can make it both simpler and more robust. As Jailhouse has no live
>>>> migration support, requirements on ivshmem related to that may only be
>>>> addressed by chance so far.
>>>
>>> Here's how live migration works with QEMU's ivshmem: exactly one peer
>>> (the "master") migrates with its ivshmem device, all others need to hot
>>> unplug ivshmem, migrate, hot plug it back after the master completed its
>>> migration.  The master connects to the new server on the destination on
>>> startup, then live migration copies over the shared memory.  The other
>>> peers connect to the new server when they get their ivshmem hot plugged
>>> again.
>>
>> OK, hot-plug is a simple answer to this problem. It would be even
>> cleaner to support from the guest POV with the new state signalling
>> mechanism of ivshmem2.
> 
> Yes, proper state signalling should make this cleaner.  Without it,
> every protocol built on top of ivshmem needs to come up with its own
> state signalling.  The robustness problems should be obvious.
> 
> This is one aspect of my objection to the idea "just share some memory,
> it's simple": it's not a protocol.  It's at best a building block for
> protocols.

True, but that is exactly the advantage we see for our case: The
hypervisor needs no knowledge about the protocol run over the link. That
was one reason to avoid virtio so far.

> 
> With ivshmem-doorbell, peers get notified of connects and disconnects.
> However, the device can't notify guest software.  Fixable with
> additional registers and an interrupt.
> 
> The design of ivshmem-plain has peers knowing nothing about their peers,
> so a fix would require a redesign.
> 
> [...]
> 

Jan

[1] https://groups.google.com/forum/#!topic/jailhouse-dev/H62ahr0_bRk

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] Towards an ivshmem 2.0?
  2017-01-29 11:56       ` msuchanek
@ 2017-01-30 11:25         ` Stefan Hajnoczi
  0 siblings, 0 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2017-01-30 11:25 UTC (permalink / raw)
  To: msuchanek
  Cc: Jan Kiszka, Jailhouse, Wei Wang, Marc-André Lureau,
	qemu-devel, Markus Armbruster, Qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2998 bytes --]

On Sun, Jan 29, 2017 at 12:56:23PM +0100, msuchanek wrote:
> On 2017-01-17 10:59, Stefan Hajnoczi wrote:
> > On Mon, Jan 16, 2017 at 02:10:17PM +0100, Jan Kiszka wrote:
> > > On 2017-01-16 13:41, Marc-André Lureau wrote:
> > > > On Mon, Jan 16, 2017 at 12:37 PM Jan Kiszka <jan.kiszka@siemens.com
> > > > <mailto:jan.kiszka@siemens.com>> wrote:
> > > >     So, this is our proposal. Would be great to hear some opinions if you
> > > >     see value in adding support for such an "ivshmem 2.0" device to QEMU as
> > > >     well and expand its ecosystem towards Linux upstream, maybe also DPDK
> > > >     again. If you see problems in the new design /wrt what QEMU provides so
> > > >     far with its ivshmem device, let's discuss how to resolve them. Looking
> > > >     forward to any feedback!
> > > >
> > > >
> > > > My feeling is that ivshmem is not being actively developped in qemu, but
> > > > rather virtio-based solutions (vhost-pci for vm2vm).
> > > 
> > > As pointed out, for us it's most important to keep the design simple -
> > > even at the price of "reinventing" some drivers for upstream (at
> > > least,
> > > we do not need two sets of drivers because our interface is fully
> > > symmetric). I don't see yet how vhost-pci could achieve the same, but
> > > I'm open to learn more!
> > 
> > The concept of symmetry is nice but only applies for communications
> > channels like networking and serial.
> > 
> > It doesn't apply for I/O that is fundamentally asymmetric like disk I/O.
> > 
> > I just wanted to point this out because lack symmetry has also bothered
> > me about virtio but it's actually impossible to achieve it for all
> > device types.
> > 
> 
> What's asymetric about storage? IIRC both SCSI and Firewire which can be
> used for storage are symmetric. All asymmetry only comes from usage
> convention or less capable buses like IDE/SATA.

I'll also add Intel NVMe and virtio-blk to the list of interfaces that
are not symmetric.

Even for SCSI, separate roles for initiator and target are central to
the SCSI Architecture Model.  The consequence is that hardware
interfaces and software stacks are not symmetric.  For example, the
Linux SCSI target only supports a small set of FC HBAs with explicit
target mode support rather than all SCSI HBAs.

Intuitively this makes sense - if the I/O has clear "client" and
"server" roles then why should both sides implement both roles?  It adds
cost and complexity for no benefit.

The same goes for other device types like graphics cards.  They are
inherently asymmetric.  Only one side has the actual hardware to perform
the I/O so it doesn't make sense to be symmetric.

You can pretend they are symmetric by restricting the hardware interface
and driver to just message passing.  Then another layer of software
handles the asymmetric behavior.  But then you may as well use iSCSI,
VNC, etc and not have a hardware interface for disk and graphics.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] Towards an ivshmem 2.0?
  2017-01-30  8:14           ` Jan Kiszka
@ 2017-01-30 12:19             ` Markus Armbruster
  2017-01-30 15:57               ` Jan Kiszka
  0 siblings, 1 reply; 29+ messages in thread
From: Markus Armbruster @ 2017-01-30 12:19 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Jailhouse, qemu-devel

Jan Kiszka <jan.kiszka@web.de> writes:

> On 2017-01-30 09:00, Markus Armbruster wrote:
>> Jan Kiszka <jan.kiszka@web.de> writes:
>> 
>>> On 2017-01-27 20:36, Markus Armbruster wrote:
>>>> Jan Kiszka <jan.kiszka@web.de> writes:
>>>>
>>>>> On 2017-01-23 15:19, Markus Armbruster wrote:
>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> some of you may know that we are using a shared memory device similar to
>>>>>>> ivshmem in the partitioning hypervisor Jailhouse [1].
>>>>>>>
>>>>>>> We started as being compatible to the original ivshmem that QEMU
>>>>>>> implements, but we quickly deviated in some details, and in the recent
>>>>>>> months even more. Some of the deviations are related to making the
>>>>>>> implementation simpler. The new ivshmem takes <500 LoC - Jailhouse is
>>>>>>
>>>>>> Compare: hw/misc/ivshmem.c ~1000 SLOC, measured with sloccount.
>>>>>
>>>>> That difference comes from remote/migration support and general QEMU
>>>>> integration - likely not very telling due to the different environments.
>>>>
>>>> Plausible.
>>>>
>>>>>>> aiming at safety critical systems and, therefore, a small code base.
>>>>>>> Other changes address deficits in the original design, like missing
>>>>>>> life-cycle management.
>>>>>>>
>>>>>>> Now the question is if there is interest in defining a common new
>>>>>>> revision of this device and maybe also of some protocols used on top,
>>>>>>> such as virtual network links. Ideally, this would enable us to share
>>>>>>> Linux drivers. We will definitely go for upstreaming at least a network
>>>>>>> driver such as [2], a UIO driver and maybe also a serial port/console.
>>>>>>>
>>>>>>> I've attached a first draft of the specification of our new ivshmem
>>>>>>> device. A working implementation can be found in the wip/ivshmem2 branch
>>>>>>> of Jailhouse [3], the corresponding ivshmem-net driver in [4].
>>>>>>>
>>>>>>> Deviations from the original design:
>>>>>>>
>>>>>>> - Only two peers per link
>>>>>>
>>>>>> Uh, define "link".
>>>>>
>>>>> VMs are linked via a common shared memory. Interrupt delivery follows
>>>>> that route as well.
>>>>>
>>>>>>
>>>>>>>   This simplifies the implementation and also the interfaces (think of
>>>>>>>   life-cycle management in a multi-peer environment). Moreover, we do
>>>>>>>   not have an urgent use case for multiple peers, thus also not
>>>>>>>   reference for a protocol that could be used in such setups. If someone
>>>>>>>   else happens to share such a protocol, it would be possible to discuss
>>>>>>>   potential extensions and their implications.
>>>>>>>
>>>>>>> - Side-band registers to discover and configure share memory regions
>>>>>>>
>>>>>>>   This was one of the first changes: We removed the memory regions from
>>>>>>>   the PCI BARs and gave them special configuration space registers. By
>>>>>>>   now, these registers are embedded in a PCI capability. The reasons are
>>>>>>>   that Jailhouse does not allow to relocate the regions in guest address
>>>>>>>   space (but other hypervisors may if they like to) and that we now have
>>>>>>>   up to three of them.
>>>>>>
>>>>>> I'm afraid I don't quite understand the change, nor the rationale.  I
>>>>>> guess I could figure out the former by studying the specification.
>>>>>
>>>>> a) It's a Jailhouse thing (we disallow the guest to move the regions
>>>>>    around in its address space)
>>>>> b) With 3 regions + MSI-X + MMIO registers, we run out of BARs (or
>>>>>    would have to downgrade them to 32 bit)
>>>>
>>>> Have you considered putting your three shared memory regions in memory
>>>> consecutively, so they can be covered by a single BAR?  Similar to how a
>>>> single BAR covers both MSI-X table and PBA.
>>>
>>> Would still require to pass three times some size information (each
>>> region can be different or empty/non-existent).
>> 
>> Yes.  Precedence: location of MSI-X table and PBA are specified in the
>> MSI-X Capability Structure as offset and BIR.
>> 
>>>                                                 Moreover, a) is not
>>> possible then without ugly modifications to the guest because they
>>> expect BAR-based regions to be relocatable.
>> 
>> Can you explain why not letting the guest map the shared memory into its
>> address space on its own just like any other piece of device memory is a
>> requirement?
>
> It requires reconfiguration of the sensitive 2nd level page tables
> during runtime of the guest. We are avoiding the neccessery checking and
> synchronization measures so far which reduces code complexity further.

You mean the hypervisor needs to act when the guest maps BARs, and that
gives the guest an attack vector?

Don't you have to deal with that anyway, for other PCI devices?

This is just out of curiosity, feel free to ignore me :)

> BTW, PCI has a similar concept of static assignment (PCI EA), but that
> is unfortunately incompatible to our needs [1].

Interesting.

>> 
>>>>>>> - Changed PCI base class code to 0xff (unspecified class)
>>>>>>
>>>>>> Changed from 0x5 (memory controller).
>>>>>
>>>>> Right.
>>>>>
>>>>>>
>>>>>>>   This allows us to define our own sub classes and interfaces. That is
>>>>>>>   now exploited for specifying the shared memory protocol the two
>>>>>>>   connected peers should use. It also allows the Linux drivers to match
>>>>>>>   on that.
>>>>>>>
>>>>>>> - INTx interrupts support is back
>>>>>>>
>>>>>>>   This is needed on target platforms without MSI controllers, i.e.
>>>>>>>   without the required guest support. Namely some PCI-less ARM SoCs
>>>>>>>   required the reintroduction. While doing this, we also took care of
>>>>>>>   keeping the MMIO registers free of privileged controls so that a
>>>>>>>   guest OS can map them safely into a guest userspace application.
>>>>>>
>>>>>> So you need interrupt capability.  Current upstream ivshmem requires a
>>>>>> server such as the one in contrib/ivshmem-server/.  What about yours?
>>>>>
>>>>> IIRC, the need for a server with QEMU/KVM is related to live migration.
>>>>> Jailhouse is simpler, all guests are managed by the same hypervisor
>>>>> instance, and there is no migration. That makes interrupt delivery much
>>>>> simpler as well. However, the device spec should not exclude other
>>>>> architectures.
>>>>
>>>> The server doesn't really help with live migration.  It's used to dole
>>>> out file descriptors for shared memory and interrupt signalling, and to
>>>> notify of peer connect/disconnect.
>>>
>>> That should be solvable directly between two peers.
>> 
>> Even between multiple peers, but it might complicate the peers.
>> 
>> Note that the current ivshmem client-server protocol doesn't support
>> graceful recovery from a server crash.  The clients can hobble on with
>> reduced functionality, though (see ivshmem-spec.txt).  Live migration
>> could be a way to recover, if the application permits it.
>> 
>>>>>> The interrupt feature enables me to guess a definition of "link": A and
>>>>>> B are peers of the same link if they can interrupt each other.
>>>>>>
>>>>>> Does your ivshmem2 support interrupt-less operation similar to
>>>>>> ivshmem-plain?
>>>>>
>>>>> Each receiver of interrupts is free to enable that - or leave it off as
>>>>> it is the default after reset. But currently the spec demands that
>>>>> either MSI-X or INTx is reported as available to the guests. We could
>>>>> extend it to permit reporting no interrupts support if there is a good
>>>>> case for it.
>>>>
>>>> I think the case for interrupt-incapable ivshmem-plain is that
>>>> interrupt-capable ivshmem-doorbell requires a server, and is therefore a
>>>> bit more complex to set up, and has additional failure modes.
>>>>
>>>> If that wasn't the case, a single device variant would make more sense.
>>>>
>>>> Besides, contrib/ivshmem-server/ is not fit for production use.
>>>>
>>>>> I will have to look into the details of the client-server structure of
>>>>> QEMU's ivshmem again to answer the question under with restriction we
>>>>> can make it both simpler and more robust. As Jailhouse has no live
>>>>> migration support, requirements on ivshmem related to that may only be
>>>>> addressed by chance so far.
>>>>
>>>> Here's how live migration works with QEMU's ivshmem: exactly one peer
>>>> (the "master") migrates with its ivshmem device, all others need to hot
>>>> unplug ivshmem, migrate, hot plug it back after the master completed its
>>>> migration.  The master connects to the new server on the destination on
>>>> startup, then live migration copies over the shared memory.  The other
>>>> peers connect to the new server when they get their ivshmem hot plugged
>>>> again.
>>>
>>> OK, hot-plug is a simple answer to this problem. It would be even
>>> cleaner to support from the guest POV with the new state signalling
>>> mechanism of ivshmem2.
>> 
>> Yes, proper state signalling should make this cleaner.  Without it,
>> every protocol built on top of ivshmem needs to come up with its own
>> state signalling.  The robustness problems should be obvious.
>> 
>> This is one aspect of my objection to the idea "just share some memory,
>> it's simple": it's not a protocol.  It's at best a building block for
>> protocols.
>
> True, but that is exactly the advantage we see for our case: The
> hypervisor needs no knowledge about the protocol run over the link. That
> was one reason to avoid virtio so far.

I understand where you're coming from.  I think the correct answer is to
layer protocols, and choose carefully how much of the stack to keep in
the hypervisor.

I feel you take (at least) two steps towards providing a (low-level)
protocol.  One, you provide for an ID of the next higher protocol level
(see "Changed PCI base class code" above).  Two, you include generic
state signalling.

>> With ivshmem-doorbell, peers get notified of connects and disconnects.
>> However, the device can't notify guest software.  Fixable with
>> additional registers and an interrupt.
>> 
>> The design of ivshmem-plain has peers knowing nothing about their peers,
>> so a fix would require a redesign.
>> 
>> [...]
>> 
>
> Jan
>
> [1] https://groups.google.com/forum/#!topic/jailhouse-dev/H62ahr0_bRk

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] Towards an ivshmem 2.0?
  2017-01-30 12:19             ` Markus Armbruster
@ 2017-01-30 15:57               ` Jan Kiszka
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Kiszka @ 2017-01-30 15:57 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Jailhouse, qemu-devel

On 2017-01-30 13:19, Markus Armbruster wrote:
>>> Can you explain why not letting the guest map the shared memory into its
>>> address space on its own just like any other piece of device memory is a
>>> requirement?
>>
>> It requires reconfiguration of the sensitive 2nd level page tables
>> during runtime of the guest. We are avoiding the neccessery checking and
>> synchronization measures so far which reduces code complexity further.
> 
> You mean the hypervisor needs to act when the guest maps BARs, and that
> gives the guest an attack vector?

Possibly, at least correctness issue will arise. We need to add TLB
flushes e.g., something that is not needed right now with the mappings
remaining static while a guest is running.

> 
> Don't you have to deal with that anyway, for other PCI devices?

Physical devices are presented to the guest with their BARs programmed
(as if the firmware did that already), and Jailhouse denies
reprogramming (only for the purpose of size discovery). Linux is fine
with that, and RTOSes ported to Jailhouse only become simpler.

Virtualized regions are trapped-and-emulate anyway, so no need for
reprogramming the mappings

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] Towards an ivshmem 2.0?
  2017-01-29 14:14           ` Jan Kiszka
  2017-01-30  8:02             ` Markus Armbruster
@ 2017-01-31  2:51             ` Wang, Wei W
  1 sibling, 0 replies; 29+ messages in thread
From: Wang, Wei W @ 2017-01-31  2:51 UTC (permalink / raw)
  To: Jan Kiszka, Marc-André Lureau
  Cc: Jailhouse, Markus Armbruster, qemu-devel

On Sunday, January 29, 2017 10:14 PM, Jan Kiszka wrote:
> On 2017-01-29 15:00, Marc-André Lureau wrote:
> > Hi
> >
> > On Sun, Jan 29, 2017 at 12:44 PM Jan Kiszka <jan.kiszka@web.de
> > <mailto:jan.kiszka@web.de>> wrote:
> >
> >     >> Of course, I'm careful with investing much time into expanding the
> >     >> existing, for Jailhouse possibly sufficient design if there no real
> >     >> interest in continuing the ivshmem support in QEMU - because of
> >     >> vhost-pci or other reasons. But if that interest exists, it would be
> >     >> beneficial for us to have QEMU supporting a compatible version
> >     and using
> >     >> the same guest drivers. Then I would start looking into concrete
> >     patches
> >     >> for it as well.
> >     >
> >     > Interest is difficult for me to gauge, not least because alternatives
> >     > are still being worked on.
> >
> >     I'm considering to suggest this as GSoC project now.
> >
> >
> > It's better for a student and for the community if the work get
> > accepted in the end.
> >
> > So, I think that could be an intersting GSoC (implementing your
> > ivshmem
> > 2 proposal). However, if the qemu community isn't ready to accept a
> > new ivshmem, and would rather have vhost-pci based solution, I would
> > suggest a different project (hopefully Wei Wang can help define it and mentor):
> > work on a vhost-pci using dedicated shared PCI BARs (and kernel
> > support to avoid extra copy - if I understand the extra copy situation correctly).

Thanks for the suggestion. I’m glad to help it. 

For that sort of usage (static configuration extension [1]), I’m thinking that it’s possible to build symmetric vhost-pci-net communication, as appose to “vhost-pci-net<-> virtio-net”.

> It's still open if vhost-pci can replace ivshmem (not to speak of being desirable
> for Jailhouse - I'm still studying). In that light, having both implementations
> available to do real comparisons is valuable IMHO.
> 
> That said, we will play with open cards, explain the student the situation and let
> her/him decide knowingly.
 
I think the static configuration of vhost-pci would be quite similar to your ivshmem based proposal- could be thought of as moving your proposal to the virtio device structure. Do you see any more big difference? Or is there any fundamental reason that it is not good to do that based on virtio? Thanks.

Best,
Wei

[1] static configuration extension: set the vhost-pci device via the QEMU command line (rather than hotplugging via vhost-user protocol) , and  share a piece of memory between two VMs (rather than the whole VM's memory)


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2017-01-31  2:51 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-16  8:36 [Qemu-devel] Towards an ivshmem 2.0? Jan Kiszka
2017-01-16 12:41 ` Marc-André Lureau
2017-01-16 13:10   ` Jan Kiszka
2017-01-17  9:13     ` Wang, Wei W
2017-01-17  9:46       ` Jan Kiszka
2017-01-20 11:54         ` Wang, Wei W
2017-01-20 16:37           ` Jan Kiszka
2017-01-23  3:49             ` Wang, Wei W
2017-01-23 10:14               ` Måns Rullgård
2017-01-17  9:59     ` Stefan Hajnoczi
2017-01-17 10:32       ` Jan Kiszka
2017-01-29 11:56       ` msuchanek
2017-01-30 11:25         ` Stefan Hajnoczi
2017-01-16 14:18 ` Stefan Hajnoczi
2017-01-16 14:34   ` Jan Kiszka
2017-01-17 10:00     ` Stefan Hajnoczi
2017-01-23 14:19 ` Markus Armbruster
2017-01-25  9:18   ` Jan Kiszka
2017-01-27 19:36     ` Markus Armbruster
2017-01-29  8:43       ` Jan Kiszka
2017-01-29 14:00         ` Marc-André Lureau
2017-01-29 14:14           ` Jan Kiszka
2017-01-30  8:02             ` Markus Armbruster
2017-01-30  8:05               ` Jan Kiszka
2017-01-31  2:51             ` Wang, Wei W
2017-01-30  8:00         ` Markus Armbruster
2017-01-30  8:14           ` Jan Kiszka
2017-01-30 12:19             ` Markus Armbruster
2017-01-30 15:57               ` Jan Kiszka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.