All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] Vhost-pci RFC2.0
@ 2017-04-19  6:38 Wang, Wei W
  2017-04-19  7:31 ` Marc-André Lureau
                   ` (3 more replies)
  0 siblings, 4 replies; 27+ messages in thread
From: Wang, Wei W @ 2017-04-19  6:38 UTC (permalink / raw)
  To: Marc-André Lureau, Michael S. Tsirkin, Stefan Hajnoczi,
	pbonzini, qemu-devel, virtio-dev

Hi,

We made some design changes to the original vhost-pci design, and want to open
a discussion about the latest design (labelled 2.0) and its extension (2.1).
2.0 design: One VM shares the entire memory of another VM
2.1 design: One VM uses an intermediate memory shared with another VM for
                     packet transmission.

For the convenience of discussion, I have some pictures presented at this link:
https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost-pci-rfc2.0.pdf

Fig. 1 shows the common driver frame that we want use to build the 2.0 and 2.1
design. A TX/RX engine consists of a local ring and an exotic ring.
Local ring:
1) allocated by the driver itself;
2) registered with the device (i.e. virtio_add_queue())
Exotic ring:
1) ring memory comes from the outside (of the driver), and exposed to the driver
     via a BAR MMIO;
2) does not have a registration in the device, so no ioeventfd/irqfd, configuration
registers allocated in the device

Fig. 2 shows how the driver frame is used to build the 2.0 design.
1) Asymmetric: vhost-pci-net <-> virtio-net
2) VM1 shares the entire memory of VM2, and the exotic rings are the rings
    from VM2.
3) Performance (in terms of copies between VMs):
    TX: 0-copy (packets are put to VM2's RX ring directly)
    RX: 1-copy (the green arrow line in the VM1's RX engine)

Fig. 3 shows how the driver frame is used to build the 2.1 design.
1) Symmetric: vhost-pci-net <-> vhost-pci-net
2) Share an intermediate memory, allocated by VM1's vhost-pci device,
for data exchange, and the exotic rings are built on the shared memory
3) Performance:
    TX: 1-copy
RX: 1-copy

Fig. 4 shows the inter-VM notification path for 2.0 (2.1 is similar).
The four eventfds are allocated by virtio-net, and shared with vhost-pci-net:
Uses virtio-net's TX/RX kickfd as the vhost-pci-net's RX/TX callfd
Uses virtio-net's TX/RX callfd as the vhost-pci-net's RX/TX kickfd
Example of how it works:
After packets are put into vhost-pci-net's TX, the driver kicks TX, which
causes the an interrupt associated with fd3 to be injected to virtio-net

The draft code of the 2.0 design is ready, and can be found here:
Qemu: https://github.com/wei-w-wang/vhost-pci-device
Guest driver: https://github.com/wei-w-wang/vhost-pci-driver

We tested the 2.0 implementation using the Spirent packet
generator to transmit 64B packets, the results show that the
throughput of vhost-pci reaches around 1.8Mpps, which is around
two times larger than the legacy OVS+DPDK. Also, vhost-pci shows
better scalability than OVS+DPDK.


Best,
Wei

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] Vhost-pci RFC2.0
  2017-04-19  6:38 [Qemu-devel] Vhost-pci RFC2.0 Wang, Wei W
@ 2017-04-19  7:31 ` Marc-André Lureau
  2017-04-19  8:33   ` Wei Wang
  2017-04-19  7:35 ` Jan Kiszka
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 27+ messages in thread
From: Marc-André Lureau @ 2017-04-19  7:31 UTC (permalink / raw)
  To: Wang, Wei W, Michael S. Tsirkin, Stefan Hajnoczi, pbonzini,
	qemu-devel, virtio-dev

Hi

On Wed, Apr 19, 2017 at 10:38 AM Wang, Wei W <wei.w.wang@intel.com> wrote:

> Hi,
>
> We made some design changes to the original vhost-pci design, and want to
> open
> a discussion about the latest design (labelled 2.0) and its extension
> (2.1).
> 2.0 design: One VM shares the entire memory of another VM
> 2.1 design: One VM uses an intermediate memory shared with another VM for
>                      packet transmission.
>
> For the convenience of discussion, I have some pictures presented at this
> link:
>
> *https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost-pci-rfc2.0.pdf*
> <https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost-pci-rfc2.0.pdf>
>
> Fig. 1 shows the common driver frame that we want use to build the 2.0 and
> 2.1
> design. A TX/RX engine consists of a local ring and an exotic ring.
>

Isn't "external" (or "remote") more appropriate than "exotic" ?

Local ring:
> 1) allocated by the driver itself;
> 2) registered with the device (i.e. virtio_add_queue())
> Exotic ring:
> 1) ring memory comes from the outside (of the driver), and exposed to the
> driver
>      via a BAR MMIO;
> 2) does not have a registration in the device, so no ioeventfd/irqfd,
> configuration
> registers allocated in the device
>
> Fig. 2 shows how the driver frame is used to build the 2.0 design.
> 1) Asymmetric: vhost-pci-net <-> virtio-net
> 2) VM1 shares the entire memory of VM2, and the exotic rings are the rings
>     from VM2.
> 3) Performance (in terms of copies between VMs):
>     TX: 0-copy (packets are put to VM2’s RX ring directly)
>     RX: 1-copy (the green arrow line in the VM1’s RX engine)
>

Why is the copy necessary?


> Fig. 3 shows how the driver frame is used to build the 2.1 design.
> 1) Symmetric: vhost-pci-net <-> vhost-pci-net
> 2) Share an intermediate memory, allocated by VM1’s vhost-pci device,
> for data exchange, and the exotic rings are built on the shared memory
> 3) Performance:
>     TX: 1-copy
> RX: 1-copy
>
> Fig. 4 shows the inter-VM notification path for 2.0 (2.1 is similar).
> The four eventfds are allocated by virtio-net, and shared with
> vhost-pci-net:
> Uses virtio-net’s TX/RX kickfd as the vhost-pci-net’s RX/TX callfd
> Uses virtio-net’s TX/RX callfd as the vhost-pci-net’s RX/TX kickfd
> Example of how it works:
> After packets are put into vhost-pci-net’s TX, the driver kicks TX, which
> causes the an interrupt associated with fd3 to be injected to virtio-net
>
> The draft code of the 2.0 design is ready, and can be found here:
> Qemu: *https://github.com/wei-w-wang/vhost-pci-device*
> <https://github.com/wei-w-wang/vhost-pci-device>
>

The repository contains a single big commit (
https://github.com/wei-w-wang/vhost-pci-device/commit/fa01ec5e41de176197dae505c05b659f5483187f).
Please try to provide a seperate patch or series of patch from an upstream
commit/release point.


> Guest driver: *https://github.com/wei-w-wang/vhost-pci-driver*
> <https://github.com/wei-w-wang/vhost-pci-driver>
>
> We tested the 2.0 implementation using the Spirent packet
> generator to transmit 64B packets, the results show that the
> throughput of vhost-pci reaches around 1.8Mpps, which is around
> two times larger than the legacy OVS+DPDK. Also, vhost-pci shows
> better scalability than OVS+DPDK.
>
>
> Best,
> Wei
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] Vhost-pci RFC2.0
  2017-04-19  6:38 [Qemu-devel] Vhost-pci RFC2.0 Wang, Wei W
  2017-04-19  7:31 ` Marc-André Lureau
@ 2017-04-19  7:35 ` Jan Kiszka
  2017-04-19  8:42   ` Wei Wang
  2017-04-19  9:57 ` [Qemu-devel] [virtio-dev] " Stefan Hajnoczi
  2017-05-05  4:05 ` Jason Wang
  3 siblings, 1 reply; 27+ messages in thread
From: Jan Kiszka @ 2017-04-19  7:35 UTC (permalink / raw)
  To: Wang, Wei W, Marc-André Lureau, Michael S. Tsirkin,
	Stefan Hajnoczi, pbonzini, qemu-devel, virtio-dev
  Cc: Jailhouse

On 2017-04-19 08:38, Wang, Wei W wrote:
> Hi,
>  
> We made some design changes to the original vhost-pci design, and want
> to open
> a discussion about the latest design (labelled 2.0) and its extension (2.1).
> 2.0 design: One VM shares the entire memory of another VM
> 2.1 design: One VM uses an intermediate memory shared with another VM for
>                      packet transmission.
>  
> For the convenience of discussion, I have some pictures presented at
> this link:
> _https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost-pci-rfc2.0.pdf_
>  
> Fig. 1 shows the common driver frame that we want use to build the 2.0
> and 2.1
> design. A TX/RX engine consists of a local ring and an exotic ring.
> Local ring:
> 1) allocated by the driver itself;
> 2) registered with the device (i.e. virtio_add_queue())
> Exotic ring:
> 1) ring memory comes from the outside (of the driver), and exposed to
> the driver
>      via a BAR MMIO;

Small additional requirement: In order to make this usable with
Jailhouse as well, we need [also] a side-channel configuration for the
regions, i.e. likely via a PCI capability. There are too few BARs, and
they suggest relocatablity, which is not available under Jailhouse for
simplicity reasons (IOW, the shared regions are statically mapped by the
hypervisor into the affected guest address spaces).

> 2) does not have a registration in the device, so no ioeventfd/irqfd,
> configuration
> registers allocated in the device
>  
> Fig. 2 shows how the driver frame is used to build the 2.0 design.
> 1) Asymmetric: vhost-pci-net <-> virtio-net
> 2) VM1 shares the entire memory of VM2, and the exotic rings are the rings
>     from VM2.
> 3) Performance (in terms of copies between VMs):
>     TX: 0-copy (packets are put to VM2’s RX ring directly)
>     RX: 1-copy (the green arrow line in the VM1’s RX engine)
>  
> Fig. 3 shows how the driver frame is used to build the 2.1 design.
> 1) Symmetric: vhost-pci-net <-> vhost-pci-net

This is interesting!

> 2) Share an intermediate memory, allocated by VM1’s vhost-pci device,
> for data exchange, and the exotic rings are built on the shared memory
> 3) Performance:
>     TX: 1-copy
> RX: 1-copy

I'm not yet sure I to this right: there are two different MMIO regions
involved, right? One is used for VM1's RX / VM2's TX, and the other for
the reverse path? Would allow our requirement to have those regions
mapped with asymmetric permissions (RX read-only, TX read/write).

>  
> Fig. 4 shows the inter-VM notification path for 2.0 (2.1 is similar).
> The four eventfds are allocated by virtio-net, and shared with
> vhost-pci-net:
> Uses virtio-net’s TX/RX kickfd as the vhost-pci-net’s RX/TX callfd
> Uses virtio-net’s TX/RX callfd as the vhost-pci-net’s RX/TX kickfd
> Example of how it works:
> After packets are put into vhost-pci-net’s TX, the driver kicks TX, which
> causes the an interrupt associated with fd3 to be injected to virtio-net
>  
> The draft code of the 2.0 design is ready, and can be found here:
> Qemu: _https://github.com/wei-w-wang/vhost-pci-device_
> Guest driver: _https://github.com/wei-w-wang/vhost-pci-driver_
>  
> We tested the 2.0 implementation using the Spirent packet
> generator to transmit 64B packets, the results show that the
> throughput of vhost-pci reaches around 1.8Mpps, which is around
> two times larger than the legacy OVS+DPDK. Also, vhost-pci shows
> better scalability than OVS+DPDK.
>  

Do you have numbers for the symmetric 2.1 case as well? Or is the driver
not yet ready for that yet? Otherwise, I could try to make it work over
a simplistic vhost-pci 2.1 version in Jailhouse as well. That would give
a better picture of how much additional complexity this would mean
compared to our ivshmem 2.0.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] Vhost-pci RFC2.0
  2017-04-19  7:31 ` Marc-André Lureau
@ 2017-04-19  8:33   ` Wei Wang
  0 siblings, 0 replies; 27+ messages in thread
From: Wei Wang @ 2017-04-19  8:33 UTC (permalink / raw)
  To: Marc-André Lureau, Michael S. Tsirkin, Stefan Hajnoczi,
	pbonzini, qemu-devel, virtio-dev

On 04/19/2017 03:31 PM, Marc-André Lureau wrote:
> Hi
>
> On Wed, Apr 19, 2017 at 10:38 AM Wang, Wei W <wei.w.wang@intel.com 
> <mailto:wei.w.wang@intel.com>> wrote:
>
>     Hi,
>     We made some design changes to the original vhost-pci design, and
>     want to open
>     a discussion about the latest design (labelled 2.0) and its
>     extension (2.1).
>     2.0 design: One VM shares the entire memory of another VM
>     2.1 design: One VM uses an intermediate memory shared with another
>     VM for
>                          packet transmission.
>     For the convenience of discussion, I have some pictures presented
>     at this link:
>     _https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost-pci-rfc2.0.pdf_
>     Fig. 1 shows the common driver frame that we want use to build the
>     2.0 and 2.1
>     design. A TX/RX engine consists of a local ring and an exotic ring.
>
>
> Isn't "external" (or "remote") more appropriate than "exotic" ?

OK, probably we can use "remote" here.

>
>     Local ring:
>     1) allocated by the driver itself;
>     2) registered with the device (i.e. virtio_add_queue())
>     Exotic ring:
>     1) ring memory comes from the outside (of the driver), and exposed
>     to the driver
>          via a BAR MMIO;
>     2) does not have a registration in the device, so no
>     ioeventfd/irqfd, configuration
>     registers allocated in the device
>     Fig. 2 shows how the driver frame is used to build the 2.0 design.
>     1) Asymmetric: vhost-pci-net <-> virtio-net
>     2) VM1 shares the entire memory of VM2, and the exotic rings are
>     the rings
>         from VM2.
>     3) Performance (in terms of copies between VMs):
>         TX: 0-copy (packets are put to VM2’s RX ring directly)
>         RX: 1-copy (the green arrow line in the VM1’s RX engine)
>
>
> Why is the copy necessary?

Because the packet from the remote ring can't be delivered to the
network stack directly. To be more precise,
1)  The buffer from the remote ring is not allocated by the guest 
driver. If the
      buffer is directly delivered to the network stack, the network 
stack will free
      the buffer that is not allocated by the guest;
2) When we think about the vring operation, after getting the buffer 
from the
     avail ring, we need to put back the used buffer to the used ring to 
tell the
     other end that the buffer has been used. The network stack won't do 
this
     operation.

So, based on the two points. I think we need to use a local ring, and 
copy the
packet to the buffer from the local ring (i.e. buffer memory allocated 
by the
guest driver), and the driver will do the "give back used buffer" operation
as explained in 2).


>     Fig. 3 shows how the driver frame is used to build the 2.1 design.
>     1) Symmetric: vhost-pci-net <-> vhost-pci-net
>     2) Share an intermediate memory, allocated by VM1’s vhost-pci device,
>     for data exchange, and the exotic rings are built on the shared memory
>     3) Performance:
>         TX: 1-copy
>     RX: 1-copy
>     Fig. 4 shows the inter-VM notification path for 2.0 (2.1 is similar).
>     The four eventfds are allocated by virtio-net, and shared with
>     vhost-pci-net:
>     Uses virtio-net’s TX/RX kickfd as the vhost-pci-net’s RX/TX callfd
>     Uses virtio-net’s TX/RX callfd as the vhost-pci-net’s RX/TX kickfd
>     Example of how it works:
>     After packets are put into vhost-pci-net’s TX, the driver kicks
>     TX, which
>     causes the an interrupt associated with fd3 to be injected to
>     virtio-net
>     The draft code of the 2.0 design is ready, and can be found here:
>     Qemu: _https://github.com/wei-w-wang/vhost-pci-device_
>
>
> The repository contains a single big commit 
> (https://github.com/wei-w-wang/vhost-pci-device/commit/fa01ec5e41de176197dae505c05b659f5483187f). 
> Please try to provide a seperate patch or series of patch from an 
> upstream commit/release point.

It's the test-able version of the 2.0 design. I will separate it.
If possible, hope we can review the design first, especially the common
driver frame. Then I can make the related changes from the
discussion, and post out the patch series.

Best,
Wei

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] Vhost-pci RFC2.0
  2017-04-19  7:35 ` Jan Kiszka
@ 2017-04-19  8:42   ` Wei Wang
  2017-04-19  8:49     ` [Qemu-devel] [virtio-dev] " Jan Kiszka
  0 siblings, 1 reply; 27+ messages in thread
From: Wei Wang @ 2017-04-19  8:42 UTC (permalink / raw)
  To: Jan Kiszka, Marc-André Lureau, Michael S. Tsirkin,
	Stefan Hajnoczi, pbonzini, qemu-devel, virtio-dev
  Cc: Jailhouse

On 04/19/2017 03:35 PM, Jan Kiszka wrote:
> On 2017-04-19 08:38, Wang, Wei W wrote:
>> Hi,
>>   
>> We made some design changes to the original vhost-pci design, and want
>> to open
>> a discussion about the latest design (labelled 2.0) and its extension (2.1).
>> 2.0 design: One VM shares the entire memory of another VM
>> 2.1 design: One VM uses an intermediate memory shared with another VM for
>>                       packet transmission.
>>   
>> For the convenience of discussion, I have some pictures presented at
>> this link:
>> _https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost-pci-rfc2.0.pdf_
>>   
>> Fig. 1 shows the common driver frame that we want use to build the 2.0
>> and 2.1
>> design. A TX/RX engine consists of a local ring and an exotic ring.
>> Local ring:
>> 1) allocated by the driver itself;
>> 2) registered with the device (i.e. virtio_add_queue())
>> Exotic ring:
>> 1) ring memory comes from the outside (of the driver), and exposed to
>> the driver
>>       via a BAR MMIO;
> Small additional requirement: In order to make this usable with
> Jailhouse as well, we need [also] a side-channel configuration for the
> regions, i.e. likely via a PCI capability. There are too few BARs, and
> they suggest relocatablity, which is not available under Jailhouse for
> simplicity reasons (IOW, the shared regions are statically mapped by the
> hypervisor into the affected guest address spaces).
What kind of configuration would you need for the regions?
I think adding a PCI capability should be easy.

>> 2) does not have a registration in the device, so no ioeventfd/irqfd,
>> configuration
>> registers allocated in the device
>>   
>> Fig. 2 shows how the driver frame is used to build the 2.0 design.
>> 1) Asymmetric: vhost-pci-net <-> virtio-net
>> 2) VM1 shares the entire memory of VM2, and the exotic rings are the rings
>>      from VM2.
>> 3) Performance (in terms of copies between VMs):
>>      TX: 0-copy (packets are put to VM2’s RX ring directly)
>>      RX: 1-copy (the green arrow line in the VM1’s RX engine)
>>   
>> Fig. 3 shows how the driver frame is used to build the 2.1 design.
>> 1) Symmetric: vhost-pci-net <-> vhost-pci-net
> This is interesting!
>
>> 2) Share an intermediate memory, allocated by VM1’s vhost-pci device,
>> for data exchange, and the exotic rings are built on the shared memory
>> 3) Performance:
>>      TX: 1-copy
>> RX: 1-copy
> I'm not yet sure I to this right: there are two different MMIO regions
> involved, right? One is used for VM1's RX / VM2's TX, and the other for
> the reverse path? Would allow our requirement to have those regions
> mapped with asymmetric permissions (RX read-only, TX read/write).
The design presented here intends to use only one BAR to expose
both TX and RX. The two VMs share an intermediate memory
here, why couldn't we give the same permission to TX and RX?


>>   
>> Fig. 4 shows the inter-VM notification path for 2.0 (2.1 is similar).
>> The four eventfds are allocated by virtio-net, and shared with
>> vhost-pci-net:
>> Uses virtio-net’s TX/RX kickfd as the vhost-pci-net’s RX/TX callfd
>> Uses virtio-net’s TX/RX callfd as the vhost-pci-net’s RX/TX kickfd
>> Example of how it works:
>> After packets are put into vhost-pci-net’s TX, the driver kicks TX, which
>> causes the an interrupt associated with fd3 to be injected to virtio-net
>>   
>> The draft code of the 2.0 design is ready, and can be found here:
>> Qemu: _https://github.com/wei-w-wang/vhost-pci-device_
>> Guest driver: _https://github.com/wei-w-wang/vhost-pci-driver_
>>   
>> We tested the 2.0 implementation using the Spirent packet
>> generator to transmit 64B packets, the results show that the
>> throughput of vhost-pci reaches around 1.8Mpps, which is around
>> two times larger than the legacy OVS+DPDK. Also, vhost-pci shows
>> better scalability than OVS+DPDK.
>>   
> Do you have numbers for the symmetric 2.1 case as well? Or is the driver
> not yet ready for that yet? Otherwise, I could try to make it work over
> a simplistic vhost-pci 2.1 version in Jailhouse as well. That would give
> a better picture of how much additional complexity this would mean
> compared to our ivshmem 2.0.
>

Implementation of 2.1 is not ready yet. We can extend it to 2.1 after
the common driver frame is reviewed.


Best,
Wei

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: Vhost-pci RFC2.0
  2017-04-19  8:42   ` Wei Wang
@ 2017-04-19  8:49     ` Jan Kiszka
  2017-04-19  9:09       ` Wei Wang
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Kiszka @ 2017-04-19  8:49 UTC (permalink / raw)
  To: Wei Wang, Marc-André Lureau, Michael S. Tsirkin,
	Stefan Hajnoczi, pbonzini, qemu-devel, virtio-dev
  Cc: Jailhouse

On 2017-04-19 10:42, Wei Wang wrote:
> On 04/19/2017 03:35 PM, Jan Kiszka wrote:
>> On 2017-04-19 08:38, Wang, Wei W wrote:
>>> Hi,
>>>   We made some design changes to the original vhost-pci design, and want
>>> to open
>>> a discussion about the latest design (labelled 2.0) and its extension
>>> (2.1).
>>> 2.0 design: One VM shares the entire memory of another VM
>>> 2.1 design: One VM uses an intermediate memory shared with another VM
>>> for
>>>                       packet transmission.
>>>   For the convenience of discussion, I have some pictures presented at
>>> this link:
>>> _https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost-pci-rfc2.0.pdf_
>>>
>>>   Fig. 1 shows the common driver frame that we want use to build the 2.0
>>> and 2.1
>>> design. A TX/RX engine consists of a local ring and an exotic ring.
>>> Local ring:
>>> 1) allocated by the driver itself;
>>> 2) registered with the device (i.e. virtio_add_queue())
>>> Exotic ring:
>>> 1) ring memory comes from the outside (of the driver), and exposed to
>>> the driver
>>>       via a BAR MMIO;
>> Small additional requirement: In order to make this usable with
>> Jailhouse as well, we need [also] a side-channel configuration for the
>> regions, i.e. likely via a PCI capability. There are too few BARs, and
>> they suggest relocatablity, which is not available under Jailhouse for
>> simplicity reasons (IOW, the shared regions are statically mapped by the
>> hypervisor into the affected guest address spaces).
> What kind of configuration would you need for the regions?
> I think adding a PCI capability should be easy.

Basically address and size, see
https://github.com/siemens/jailhouse/blob/wip/ivshmem2/Documentation/ivshmem-v2-specification.md#vendor-specific-capability-id-09h

> 
>>> 2) does not have a registration in the device, so no ioeventfd/irqfd,
>>> configuration
>>> registers allocated in the device
>>>   Fig. 2 shows how the driver frame is used to build the 2.0 design.
>>> 1) Asymmetric: vhost-pci-net <-> virtio-net
>>> 2) VM1 shares the entire memory of VM2, and the exotic rings are the
>>> rings
>>>      from VM2.
>>> 3) Performance (in terms of copies between VMs):
>>>      TX: 0-copy (packets are put to VM2’s RX ring directly)
>>>      RX: 1-copy (the green arrow line in the VM1’s RX engine)
>>>   Fig. 3 shows how the driver frame is used to build the 2.1 design.
>>> 1) Symmetric: vhost-pci-net <-> vhost-pci-net
>> This is interesting!
>>
>>> 2) Share an intermediate memory, allocated by VM1’s vhost-pci device,
>>> for data exchange, and the exotic rings are built on the shared memory
>>> 3) Performance:
>>>      TX: 1-copy
>>> RX: 1-copy
>> I'm not yet sure I to this right: there are two different MMIO regions
>> involved, right? One is used for VM1's RX / VM2's TX, and the other for
>> the reverse path? Would allow our requirement to have those regions
>> mapped with asymmetric permissions (RX read-only, TX read/write).
> The design presented here intends to use only one BAR to expose
> both TX and RX. The two VMs share an intermediate memory
> here, why couldn't we give the same permission to TX and RX?
> 

For security and/or safety reasons: the TX side can then safely prepare
and sign a message in-place because the RX side cannot mess around with
it while not yet being signed (or check-summed). Saves one copy from a
secure place into the shared memory.

> 
>>>   Fig. 4 shows the inter-VM notification path for 2.0 (2.1 is similar).
>>> The four eventfds are allocated by virtio-net, and shared with
>>> vhost-pci-net:
>>> Uses virtio-net’s TX/RX kickfd as the vhost-pci-net’s RX/TX callfd
>>> Uses virtio-net’s TX/RX callfd as the vhost-pci-net’s RX/TX kickfd
>>> Example of how it works:
>>> After packets are put into vhost-pci-net’s TX, the driver kicks TX,
>>> which
>>> causes the an interrupt associated with fd3 to be injected to virtio-net
>>>   The draft code of the 2.0 design is ready, and can be found here:
>>> Qemu: _https://github.com/wei-w-wang/vhost-pci-device_
>>> Guest driver: _https://github.com/wei-w-wang/vhost-pci-driver_
>>>   We tested the 2.0 implementation using the Spirent packet
>>> generator to transmit 64B packets, the results show that the
>>> throughput of vhost-pci reaches around 1.8Mpps, which is around
>>> two times larger than the legacy OVS+DPDK. Also, vhost-pci shows
>>> better scalability than OVS+DPDK.
>>>   
>> Do you have numbers for the symmetric 2.1 case as well? Or is the driver
>> not yet ready for that yet? Otherwise, I could try to make it work over
>> a simplistic vhost-pci 2.1 version in Jailhouse as well. That would give
>> a better picture of how much additional complexity this would mean
>> compared to our ivshmem 2.0.
>>
> 
> Implementation of 2.1 is not ready yet. We can extend it to 2.1 after
> the common driver frame is reviewed.

Can you you assess the needed effort?

For us, this is a critical feature, because we need to decide if
vhost-pci can be an option at all. In fact, the "exotic ring" will be
the only way to provide secure inter-partition communication on Jailhouse.

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: Vhost-pci RFC2.0
  2017-04-19  8:49     ` [Qemu-devel] [virtio-dev] " Jan Kiszka
@ 2017-04-19  9:09       ` Wei Wang
  2017-04-19  9:31         ` Jan Kiszka
  0 siblings, 1 reply; 27+ messages in thread
From: Wei Wang @ 2017-04-19  9:09 UTC (permalink / raw)
  To: Jan Kiszka, Marc-André Lureau, Michael S. Tsirkin,
	Stefan Hajnoczi, pbonzini, qemu-devel, virtio-dev
  Cc: Jailhouse

On 04/19/2017 04:49 PM, Jan Kiszka wrote:
> On 2017-04-19 10:42, Wei Wang wrote:
>> On 04/19/2017 03:35 PM, Jan Kiszka wrote:
>>> On 2017-04-19 08:38, Wang, Wei W wrote:
>>>> Hi,
>>>>    We made some design changes to the original vhost-pci design, and want
>>>> to open
>>>> a discussion about the latest design (labelled 2.0) and its extension
>>>> (2.1).
>>>> 2.0 design: One VM shares the entire memory of another VM
>>>> 2.1 design: One VM uses an intermediate memory shared with another VM
>>>> for
>>>>                        packet transmission.
>>>>    For the convenience of discussion, I have some pictures presented at
>>>> this link:
>>>> _https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost-pci-rfc2.0.pdf_
>>>>
>>>>    Fig. 1 shows the common driver frame that we want use to build the 2.0
>>>> and 2.1
>>>> design. A TX/RX engine consists of a local ring and an exotic ring.
>>>> Local ring:
>>>> 1) allocated by the driver itself;
>>>> 2) registered with the device (i.e. virtio_add_queue())
>>>> Exotic ring:
>>>> 1) ring memory comes from the outside (of the driver), and exposed to
>>>> the driver
>>>>        via a BAR MMIO;
>>> Small additional requirement: In order to make this usable with
>>> Jailhouse as well, we need [also] a side-channel configuration for the
>>> regions, i.e. likely via a PCI capability. There are too few BARs, and
>>> they suggest relocatablity, which is not available under Jailhouse for
>>> simplicity reasons (IOW, the shared regions are statically mapped by the
>>> hypervisor into the affected guest address spaces).
>> What kind of configuration would you need for the regions?
>> I think adding a PCI capability should be easy.
> Basically address and size, see
> https://github.com/siemens/jailhouse/blob/wip/ivshmem2/Documentation/ivshmem-v2-specification.md#vendor-specific-capability-id-09h
Got it, thanks. That should be easy to add to 2.1.

>>>> 2) does not have a registration in the device, so no ioeventfd/irqfd,
>>>> configuration
>>>> registers allocated in the device
>>>>    Fig. 2 shows how the driver frame is used to build the 2.0 design.
>>>> 1) Asymmetric: vhost-pci-net <-> virtio-net
>>>> 2) VM1 shares the entire memory of VM2, and the exotic rings are the
>>>> rings
>>>>       from VM2.
>>>> 3) Performance (in terms of copies between VMs):
>>>>       TX: 0-copy (packets are put to VM2’s RX ring directly)
>>>>       RX: 1-copy (the green arrow line in the VM1’s RX engine)
>>>>    Fig. 3 shows how the driver frame is used to build the 2.1 design.
>>>> 1) Symmetric: vhost-pci-net <-> vhost-pci-net
>>> This is interesting!
>>>
>>>> 2) Share an intermediate memory, allocated by VM1’s vhost-pci device,
>>>> for data exchange, and the exotic rings are built on the shared memory
>>>> 3) Performance:
>>>>       TX: 1-copy
>>>> RX: 1-copy
>>> I'm not yet sure I to this right: there are two different MMIO regions
>>> involved, right? One is used for VM1's RX / VM2's TX, and the other for
>>> the reverse path? Would allow our requirement to have those regions
>>> mapped with asymmetric permissions (RX read-only, TX read/write).
>> The design presented here intends to use only one BAR to expose
>> both TX and RX. The two VMs share an intermediate memory
>> here, why couldn't we give the same permission to TX and RX?
>>
> For security and/or safety reasons: the TX side can then safely prepare
> and sign a message in-place because the RX side cannot mess around with
> it while not yet being signed (or check-summed). Saves one copy from a
> secure place into the shared memory.

If we allow guest1 to write to RX, what safety issue would it cause to
guest2?


>>>>    Fig. 4 shows the inter-VM notification path for 2.0 (2.1 is similar).
>>>> The four eventfds are allocated by virtio-net, and shared with
>>>> vhost-pci-net:
>>>> Uses virtio-net’s TX/RX kickfd as the vhost-pci-net’s RX/TX callfd
>>>> Uses virtio-net’s TX/RX callfd as the vhost-pci-net’s RX/TX kickfd
>>>> Example of how it works:
>>>> After packets are put into vhost-pci-net’s TX, the driver kicks TX,
>>>> which
>>>> causes the an interrupt associated with fd3 to be injected to virtio-net
>>>>    The draft code of the 2.0 design is ready, and can be found here:
>>>> Qemu: _https://github.com/wei-w-wang/vhost-pci-device_
>>>> Guest driver: _https://github.com/wei-w-wang/vhost-pci-driver_
>>>>    We tested the 2.0 implementation using the Spirent packet
>>>> generator to transmit 64B packets, the results show that the
>>>> throughput of vhost-pci reaches around 1.8Mpps, which is around
>>>> two times larger than the legacy OVS+DPDK. Also, vhost-pci shows
>>>> better scalability than OVS+DPDK.
>>>>    
>>> Do you have numbers for the symmetric 2.1 case as well? Or is the driver
>>> not yet ready for that yet? Otherwise, I could try to make it work over
>>> a simplistic vhost-pci 2.1 version in Jailhouse as well. That would give
>>> a better picture of how much additional complexity this would mean
>>> compared to our ivshmem 2.0.
>>>
>> Implementation of 2.1 is not ready yet. We can extend it to 2.1 after
>> the common driver frame is reviewed.
> Can you you assess the needed effort?
>
> For us, this is a critical feature, because we need to decide if
> vhost-pci can be an option at all. In fact, the "exotic ring" will be
> the only way to provide secure inter-partition communication on Jailhouse.
>
If what is here for 2.0 is suitable to be upstream-ed, I think it will 
be easy
to extend it to 2.1 (probably within 1 month).

Best,
Wei

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: Vhost-pci RFC2.0
  2017-04-19  9:09       ` Wei Wang
@ 2017-04-19  9:31         ` Jan Kiszka
  2017-04-19 10:02           ` Wei Wang
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Kiszka @ 2017-04-19  9:31 UTC (permalink / raw)
  To: Wei Wang, Marc-André Lureau, Michael S. Tsirkin,
	Stefan Hajnoczi, pbonzini, qemu-devel, virtio-dev
  Cc: Jailhouse

On 2017-04-19 11:09, Wei Wang wrote:
> On 04/19/2017 04:49 PM, Jan Kiszka wrote:
>> On 2017-04-19 10:42, Wei Wang wrote:
>>> On 04/19/2017 03:35 PM, Jan Kiszka wrote:
>>>> On 2017-04-19 08:38, Wang, Wei W wrote:
>>>>> Hi,
>>>>>    We made some design changes to the original vhost-pci design,
>>>>> and want
>>>>> to open
>>>>> a discussion about the latest design (labelled 2.0) and its extension
>>>>> (2.1).
>>>>> 2.0 design: One VM shares the entire memory of another VM
>>>>> 2.1 design: One VM uses an intermediate memory shared with another VM
>>>>> for
>>>>>                        packet transmission.
>>>>>    For the convenience of discussion, I have some pictures
>>>>> presented at
>>>>> this link:
>>>>> _https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost-pci-rfc2.0.pdf_
>>>>>
>>>>>
>>>>>    Fig. 1 shows the common driver frame that we want use to build
>>>>> the 2.0
>>>>> and 2.1
>>>>> design. A TX/RX engine consists of a local ring and an exotic ring.
>>>>> Local ring:
>>>>> 1) allocated by the driver itself;
>>>>> 2) registered with the device (i.e. virtio_add_queue())
>>>>> Exotic ring:
>>>>> 1) ring memory comes from the outside (of the driver), and exposed to
>>>>> the driver
>>>>>        via a BAR MMIO;
>>>> Small additional requirement: In order to make this usable with
>>>> Jailhouse as well, we need [also] a side-channel configuration for the
>>>> regions, i.e. likely via a PCI capability. There are too few BARs, and
>>>> they suggest relocatablity, which is not available under Jailhouse for
>>>> simplicity reasons (IOW, the shared regions are statically mapped by
>>>> the
>>>> hypervisor into the affected guest address spaces).
>>> What kind of configuration would you need for the regions?
>>> I think adding a PCI capability should be easy.
>> Basically address and size, see
>> https://github.com/siemens/jailhouse/blob/wip/ivshmem2/Documentation/ivshmem-v2-specification.md#vendor-specific-capability-id-09h
>>
> Got it, thanks. That should be easy to add to 2.1.
> 
>>>>> 2) does not have a registration in the device, so no ioeventfd/irqfd,
>>>>> configuration
>>>>> registers allocated in the device
>>>>>    Fig. 2 shows how the driver frame is used to build the 2.0 design.
>>>>> 1) Asymmetric: vhost-pci-net <-> virtio-net
>>>>> 2) VM1 shares the entire memory of VM2, and the exotic rings are the
>>>>> rings
>>>>>       from VM2.
>>>>> 3) Performance (in terms of copies between VMs):
>>>>>       TX: 0-copy (packets are put to VM2’s RX ring directly)
>>>>>       RX: 1-copy (the green arrow line in the VM1’s RX engine)
>>>>>    Fig. 3 shows how the driver frame is used to build the 2.1 design.
>>>>> 1) Symmetric: vhost-pci-net <-> vhost-pci-net
>>>> This is interesting!
>>>>
>>>>> 2) Share an intermediate memory, allocated by VM1’s vhost-pci device,
>>>>> for data exchange, and the exotic rings are built on the shared memory
>>>>> 3) Performance:
>>>>>       TX: 1-copy
>>>>> RX: 1-copy
>>>> I'm not yet sure I to this right: there are two different MMIO regions
>>>> involved, right? One is used for VM1's RX / VM2's TX, and the other for
>>>> the reverse path? Would allow our requirement to have those regions
>>>> mapped with asymmetric permissions (RX read-only, TX read/write).
>>> The design presented here intends to use only one BAR to expose
>>> both TX and RX. The two VMs share an intermediate memory
>>> here, why couldn't we give the same permission to TX and RX?
>>>
>> For security and/or safety reasons: the TX side can then safely prepare
>> and sign a message in-place because the RX side cannot mess around with
>> it while not yet being signed (or check-summed). Saves one copy from a
>> secure place into the shared memory.
> 
> If we allow guest1 to write to RX, what safety issue would it cause to
> guest2?

This way, guest1 could trick guest2, in a race condition, to sign a
modified message instead of the original one.

> 
> 
>>>>>    Fig. 4 shows the inter-VM notification path for 2.0 (2.1 is
>>>>> similar).
>>>>> The four eventfds are allocated by virtio-net, and shared with
>>>>> vhost-pci-net:
>>>>> Uses virtio-net’s TX/RX kickfd as the vhost-pci-net’s RX/TX callfd
>>>>> Uses virtio-net’s TX/RX callfd as the vhost-pci-net’s RX/TX kickfd
>>>>> Example of how it works:
>>>>> After packets are put into vhost-pci-net’s TX, the driver kicks TX,
>>>>> which
>>>>> causes the an interrupt associated with fd3 to be injected to
>>>>> virtio-net
>>>>>    The draft code of the 2.0 design is ready, and can be found here:
>>>>> Qemu: _https://github.com/wei-w-wang/vhost-pci-device_
>>>>> Guest driver: _https://github.com/wei-w-wang/vhost-pci-driver_
>>>>>    We tested the 2.0 implementation using the Spirent packet
>>>>> generator to transmit 64B packets, the results show that the
>>>>> throughput of vhost-pci reaches around 1.8Mpps, which is around
>>>>> two times larger than the legacy OVS+DPDK. Also, vhost-pci shows
>>>>> better scalability than OVS+DPDK.
>>>>>    
>>>> Do you have numbers for the symmetric 2.1 case as well? Or is the
>>>> driver
>>>> not yet ready for that yet? Otherwise, I could try to make it work over
>>>> a simplistic vhost-pci 2.1 version in Jailhouse as well. That would
>>>> give
>>>> a better picture of how much additional complexity this would mean
>>>> compared to our ivshmem 2.0.
>>>>
>>> Implementation of 2.1 is not ready yet. We can extend it to 2.1 after
>>> the common driver frame is reviewed.
>> Can you you assess the needed effort?
>>
>> For us, this is a critical feature, because we need to decide if
>> vhost-pci can be an option at all. In fact, the "exotic ring" will be
>> the only way to provide secure inter-partition communication on
>> Jailhouse.
>>
> If what is here for 2.0 is suitable to be upstream-ed, I think it will
> be easy
> to extend it to 2.1 (probably within 1 month).

Unfortunate ordering here, though. Specifically if we need to modify
existing things instead of just adding something. We will need 2.1 prior
to committing to 2.0 being the right thing.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Vhost-pci RFC2.0
  2017-04-19  6:38 [Qemu-devel] Vhost-pci RFC2.0 Wang, Wei W
  2017-04-19  7:31 ` Marc-André Lureau
  2017-04-19  7:35 ` Jan Kiszka
@ 2017-04-19  9:57 ` Stefan Hajnoczi
  2017-04-19 10:42   ` Wei Wang
  2017-05-05  4:05 ` Jason Wang
  3 siblings, 1 reply; 27+ messages in thread
From: Stefan Hajnoczi @ 2017-04-19  9:57 UTC (permalink / raw)
  To: Wang, Wei W
  Cc: Marc-André Lureau, Michael S. Tsirkin, Stefan Hajnoczi,
	pbonzini, qemu-devel, virtio-dev

[-- Attachment #1: Type: text/plain, Size: 918 bytes --]

On Wed, Apr 19, 2017 at 06:38:11AM +0000, Wang, Wei W wrote:
> We made some design changes to the original vhost-pci design, and want to open
> a discussion about the latest design (labelled 2.0) and its extension (2.1).
> 2.0 design: One VM shares the entire memory of another VM
> 2.1 design: One VM uses an intermediate memory shared with another VM for
>                      packet transmission.

Hi,
Can you talk a bit about the motivation for the 2.x design and major
changes compared to 1.x?

What is the relationship between 2.0 and 2.1?  Do you plan to upstream
both?

> For the convenience of discussion, I have some pictures presented at this link:
> https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost-pci-rfc2.0.pdf
> 
> Fig. 1 shows the common driver frame that we want use to build the 2.0 and 2.1

What does "frame" mean in this context?  "Framework" or "design".

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: Vhost-pci RFC2.0
  2017-04-19  9:31         ` Jan Kiszka
@ 2017-04-19 10:02           ` Wei Wang
  2017-04-19 10:36             ` Jan Kiszka
  0 siblings, 1 reply; 27+ messages in thread
From: Wei Wang @ 2017-04-19 10:02 UTC (permalink / raw)
  To: Jan Kiszka, Marc-André Lureau, Michael S. Tsirkin,
	Stefan Hajnoczi, pbonzini, qemu-devel, virtio-dev
  Cc: Jailhouse

On 04/19/2017 05:31 PM, Jan Kiszka wrote:
> On 2017-04-19 11:09, Wei Wang wrote:
>> On 04/19/2017 04:49 PM, Jan Kiszka wrote:
>>> On 2017-04-19 10:42, Wei Wang wrote:
>>>> On 04/19/2017 03:35 PM, Jan Kiszka wrote:
>>>>> On 2017-04-19 08:38, Wang, Wei W wrote:
>>>>>> Hi,
>>>>>>     We made some design changes to the original vhost-pci design,
>>>>>> and want
>>>>>> to open
>>>>>> a discussion about the latest design (labelled 2.0) and its extension
>>>>>> (2.1).
>>>>>> 2.0 design: One VM shares the entire memory of another VM
>>>>>> 2.1 design: One VM uses an intermediate memory shared with another VM
>>>>>> for
>>>>>>                         packet transmission.
>>>>>>     For the convenience of discussion, I have some pictures
>>>>>> presented at
>>>>>> this link:
>>>>>> _https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost-pci-rfc2.0.pdf_
>>>>>>
>>>>>>
>>>>>>     Fig. 1 shows the common driver frame that we want use to build
>>>>>> the 2.0
>>>>>> and 2.1
>>>>>> design. A TX/RX engine consists of a local ring and an exotic ring.
>>>>>> Local ring:
>>>>>> 1) allocated by the driver itself;
>>>>>> 2) registered with the device (i.e. virtio_add_queue())
>>>>>> Exotic ring:
>>>>>> 1) ring memory comes from the outside (of the driver), and exposed to
>>>>>> the driver
>>>>>>         via a BAR MMIO;
>>>>> Small additional requirement: In order to make this usable with
>>>>> Jailhouse as well, we need [also] a side-channel configuration for the
>>>>> regions, i.e. likely via a PCI capability. There are too few BARs, and
>>>>> they suggest relocatablity, which is not available under Jailhouse for
>>>>> simplicity reasons (IOW, the shared regions are statically mapped by
>>>>> the
>>>>> hypervisor into the affected guest address spaces).
>>>> What kind of configuration would you need for the regions?
>>>> I think adding a PCI capability should be easy.
>>> Basically address and size, see
>>> https://github.com/siemens/jailhouse/blob/wip/ivshmem2/Documentation/ivshmem-v2-specification.md#vendor-specific-capability-id-09h
>>>
>> Got it, thanks. That should be easy to add to 2.1.
>>
>>>>>> 2) does not have a registration in the device, so no ioeventfd/irqfd,
>>>>>> configuration
>>>>>> registers allocated in the device
>>>>>>     Fig. 2 shows how the driver frame is used to build the 2.0 design.
>>>>>> 1) Asymmetric: vhost-pci-net <-> virtio-net
>>>>>> 2) VM1 shares the entire memory of VM2, and the exotic rings are the
>>>>>> rings
>>>>>>        from VM2.
>>>>>> 3) Performance (in terms of copies between VMs):
>>>>>>        TX: 0-copy (packets are put to VM2’s RX ring directly)
>>>>>>        RX: 1-copy (the green arrow line in the VM1’s RX engine)
>>>>>>     Fig. 3 shows how the driver frame is used to build the 2.1 design.
>>>>>> 1) Symmetric: vhost-pci-net <-> vhost-pci-net
>>>>> This is interesting!
>>>>>
>>>>>> 2) Share an intermediate memory, allocated by VM1’s vhost-pci device,
>>>>>> for data exchange, and the exotic rings are built on the shared memory
>>>>>> 3) Performance:
>>>>>>        TX: 1-copy
>>>>>> RX: 1-copy
>>>>> I'm not yet sure I to this right: there are two different MMIO regions
>>>>> involved, right? One is used for VM1's RX / VM2's TX, and the other for
>>>>> the reverse path? Would allow our requirement to have those regions
>>>>> mapped with asymmetric permissions (RX read-only, TX read/write).
>>>> The design presented here intends to use only one BAR to expose
>>>> both TX and RX. The two VMs share an intermediate memory
>>>> here, why couldn't we give the same permission to TX and RX?
>>>>
>>> For security and/or safety reasons: the TX side can then safely prepare
>>> and sign a message in-place because the RX side cannot mess around with
>>> it while not yet being signed (or check-summed). Saves one copy from a
>>> secure place into the shared memory.
>> If we allow guest1 to write to RX, what safety issue would it cause to
>> guest2?
> This way, guest1 could trick guest2, in a race condition, to sign a
> modified message instead of the original one.
>
Just align the context that we are talking about: RX is the intermediate
shared ring that guest1 uses to receive packets and guest2 uses to send
packet.

Seems the issue is that guest1 will receive a hacked message from RX
(modified by itself). How would it affect guest2?

>>
>>>>>>     Fig. 4 shows the inter-VM notification path for 2.0 (2.1 is
>>>>>> similar).
>>>>>> The four eventfds are allocated by virtio-net, and shared with
>>>>>> vhost-pci-net:
>>>>>> Uses virtio-net’s TX/RX kickfd as the vhost-pci-net’s RX/TX callfd
>>>>>> Uses virtio-net’s TX/RX callfd as the vhost-pci-net’s RX/TX kickfd
>>>>>> Example of how it works:
>>>>>> After packets are put into vhost-pci-net’s TX, the driver kicks TX,
>>>>>> which
>>>>>> causes the an interrupt associated with fd3 to be injected to
>>>>>> virtio-net
>>>>>>     The draft code of the 2.0 design is ready, and can be found here:
>>>>>> Qemu: _https://github.com/wei-w-wang/vhost-pci-device_
>>>>>> Guest driver: _https://github.com/wei-w-wang/vhost-pci-driver_
>>>>>>     We tested the 2.0 implementation using the Spirent packet
>>>>>> generator to transmit 64B packets, the results show that the
>>>>>> throughput of vhost-pci reaches around 1.8Mpps, which is around
>>>>>> two times larger than the legacy OVS+DPDK. Also, vhost-pci shows
>>>>>> better scalability than OVS+DPDK.
>>>>>>     
>>>>> Do you have numbers for the symmetric 2.1 case as well? Or is the
>>>>> driver
>>>>> not yet ready for that yet? Otherwise, I could try to make it work over
>>>>> a simplistic vhost-pci 2.1 version in Jailhouse as well. That would
>>>>> give
>>>>> a better picture of how much additional complexity this would mean
>>>>> compared to our ivshmem 2.0.
>>>>>
>>>> Implementation of 2.1 is not ready yet. We can extend it to 2.1 after
>>>> the common driver frame is reviewed.
>>> Can you you assess the needed effort?
>>>
>>> For us, this is a critical feature, because we need to decide if
>>> vhost-pci can be an option at all. In fact, the "exotic ring" will be
>>> the only way to provide secure inter-partition communication on
>>> Jailhouse.
>>>
>> If what is here for 2.0 is suitable to be upstream-ed, I think it will
>> be easy
>> to extend it to 2.1 (probably within 1 month).
> Unfortunate ordering here, though. Specifically if we need to modify
> existing things instead of just adding something. We will need 2.1 prior
> to committing to 2.0 being the right thing.
>

If you want, we can get the common part of design ready first,
then we can start to build on the common part at the same time.
The draft code of 2.0 is ready. I can clean it up, making it easier for
us to continue and change.

Best,
Wei

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: Vhost-pci RFC2.0
  2017-04-19 10:02           ` Wei Wang
@ 2017-04-19 10:36             ` Jan Kiszka
  2017-04-19 11:11               ` Wei Wang
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Kiszka @ 2017-04-19 10:36 UTC (permalink / raw)
  To: Wei Wang, Marc-André Lureau, Michael S. Tsirkin,
	Stefan Hajnoczi, pbonzini, qemu-devel, virtio-dev
  Cc: Jailhouse

On 2017-04-19 12:02, Wei Wang wrote:
>>>>> The design presented here intends to use only one BAR to expose
>>>>> both TX and RX. The two VMs share an intermediate memory
>>>>> here, why couldn't we give the same permission to TX and RX?
>>>>>
>>>> For security and/or safety reasons: the TX side can then safely prepare
>>>> and sign a message in-place because the RX side cannot mess around with
>>>> it while not yet being signed (or check-summed). Saves one copy from a
>>>> secure place into the shared memory.
>>> If we allow guest1 to write to RX, what safety issue would it cause to
>>> guest2?
>> This way, guest1 could trick guest2, in a race condition, to sign a
>> modified message instead of the original one.
>>
> Just align the context that we are talking about: RX is the intermediate
> shared ring that guest1 uses to receive packets and guest2 uses to send
> packet.
> 
> Seems the issue is that guest1 will receive a hacked message from RX
> (modified by itself). How would it affect guest2?

Retry: guest2 wants to send a signed/hashed message to guest1. For that
purpose, it starts to build that message inside the shared memory that
guest1 can at least read, then guest2 signs that message, also in-place.
If guest1 can modify the message inside the ring while guest2 has not
yet signed it, the result is invalid.

Now, if guest2 is the final receiver of the message, nothing is lost,
guest2 just shot itself into the foot. However, if guest2 is just a
router (gray channel) and the message travels further, guest2 now has
corrupted that channel without allowing the final receive to detect
that. That's the scenario.

>>>>>>>     Fig. 4 shows the inter-VM notification path for 2.0 (2.1 is
>>>>>>> similar).
>>>>>>> The four eventfds are allocated by virtio-net, and shared with
>>>>>>> vhost-pci-net:
>>>>>>> Uses virtio-net’s TX/RX kickfd as the vhost-pci-net’s RX/TX callfd
>>>>>>> Uses virtio-net’s TX/RX callfd as the vhost-pci-net’s RX/TX kickfd
>>>>>>> Example of how it works:
>>>>>>> After packets are put into vhost-pci-net’s TX, the driver kicks TX,
>>>>>>> which
>>>>>>> causes the an interrupt associated with fd3 to be injected to
>>>>>>> virtio-net
>>>>>>>     The draft code of the 2.0 design is ready, and can be found
>>>>>>> here:
>>>>>>> Qemu: _https://github.com/wei-w-wang/vhost-pci-device_
>>>>>>> Guest driver: _https://github.com/wei-w-wang/vhost-pci-driver_
>>>>>>>     We tested the 2.0 implementation using the Spirent packet
>>>>>>> generator to transmit 64B packets, the results show that the
>>>>>>> throughput of vhost-pci reaches around 1.8Mpps, which is around
>>>>>>> two times larger than the legacy OVS+DPDK. Also, vhost-pci shows
>>>>>>> better scalability than OVS+DPDK.
>>>>>>>     
>>>>>> Do you have numbers for the symmetric 2.1 case as well? Or is the
>>>>>> driver
>>>>>> not yet ready for that yet? Otherwise, I could try to make it work
>>>>>> over
>>>>>> a simplistic vhost-pci 2.1 version in Jailhouse as well. That would
>>>>>> give
>>>>>> a better picture of how much additional complexity this would mean
>>>>>> compared to our ivshmem 2.0.
>>>>>>
>>>>> Implementation of 2.1 is not ready yet. We can extend it to 2.1 after
>>>>> the common driver frame is reviewed.
>>>> Can you you assess the needed effort?
>>>>
>>>> For us, this is a critical feature, because we need to decide if
>>>> vhost-pci can be an option at all. In fact, the "exotic ring" will be
>>>> the only way to provide secure inter-partition communication on
>>>> Jailhouse.
>>>>
>>> If what is here for 2.0 is suitable to be upstream-ed, I think it will
>>> be easy
>>> to extend it to 2.1 (probably within 1 month).
>> Unfortunate ordering here, though. Specifically if we need to modify
>> existing things instead of just adding something. We will need 2.1 prior
>> to committing to 2.0 being the right thing.
>>
> 
> If you want, we can get the common part of design ready first,
> then we can start to build on the common part at the same time.
> The draft code of 2.0 is ready. I can clean it up, making it easier for
> us to continue and change.

Without going into details yet, a meta requirement for us will be to
have advanced features optional, negotiable. Basically, we would like to
minimize the interface to an equivalent of what the ivshmem 2.0 is about
(there is no need for more in a safe/secure partitioning scenario). At
the same time, the complexity for a guest should remain low as well.

>From past experience, the only way to ensure that is having a working
prototype. So I will have to look into enabling that.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Vhost-pci RFC2.0
  2017-04-19  9:57 ` [Qemu-devel] [virtio-dev] " Stefan Hajnoczi
@ 2017-04-19 10:42   ` Wei Wang
  2017-04-19 15:24     ` Stefan Hajnoczi
  0 siblings, 1 reply; 27+ messages in thread
From: Wei Wang @ 2017-04-19 10:42 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Marc-André Lureau, Michael S. Tsirkin, Stefan Hajnoczi,
	pbonzini, qemu-devel, virtio-dev

On 04/19/2017 05:57 PM, Stefan Hajnoczi wrote:
> On Wed, Apr 19, 2017 at 06:38:11AM +0000, Wang, Wei W wrote:
>> We made some design changes to the original vhost-pci design, and want to open
>> a discussion about the latest design (labelled 2.0) and its extension (2.1).
>> 2.0 design: One VM shares the entire memory of another VM
>> 2.1 design: One VM uses an intermediate memory shared with another VM for
>>                       packet transmission.
> Hi,
> Can you talk a bit about the motivation for the 2.x design and major
> changes compared to 1.x?

1.x refers to the design we presented at KVM Form before. The major
change includes:
1) inter-VM notification support
2) TX engine and RX engine, which is the structure built in the driver. From
the device point of view, the local rings of the engines need to be 
registered.

The motivation is to build a common design for 2.0 and 2.1.

>
> What is the relationship between 2.0 and 2.1?  Do you plan to upstream
> both?
2.0 and 2.1 use different ways to share memory.

2.0: VM1 shares the entire memory of VM2, which achieves 0 copy
between VMs while being less secure.
2.1: VM1 and VM2 use an intermediate shared memory to transmit
packets, which results in 1 copy between VMs while being more secure.

Yes, plan to upstream both. Since the difference is the way to share memory,
I think it wouldn't have too many patches to upstream 2.1 if 2.0 is 
ready (or
changing the order if needed).

>> For the convenience of discussion, I have some pictures presented at this link:
>> https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost-pci-rfc2.0.pdf
>>
>> Fig. 1 shows the common driver frame that we want use to build the 2.0 and 2.1
> What does "frame" mean in this context?  "Framework" or "design".
"design" is more accurate here. We want to have a fundamental common 
design,
(please have a check Fig. 1. ), and build 2.0 and 2.1 on it.

Best,
Wei

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: Vhost-pci RFC2.0
  2017-04-19 10:36             ` Jan Kiszka
@ 2017-04-19 11:11               ` Wei Wang
  2017-04-19 11:21                 ` Jan Kiszka
  0 siblings, 1 reply; 27+ messages in thread
From: Wei Wang @ 2017-04-19 11:11 UTC (permalink / raw)
  To: Jan Kiszka, Marc-André Lureau, Michael S. Tsirkin,
	Stefan Hajnoczi, pbonzini, qemu-devel, virtio-dev
  Cc: Jailhouse

On 04/19/2017 06:36 PM, Jan Kiszka wrote:
> On 2017-04-19 12:02, Wei Wang wrote:
>>>>>> The design presented here intends to use only one BAR to expose
>>>>>> both TX and RX. The two VMs share an intermediate memory
>>>>>> here, why couldn't we give the same permission to TX and RX?
>>>>>>
>>>>> For security and/or safety reasons: the TX side can then safely prepare
>>>>> and sign a message in-place because the RX side cannot mess around with
>>>>> it while not yet being signed (or check-summed). Saves one copy from a
>>>>> secure place into the shared memory.
>>>> If we allow guest1 to write to RX, what safety issue would it cause to
>>>> guest2?
>>> This way, guest1 could trick guest2, in a race condition, to sign a
>>> modified message instead of the original one.
>>>
>> Just align the context that we are talking about: RX is the intermediate
>> shared ring that guest1 uses to receive packets and guest2 uses to send
>> packet.
>>
>> Seems the issue is that guest1 will receive a hacked message from RX
>> (modified by itself). How would it affect guest2?
> Retry: guest2 wants to send a signed/hashed message to guest1. For that
> purpose, it starts to build that message inside the shared memory that
> guest1 can at least read, then guest2 signs that message, also in-place.
> If guest1 can modify the message inside the ring while guest2 has not
> yet signed it, the result is invalid.
>
> Now, if guest2 is the final receiver of the message, nothing is lost,
> guest2 just shot itself into the foot. However, if guest2 is just a
> router (gray channel) and the message travels further, guest2 now has
> corrupted that channel without allowing the final receive to detect
> that. That's the scenario.

If guest2 has been a malicious guest, I think it wouldn't make a
difference whether we protect the shared RX or not. As a router,
guest2 can play tricks on the messages after read it and then
send the modified message to a third man, right?


>>>>>>>>      Fig. 4 shows the inter-VM notification path for 2.0 (2.1 is
>>>>>>>> similar).
>>>>>>>> The four eventfds are allocated by virtio-net, and shared with
>>>>>>>> vhost-pci-net:
>>>>>>>> Uses virtio-net’s TX/RX kickfd as the vhost-pci-net’s RX/TX callfd
>>>>>>>> Uses virtio-net’s TX/RX callfd as the vhost-pci-net’s RX/TX kickfd
>>>>>>>> Example of how it works:
>>>>>>>> After packets are put into vhost-pci-net’s TX, the driver kicks TX,
>>>>>>>> which
>>>>>>>> causes the an interrupt associated with fd3 to be injected to
>>>>>>>> virtio-net
>>>>>>>>      The draft code of the 2.0 design is ready, and can be found
>>>>>>>> here:
>>>>>>>> Qemu: _https://github.com/wei-w-wang/vhost-pci-device_
>>>>>>>> Guest driver: _https://github.com/wei-w-wang/vhost-pci-driver_
>>>>>>>>      We tested the 2.0 implementation using the Spirent packet
>>>>>>>> generator to transmit 64B packets, the results show that the
>>>>>>>> throughput of vhost-pci reaches around 1.8Mpps, which is around
>>>>>>>> two times larger than the legacy OVS+DPDK. Also, vhost-pci shows
>>>>>>>> better scalability than OVS+DPDK.
>>>>>>>>      
>>>>>>> Do you have numbers for the symmetric 2.1 case as well? Or is the
>>>>>>> driver
>>>>>>> not yet ready for that yet? Otherwise, I could try to make it work
>>>>>>> over
>>>>>>> a simplistic vhost-pci 2.1 version in Jailhouse as well. That would
>>>>>>> give
>>>>>>> a better picture of how much additional complexity this would mean
>>>>>>> compared to our ivshmem 2.0.
>>>>>>>
>>>>>> Implementation of 2.1 is not ready yet. We can extend it to 2.1 after
>>>>>> the common driver frame is reviewed.
>>>>> Can you you assess the needed effort?
>>>>>
>>>>> For us, this is a critical feature, because we need to decide if
>>>>> vhost-pci can be an option at all. In fact, the "exotic ring" will be
>>>>> the only way to provide secure inter-partition communication on
>>>>> Jailhouse.
>>>>>
>>>> If what is here for 2.0 is suitable to be upstream-ed, I think it will
>>>> be easy
>>>> to extend it to 2.1 (probably within 1 month).
>>> Unfortunate ordering here, though. Specifically if we need to modify
>>> existing things instead of just adding something. We will need 2.1 prior
>>> to committing to 2.0 being the right thing.
>>>
>> If you want, we can get the common part of design ready first,
>> then we can start to build on the common part at the same time.
>> The draft code of 2.0 is ready. I can clean it up, making it easier for
>> us to continue and change.
> Without going into details yet, a meta requirement for us will be to
> have advanced features optional, negotiable. Basically, we would like to
> minimize the interface to an equivalent of what the ivshmem 2.0 is about
> (there is no need for more in a safe/secure partitioning scenario). At
> the same time, the complexity for a guest should remain low as well.
>
>  From past experience, the only way to ensure that is having a working
> prototype. So I will have to look into enabling that.
>

OK. Looks like the ordering needs to be changed. This doesn't appear
to be a problem to me.

If the final design doesn't deviate a lot from what's presented here,
I think it should be easy to get 2.1 implemented quickly.
Let's first get the design ready, then assess the effort for
implementation.


Best,
Wei

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: Vhost-pci RFC2.0
  2017-04-19 11:11               ` Wei Wang
@ 2017-04-19 11:21                 ` Jan Kiszka
  2017-04-19 14:33                   ` Wang, Wei W
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Kiszka @ 2017-04-19 11:21 UTC (permalink / raw)
  To: Wei Wang, Marc-André Lureau, Michael S. Tsirkin,
	Stefan Hajnoczi, pbonzini, qemu-devel, virtio-dev
  Cc: Jailhouse

On 2017-04-19 13:11, Wei Wang wrote:
> On 04/19/2017 06:36 PM, Jan Kiszka wrote:
>> On 2017-04-19 12:02, Wei Wang wrote:
>>>>>>> The design presented here intends to use only one BAR to expose
>>>>>>> both TX and RX. The two VMs share an intermediate memory
>>>>>>> here, why couldn't we give the same permission to TX and RX?
>>>>>>>
>>>>>> For security and/or safety reasons: the TX side can then safely
>>>>>> prepare
>>>>>> and sign a message in-place because the RX side cannot mess around
>>>>>> with
>>>>>> it while not yet being signed (or check-summed). Saves one copy
>>>>>> from a
>>>>>> secure place into the shared memory.
>>>>> If we allow guest1 to write to RX, what safety issue would it cause to
>>>>> guest2?
>>>> This way, guest1 could trick guest2, in a race condition, to sign a
>>>> modified message instead of the original one.
>>>>
>>> Just align the context that we are talking about: RX is the intermediate
>>> shared ring that guest1 uses to receive packets and guest2 uses to send
>>> packet.
>>>
>>> Seems the issue is that guest1 will receive a hacked message from RX
>>> (modified by itself). How would it affect guest2?
>> Retry: guest2 wants to send a signed/hashed message to guest1. For that
>> purpose, it starts to build that message inside the shared memory that
>> guest1 can at least read, then guest2 signs that message, also in-place.
>> If guest1 can modify the message inside the ring while guest2 has not
>> yet signed it, the result is invalid.
>>
>> Now, if guest2 is the final receiver of the message, nothing is lost,
>> guest2 just shot itself into the foot. However, if guest2 is just a
>> router (gray channel) and the message travels further, guest2 now has
>> corrupted that channel without allowing the final receive to detect
>> that. That's the scenario.
> 
> If guest2 has been a malicious guest, I think it wouldn't make a
> difference whether we protect the shared RX or not. As a router,
> guest2 can play tricks on the messages after read it and then
> send the modified message to a third man, right?

It can swallow it, "steal" it (redirect), but it can't manipulate the
signed content without being caught, that's the idea. It's particularly
relevant for safety-critical traffic from one safe application to
another over unreliable channels, but it may also be relevant for the
integrity of messages in a secure setup.

> 
>>>>>>>>>      Fig. 4 shows the inter-VM notification path for 2.0 (2.1 is
>>>>>>>>> similar).
>>>>>>>>> The four eventfds are allocated by virtio-net, and shared with
>>>>>>>>> vhost-pci-net:
>>>>>>>>> Uses virtio-net’s TX/RX kickfd as the vhost-pci-net’s RX/TX callfd
>>>>>>>>> Uses virtio-net’s TX/RX callfd as the vhost-pci-net’s RX/TX kickfd
>>>>>>>>> Example of how it works:
>>>>>>>>> After packets are put into vhost-pci-net’s TX, the driver kicks
>>>>>>>>> TX,
>>>>>>>>> which
>>>>>>>>> causes the an interrupt associated with fd3 to be injected to
>>>>>>>>> virtio-net
>>>>>>>>>      The draft code of the 2.0 design is ready, and can be found
>>>>>>>>> here:
>>>>>>>>> Qemu: _https://github.com/wei-w-wang/vhost-pci-device_
>>>>>>>>> Guest driver: _https://github.com/wei-w-wang/vhost-pci-driver_
>>>>>>>>>      We tested the 2.0 implementation using the Spirent packet
>>>>>>>>> generator to transmit 64B packets, the results show that the
>>>>>>>>> throughput of vhost-pci reaches around 1.8Mpps, which is around
>>>>>>>>> two times larger than the legacy OVS+DPDK. Also, vhost-pci shows
>>>>>>>>> better scalability than OVS+DPDK.
>>>>>>>>>      
>>>>>>>> Do you have numbers for the symmetric 2.1 case as well? Or is the
>>>>>>>> driver
>>>>>>>> not yet ready for that yet? Otherwise, I could try to make it work
>>>>>>>> over
>>>>>>>> a simplistic vhost-pci 2.1 version in Jailhouse as well. That would
>>>>>>>> give
>>>>>>>> a better picture of how much additional complexity this would mean
>>>>>>>> compared to our ivshmem 2.0.
>>>>>>>>
>>>>>>> Implementation of 2.1 is not ready yet. We can extend it to 2.1
>>>>>>> after
>>>>>>> the common driver frame is reviewed.
>>>>>> Can you you assess the needed effort?
>>>>>>
>>>>>> For us, this is a critical feature, because we need to decide if
>>>>>> vhost-pci can be an option at all. In fact, the "exotic ring" will be
>>>>>> the only way to provide secure inter-partition communication on
>>>>>> Jailhouse.
>>>>>>
>>>>> If what is here for 2.0 is suitable to be upstream-ed, I think it will
>>>>> be easy
>>>>> to extend it to 2.1 (probably within 1 month).
>>>> Unfortunate ordering here, though. Specifically if we need to modify
>>>> existing things instead of just adding something. We will need 2.1
>>>> prior
>>>> to committing to 2.0 being the right thing.
>>>>
>>> If you want, we can get the common part of design ready first,
>>> then we can start to build on the common part at the same time.
>>> The draft code of 2.0 is ready. I can clean it up, making it easier for
>>> us to continue and change.
>> Without going into details yet, a meta requirement for us will be to
>> have advanced features optional, negotiable. Basically, we would like to
>> minimize the interface to an equivalent of what the ivshmem 2.0 is about
>> (there is no need for more in a safe/secure partitioning scenario). At
>> the same time, the complexity for a guest should remain low as well.
>>
>>  From past experience, the only way to ensure that is having a working
>> prototype. So I will have to look into enabling that.
>>
> 
> OK. Looks like the ordering needs to be changed. This doesn't appear
> to be a problem to me.
> 
> If the final design doesn't deviate a lot from what's presented here,
> I think it should be easy to get 2.1 implemented quickly.
> Let's first get the design ready, then assess the effort for
> implementation.
> 

OK, thanks.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: Vhost-pci RFC2.0
  2017-04-19 11:21                 ` Jan Kiszka
@ 2017-04-19 14:33                   ` Wang, Wei W
  2017-04-19 14:52                     ` Jan Kiszka
  0 siblings, 1 reply; 27+ messages in thread
From: Wang, Wei W @ 2017-04-19 14:33 UTC (permalink / raw)
  To: Jan Kiszka, Marc-André Lureau, Michael S. Tsirkin,
	Stefan Hajnoczi, pbonzini, qemu-devel, virtio-dev
  Cc: Jailhouse

On 04/19/2017 07:21 PM, Jan Kiszka wrote:
> On 2017-04-19 13:11, Wei Wang wrote:
> > On 04/19/2017 06:36 PM, Jan Kiszka wrote:
> >> On 2017-04-19 12:02, Wei Wang wrote:
> >>>>>>> The design presented here intends to use only one BAR to expose
> >>>>>>> both TX and RX. The two VMs share an intermediate memory here,
> >>>>>>> why couldn't we give the same permission to TX and RX?
> >>>>>>>
> >>>>>> For security and/or safety reasons: the TX side can then safely
> >>>>>> prepare and sign a message in-place because the RX side cannot
> >>>>>> mess around with it while not yet being signed (or check-summed).
> >>>>>> Saves one copy from a secure place into the shared memory.
> >>>>> If we allow guest1 to write to RX, what safety issue would it
> >>>>> cause to guest2?
> >>>> This way, guest1 could trick guest2, in a race condition, to sign a
> >>>> modified message instead of the original one.
> >>>>
> >>> Just align the context that we are talking about: RX is the
> >>> intermediate shared ring that guest1 uses to receive packets and
> >>> guest2 uses to send packet.
> >>>
> >>> Seems the issue is that guest1 will receive a hacked message from RX
> >>> (modified by itself). How would it affect guest2?
> >> Retry: guest2 wants to send a signed/hashed message to guest1. For
> >> that purpose, it starts to build that message inside the shared
> >> memory that
> >> guest1 can at least read, then guest2 signs that message, also in-place.
> >> If guest1 can modify the message inside the ring while guest2 has not
> >> yet signed it, the result is invalid.
> >>
> >> Now, if guest2 is the final receiver of the message, nothing is lost,
> >> guest2 just shot itself into the foot. However, if guest2 is just a
> >> router (gray channel) and the message travels further, guest2 now has
> >> corrupted that channel without allowing the final receive to detect
> >> that. That's the scenario.
> >
> > If guest2 has been a malicious guest, I think it wouldn't make a
> > difference whether we protect the shared RX or not. As a router,
> > guest2 can play tricks on the messages after read it and then send the
> > modified message to a third man, right?
> 
> It can swallow it, "steal" it (redirect), but it can't manipulate the signed content
> without being caught, that's the idea. It's particularly relevant for safety-critical
> traffic from one safe application to another over unreliable channels, but it may
> also be relevant for the integrity of messages in a secure setup.
> 

OK, I see most of your story, thanks. To get to the bottom of it, is it possible to
Sign the packet before put it onto the unreliable channel (e.g. the shared RX),
Instead of signing in-place? If that's doable, we can have a simpler shared channel.  


Best,
Wei

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: Vhost-pci RFC2.0
  2017-04-19 14:33                   ` Wang, Wei W
@ 2017-04-19 14:52                     ` Jan Kiszka
  2017-04-20  6:51                       ` Wei Wang
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Kiszka @ 2017-04-19 14:52 UTC (permalink / raw)
  To: Wang, Wei W, Marc-André Lureau, Michael S. Tsirkin,
	Stefan Hajnoczi, pbonzini, qemu-devel, virtio-dev
  Cc: Jailhouse

On 2017-04-19 16:33, Wang, Wei W wrote:
> On 04/19/2017 07:21 PM, Jan Kiszka wrote:
>> On 2017-04-19 13:11, Wei Wang wrote:
>>> On 04/19/2017 06:36 PM, Jan Kiszka wrote:
>>>> On 2017-04-19 12:02, Wei Wang wrote:
>>>>>>>>> The design presented here intends to use only one BAR to expose
>>>>>>>>> both TX and RX. The two VMs share an intermediate memory here,
>>>>>>>>> why couldn't we give the same permission to TX and RX?
>>>>>>>>>
>>>>>>>> For security and/or safety reasons: the TX side can then safely
>>>>>>>> prepare and sign a message in-place because the RX side cannot
>>>>>>>> mess around with it while not yet being signed (or check-summed).
>>>>>>>> Saves one copy from a secure place into the shared memory.
>>>>>>> If we allow guest1 to write to RX, what safety issue would it
>>>>>>> cause to guest2?
>>>>>> This way, guest1 could trick guest2, in a race condition, to sign a
>>>>>> modified message instead of the original one.
>>>>>>
>>>>> Just align the context that we are talking about: RX is the
>>>>> intermediate shared ring that guest1 uses to receive packets and
>>>>> guest2 uses to send packet.
>>>>>
>>>>> Seems the issue is that guest1 will receive a hacked message from RX
>>>>> (modified by itself). How would it affect guest2?
>>>> Retry: guest2 wants to send a signed/hashed message to guest1. For
>>>> that purpose, it starts to build that message inside the shared
>>>> memory that
>>>> guest1 can at least read, then guest2 signs that message, also in-place.
>>>> If guest1 can modify the message inside the ring while guest2 has not
>>>> yet signed it, the result is invalid.
>>>>
>>>> Now, if guest2 is the final receiver of the message, nothing is lost,
>>>> guest2 just shot itself into the foot. However, if guest2 is just a
>>>> router (gray channel) and the message travels further, guest2 now has
>>>> corrupted that channel without allowing the final receive to detect
>>>> that. That's the scenario.
>>>
>>> If guest2 has been a malicious guest, I think it wouldn't make a
>>> difference whether we protect the shared RX or not. As a router,
>>> guest2 can play tricks on the messages after read it and then send the
>>> modified message to a third man, right?
>>
>> It can swallow it, "steal" it (redirect), but it can't manipulate the signed content
>> without being caught, that's the idea. It's particularly relevant for safety-critical
>> traffic from one safe application to another over unreliable channels, but it may
>> also be relevant for the integrity of messages in a secure setup.
>>
> 
> OK, I see most of your story, thanks. To get to the bottom of it, is it possible to
> Sign the packet before put it onto the unreliable channel (e.g. the shared RX),
> Instead of signing in-place? If that's doable, we can have a simpler shared channel.  

Of course, you can always add another copy... But as it was trivial to
add unidirectional shared memory support to ivshmem [1], I see no reason
this shouldn't be possible for vhost-pci as well.

Jan

[1] https://github.com/siemens/jailhouse/commit/cfbd0b96d9cdb1ab7246c64bc446be39deb3f087, hypervisor part:

 hypervisor/include/jailhouse/cell-config.h |  4 ++--
 hypervisor/include/jailhouse/ivshmem.h     |  2 +-
 hypervisor/ivshmem.c                       | 52 +++++++++++++++++++++++++++++++++++-----------------
 3 files changed, 38 insertions(+), 20 deletions(-)

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Vhost-pci RFC2.0
  2017-04-19 10:42   ` Wei Wang
@ 2017-04-19 15:24     ` Stefan Hajnoczi
  2017-04-20  5:51       ` Wei Wang
  0 siblings, 1 reply; 27+ messages in thread
From: Stefan Hajnoczi @ 2017-04-19 15:24 UTC (permalink / raw)
  To: Wei Wang
  Cc: Stefan Hajnoczi, Marc-André Lureau, Michael S. Tsirkin,
	pbonzini, qemu-devel, virtio-dev

On Wed, Apr 19, 2017 at 11:42 AM, Wei Wang <wei.w.wang@intel.com> wrote:
> On 04/19/2017 05:57 PM, Stefan Hajnoczi wrote:
>> On Wed, Apr 19, 2017 at 06:38:11AM +0000, Wang, Wei W wrote:
>>>
>>> We made some design changes to the original vhost-pci design, and want to
>>> open
>>> a discussion about the latest design (labelled 2.0) and its extension
>>> (2.1).
>>> 2.0 design: One VM shares the entire memory of another VM
>>> 2.1 design: One VM uses an intermediate memory shared with another VM for
>>>                       packet transmission.
>>
>> Hi,
>> Can you talk a bit about the motivation for the 2.x design and major
>> changes compared to 1.x?
>
>
> 1.x refers to the design we presented at KVM Form before. The major
> change includes:
> 1) inter-VM notification support
> 2) TX engine and RX engine, which is the structure built in the driver. From
> the device point of view, the local rings of the engines need to be
> registered.

It would be great to support any virtio device type.

The use case I'm thinking of is networking and storage appliances in
cloud environments (e.g. OpenStack).  vhost-user doesn't fit nicely
because users may not be allowed to run host userspace processes.  VMs
are first-class objects in compute clouds.  It would be natural to
deploy networking and storage appliances as VMs using vhost-pci.

In order to achieve this vhost-pci needs to be a virtio transport and
not a virtio-net-specific PCI device.  It would extend the VIRTIO 1.x
spec alongside virtio-pci, virtio-mmio, and virtio-ccw.

When you say TX and RX I'm not sure if the design only supports
virtio-net devices?

> The motivation is to build a common design for 2.0 and 2.1.
>
>>
>> What is the relationship between 2.0 and 2.1?  Do you plan to upstream
>> both?
>
> 2.0 and 2.1 use different ways to share memory.
>
> 2.0: VM1 shares the entire memory of VM2, which achieves 0 copy
> between VMs while being less secure.
> 2.1: VM1 and VM2 use an intermediate shared memory to transmit
> packets, which results in 1 copy between VMs while being more secure.
>
> Yes, plan to upstream both. Since the difference is the way to share memory,
> I think it wouldn't have too many patches to upstream 2.1 if 2.0 is ready
> (or
> changing the order if needed).

Okay.  "Asymmetric" (vhost-pci <-> virtio-pci) and "symmetric"
(vhost-pci <-> vhost-pci) mode might be a clearer way to distinguish
between the two.  Or even "compatibility" mode and "native" mode since
existing guests only work in vhost-pci <-> virtio-pci mode.  Using
version numbers to describe two different modes of operation could be
confusing.

Stefan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Vhost-pci RFC2.0
  2017-04-19 15:24     ` Stefan Hajnoczi
@ 2017-04-20  5:51       ` Wei Wang
  2017-05-02 12:48         ` Stefan Hajnoczi
  0 siblings, 1 reply; 27+ messages in thread
From: Wei Wang @ 2017-04-20  5:51 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Stefan Hajnoczi, Marc-André Lureau, Michael S. Tsirkin,
	pbonzini, qemu-devel, virtio-dev

On 04/19/2017 11:24 PM, Stefan Hajnoczi wrote:
> On Wed, Apr 19, 2017 at 11:42 AM, Wei Wang <wei.w.wang@intel.com> wrote:
>> On 04/19/2017 05:57 PM, Stefan Hajnoczi wrote:
>>> On Wed, Apr 19, 2017 at 06:38:11AM +0000, Wang, Wei W wrote:
>>>> We made some design changes to the original vhost-pci design, and want to
>>>> open
>>>> a discussion about the latest design (labelled 2.0) and its extension
>>>> (2.1).
>>>> 2.0 design: One VM shares the entire memory of another VM
>>>> 2.1 design: One VM uses an intermediate memory shared with another VM for
>>>>                        packet transmission.
>>> Hi,
>>> Can you talk a bit about the motivation for the 2.x design and major
>>> changes compared to 1.x?
>>
>> 1.x refers to the design we presented at KVM Form before. The major
>> change includes:
>> 1) inter-VM notification support
>> 2) TX engine and RX engine, which is the structure built in the driver. From
>> the device point of view, the local rings of the engines need to be
>> registered.
> It would be great to support any virtio device type.

Yes, the current design already supports the creation of devices of
different types.
The support is added to the vhost-user protocol and the vhost-user slave.
Once the slave handler receives the request to create the device (with
the specified device type), the remaining process (e.g. device realize)
is device specific.
This part remains the same as presented before
(i.e.Page 12 @ 
http://www.linux-kvm.org/images/5/55/02x07A-Wei_Wang-Design_of-Vhost-pci.pdf).
>
> The use case I'm thinking of is networking and storage appliances in
> cloud environments (e.g. OpenStack).  vhost-user doesn't fit nicely
> because users may not be allowed to run host userspace processes.  VMs
> are first-class objects in compute clouds.  It would be natural to
> deploy networking and storage appliances as VMs using vhost-pci.
>
> In order to achieve this vhost-pci needs to be a virtio transport and
> not a virtio-net-specific PCI device.  It would extend the VIRTIO 1.x
> spec alongside virtio-pci, virtio-mmio, and virtio-ccw.

Actually it is designed as a device under virtio-pci transport. I'm
not sure about the value of having a new transport.

> When you say TX and RX I'm not sure if the design only supports
> virtio-net devices?

Current design focuses on the vhost-pci-net device. That's the
reason that we have TX/RX here. As mention above, when the
slave invokes the device creation function, the execution
goes to each device specific code.

The TX/RX is the design after the device creation, so it is specific
to vhost-pci-net. For the future vhost-pci-blk, that design can
have its own request queue.


>> The motivation is to build a common design for 2.0 and 2.1.
>>
>>> What is the relationship between 2.0 and 2.1?  Do you plan to upstream
>>> both?
>> 2.0 and 2.1 use different ways to share memory.
>>
>> 2.0: VM1 shares the entire memory of VM2, which achieves 0 copy
>> between VMs while being less secure.
>> 2.1: VM1 and VM2 use an intermediate shared memory to transmit
>> packets, which results in 1 copy between VMs while being more secure.
>>
>> Yes, plan to upstream both. Since the difference is the way to share memory,
>> I think it wouldn't have too many patches to upstream 2.1 if 2.0 is ready
>> (or
>> changing the order if needed).
> Okay.  "Asymmetric" (vhost-pci <-> virtio-pci) and "symmetric"
> (vhost-pci <-> vhost-pci) mode might be a clearer way to distinguish
> between the two.  Or even "compatibility" mode and "native" mode since
> existing guests only work in vhost-pci <-> virtio-pci mode.  Using
> version numbers to describe two different modes of operation could be
> confusing.
>
OK. I'll take your suggestion to use "asymmetric" and
"asymmetric". Thanks.


Best,
Wei

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: Vhost-pci RFC2.0
  2017-04-19 14:52                     ` Jan Kiszka
@ 2017-04-20  6:51                       ` Wei Wang
  2017-04-20  7:05                         ` Jan Kiszka
  0 siblings, 1 reply; 27+ messages in thread
From: Wei Wang @ 2017-04-20  6:51 UTC (permalink / raw)
  To: Jan Kiszka, Marc-André Lureau, Michael S. Tsirkin,
	Stefan Hajnoczi, pbonzini, qemu-devel, virtio-dev
  Cc: Jailhouse

On 04/19/2017 10:52 PM, Jan Kiszka wrote:
> On 2017-04-19 16:33, Wang, Wei W wrote:
>> On 04/19/2017 07:21 PM, Jan Kiszka wrote:
>>> On 2017-04-19 13:11, Wei Wang wrote:
>>>> On 04/19/2017 06:36 PM, Jan Kiszka wrote:
>>>>> On 2017-04-19 12:02, Wei Wang wrote:
>>>>>>>>>> The design presented here intends to use only one BAR to expose
>>>>>>>>>> both TX and RX. The two VMs share an intermediate memory here,
>>>>>>>>>> why couldn't we give the same permission to TX and RX?
>>>>>>>>>>
>>>>>>>>> For security and/or safety reasons: the TX side can then safely
>>>>>>>>> prepare and sign a message in-place because the RX side cannot
>>>>>>>>> mess around with it while not yet being signed (or check-summed).
>>>>>>>>> Saves one copy from a secure place into the shared memory.
>>>>>>>> If we allow guest1 to write to RX, what safety issue would it
>>>>>>>> cause to guest2?
>>>>>>> This way, guest1 could trick guest2, in a race condition, to sign a
>>>>>>> modified message instead of the original one.
>>>>>>>
>>>>>> Just align the context that we are talking about: RX is the
>>>>>> intermediate shared ring that guest1 uses to receive packets and
>>>>>> guest2 uses to send packet.
>>>>>>
>>>>>> Seems the issue is that guest1 will receive a hacked message from RX
>>>>>> (modified by itself). How would it affect guest2?
>>>>> Retry: guest2 wants to send a signed/hashed message to guest1. For
>>>>> that purpose, it starts to build that message inside the shared
>>>>> memory that
>>>>> guest1 can at least read, then guest2 signs that message, also in-place.
>>>>> If guest1 can modify the message inside the ring while guest2 has not
>>>>> yet signed it, the result is invalid.
>>>>>
>>>>> Now, if guest2 is the final receiver of the message, nothing is lost,
>>>>> guest2 just shot itself into the foot. However, if guest2 is just a
>>>>> router (gray channel) and the message travels further, guest2 now has
>>>>> corrupted that channel without allowing the final receive to detect
>>>>> that. That's the scenario.
>>>> If guest2 has been a malicious guest, I think it wouldn't make a
>>>> difference whether we protect the shared RX or not. As a router,
>>>> guest2 can play tricks on the messages after read it and then send the
>>>> modified message to a third man, right?
>>> It can swallow it, "steal" it (redirect), but it can't manipulate the signed content
>>> without being caught, that's the idea. It's particularly relevant for safety-critical
>>> traffic from one safe application to another over unreliable channels, but it may
>>> also be relevant for the integrity of messages in a secure setup.
>>>
>> OK, I see most of your story, thanks. To get to the bottom of it, is it possible to
>> Sign the packet before put it onto the unreliable channel (e.g. the shared RX),
>> Instead of signing in-place? If that's doable, we can have a simpler shared channel.
> Of course, you can always add another copy... But as it was trivial to
> add unidirectional shared memory support to ivshmem [1], I see no reason
> this shouldn't be possible for vhost-pci as well.
>

IIUC, this requires the ring and it's head/tail to be put into different 
regions, it would
be hard to fit the existing virtqueue into the shared the channel, since 
the vring and
its pointers (e.g. idx) and flags are on the same page.
So, probably will need to use another ring type.


Best,
Wei

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: Vhost-pci RFC2.0
  2017-04-20  6:51                       ` Wei Wang
@ 2017-04-20  7:05                         ` Jan Kiszka
  2017-04-20  8:58                           ` Wei Wang
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Kiszka @ 2017-04-20  7:05 UTC (permalink / raw)
  To: Wei Wang, Marc-André Lureau, Michael S. Tsirkin,
	Stefan Hajnoczi, pbonzini, qemu-devel, virtio-dev
  Cc: Jailhouse

On 2017-04-20 08:51, Wei Wang wrote:
> On 04/19/2017 10:52 PM, Jan Kiszka wrote:
>> On 2017-04-19 16:33, Wang, Wei W wrote:
>>> On 04/19/2017 07:21 PM, Jan Kiszka wrote:
>>>> On 2017-04-19 13:11, Wei Wang wrote:
>>>>> On 04/19/2017 06:36 PM, Jan Kiszka wrote:
>>>>>> On 2017-04-19 12:02, Wei Wang wrote:
>>>>>>>>>>> The design presented here intends to use only one BAR to expose
>>>>>>>>>>> both TX and RX. The two VMs share an intermediate memory here,
>>>>>>>>>>> why couldn't we give the same permission to TX and RX?
>>>>>>>>>>>
>>>>>>>>>> For security and/or safety reasons: the TX side can then safely
>>>>>>>>>> prepare and sign a message in-place because the RX side cannot
>>>>>>>>>> mess around with it while not yet being signed (or check-summed).
>>>>>>>>>> Saves one copy from a secure place into the shared memory.
>>>>>>>>> If we allow guest1 to write to RX, what safety issue would it
>>>>>>>>> cause to guest2?
>>>>>>>> This way, guest1 could trick guest2, in a race condition, to sign a
>>>>>>>> modified message instead of the original one.
>>>>>>>>
>>>>>>> Just align the context that we are talking about: RX is the
>>>>>>> intermediate shared ring that guest1 uses to receive packets and
>>>>>>> guest2 uses to send packet.
>>>>>>>
>>>>>>> Seems the issue is that guest1 will receive a hacked message from RX
>>>>>>> (modified by itself). How would it affect guest2?
>>>>>> Retry: guest2 wants to send a signed/hashed message to guest1. For
>>>>>> that purpose, it starts to build that message inside the shared
>>>>>> memory that
>>>>>> guest1 can at least read, then guest2 signs that message, also
>>>>>> in-place.
>>>>>> If guest1 can modify the message inside the ring while guest2 has not
>>>>>> yet signed it, the result is invalid.
>>>>>>
>>>>>> Now, if guest2 is the final receiver of the message, nothing is lost,
>>>>>> guest2 just shot itself into the foot. However, if guest2 is just a
>>>>>> router (gray channel) and the message travels further, guest2 now has
>>>>>> corrupted that channel without allowing the final receive to detect
>>>>>> that. That's the scenario.
>>>>> If guest2 has been a malicious guest, I think it wouldn't make a
>>>>> difference whether we protect the shared RX or not. As a router,
>>>>> guest2 can play tricks on the messages after read it and then send the
>>>>> modified message to a third man, right?
>>>> It can swallow it, "steal" it (redirect), but it can't manipulate
>>>> the signed content
>>>> without being caught, that's the idea. It's particularly relevant
>>>> for safety-critical
>>>> traffic from one safe application to another over unreliable
>>>> channels, but it may
>>>> also be relevant for the integrity of messages in a secure setup.
>>>>
>>> OK, I see most of your story, thanks. To get to the bottom of it, is
>>> it possible to
>>> Sign the packet before put it onto the unreliable channel (e.g. the
>>> shared RX),
>>> Instead of signing in-place? If that's doable, we can have a simpler
>>> shared channel.
>> Of course, you can always add another copy... But as it was trivial to
>> add unidirectional shared memory support to ivshmem [1], I see no reason
>> this shouldn't be possible for vhost-pci as well.
>>
> 
> IIUC, this requires the ring and it's head/tail to be put into different
> regions, it would
> be hard to fit the existing virtqueue into the shared the channel, since
> the vring and
> its pointers (e.g. idx) and flags are on the same page.
> So, probably will need to use another ring type.

The current virtio spec allows this split already - though it may not be
lived by existing implementations (we do, though, in ivshmem-net).
Future virtio spec (1.1 IIRC) will require a third region that holds the
meta data and has to be writable by both sides. But it will remain
possible to keep outgoing and incoming payload in separate pages, thus
with different access permissions.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: Vhost-pci RFC2.0
  2017-04-20  7:05                         ` Jan Kiszka
@ 2017-04-20  8:58                           ` Wei Wang
  0 siblings, 0 replies; 27+ messages in thread
From: Wei Wang @ 2017-04-20  8:58 UTC (permalink / raw)
  To: Jan Kiszka, Marc-André Lureau, Michael S. Tsirkin,
	Stefan Hajnoczi, pbonzini, qemu-devel, virtio-dev
  Cc: Jailhouse

On 04/20/2017 03:05 PM, Jan Kiszka wrote:
> On 2017-04-20 08:51, Wei Wang wrote:
>> On 04/19/2017 10:52 PM, Jan Kiszka wrote:
>>> On 2017-04-19 16:33, Wang, Wei W wrote:
>>>> On 04/19/2017 07:21 PM, Jan Kiszka wrote:
>>>>> On 2017-04-19 13:11, Wei Wang wrote:
>>>>>> On 04/19/2017 06:36 PM, Jan Kiszka wrote:
>>>>>>> On 2017-04-19 12:02, Wei Wang wrote:
>>>>>>>>>>>> The design presented here intends to use only one BAR to expose
>>>>>>>>>>>> both TX and RX. The two VMs share an intermediate memory here,
>>>>>>>>>>>> why couldn't we give the same permission to TX and RX?
>>>>>>>>>>>>
>>>>>>>>>>> For security and/or safety reasons: the TX side can then safely
>>>>>>>>>>> prepare and sign a message in-place because the RX side cannot
>>>>>>>>>>> mess around with it while not yet being signed (or check-summed).
>>>>>>>>>>> Saves one copy from a secure place into the shared memory.
>>>>>>>>>> If we allow guest1 to write to RX, what safety issue would it
>>>>>>>>>> cause to guest2?
>>>>>>>>> This way, guest1 could trick guest2, in a race condition, to sign a
>>>>>>>>> modified message instead of the original one.
>>>>>>>>>
>>>>>>>> Just align the context that we are talking about: RX is the
>>>>>>>> intermediate shared ring that guest1 uses to receive packets and
>>>>>>>> guest2 uses to send packet.
>>>>>>>>
>>>>>>>> Seems the issue is that guest1 will receive a hacked message from RX
>>>>>>>> (modified by itself). How would it affect guest2?
>>>>>>> Retry: guest2 wants to send a signed/hashed message to guest1. For
>>>>>>> that purpose, it starts to build that message inside the shared
>>>>>>> memory that
>>>>>>> guest1 can at least read, then guest2 signs that message, also
>>>>>>> in-place.
>>>>>>> If guest1 can modify the message inside the ring while guest2 has not
>>>>>>> yet signed it, the result is invalid.
>>>>>>>
>>>>>>> Now, if guest2 is the final receiver of the message, nothing is lost,
>>>>>>> guest2 just shot itself into the foot. However, if guest2 is just a
>>>>>>> router (gray channel) and the message travels further, guest2 now has
>>>>>>> corrupted that channel without allowing the final receive to detect
>>>>>>> that. That's the scenario.
>>>>>> If guest2 has been a malicious guest, I think it wouldn't make a
>>>>>> difference whether we protect the shared RX or not. As a router,
>>>>>> guest2 can play tricks on the messages after read it and then send the
>>>>>> modified message to a third man, right?
>>>>> It can swallow it, "steal" it (redirect), but it can't manipulate
>>>>> the signed content
>>>>> without being caught, that's the idea. It's particularly relevant
>>>>> for safety-critical
>>>>> traffic from one safe application to another over unreliable
>>>>> channels, but it may
>>>>> also be relevant for the integrity of messages in a secure setup.
>>>>>
>>>> OK, I see most of your story, thanks. To get to the bottom of it, is
>>>> it possible to
>>>> Sign the packet before put it onto the unreliable channel (e.g. the
>>>> shared RX),
>>>> Instead of signing in-place? If that's doable, we can have a simpler
>>>> shared channel.
>>> Of course, you can always add another copy... But as it was trivial to
>>> add unidirectional shared memory support to ivshmem [1], I see no reason
>>> this shouldn't be possible for vhost-pci as well.
>>>
>> IIUC, this requires the ring and it's head/tail to be put into different
>> regions, it would
>> be hard to fit the existing virtqueue into the shared the channel, since
>> the vring and
>> its pointers (e.g. idx) and flags are on the same page.
>> So, probably will need to use another ring type.
> The current virtio spec allows this split already - though it may not be
> lived by existing implementations (we do, though, in ivshmem-net).
> Future virtio spec (1.1 IIRC) will require a third region that holds the
> meta data and has to be writable by both sides. But it will remain
> possible to keep outgoing and incoming payload in separate pages, thus
> with different access permissions.
>
Yes, need to change some implementation. Will give it a try later.


Best,
Wei

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Vhost-pci RFC2.0
  2017-04-20  5:51       ` Wei Wang
@ 2017-05-02 12:48         ` Stefan Hajnoczi
  2017-05-03  6:02           ` Wei Wang
  0 siblings, 1 reply; 27+ messages in thread
From: Stefan Hajnoczi @ 2017-05-02 12:48 UTC (permalink / raw)
  To: Wei Wang
  Cc: Stefan Hajnoczi, Marc-André Lureau, Michael S. Tsirkin,
	pbonzini, qemu-devel, virtio-dev

[-- Attachment #1: Type: text/plain, Size: 3812 bytes --]

On Thu, Apr 20, 2017 at 01:51:24PM +0800, Wei Wang wrote:
> On 04/19/2017 11:24 PM, Stefan Hajnoczi wrote:
> > On Wed, Apr 19, 2017 at 11:42 AM, Wei Wang <wei.w.wang@intel.com> wrote:
> > > On 04/19/2017 05:57 PM, Stefan Hajnoczi wrote:
> > > > On Wed, Apr 19, 2017 at 06:38:11AM +0000, Wang, Wei W wrote:
> > > > > We made some design changes to the original vhost-pci design, and want to
> > > > > open
> > > > > a discussion about the latest design (labelled 2.0) and its extension
> > > > > (2.1).
> > > > > 2.0 design: One VM shares the entire memory of another VM
> > > > > 2.1 design: One VM uses an intermediate memory shared with another VM for
> > > > >                        packet transmission.
> > > > Hi,
> > > > Can you talk a bit about the motivation for the 2.x design and major
> > > > changes compared to 1.x?
> > > 
> > > 1.x refers to the design we presented at KVM Form before. The major
> > > change includes:
> > > 1) inter-VM notification support
> > > 2) TX engine and RX engine, which is the structure built in the driver. From
> > > the device point of view, the local rings of the engines need to be
> > > registered.
> > It would be great to support any virtio device type.
> 
> Yes, the current design already supports the creation of devices of
> different types.
> The support is added to the vhost-user protocol and the vhost-user slave.
> Once the slave handler receives the request to create the device (with
> the specified device type), the remaining process (e.g. device realize)
> is device specific.
> This part remains the same as presented before
> (i.e.Page 12 @ http://www.linux-kvm.org/images/5/55/02x07A-Wei_Wang-Design_of-Vhost-pci.pdf).
> > 
> > The use case I'm thinking of is networking and storage appliances in
> > cloud environments (e.g. OpenStack).  vhost-user doesn't fit nicely
> > because users may not be allowed to run host userspace processes.  VMs
> > are first-class objects in compute clouds.  It would be natural to
> > deploy networking and storage appliances as VMs using vhost-pci.
> > 
> > In order to achieve this vhost-pci needs to be a virtio transport and
> > not a virtio-net-specific PCI device.  It would extend the VIRTIO 1.x
> > spec alongside virtio-pci, virtio-mmio, and virtio-ccw.
> 
> Actually it is designed as a device under virtio-pci transport. I'm
> not sure about the value of having a new transport.
> 
> > When you say TX and RX I'm not sure if the design only supports
> > virtio-net devices?
> 
> Current design focuses on the vhost-pci-net device. That's the
> reason that we have TX/RX here. As mention above, when the
> slave invokes the device creation function, the execution
> goes to each device specific code.
> 
> The TX/RX is the design after the device creation, so it is specific
> to vhost-pci-net. For the future vhost-pci-blk, that design can
> have its own request queue.

Here is my understanding based on your vhost-pci GitHub repo:

VM1 sees a normal virtio-net-pci device.  VM1 QEMU is invoked with a
vhost-user netdev.

VM2 sees a hotplugged vhost-pci-net virtio-pci device once VM1
initializes the device and a message is sent over vhost-user.

There is no integration with Linux drivers/vhost/ code for VM2.  Instead
you are writing a 3rd virtio-net driver specifically for vhost-pci.  Not
sure if it's possible to reuse drivers/vhost/ cleanly but that would be
nicer than implementing virtio-net again.

Is the VM1 vhost-user netdev a normal vhost-user device or does it know
about vhost-pci?

It's hard to study code changes in your vhost-pci repo because
everything (QEMU + Linux + your changes) was committed in a single
commit.  Please keep your changes in separate commits so it's easy to
find them.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Vhost-pci RFC2.0
  2017-05-02 12:48         ` Stefan Hajnoczi
@ 2017-05-03  6:02           ` Wei Wang
  0 siblings, 0 replies; 27+ messages in thread
From: Wei Wang @ 2017-05-03  6:02 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Stefan Hajnoczi, Marc-André Lureau, Michael S. Tsirkin,
	pbonzini, qemu-devel, virtio-dev

On 05/02/2017 08:48 PM, Stefan Hajnoczi wrote:
> On Thu, Apr 20, 2017 at 01:51:24PM +0800, Wei Wang wrote:
>> On 04/19/2017 11:24 PM, Stefan Hajnoczi wrote:
>>> On Wed, Apr 19, 2017 at 11:42 AM, Wei Wang <wei.w.wang@intel.com> wrote:
>>>> On 04/19/2017 05:57 PM, Stefan Hajnoczi wrote:
>>>>> On Wed, Apr 19, 2017 at 06:38:11AM +0000, Wang, Wei W wrote:
>>>>>> We made some design changes to the original vhost-pci design, and want to
>>>>>> open
>>>>>> a discussion about the latest design (labelled 2.0) and its extension
>>>>>> (2.1).
>>>>>> 2.0 design: One VM shares the entire memory of another VM
>>>>>> 2.1 design: One VM uses an intermediate memory shared with another VM for
>>>>>>                         packet transmission.
>>>>> Hi,
>>>>> Can you talk a bit about the motivation for the 2.x design and major
>>>>> changes compared to 1.x?
>>>> 1.x refers to the design we presented at KVM Form before. The major
>>>> change includes:
>>>> 1) inter-VM notification support
>>>> 2) TX engine and RX engine, which is the structure built in the driver. From
>>>> the device point of view, the local rings of the engines need to be
>>>> registered.
>>> It would be great to support any virtio device type.
>> Yes, the current design already supports the creation of devices of
>> different types.
>> The support is added to the vhost-user protocol and the vhost-user slave.
>> Once the slave handler receives the request to create the device (with
>> the specified device type), the remaining process (e.g. device realize)
>> is device specific.
>> This part remains the same as presented before
>> (i.e.Page 12 @ http://www.linux-kvm.org/images/5/55/02x07A-Wei_Wang-Design_of-Vhost-pci.pdf).
>>> The use case I'm thinking of is networking and storage appliances in
>>> cloud environments (e.g. OpenStack).  vhost-user doesn't fit nicely
>>> because users may not be allowed to run host userspace processes.  VMs
>>> are first-class objects in compute clouds.  It would be natural to
>>> deploy networking and storage appliances as VMs using vhost-pci.
>>>
>>> In order to achieve this vhost-pci needs to be a virtio transport and
>>> not a virtio-net-specific PCI device.  It would extend the VIRTIO 1.x
>>> spec alongside virtio-pci, virtio-mmio, and virtio-ccw.
>> Actually it is designed as a device under virtio-pci transport. I'm
>> not sure about the value of having a new transport.
>>
>>> When you say TX and RX I'm not sure if the design only supports
>>> virtio-net devices?
>> Current design focuses on the vhost-pci-net device. That's the
>> reason that we have TX/RX here. As mention above, when the
>> slave invokes the device creation function, the execution
>> goes to each device specific code.
>>
>> The TX/RX is the design after the device creation, so it is specific
>> to vhost-pci-net. For the future vhost-pci-blk, that design can
>> have its own request queue.
> Here is my understanding based on your vhost-pci GitHub repo:
>
> VM1 sees a normal virtio-net-pci device.  VM1 QEMU is invoked with a
> vhost-user netdev.
>
> VM2 sees a hotplugged vhost-pci-net virtio-pci device once VM1
> initializes the device and a message is sent over vhost-user.

Right.

>
> There is no integration with Linux drivers/vhost/ code for VM2.  Instead
> you are writing a 3rd virtio-net driver specifically for vhost-pci.  Not
> sure if it's possible to reuse drivers/vhost/ cleanly but that would be
> nicer than implementing virtio-net again.

vhost-pci-net is a standalone network device with its own unique
device id, and the device itself is different from virtio-net (e.g.
different virtqueues), so I think it would be more reasonable to
let vhost-pci-net have its own driver.

There are indeed some functions in vhost-pci-net that looks
similar to those in virtio-net (e.g. try_fill_recv()). I haven't thought
of a good way to reuse them yet, because the interfaces are not
completely the same, for example, vpnet_info and virtnet_info,
which need to be passed to the functions, are different.

>
> Is the VM1 vhost-user netdev a normal vhost-user device or does it know
> about vhost-pci?

Share the QEMU booting commands, which would be helpful:
VM1(vhost-pci-net):
-chardev socket,id=slave1,server,wait=off,path=${PATH_SLAVE1} \
-vhost-pci-slave socket,chardev=slave1

VM2(virtio-net):
-chardev socket,id=sock2,path=${PATH_SLAVE1} \
-netdev type=vhost-user,id=net2,chardev=sock2,vhostforce \
-device virtio-net-pci,mac=52:54:00:00:00:02,netdev=net2

The netdev doesn't know about vhost_pci, but the vhost_dev knows
it via
vhost_dev->protocol_features &
     (1ULL << VHOST_USER_PROTOCOL_F_VHOST_PCI),

The vhost-pci specific messages need to be sent in the vhost-pci
case. For example, at the end of vhost_net_start(), if it detects the
slave is vhost-pci, it will send a
VHOST_USER_SET_VHOST_PCI_START message to the slave(VM1).

>
> It's hard to study code changes in your vhost-pci repo because
> everything (QEMU + Linux + your changes) was committed in a single
> commit.  Please keep your changes in separate commits so it's easy to
> find them.
>
Thanks a lot for reading the draft code. I'm working to do some
cleanup and split it into patches. I will post out the QEMU side
patches soon.


Best,
Wei

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Vhost-pci RFC2.0
  2017-04-19  6:38 [Qemu-devel] Vhost-pci RFC2.0 Wang, Wei W
                   ` (2 preceding siblings ...)
  2017-04-19  9:57 ` [Qemu-devel] [virtio-dev] " Stefan Hajnoczi
@ 2017-05-05  4:05 ` Jason Wang
  2017-05-05  6:18   ` Wei Wang
  3 siblings, 1 reply; 27+ messages in thread
From: Jason Wang @ 2017-05-05  4:05 UTC (permalink / raw)
  To: Wang, Wei W, Marc-André Lureau, Michael S. Tsirkin,
	Stefan Hajnoczi, pbonzini, qemu-devel, virtio-dev



On 2017年04月19日 14:38, Wang, Wei W wrote:
> Hi,
> We made some design changes to the original vhost-pci design, and want 
> to open
> a discussion about the latest design (labelled 2.0) and its extension 
> (2.1).
> 2.0 design: One VM shares the entire memory of another VM
> 2.1 design: One VM uses an intermediate memory shared with another VM for
>                      packet transmission.
> For the convenience of discussion, I have some pictures presented at 
> this link:
> _https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost-pci-rfc2.0.pdf_

Hi, is there any doc or pointer that describes the the design in detail? 
E.g patch 4 in v1 
https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg05163.html.

Thanks

> Fig. 1 shows the common driver frame that we want use to build the 2.0 
> and 2.1
> design. A TX/RX engine consists of a local ring and an exotic ring.
> Local ring:
> 1) allocated by the driver itself;
> 2) registered with the device (i.e. virtio_add_queue())
> Exotic ring:
> 1) ring memory comes from the outside (of the driver), and exposed to 
> the driver
>      via a BAR MMIO;
> 2) does not have a registration in the device, so no ioeventfd/irqfd, 
> configuration
> registers allocated in the device
> Fig. 2 shows how the driver frame is used to build the 2.0 design.
> 1) Asymmetric: vhost-pci-net <-> virtio-net
> 2) VM1 shares the entire memory of VM2, and the exotic rings are the rings
>     from VM2.
> 3) Performance (in terms of copies between VMs):
>     TX: 0-copy (packets are put to VM2’s RX ring directly)
>     RX: 1-copy (the green arrow line in the VM1’s RX engine)
> Fig. 3 shows how the driver frame is used to build the 2.1 design.
> 1) Symmetric: vhost-pci-net <-> vhost-pci-net
> 2) Share an intermediate memory, allocated by VM1’s vhost-pci device,
> for data exchange, and the exotic rings are built on the shared memory
> 3) Performance:
>     TX: 1-copy
> RX: 1-copy
> Fig. 4 shows the inter-VM notification path for 2.0 (2.1 is similar).
> The four eventfds are allocated by virtio-net, and shared with 
> vhost-pci-net:
> Uses virtio-net’s TX/RX kickfd as the vhost-pci-net’s RX/TX callfd
> Uses virtio-net’s TX/RX callfd as the vhost-pci-net’s RX/TX kickfd
> Example of how it works:
> After packets are put into vhost-pci-net’s TX, the driver kicks TX, which
> causes the an interrupt associated with fd3 to be injected to virtio-net
> The draft code of the 2.0 design is ready, and can be found here:
> Qemu: _https://github.com/wei-w-wang/vhost-pci-device_
> Guest driver: _https://github.com/wei-w-wang/vhost-pci-driver_
> We tested the 2.0 implementation using the Spirent packet
> generator to transmit 64B packets, the results show that the
> throughput of vhost-pci reaches around 1.8Mpps, which is around
> two times larger than the legacy OVS+DPDK.

Does this mean OVS+DPDK can only have ~0.9Mpps? A little bit surprise 
that the number looks rather low (I can get similar result if I use 
kernel bridge).

Thanks

> Also, vhost-pci shows
> better scalability than OVS+DPDK.
> Best,
> Wei

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Vhost-pci RFC2.0
  2017-05-05  4:05 ` Jason Wang
@ 2017-05-05  6:18   ` Wei Wang
  2017-05-05  9:18     ` Jason Wang
  0 siblings, 1 reply; 27+ messages in thread
From: Wei Wang @ 2017-05-05  6:18 UTC (permalink / raw)
  To: Jason Wang, Marc-André Lureau, Michael S. Tsirkin,
	Stefan Hajnoczi, pbonzini, qemu-devel, virtio-dev

On 05/05/2017 12:05 PM, Jason Wang wrote:
>
>
> On 2017年04月19日 14:38, Wang, Wei W wrote:
>> Hi,
>> We made some design changes to the original vhost-pci design, and 
>> want to open
>> a discussion about the latest design (labelled 2.0) and its extension 
>> (2.1).
>> 2.0 design: One VM shares the entire memory of another VM
>> 2.1 design: One VM uses an intermediate memory shared with another VM 
>> for
>>                      packet transmission.
>> For the convenience of discussion, I have some pictures presented at 
>> this link:
>> _https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost-pci-rfc2.0.pdf_ 
>>
>
> Hi, is there any doc or pointer that describes the the design in 
> detail? E.g patch 4 in v1 
> https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg05163.html.
>
> Thanks

That link is kind of obsolete.

We currently only have high level introduction of the design:

For the device part design, please check slide 12:
http://www.linux-kvm.org/images/5/55/02x07A-Wei_Wang-Design_of-Vhost-pci.pdf
The vhost-pci protocol is changed to be an extension of vhost-user protocol.

For the driver part design, please check Fig. 2:

https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost-pci-rfc2.0.pdf


>> Fig. 1 shows the common driver frame that we want use to build the 
>> 2.0 and 2.1
>> design. A TX/RX engine consists of a local ring and an exotic ring.
>> Local ring:
>> 1) allocated by the driver itself;
>> 2) registered with the device (i.e. virtio_add_queue())
>> Exotic ring:
>> 1) ring memory comes from the outside (of the driver), and exposed to 
>> the driver
>>      via a BAR MMIO;
>> 2) does not have a registration in the device, so no ioeventfd/irqfd, 
>> configuration
>> registers allocated in the device
>> Fig. 2 shows how the driver frame is used to build the 2.0 design.
>> 1) Asymmetric: vhost-pci-net <-> virtio-net
>> 2) VM1 shares the entire memory of VM2, and the exotic rings are the 
>> rings
>>     from VM2.
>> 3) Performance (in terms of copies between VMs):
>>     TX: 0-copy (packets are put to VM2’s RX ring directly)
>>     RX: 1-copy (the green arrow line in the VM1’s RX engine)
>> Fig. 3 shows how the driver frame is used to build the 2.1 design.
>> 1) Symmetric: vhost-pci-net <-> vhost-pci-net
>> 2) Share an intermediate memory, allocated by VM1’s vhost-pci device,
>> for data exchange, and the exotic rings are built on the shared memory
>> 3) Performance:
>>     TX: 1-copy
>> RX: 1-copy
>> Fig. 4 shows the inter-VM notification path for 2.0 (2.1 is similar).
>> The four eventfds are allocated by virtio-net, and shared with 
>> vhost-pci-net:
>> Uses virtio-net’s TX/RX kickfd as the vhost-pci-net’s RX/TX callfd
>> Uses virtio-net’s TX/RX callfd as the vhost-pci-net’s RX/TX kickfd
>> Example of how it works:
>> After packets are put into vhost-pci-net’s TX, the driver kicks TX, 
>> which
>> causes the an interrupt associated with fd3 to be injected to virtio-net
>> The draft code of the 2.0 design is ready, and can be found here:
>> Qemu: _https://github.com/wei-w-wang/vhost-pci-device_
>> Guest driver: _https://github.com/wei-w-wang/vhost-pci-driver_
>> We tested the 2.0 implementation using the Spirent packet
>> generator to transmit 64B packets, the results show that the
>> throughput of vhost-pci reaches around 1.8Mpps, which is around
>> two times larger than the legacy OVS+DPDK.
>
> Does this mean OVS+DPDK can only have ~0.9Mpps? A little bit surprise 
> that the number looks rather low (I can get similar result if I use 
> kernel bridge).
>

Yes, that's what we got on our machine (E5-2699 @2.2G). Do you have 
numbers of OVS+DPDK?

Best,
Wei

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Vhost-pci RFC2.0
  2017-05-05  6:18   ` Wei Wang
@ 2017-05-05  9:18     ` Jason Wang
  2017-05-08  1:39       ` Wei Wang
  0 siblings, 1 reply; 27+ messages in thread
From: Jason Wang @ 2017-05-05  9:18 UTC (permalink / raw)
  To: Wei Wang, Marc-André Lureau, Michael S. Tsirkin,
	Stefan Hajnoczi, pbonzini, qemu-devel, virtio-dev



On 2017年05月05日 14:18, Wei Wang wrote:
> On 05/05/2017 12:05 PM, Jason Wang wrote:
>>
>>
>> On 2017年04月19日 14:38, Wang, Wei W wrote:
>>> Hi,
>>> We made some design changes to the original vhost-pci design, and 
>>> want to open
>>> a discussion about the latest design (labelled 2.0) and its 
>>> extension (2.1).
>>> 2.0 design: One VM shares the entire memory of another VM
>>> 2.1 design: One VM uses an intermediate memory shared with another 
>>> VM for
>>>                      packet transmission.
>>> For the convenience of discussion, I have some pictures presented at 
>>> this link:
>>> _https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost-pci-rfc2.0.pdf_ 
>>>
>>
>> Hi, is there any doc or pointer that describes the the design in 
>> detail? E.g patch 4 in v1 
>> https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg05163.html.
>>
>> Thanks
>
> That link is kind of obsolete.
>
> We currently only have high level introduction of the design:
>
> For the device part design, please check slide 12:
> http://www.linux-kvm.org/images/5/55/02x07A-Wei_Wang-Design_of-Vhost-pci.pdf 
>
> The vhost-pci protocol is changed to be an extension of vhost-user 
> protocol.
>
> For the driver part design, please check Fig. 2:
>
> https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost-pci-rfc2.0.pdf 
>

Thanks for the pointers. It would be nice to have a doc like patch 4 in 
v1, this could ease reviewers, otherwise we may guess and ask for them.

>
>
>>> Fig. 1 shows the common driver frame that we want use to build the 
>>> 2.0 and 2.1
>>> design. A TX/RX engine consists of a local ring and an exotic ring.
>>> Local ring:
>>> 1) allocated by the driver itself;
>>> 2) registered with the device (i.e. virtio_add_queue())
>>> Exotic ring:
>>> 1) ring memory comes from the outside (of the driver), and exposed 
>>> to the driver
>>>      via a BAR MMIO;
>>> 2) does not have a registration in the device, so no 
>>> ioeventfd/irqfd, configuration
>>> registers allocated in the device
>>> Fig. 2 shows how the driver frame is used to build the 2.0 design.
>>> 1) Asymmetric: vhost-pci-net <-> virtio-net
>>> 2) VM1 shares the entire memory of VM2, and the exotic rings are the 
>>> rings
>>>     from VM2.
>>> 3) Performance (in terms of copies between VMs):
>>>     TX: 0-copy (packets are put to VM2’s RX ring directly)
>>>     RX: 1-copy (the green arrow line in the VM1’s RX engine)
>>> Fig. 3 shows how the driver frame is used to build the 2.1 design.
>>> 1) Symmetric: vhost-pci-net <-> vhost-pci-net
>>> 2) Share an intermediate memory, allocated by VM1’s vhost-pci device,
>>> for data exchange, and the exotic rings are built on the shared memory
>>> 3) Performance:
>>>     TX: 1-copy
>>> RX: 1-copy
>>> Fig. 4 shows the inter-VM notification path for 2.0 (2.1 is similar).
>>> The four eventfds are allocated by virtio-net, and shared with 
>>> vhost-pci-net:
>>> Uses virtio-net’s TX/RX kickfd as the vhost-pci-net’s RX/TX callfd
>>> Uses virtio-net’s TX/RX callfd as the vhost-pci-net’s RX/TX kickfd
>>> Example of how it works:
>>> After packets are put into vhost-pci-net’s TX, the driver kicks TX, 
>>> which
>>> causes the an interrupt associated with fd3 to be injected to 
>>> virtio-net
>>> The draft code of the 2.0 design is ready, and can be found here:
>>> Qemu: _https://github.com/wei-w-wang/vhost-pci-device_
>>> Guest driver: _https://github.com/wei-w-wang/vhost-pci-driver_
>>> We tested the 2.0 implementation using the Spirent packet
>>> generator to transmit 64B packets, the results show that the
>>> throughput of vhost-pci reaches around 1.8Mpps, which is around
>>> two times larger than the legacy OVS+DPDK.
>>
>> Does this mean OVS+DPDK can only have ~0.9Mpps? A little bit surprise 
>> that the number looks rather low (I can get similar result if I use 
>> kernel bridge).
>>
>
> Yes, that's what we got on our machine (E5-2699 @2.2G). Do you have 
> numbers of OVS+DPDK?
>
> Best,
> Wei
>
>

I don't, I only have kernel data path numbers now. Just curious about 
the numbers.

Thanks

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Vhost-pci RFC2.0
  2017-05-05  9:18     ` Jason Wang
@ 2017-05-08  1:39       ` Wei Wang
  0 siblings, 0 replies; 27+ messages in thread
From: Wei Wang @ 2017-05-08  1:39 UTC (permalink / raw)
  To: Jason Wang, Marc-André Lureau, Michael S. Tsirkin,
	Stefan Hajnoczi, pbonzini, qemu-devel, virtio-dev

On 05/05/2017 05:18 PM, Jason Wang wrote:
>
>
> On 2017年05月05日 14:18, Wei Wang wrote:
>> On 05/05/2017 12:05 PM, Jason Wang wrote:
>>>
>>>
>>> On 2017年04月19日 14:38, Wang, Wei W wrote:
>>>> Hi,
>>>> We made some design changes to the original vhost-pci design, and 
>>>> want to open
>>>> a discussion about the latest design (labelled 2.0) and its 
>>>> extension (2.1).
>>>> 2.0 design: One VM shares the entire memory of another VM
>>>> 2.1 design: One VM uses an intermediate memory shared with another 
>>>> VM for
>>>>                      packet transmission.
>>>> For the convenience of discussion, I have some pictures presented 
>>>> at this link:
>>>> _https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost-pci-rfc2.0.pdf_ 
>>>>
>>>
>>> Hi, is there any doc or pointer that describes the the design in 
>>> detail? E.g patch 4 in v1 
>>> https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg05163.html.
>>>
>>> Thanks
>>
>> That link is kind of obsolete.
>>
>> We currently only have high level introduction of the design:
>>
>> For the device part design, please check slide 12:
>> http://www.linux-kvm.org/images/5/55/02x07A-Wei_Wang-Design_of-Vhost-pci.pdf 
>>
>> The vhost-pci protocol is changed to be an extension of vhost-user 
>> protocol.
>>
>> For the driver part design, please check Fig. 2:
>>
>> https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost-pci-rfc2.0.pdf 
>>
>
> Thanks for the pointers. It would be nice to have a doc like patch 4 
> in v1, this could ease reviewers, otherwise we may guess and ask for 
> them.
>

Thanks for joining the review. I'm preparing the v2 QEMU patches. I will 
have the code sent out first,
which would be helpful for understanding how the design works.

Best,
Wei

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2017-05-08  1:38 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-19  6:38 [Qemu-devel] Vhost-pci RFC2.0 Wang, Wei W
2017-04-19  7:31 ` Marc-André Lureau
2017-04-19  8:33   ` Wei Wang
2017-04-19  7:35 ` Jan Kiszka
2017-04-19  8:42   ` Wei Wang
2017-04-19  8:49     ` [Qemu-devel] [virtio-dev] " Jan Kiszka
2017-04-19  9:09       ` Wei Wang
2017-04-19  9:31         ` Jan Kiszka
2017-04-19 10:02           ` Wei Wang
2017-04-19 10:36             ` Jan Kiszka
2017-04-19 11:11               ` Wei Wang
2017-04-19 11:21                 ` Jan Kiszka
2017-04-19 14:33                   ` Wang, Wei W
2017-04-19 14:52                     ` Jan Kiszka
2017-04-20  6:51                       ` Wei Wang
2017-04-20  7:05                         ` Jan Kiszka
2017-04-20  8:58                           ` Wei Wang
2017-04-19  9:57 ` [Qemu-devel] [virtio-dev] " Stefan Hajnoczi
2017-04-19 10:42   ` Wei Wang
2017-04-19 15:24     ` Stefan Hajnoczi
2017-04-20  5:51       ` Wei Wang
2017-05-02 12:48         ` Stefan Hajnoczi
2017-05-03  6:02           ` Wei Wang
2017-05-05  4:05 ` Jason Wang
2017-05-05  6:18   ` Wei Wang
2017-05-05  9:18     ` Jason Wang
2017-05-08  1:39       ` Wei Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.