All of lore.kernel.org
 help / color / mirror / Atom feed
* [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
@ 2023-04-23 11:29 zhenwei pi
  2023-04-24  3:40   ` Jason Wang
  0 siblings, 1 reply; 29+ messages in thread
From: zhenwei pi @ 2023-04-23 11:29 UTC (permalink / raw)
  To: Michael S . Tsirkin, Cornelia Huck, parav
  Cc: virtio-dev, virtio-comment, helei.sig11, houp

Hi,

In the past years, virtio supports lots of device specifications by 
PCI/MMIO/CCW. These devices work fine in the virtualization environment, 
and we have a chance to support virtio device family for the 
container/host scenario.

- Theory
"Virtio Over Fabrics" aims at "reuse virtio device specifications", and 
provides network defined peripheral devices.
And this protocol also could be used in virtualization environment, 
typically hypervisor(or vhost-user process) handles request from virtio 
PCI/MMIO/CCW, remaps request and forwards to target by fabrics.

- Protocol
The detail protocol definition see:
https://github.com/pizhenwei/linux/blob/virtio-of-github/include/uapi/linux/virtio_of.h

Example of virtio-blk read/write by TCP/RDMA:
1. Virtio Over TCP
1.1 An example of virtio-blk write(8K) command:
Initiator side sends a stream buffer(command + 4 * desc + 8208 bytes):
  COMMAND            +------+
                     |opcode|  ->  virtio_of_op_vring
                     +------+
                     |cmd id|  ->  10
                     +------+
                     |length|  ->  8208
                     +------+
                     |ndesc |  ->  4
                     +------+
                     |rsvd  |
                     +------+

  DESC0              +------+
               +-----|addr  |  -> 0
               |     +------+
               |     |length|  -> 16 (virtio blk write command)
               |     +------+
               |     |id    |  -> 10
               |     +------+
               |     |flags |  -> VRING_DESC_F_NEXT
               |     +------+
               |
  DESC1        |     +------+
               | +---|addr  |  -> 16
               | |   +------+
               | |   |length|  -> 4096
               | |   +------+
               | |   |id    |  -> 11
               | |   +------+
               | |   |flags |  -> VRING_DESC_F_NEXT
               | |   +------+
               | |
  DESC2        | |   +------+
               | |   |addr  |  -> 4112
               | |   +------+
               | | +-|length|  -> 4096
               | | | +------+
               | | | |id    |  -> 12
               | | | +------+
               | | | |flags |  -> VRING_DESC_F_NEXT
               | | | +------+
               | | |
  DESC3        | | | +------+
               | | | |addr  |  -> 0
               | | | +------+
               | | | |length|  -> 1
               | | | +------+
               | | | |id    |  -> 13
               | | | +------+
               | | | |flags |  -> VRING_DESC_F_WRITE
               | | | +------+
               | | |
  DATA         +-+-+>+------+  -> 0
                 | | |......|
                 +-+>+------+  -> 16
                   | |......|
                   +>+------+  -> 4112
                     |......|
                     +------+  -> 8208

Target side sends a stream buffer(completion + 1 * desc + 1 bytes):
  COMPLETION         +------+
                     |status|  ->  VIRTIO_OF_SUCCESS
                     +------+
                     |cmd id|  ->  10
                     +------+
                     |ndesc |  ->  1
                     +------+
                     |rsvd  |
                     +------+
                     |value |  -> 1 (value.u32)
                     +------+

  DESC0              +------+
                   +-|addr  |  -> 0
                   | +------+
                   | |length|  -> 1
                   | +------+
                   | |id    |  -> 13
                   | +------+
                   | |flags |  -> VRING_DESC_F_WRITE
                   | +------+
                   |
  DATA             |>+------+  -> 0
                     |......|
                     +------+  -> 1

1.2 An example of virtio-blk read(8K) command:
Initiator side sends a stream buffer(command + 4 * desc + 16 bytes):
  COMMAND            +------+
                     |opcode|  ->  virtio_of_op_vring
                     +------+
                     |cmd id|  ->  14
                     +------+
                     |length|  ->  16 (virtio blk read command)
                     +------+
                     |ndesc |  ->  4
                     +------+
                     |rsvd  |
                     +------+

  DESC0              +------+
                   +-|addr  |  -> 0
                   | +------+
                   | |length|  -> 16
                   | +------+
                   | |id    |  -> 14
                   | +------+
                   | |flags |  -> VRING_DESC_F_NEXT
                   | +------+
                   |
  DESC1            | +------+
                   | |addr  |  -> 16
                   | +------+
                   | |length|  -> 4096
                   | +------+
                   | |id    |  -> 15
                   | +------+
                   | |flags |  -> VRING_DESC_F_NEXT | VRING_DESC_F_WRITE
                   | +------+
                   |
  DESC2            | +------+
                   | |addr  |  -> 4112
                   | +------+
                   | |length|  -> 4096
                   | +------+
                   | |id    |  -> 16
                   | +------+
                   | |flags |  -> VRING_DESC_F_NEXT | VRING_DESC_F_WRITE
                   | +------+
                   |
  DESC3            | +------+
                   | |addr  |  -> 0
                   | +------+
                   | |length|  -> 1
                   | +------+
                   | |id    |  -> 17
                   | +------+
                   | |flags |  -> VRING_DESC_F_WRITE
                   | +------+
                   |
  DATA             +>+------+  -> 0
                     |......|
                     +------+  -> 16

Target side sends a stream buffer(completion + 3 * desc + 8193 bytes):
  COMPLETION         +------+
                     |status|  ->  VIRTIO_OF_SUCCESS
                     +------+
                     |cmd id|  ->  14
                     +------+
                     |ndesc |  ->  3
                     +------+
                     |rsvd  |
                     +------+
                     |value |  -> 8193 (value.u32)
                     +------+

  DESC0              +------+
               +-----|addr  |  -> 0
               |     +------+
               |     |length|  -> 4096
               |     +------+
               |     |id    |  -> 15
               |     +------+
               |     |flags |  -> VRING_DESC_F_NEXT | VRING_DESC_F_WRITE
               |     +------+
               |
  DESC1        |     +------+
               | +---|addr  |  -> 4096
               | |   +------+
               | |   |length|  -> 4096
               | |   +------+
               | |   |id    |  -> 16
               | |   +------+
               | |   |flags |  -> VRING_DESC_F_NEXT | VRING_DESC_F_WRITE
               | |   +------+
               | |
  DESC2        | |   +------+
               | |   |addr  |  -> 8192
               | |   +------+
               | | +-|length|  -> 1
               | | | +------+
               | | | |id    |  -> 17
               | | | +------+
               | | | |flags |  -> VRING_DESC_F_WRITE
               | | | +------+
               | | |
  DATA         +-+-+>+------+  -> 0
                 | | |......|
                 +-+>+------+  -> 4096
                   | |......|
                   +>+------+  -> 8192
                     |......|
                     +------+  -> 8193

1. Virtio Over RDMA
2.1 An example of virtio-blk write(8K) command:
Initiator side sends a message (command + 4 * desc) by RDMA POST SEND:
  COMMAND            +------+
                     |opcode|  ->  virtio_of_op_vring
                     +------+
                     |cmd id|  ->  10
                     +------+
                     |length|  ->  0
                     +------+
                     |ndesc |  ->  4
                     +------+
                     |rsvd  |
                     +------+

  DESC0              +------+
                     |addr  |  -> 0xffff012345670000
                     +------+
                     |length|  -> 16 (virtio blk write command)
                     +------+
                     |id    |  -> 10
                     +------+
                     |flags |  -> VRING_DESC_F_NEXT
                     +------+
                     |key   |  -> 0x1234
                     +------+

  DESC1              +------+
                     |addr  |  -> 0xffff012345671000
                     +------+
                     |length|  -> 4096
                     +------+
                     |id    |  -> 11
                     +------+
                     |flags |  -> VRING_DESC_F_NEXT
                     +------+
                     |key   |  -> 0x1236
                     +------+

  DESC2              +------+
                     |addr  |  -> 0xffff012345673000
                     +------+
                     |length|  -> 4096
                     +------+
                     |id    |  -> 12
                     +------+
                     |flags |  -> VRING_DESC_F_NEXT
                     +------+
                     |key   |  -> 0x1238
                     +------+

  DESC3              +------+
                     |addr  |  -> 0xffff012345677000
                     +------+
                     |length|  -> 1
                     +------+
                     |id    |  -> 13
                     +------+
                     |flags |  -> VRING_DESC_F_WRITE
                     +------+
                     |key   |  -> 0x1239
                     +------+

Target side reads the remote address of DESC0/DESC1/DESC2 by RDMA POST 
READ, and writes the remote address of DESC3 by RDMA POST WRITE, sends a 
completion by POST SEND:
  COMPLETION         +------+
                     |status|  ->  VIRTIO_OF_SUCCESS
                     +------+
                     |cmd id|  ->  10
                     +------+
                     |ndesc |  ->  0
                     +------+
                     |rsvd  |
                     +------+
                     |value |  -> 1 (value.u32)
                     +------+

2.2 An example of virtio-blk read(8K) command:
This is quite similar to 2.1 except flags in DESC1/DESC2, target side 
reads the remote address of DESC0 by RDMA POST READ, and writes the 
remote address of DESC1/DESC2/DESC3 by RDMA POST WRITE, sends a 
completion by POST SEND.

- Example
I develop an kernel initiator(unstable, WIP version, currently TCP/RDMA 
supported):
https://github.com/pizhenwei/linux/tree/virtio-of-github

And a target(unstable, WIP version, currently blk/crypto/rng supported):
https://github.com/pizhenwei/virtio-target/tree/WIP

Run target firstly: ~# ./vtgt vtgt.conf
Then install kernel modules in initiator side:
  ~# insmod ./virtio_fabrics.ko
  ~# insmod ./virtio_tcp.ko
  ~# insmod ./virtio_rdma.ko

Create a virtio-blk device over TCP by command:
  ~# echo 
command=create,transport=tcp,taddr=192.168.122.1,tport=15771,tvqn=virtio-target/block/block0.service,iaddr=192.168.122.1,iport=0,ivqn=vqn.uuid:42761df9-4c3f-4b27-843d-c88d1dcdce32 
 > /dev/virtio-fabrics

Or create a virtio-crypto device over RDMA by command:
  ~# echo 
command=create,transport=rdma,taddr=192.168.122.1,tport=15771,tvqn=virtio-target/crypto/crypto0.service,iaddr=192.168.122.1,iport=0,ivqn=vqn.uuid:42761df9-4c3f-4b27-843d-c88d1dcdce32 
 > /dev/virtio-fabrics

Or destroy a virtio-of device by command:
  ~# echo 
command=destroy,transport=tcp,taddr=192.168.122.1,tport=15771,tvqn=vqn.uuid:2d5130d8-36d5-4fe8-ae55-48ea51e0391a,iaddr=192.168.122.1,ivqn=vqn.uuid:42761df9-4c3f-4b27-843d-c88d1dcdce32 
 > /dev/virtio-fabrics

-- 
zhenwei pi

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
  2023-04-23 11:29 [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA) zhenwei pi
@ 2023-04-24  3:40   ` Jason Wang
  0 siblings, 0 replies; 29+ messages in thread
From: Jason Wang @ 2023-04-24  3:40 UTC (permalink / raw)
  To: zhenwei pi
  Cc: Michael S . Tsirkin, Cornelia Huck, parav, virtio-dev,
	virtio-comment, helei.sig11, houp

On Sun, Apr 23, 2023 at 7:31 PM zhenwei pi <pizhenwei@bytedance.com> wrote:
>
> Hi,
>
> In the past years, virtio supports lots of device specifications by
> PCI/MMIO/CCW. These devices work fine in the virtualization environment,
> and we have a chance to support virtio device family for the
> container/host scenario.

PCI can work for containers for sure (or does it meet any issue like
scalability?). It's better to describe what problems you met and why
you choose this way to solve it.

It's better to compare this with

1) hiding the fabrics details via DPU
2) vDPA

>
> - Theory
> "Virtio Over Fabrics" aims at "reuse virtio device specifications", and
> provides network defined peripheral devices.
> And this protocol also could be used in virtualization environment,
> typically hypervisor(or vhost-user process) handles request from virtio
> PCI/MMIO/CCW, remaps request and forwards to target by fabrics.

This requires meditation in the datapath, isn't it?

>
> - Protocol
> The detail protocol definition see:
> https://github.com/pizhenwei/linux/blob/virtio-of-github/include/uapi/linux/virtio_of.h

I'd say a RFC patch for virtio spec is more suitable than the codes.

>
> Example of virtio-blk read/write by TCP/RDMA:
> 1. Virtio Over TCP
> 1.1 An example of virtio-blk write(8K) command:
> Initiator side sends a stream buffer(command + 4 * desc + 8208 bytes):
>   COMMAND            +------+
>                      |opcode|  ->  virtio_of_op_vring
>                      +------+
>                      |cmd id|  ->  10
>                      +------+
>                      |length|  ->  8208
>                      +------+
>                      |ndesc |  ->  4
>                      +------+
>                      |rsvd  |
>                      +------+
>
>   DESC0              +------+
>                +-----|addr  |  -> 0
>                |     +------+
>                |     |length|  -> 16 (virtio blk write command)
>                |     +------+
>                |     |id    |  -> 10
>                |     +------+
>                |     |flags |  -> VRING_DESC_F_NEXT
>                |     +------+
>                |
>   DESC1        |     +------+
>                | +---|addr  |  -> 16
>                | |   +------+
>                | |   |length|  -> 4096
>                | |   +------+
>                | |   |id    |  -> 11
>                | |   +------+
>                | |   |flags |  -> VRING_DESC_F_NEXT
>                | |   +------+
>                | |
>   DESC2        | |   +------+
>                | |   |addr  |  -> 4112
>                | |   +------+
>                | | +-|length|  -> 4096
>                | | | +------+
>                | | | |id    |  -> 12
>                | | | +------+
>                | | | |flags |  -> VRING_DESC_F_NEXT
>                | | | +------+
>                | | |
>   DESC3        | | | +------+
>                | | | |addr  |  -> 0
>                | | | +------+
>                | | | |length|  -> 1
>                | | | +------+
>                | | | |id    |  -> 13
>                | | | +------+
>                | | | |flags |  -> VRING_DESC_F_WRITE
>                | | | +------+
>                | | |
>   DATA         +-+-+>+------+  -> 0
>                  | | |......|
>                  +-+>+------+  -> 16
>                    | |......|
>                    +>+------+  -> 4112
>                      |......|
>                      +------+  -> 8208
>
> Target side sends a stream buffer(completion + 1 * desc + 1 bytes):
>   COMPLETION         +------+
>                      |status|  ->  VIRTIO_OF_SUCCESS
>                      +------+
>                      |cmd id|  ->  10
>                      +------+
>                      |ndesc |  ->  1
>                      +------+
>                      |rsvd  |
>                      +------+
>                      |value |  -> 1 (value.u32)
>                      +------+
>
>   DESC0              +------+
>                    +-|addr  |  -> 0
>                    | +------+
>                    | |length|  -> 1
>                    | +------+
>                    | |id    |  -> 13
>                    | +------+
>                    | |flags |  -> VRING_DESC_F_WRITE
>                    | +------+
>                    |
>   DATA             |>+------+  -> 0
>                      |......|
>                      +------+  -> 1
>
> 1.2 An example of virtio-blk read(8K) command:
> Initiator side sends a stream buffer(command + 4 * desc + 16 bytes):
>   COMMAND            +------+
>                      |opcode|  ->  virtio_of_op_vring
>                      +------+
>                      |cmd id|  ->  14
>                      +------+
>                      |length|  ->  16 (virtio blk read command)
>                      +------+
>                      |ndesc |  ->  4
>                      +------+
>                      |rsvd  |
>                      +------+
>
>   DESC0              +------+
>                    +-|addr  |  -> 0
>                    | +------+
>                    | |length|  -> 16
>                    | +------+
>                    | |id    |  -> 14
>                    | +------+
>                    | |flags |  -> VRING_DESC_F_NEXT
>                    | +------+
>                    |
>   DESC1            | +------+
>                    | |addr  |  -> 16
>                    | +------+
>                    | |length|  -> 4096
>                    | +------+
>                    | |id    |  -> 15
>                    | +------+
>                    | |flags |  -> VRING_DESC_F_NEXT | VRING_DESC_F_WRITE
>                    | +------+
>                    |
>   DESC2            | +------+
>                    | |addr  |  -> 4112
>                    | +------+
>                    | |length|  -> 4096
>                    | +------+
>                    | |id    |  -> 16
>                    | +------+
>                    | |flags |  -> VRING_DESC_F_NEXT | VRING_DESC_F_WRITE
>                    | +------+
>                    |
>   DESC3            | +------+
>                    | |addr  |  -> 0
>                    | +------+
>                    | |length|  -> 1
>                    | +------+
>                    | |id    |  -> 17
>                    | +------+
>                    | |flags |  -> VRING_DESC_F_WRITE
>                    | +------+
>                    |
>   DATA             +>+------+  -> 0
>                      |......|
>                      +------+  -> 16
>
> Target side sends a stream buffer(completion + 3 * desc + 8193 bytes):
>   COMPLETION         +------+
>                      |status|  ->  VIRTIO_OF_SUCCESS
>                      +------+
>                      |cmd id|  ->  14
>                      +------+
>                      |ndesc |  ->  3
>                      +------+
>                      |rsvd  |
>                      +------+
>                      |value |  -> 8193 (value.u32)
>                      +------+
>
>   DESC0              +------+
>                +-----|addr  |  -> 0
>                |     +------+
>                |     |length|  -> 4096
>                |     +------+
>                |     |id    |  -> 15
>                |     +------+
>                |     |flags |  -> VRING_DESC_F_NEXT | VRING_DESC_F_WRITE
>                |     +------+
>                |
>   DESC1        |     +------+
>                | +---|addr  |  -> 4096
>                | |   +------+
>                | |   |length|  -> 4096
>                | |   +------+
>                | |   |id    |  -> 16
>                | |   +------+
>                | |   |flags |  -> VRING_DESC_F_NEXT | VRING_DESC_F_WRITE
>                | |   +------+
>                | |
>   DESC2        | |   +------+
>                | |   |addr  |  -> 8192
>                | |   +------+
>                | | +-|length|  -> 1
>                | | | +------+
>                | | | |id    |  -> 17
>                | | | +------+
>                | | | |flags |  -> VRING_DESC_F_WRITE
>                | | | +------+
>                | | |
>   DATA         +-+-+>+------+  -> 0
>                  | | |......|
>                  +-+>+------+  -> 4096
>                    | |......|
>                    +>+------+  -> 8192
>                      |......|
>                      +------+  -> 8193
>
> 1. Virtio Over RDMA
> 2.1 An example of virtio-blk write(8K) command:
> Initiator side sends a message (command + 4 * desc) by RDMA POST SEND:
>   COMMAND            +------+
>                      |opcode|  ->  virtio_of_op_vring
>                      +------+
>                      |cmd id|  ->  10
>                      +------+
>                      |length|  ->  0
>                      +------+
>                      |ndesc |  ->  4
>                      +------+
>                      |rsvd  |
>                      +------+
>
>   DESC0              +------+
>                      |addr  |  -> 0xffff012345670000
>                      +------+
>                      |length|  -> 16 (virtio blk write command)
>                      +------+
>                      |id    |  -> 10
>                      +------+
>                      |flags |  -> VRING_DESC_F_NEXT
>                      +------+
>                      |key   |  -> 0x1234
>                      +------+
>
>   DESC1              +------+
>                      |addr  |  -> 0xffff012345671000
>                      +------+
>                      |length|  -> 4096
>                      +------+
>                      |id    |  -> 11
>                      +------+
>                      |flags |  -> VRING_DESC_F_NEXT
>                      +------+
>                      |key   |  -> 0x1236
>                      +------+
>
>   DESC2              +------+
>                      |addr  |  -> 0xffff012345673000
>                      +------+
>                      |length|  -> 4096
>                      +------+
>                      |id    |  -> 12
>                      +------+
>                      |flags |  -> VRING_DESC_F_NEXT
>                      +------+
>                      |key   |  -> 0x1238
>                      +------+
>
>   DESC3              +------+
>                      |addr  |  -> 0xffff012345677000
>                      +------+
>                      |length|  -> 1
>                      +------+
>                      |id    |  -> 13
>                      +------+
>                      |flags |  -> VRING_DESC_F_WRITE
>                      +------+
>                      |key   |  -> 0x1239
>                      +------+
>
> Target side reads the remote address of DESC0/DESC1/DESC2 by RDMA POST
> READ, and writes the remote address of DESC3 by RDMA POST WRITE, sends a
> completion by POST SEND:
>   COMPLETION         +------+
>                      |status|  ->  VIRTIO_OF_SUCCESS
>                      +------+
>                      |cmd id|  ->  10
>                      +------+
>                      |ndesc |  ->  0
>                      +------+
>                      |rsvd  |
>                      +------+
>                      |value |  -> 1 (value.u32)
>                      +------+
>
> 2.2 An example of virtio-blk read(8K) command:
> This is quite similar to 2.1 except flags in DESC1/DESC2, target side
> reads the remote address of DESC0 by RDMA POST READ, and writes the
> remote address of DESC1/DESC2/DESC3 by RDMA POST WRITE, sends a
> completion by POST SEND.
>
> - Example
> I develop an kernel initiator(unstable, WIP version, currently TCP/RDMA
> supported):
> https://github.com/pizhenwei/linux/tree/virtio-of-github

A quick glance at the code told me it's a mediation layer that convert
descriptors in the vring to the fabric specific packet. This is the
vDPA way.

If we agree virtio of fabic is useful, we need invent facilities to
allow building packet directly without bothering the virtqueue (the
API is layout independent anyhow).

Thanks

>
> And a target(unstable, WIP version, currently blk/crypto/rng supported):
> https://github.com/pizhenwei/virtio-target/tree/WIP
>
> Run target firstly: ~# ./vtgt vtgt.conf
> Then install kernel modules in initiator side:
>   ~# insmod ./virtio_fabrics.ko
>   ~# insmod ./virtio_tcp.ko
>   ~# insmod ./virtio_rdma.ko
>
> Create a virtio-blk device over TCP by command:
>   ~# echo
> command=create,transport=tcp,taddr=192.168.122.1,tport=15771,tvqn=virtio-target/block/block0.service,iaddr=192.168.122.1,iport=0,ivqn=vqn.uuid:42761df9-4c3f-4b27-843d-c88d1dcdce32
>  > /dev/virtio-fabrics
>
> Or create a virtio-crypto device over RDMA by command:
>   ~# echo
> command=create,transport=rdma,taddr=192.168.122.1,tport=15771,tvqn=virtio-target/crypto/crypto0.service,iaddr=192.168.122.1,iport=0,ivqn=vqn.uuid:42761df9-4c3f-4b27-843d-c88d1dcdce32
>  > /dev/virtio-fabrics
>
> Or destroy a virtio-of device by command:
>   ~# echo
> command=destroy,transport=tcp,taddr=192.168.122.1,tport=15771,tvqn=vqn.uuid:2d5130d8-36d5-4fe8-ae55-48ea51e0391a,iaddr=192.168.122.1,ivqn=vqn.uuid:42761df9-4c3f-4b27-843d-c88d1dcdce32
>  > /dev/virtio-fabrics
>
> --
> zhenwei pi
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
@ 2023-04-24  3:40   ` Jason Wang
  0 siblings, 0 replies; 29+ messages in thread
From: Jason Wang @ 2023-04-24  3:40 UTC (permalink / raw)
  To: zhenwei pi
  Cc: Michael S . Tsirkin, Cornelia Huck, parav, virtio-dev,
	virtio-comment, helei.sig11, houp

On Sun, Apr 23, 2023 at 7:31 PM zhenwei pi <pizhenwei@bytedance.com> wrote:
>
> Hi,
>
> In the past years, virtio supports lots of device specifications by
> PCI/MMIO/CCW. These devices work fine in the virtualization environment,
> and we have a chance to support virtio device family for the
> container/host scenario.

PCI can work for containers for sure (or does it meet any issue like
scalability?). It's better to describe what problems you met and why
you choose this way to solve it.

It's better to compare this with

1) hiding the fabrics details via DPU
2) vDPA

>
> - Theory
> "Virtio Over Fabrics" aims at "reuse virtio device specifications", and
> provides network defined peripheral devices.
> And this protocol also could be used in virtualization environment,
> typically hypervisor(or vhost-user process) handles request from virtio
> PCI/MMIO/CCW, remaps request and forwards to target by fabrics.

This requires meditation in the datapath, isn't it?

>
> - Protocol
> The detail protocol definition see:
> https://github.com/pizhenwei/linux/blob/virtio-of-github/include/uapi/linux/virtio_of.h

I'd say a RFC patch for virtio spec is more suitable than the codes.

>
> Example of virtio-blk read/write by TCP/RDMA:
> 1. Virtio Over TCP
> 1.1 An example of virtio-blk write(8K) command:
> Initiator side sends a stream buffer(command + 4 * desc + 8208 bytes):
>   COMMAND            +------+
>                      |opcode|  ->  virtio_of_op_vring
>                      +------+
>                      |cmd id|  ->  10
>                      +------+
>                      |length|  ->  8208
>                      +------+
>                      |ndesc |  ->  4
>                      +------+
>                      |rsvd  |
>                      +------+
>
>   DESC0              +------+
>                +-----|addr  |  -> 0
>                |     +------+
>                |     |length|  -> 16 (virtio blk write command)
>                |     +------+
>                |     |id    |  -> 10
>                |     +------+
>                |     |flags |  -> VRING_DESC_F_NEXT
>                |     +------+
>                |
>   DESC1        |     +------+
>                | +---|addr  |  -> 16
>                | |   +------+
>                | |   |length|  -> 4096
>                | |   +------+
>                | |   |id    |  -> 11
>                | |   +------+
>                | |   |flags |  -> VRING_DESC_F_NEXT
>                | |   +------+
>                | |
>   DESC2        | |   +------+
>                | |   |addr  |  -> 4112
>                | |   +------+
>                | | +-|length|  -> 4096
>                | | | +------+
>                | | | |id    |  -> 12
>                | | | +------+
>                | | | |flags |  -> VRING_DESC_F_NEXT
>                | | | +------+
>                | | |
>   DESC3        | | | +------+
>                | | | |addr  |  -> 0
>                | | | +------+
>                | | | |length|  -> 1
>                | | | +------+
>                | | | |id    |  -> 13
>                | | | +------+
>                | | | |flags |  -> VRING_DESC_F_WRITE
>                | | | +------+
>                | | |
>   DATA         +-+-+>+------+  -> 0
>                  | | |......|
>                  +-+>+------+  -> 16
>                    | |......|
>                    +>+------+  -> 4112
>                      |......|
>                      +------+  -> 8208
>
> Target side sends a stream buffer(completion + 1 * desc + 1 bytes):
>   COMPLETION         +------+
>                      |status|  ->  VIRTIO_OF_SUCCESS
>                      +------+
>                      |cmd id|  ->  10
>                      +------+
>                      |ndesc |  ->  1
>                      +------+
>                      |rsvd  |
>                      +------+
>                      |value |  -> 1 (value.u32)
>                      +------+
>
>   DESC0              +------+
>                    +-|addr  |  -> 0
>                    | +------+
>                    | |length|  -> 1
>                    | +------+
>                    | |id    |  -> 13
>                    | +------+
>                    | |flags |  -> VRING_DESC_F_WRITE
>                    | +------+
>                    |
>   DATA             |>+------+  -> 0
>                      |......|
>                      +------+  -> 1
>
> 1.2 An example of virtio-blk read(8K) command:
> Initiator side sends a stream buffer(command + 4 * desc + 16 bytes):
>   COMMAND            +------+
>                      |opcode|  ->  virtio_of_op_vring
>                      +------+
>                      |cmd id|  ->  14
>                      +------+
>                      |length|  ->  16 (virtio blk read command)
>                      +------+
>                      |ndesc |  ->  4
>                      +------+
>                      |rsvd  |
>                      +------+
>
>   DESC0              +------+
>                    +-|addr  |  -> 0
>                    | +------+
>                    | |length|  -> 16
>                    | +------+
>                    | |id    |  -> 14
>                    | +------+
>                    | |flags |  -> VRING_DESC_F_NEXT
>                    | +------+
>                    |
>   DESC1            | +------+
>                    | |addr  |  -> 16
>                    | +------+
>                    | |length|  -> 4096
>                    | +------+
>                    | |id    |  -> 15
>                    | +------+
>                    | |flags |  -> VRING_DESC_F_NEXT | VRING_DESC_F_WRITE
>                    | +------+
>                    |
>   DESC2            | +------+
>                    | |addr  |  -> 4112
>                    | +------+
>                    | |length|  -> 4096
>                    | +------+
>                    | |id    |  -> 16
>                    | +------+
>                    | |flags |  -> VRING_DESC_F_NEXT | VRING_DESC_F_WRITE
>                    | +------+
>                    |
>   DESC3            | +------+
>                    | |addr  |  -> 0
>                    | +------+
>                    | |length|  -> 1
>                    | +------+
>                    | |id    |  -> 17
>                    | +------+
>                    | |flags |  -> VRING_DESC_F_WRITE
>                    | +------+
>                    |
>   DATA             +>+------+  -> 0
>                      |......|
>                      +------+  -> 16
>
> Target side sends a stream buffer(completion + 3 * desc + 8193 bytes):
>   COMPLETION         +------+
>                      |status|  ->  VIRTIO_OF_SUCCESS
>                      +------+
>                      |cmd id|  ->  14
>                      +------+
>                      |ndesc |  ->  3
>                      +------+
>                      |rsvd  |
>                      +------+
>                      |value |  -> 8193 (value.u32)
>                      +------+
>
>   DESC0              +------+
>                +-----|addr  |  -> 0
>                |     +------+
>                |     |length|  -> 4096
>                |     +------+
>                |     |id    |  -> 15
>                |     +------+
>                |     |flags |  -> VRING_DESC_F_NEXT | VRING_DESC_F_WRITE
>                |     +------+
>                |
>   DESC1        |     +------+
>                | +---|addr  |  -> 4096
>                | |   +------+
>                | |   |length|  -> 4096
>                | |   +------+
>                | |   |id    |  -> 16
>                | |   +------+
>                | |   |flags |  -> VRING_DESC_F_NEXT | VRING_DESC_F_WRITE
>                | |   +------+
>                | |
>   DESC2        | |   +------+
>                | |   |addr  |  -> 8192
>                | |   +------+
>                | | +-|length|  -> 1
>                | | | +------+
>                | | | |id    |  -> 17
>                | | | +------+
>                | | | |flags |  -> VRING_DESC_F_WRITE
>                | | | +------+
>                | | |
>   DATA         +-+-+>+------+  -> 0
>                  | | |......|
>                  +-+>+------+  -> 4096
>                    | |......|
>                    +>+------+  -> 8192
>                      |......|
>                      +------+  -> 8193
>
> 1. Virtio Over RDMA
> 2.1 An example of virtio-blk write(8K) command:
> Initiator side sends a message (command + 4 * desc) by RDMA POST SEND:
>   COMMAND            +------+
>                      |opcode|  ->  virtio_of_op_vring
>                      +------+
>                      |cmd id|  ->  10
>                      +------+
>                      |length|  ->  0
>                      +------+
>                      |ndesc |  ->  4
>                      +------+
>                      |rsvd  |
>                      +------+
>
>   DESC0              +------+
>                      |addr  |  -> 0xffff012345670000
>                      +------+
>                      |length|  -> 16 (virtio blk write command)
>                      +------+
>                      |id    |  -> 10
>                      +------+
>                      |flags |  -> VRING_DESC_F_NEXT
>                      +------+
>                      |key   |  -> 0x1234
>                      +------+
>
>   DESC1              +------+
>                      |addr  |  -> 0xffff012345671000
>                      +------+
>                      |length|  -> 4096
>                      +------+
>                      |id    |  -> 11
>                      +------+
>                      |flags |  -> VRING_DESC_F_NEXT
>                      +------+
>                      |key   |  -> 0x1236
>                      +------+
>
>   DESC2              +------+
>                      |addr  |  -> 0xffff012345673000
>                      +------+
>                      |length|  -> 4096
>                      +------+
>                      |id    |  -> 12
>                      +------+
>                      |flags |  -> VRING_DESC_F_NEXT
>                      +------+
>                      |key   |  -> 0x1238
>                      +------+
>
>   DESC3              +------+
>                      |addr  |  -> 0xffff012345677000
>                      +------+
>                      |length|  -> 1
>                      +------+
>                      |id    |  -> 13
>                      +------+
>                      |flags |  -> VRING_DESC_F_WRITE
>                      +------+
>                      |key   |  -> 0x1239
>                      +------+
>
> Target side reads the remote address of DESC0/DESC1/DESC2 by RDMA POST
> READ, and writes the remote address of DESC3 by RDMA POST WRITE, sends a
> completion by POST SEND:
>   COMPLETION         +------+
>                      |status|  ->  VIRTIO_OF_SUCCESS
>                      +------+
>                      |cmd id|  ->  10
>                      +------+
>                      |ndesc |  ->  0
>                      +------+
>                      |rsvd  |
>                      +------+
>                      |value |  -> 1 (value.u32)
>                      +------+
>
> 2.2 An example of virtio-blk read(8K) command:
> This is quite similar to 2.1 except flags in DESC1/DESC2, target side
> reads the remote address of DESC0 by RDMA POST READ, and writes the
> remote address of DESC1/DESC2/DESC3 by RDMA POST WRITE, sends a
> completion by POST SEND.
>
> - Example
> I develop an kernel initiator(unstable, WIP version, currently TCP/RDMA
> supported):
> https://github.com/pizhenwei/linux/tree/virtio-of-github

A quick glance at the code told me it's a mediation layer that convert
descriptors in the vring to the fabric specific packet. This is the
vDPA way.

If we agree virtio of fabic is useful, we need invent facilities to
allow building packet directly without bothering the virtqueue (the
API is layout independent anyhow).

Thanks

>
> And a target(unstable, WIP version, currently blk/crypto/rng supported):
> https://github.com/pizhenwei/virtio-target/tree/WIP
>
> Run target firstly: ~# ./vtgt vtgt.conf
> Then install kernel modules in initiator side:
>   ~# insmod ./virtio_fabrics.ko
>   ~# insmod ./virtio_tcp.ko
>   ~# insmod ./virtio_rdma.ko
>
> Create a virtio-blk device over TCP by command:
>   ~# echo
> command=create,transport=tcp,taddr=192.168.122.1,tport=15771,tvqn=virtio-target/block/block0.service,iaddr=192.168.122.1,iport=0,ivqn=vqn.uuid:42761df9-4c3f-4b27-843d-c88d1dcdce32
>  > /dev/virtio-fabrics
>
> Or create a virtio-crypto device over RDMA by command:
>   ~# echo
> command=create,transport=rdma,taddr=192.168.122.1,tport=15771,tvqn=virtio-target/crypto/crypto0.service,iaddr=192.168.122.1,iport=0,ivqn=vqn.uuid:42761df9-4c3f-4b27-843d-c88d1dcdce32
>  > /dev/virtio-fabrics
>
> Or destroy a virtio-of device by command:
>   ~# echo
> command=destroy,transport=tcp,taddr=192.168.122.1,tport=15771,tvqn=vqn.uuid:2d5130d8-36d5-4fe8-ae55-48ea51e0391a,iaddr=192.168.122.1,ivqn=vqn.uuid:42761df9-4c3f-4b27-843d-c88d1dcdce32
>  > /dev/virtio-fabrics
>
> --
> zhenwei pi
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
  2023-04-24  3:40   ` Jason Wang
  (?)
@ 2023-04-24 13:38   ` zhenwei pi
  2023-04-25  5:03       ` Parav Pandit
  2023-04-25  6:36       ` Jason Wang
  -1 siblings, 2 replies; 29+ messages in thread
From: zhenwei pi @ 2023-04-24 13:38 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S . Tsirkin, Cornelia Huck, parav, virtio-dev,
	virtio-comment, helei.sig11, houp



On 4/24/23 11:40, Jason Wang wrote:
> On Sun, Apr 23, 2023 at 7:31 PM zhenwei pi <pizhenwei@bytedance.com> wrote:
>>
>> Hi,
>>
>> In the past years, virtio supports lots of device specifications by
>> PCI/MMIO/CCW. These devices work fine in the virtualization environment,
>> and we have a chance to support virtio device family for the
>> container/host scenario.
> 
> PCI can work for containers for sure (or does it meet any issue like
> scalability?). It's better to describe what problems you met and why
> you choose this way to solve it.
> 
> It's better to compare this with
> 
> 1) hiding the fabrics details via DPU
> 2) vDPA
> 
Hi,

Sorry, I missed this part. "Network defined peripheral devices of virtio 
family" is the main purpose of this proposal, this allows us to use many 
types of remote resources which are provided by virtio target.

 From the point of my view, there are 3 cases:
1, Host/container scenario. For example, host kernel connects a virtio 
target block service, maps it as a vdx(virtio-blk) device(used by 
Map-Reduce service which needs a fast/large size disk). The host kernel 
also connects a virtio target crypto service, maps it as virtio crypto 
device(used by nginx to accelarate HTTPS). And so on.

         +----------+    +----------+       +----------+
         |Map-Reduce|    |   nginx  |  ...  | processes|
         +----------+    +----------+       +----------+
------------------------------------------------------------
Host         |               |                  |
Kernel   +-------+       +-------+          +-------+
          | ext4  |       | LKCF  |          | HWRNG |
          +-------+       +-------+          +-------+
              |               |                  |
          +-------+       +-------+          +-------+
          |  vdx  |       |vCrypto|          | vRNG  |
          +-------+       +-------+          +-------+
              |               |                  |
              |           +--------+             |
              +---------->|TCP/RDMA|<------------+
                          +--------+
                              |
                          +------+
                          |NIC/IB|
                          +------+
                              |                      +-------------+
                              +--------------------->|virtio target|
                                                     +-------------+

2, Typical virtualization environment. The workloads run in a guest, and 
QEMU handles virtio-pci(or MMIO), and forwards requests to target.
         +----------+    +----------+       +----------+
         |Map-Reduce|    |   nginx  |  ...  | processes|
         +----------+    +----------+       +----------+
------------------------------------------------------------
Guest        |               |                  |
Kernel   +-------+       +-------+          +-------+
          | ext4  |       | LKCF  |          | HWRNG |
          +-------+       +-------+          +-------+
              |               |                  |
          +-------+       +-------+          +-------+
          |  vdx  |       |vCrypto|          | vRNG  |
          +-------+       +-------+          +-------+
              |               |                  |
PCI --------------------------------------------------------
                              |
QEMU                 +--------------+
                      |virtio backend|
                      +--------------+
                              |
                          +------+
                          |NIC/IB|
                          +------+
                              |                      +-------------+
                              +--------------------->|virtio target|
                                                     +-------------+

3, SmartNIC/DPU/vDPA environment. It's possible to convert virtio-pci 
request to virtio-of request by hardware, and forward request to virtio 
target directly.
         +----------+    +----------+       +----------+
         |Map-Reduce|    |   nginx  |  ...  | processes|
         +----------+    +----------+       +----------+
------------------------------------------------------------
Host         |               |                  |
Kernel   +-------+       +-------+          +-------+
          | ext4  |       | LKCF  |          | HWRNG |
          +-------+       +-------+          +-------+
              |               |                  |
          +-------+       +-------+          +-------+
          |  vdx  |       |vCrypto|          | vRNG  |
          +-------+       +-------+          +-------+
              |               |                  |
PCI --------------------------------------------------------
                              |
SmartNIC             +---------------+
                      |virtio HW queue|
                      +---------------+
                              |
                          +------+
                          |NIC/IB|
                          +------+
                              |                      +-------------+
                              +--------------------->|virtio target|
                                                     +-------------+

>>
>> - Theory
>> "Virtio Over Fabrics" aims at "reuse virtio device specifications", and
>> provides network defined peripheral devices.
>> And this protocol also could be used in virtualization environment,
>> typically hypervisor(or vhost-user process) handles request from virtio
>> PCI/MMIO/CCW, remaps request and forwards to target by fabrics.
> 
> This requires meditation in the datapath, isn't it?
> 
>>
>> - Protocol
>> The detail protocol definition see:
>> https://github.com/pizhenwei/linux/blob/virtio-of-github/include/uapi/linux/virtio_of.h
> 
> I'd say a RFC patch for virtio spec is more suitable than the codes.
> 

OK. I'll send a RFC patch for virtio spec later if this proposal is 
acceptable.

[...]

> 
> A quick glance at the code told me it's a mediation layer that convert
> descriptors in the vring to the fabric specific packet. This is the
> vDPA way.
> 
> If we agree virtio of fabic is useful, we need invent facilities to
> allow building packet directly without bothering the virtqueue (the
> API is layout independent anyhow).
> 
> Thanks
> 

This code describes the case 1[Host/container scenario], also proves 
this case works.
Create a virtqueue in the virtio fabric module, also emulate a 
"virtqueue backend" here, when uplayer kicks vring, the "backend" gets 
notified and builds packet to TCP/RDMA.

[...]

-- 
zhenwei pi

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
  2023-04-24 13:38   ` zhenwei pi
@ 2023-04-25  5:03       ` Parav Pandit
  2023-04-25  6:36       ` Jason Wang
  1 sibling, 0 replies; 29+ messages in thread
From: Parav Pandit @ 2023-04-25  5:03 UTC (permalink / raw)
  To: zhenwei pi, Jason Wang
  Cc: Michael S . Tsirkin, Cornelia Huck, virtio-dev, virtio-comment,
	helei.sig11, houp



On 4/24/2023 9:38 AM, zhenwei pi wrote:
> 
> 
>  From the point of my view, there are 3 cases:
> 1, Host/container scenario. For example, host kernel connects a virtio 
> target block service, maps it as a vdx(virtio-blk) device(used by 
> Map-Reduce service which needs a fast/large size disk). The host kernel 
> also connects a virtio target crypto service, maps it as virtio crypto 
> device(used by nginx to accelarate HTTPS). And so on.
> 
>          +----------+    +----------+       +----------+
>          |Map-Reduce|    |   nginx  |  ...  | processes|
>          +----------+    +----------+       +----------+
> ------------------------------------------------------------
> Host         |               |                  |
> Kernel   +-------+       +-------+          +-------+
>           | ext4  |       | LKCF  |          | HWRNG |
>           +-------+       +-------+          +-------+
>               |               |                  |
>           +-------+       +-------+          +-------+
>           |  vdx  |       |vCrypto|          | vRNG  |
>           +-------+       +-------+          +-------+
>               |               |                  |
>               |           +--------+             |
>               +---------->|TCP/RDMA|<------------+
>                           +--------+
>                               |
>                           +------+
>                           |NIC/IB|
>                           +------+
>                               |                      +-------------+
>                               +--------------------->|virtio target|
>                                                      +-------------+
> 
> 2, Typical virtualization environment. The workloads run in a guest, and 
> QEMU handles virtio-pci(or MMIO), and forwards requests to target.
>          +----------+    +----------+       +----------+
>          |Map-Reduce|    |   nginx  |  ...  | processes|
>          +----------+    +----------+       +----------+
> ------------------------------------------------------------
> Guest        |               |                  |
> Kernel   +-------+       +-------+          +-------+
>           | ext4  |       | LKCF  |          | HWRNG |
>           +-------+       +-------+          +-------+
>               |               |                  |
>           +-------+       +-------+          +-------+
>           |  vdx  |       |vCrypto|          | vRNG  |
>           +-------+       +-------+          +-------+
>               |               |                  |
> PCI --------------------------------------------------------
>                               |
> QEMU                 +--------------+
>                       |virtio backend|
>                       +--------------+
>                               |
>                           +------+
>                           |NIC/IB|
>                           +------+
>                               |                      +-------------+
>                               +--------------------->|virtio target|
>                                                      +-------------+
>
Example #3 enables to implement virtio backend over fabrics initiator in 
the user space, which is also a good use case.
It can be also be done in non native virtio backend.
More below.

> 3, SmartNIC/DPU/vDPA environment. It's possible to convert virtio-pci 
> request to virtio-of request by hardware, and forward request to virtio 
> target directly.
>          +----------+    +----------+       +----------+
>          |Map-Reduce|    |   nginx  |  ...  | processes|
>          +----------+    +----------+       +----------+
> ------------------------------------------------------------
> Host         |               |                  |
> Kernel   +-------+       +-------+          +-------+
>           | ext4  |       | LKCF  |          | HWRNG |
>           +-------+       +-------+          +-------+
>               |               |                  |
>           +-------+       +-------+          +-------+
>           |  vdx  |       |vCrypto|          | vRNG  |
>           +-------+       +-------+          +-------+
>               |               |                  |
> PCI --------------------------------------------------------
>                               |
> SmartNIC             +---------------+
>                       |virtio HW queue|
>                       +---------------+
>                               |
>                           +------+
>                           |NIC/IB|
>                           +------+
>                               |                      +-------------+
>                               +--------------------->|virtio target|
>                                                      +-------------+
> 
All 3 seems a valid use cases.

Use case 1 and 2 can be achieved directly without involving any 
mediation layer or any other translation layer (for example virtio to nfs).
Many blk and file protocols outside of the virtio already exists which 
achieve this. I don't see virtio being any different to support this in 
native manner, mainly the blk, fs, crypto device.

use case #3 brings additional a benefits at the same time different 
complexity but sure #3 is also a valid and common use case in our 
experiences.

In my experience working with FC, iSCSI, FCoE, NVMe RDMA fabrics, iSER,
A virito fabrics needs a lot of work to reach the scale, resiliency and 
lastly the security. (step by step...)

My humble suggestion is : pick one transport instead of all at once, 
rdma being most performant probably the first candidate to see the perf 
gain for use case #1 and #2 from remote system.

I briefly see your rdma command descriptor example, which is not aligned 
to 16B. Perf wise it will be poor than nvme rdma fabrics.

For PCI transport for net, we intent to start the work to improve 
descriptors, the transport binding for net device. From our research I 
see that some abstract virtio descriptors are great today, but if you 
want to get best out of the system (sw, hw, cpu), such abstraction is 
not the best. Sharing of "id" all the way to target and bring back is an 
example of such inefficiency in your example.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
@ 2023-04-25  5:03       ` Parav Pandit
  0 siblings, 0 replies; 29+ messages in thread
From: Parav Pandit @ 2023-04-25  5:03 UTC (permalink / raw)
  To: zhenwei pi, Jason Wang
  Cc: Michael S . Tsirkin, Cornelia Huck, virtio-dev, virtio-comment,
	helei.sig11, houp



On 4/24/2023 9:38 AM, zhenwei pi wrote:
> 
> 
>  From the point of my view, there are 3 cases:
> 1, Host/container scenario. For example, host kernel connects a virtio 
> target block service, maps it as a vdx(virtio-blk) device(used by 
> Map-Reduce service which needs a fast/large size disk). The host kernel 
> also connects a virtio target crypto service, maps it as virtio crypto 
> device(used by nginx to accelarate HTTPS). And so on.
> 
>          +----------+    +----------+       +----------+
>          |Map-Reduce|    |   nginx  |  ...  | processes|
>          +----------+    +----------+       +----------+
> ------------------------------------------------------------
> Host         |               |                  |
> Kernel   +-------+       +-------+          +-------+
>           | ext4  |       | LKCF  |          | HWRNG |
>           +-------+       +-------+          +-------+
>               |               |                  |
>           +-------+       +-------+          +-------+
>           |  vdx  |       |vCrypto|          | vRNG  |
>           +-------+       +-------+          +-------+
>               |               |                  |
>               |           +--------+             |
>               +---------->|TCP/RDMA|<------------+
>                           +--------+
>                               |
>                           +------+
>                           |NIC/IB|
>                           +------+
>                               |                      +-------------+
>                               +--------------------->|virtio target|
>                                                      +-------------+
> 
> 2, Typical virtualization environment. The workloads run in a guest, and 
> QEMU handles virtio-pci(or MMIO), and forwards requests to target.
>          +----------+    +----------+       +----------+
>          |Map-Reduce|    |   nginx  |  ...  | processes|
>          +----------+    +----------+       +----------+
> ------------------------------------------------------------
> Guest        |               |                  |
> Kernel   +-------+       +-------+          +-------+
>           | ext4  |       | LKCF  |          | HWRNG |
>           +-------+       +-------+          +-------+
>               |               |                  |
>           +-------+       +-------+          +-------+
>           |  vdx  |       |vCrypto|          | vRNG  |
>           +-------+       +-------+          +-------+
>               |               |                  |
> PCI --------------------------------------------------------
>                               |
> QEMU                 +--------------+
>                       |virtio backend|
>                       +--------------+
>                               |
>                           +------+
>                           |NIC/IB|
>                           +------+
>                               |                      +-------------+
>                               +--------------------->|virtio target|
>                                                      +-------------+
>
Example #3 enables to implement virtio backend over fabrics initiator in 
the user space, which is also a good use case.
It can be also be done in non native virtio backend.
More below.

> 3, SmartNIC/DPU/vDPA environment. It's possible to convert virtio-pci 
> request to virtio-of request by hardware, and forward request to virtio 
> target directly.
>          +----------+    +----------+       +----------+
>          |Map-Reduce|    |   nginx  |  ...  | processes|
>          +----------+    +----------+       +----------+
> ------------------------------------------------------------
> Host         |               |                  |
> Kernel   +-------+       +-------+          +-------+
>           | ext4  |       | LKCF  |          | HWRNG |
>           +-------+       +-------+          +-------+
>               |               |                  |
>           +-------+       +-------+          +-------+
>           |  vdx  |       |vCrypto|          | vRNG  |
>           +-------+       +-------+          +-------+
>               |               |                  |
> PCI --------------------------------------------------------
>                               |
> SmartNIC             +---------------+
>                       |virtio HW queue|
>                       +---------------+
>                               |
>                           +------+
>                           |NIC/IB|
>                           +------+
>                               |                      +-------------+
>                               +--------------------->|virtio target|
>                                                      +-------------+
> 
All 3 seems a valid use cases.

Use case 1 and 2 can be achieved directly without involving any 
mediation layer or any other translation layer (for example virtio to nfs).
Many blk and file protocols outside of the virtio already exists which 
achieve this. I don't see virtio being any different to support this in 
native manner, mainly the blk, fs, crypto device.

use case #3 brings additional a benefits at the same time different 
complexity but sure #3 is also a valid and common use case in our 
experiences.

In my experience working with FC, iSCSI, FCoE, NVMe RDMA fabrics, iSER,
A virito fabrics needs a lot of work to reach the scale, resiliency and 
lastly the security. (step by step...)

My humble suggestion is : pick one transport instead of all at once, 
rdma being most performant probably the first candidate to see the perf 
gain for use case #1 and #2 from remote system.

I briefly see your rdma command descriptor example, which is not aligned 
to 16B. Perf wise it will be poor than nvme rdma fabrics.

For PCI transport for net, we intent to start the work to improve 
descriptors, the transport binding for net device. From our research I 
see that some abstract virtio descriptors are great today, but if you 
want to get best out of the system (sw, hw, cpu), such abstraction is 
not the best. Sharing of "id" all the way to target and bring back is an 
example of such inefficiency in your example.

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
  2023-04-25  5:03       ` Parav Pandit
@ 2023-04-25  6:31         ` Jason Wang
  -1 siblings, 0 replies; 29+ messages in thread
From: Jason Wang @ 2023-04-25  6:31 UTC (permalink / raw)
  To: Parav Pandit, zhenwei pi
  Cc: Michael S . Tsirkin, Cornelia Huck, virtio-dev, virtio-comment,
	helei.sig11, houp


在 2023/4/25 13:03, Parav Pandit 写道:
>
>
> On 4/24/2023 9:38 AM, zhenwei pi wrote:
>>
>>
>>  From the point of my view, there are 3 cases:
>> 1, Host/container scenario. For example, host kernel connects a 
>> virtio target block service, maps it as a vdx(virtio-blk) device(used 
>> by Map-Reduce service which needs a fast/large size disk). The host 
>> kernel also connects a virtio target crypto service, maps it as 
>> virtio crypto device(used by nginx to accelarate HTTPS). And so on.
>>
>>          +----------+    +----------+       +----------+
>>          |Map-Reduce|    |   nginx  |  ...  | processes|
>>          +----------+    +----------+       +----------+
>> ------------------------------------------------------------
>> Host         |               |                  |
>> Kernel   +-------+       +-------+          +-------+
>>           | ext4  |       | LKCF  |          | HWRNG |
>>           +-------+       +-------+          +-------+
>>               |               |                  |
>>           +-------+       +-------+          +-------+
>>           |  vdx  |       |vCrypto|          | vRNG  |
>>           +-------+       +-------+          +-------+
>>               |               |                  |
>>               |           +--------+             |
>>               +---------->|TCP/RDMA|<------------+
>>                           +--------+
>>                               |
>>                           +------+
>>                           |NIC/IB|
>>                           +------+
>>                               | +-------------+
>>                               +--------------------->|virtio target|
>> +-------------+
>>
>> 2, Typical virtualization environment. The workloads run in a guest, 
>> and QEMU handles virtio-pci(or MMIO), and forwards requests to target.
>>          +----------+    +----------+       +----------+
>>          |Map-Reduce|    |   nginx  |  ...  | processes|
>>          +----------+    +----------+       +----------+
>> ------------------------------------------------------------
>> Guest        |               |                  |
>> Kernel   +-------+       +-------+          +-------+
>>           | ext4  |       | LKCF  |          | HWRNG |
>>           +-------+       +-------+          +-------+
>>               |               |                  |
>>           +-------+       +-------+          +-------+
>>           |  vdx  |       |vCrypto|          | vRNG  |
>>           +-------+       +-------+          +-------+
>>               |               |                  |
>> PCI --------------------------------------------------------
>>                               |
>> QEMU                 +--------------+
>>                       |virtio backend|
>>                       +--------------+
>>                               |
>>                           +------+
>>                           |NIC/IB|
>>                           +------+
>>                               | +-------------+
>>                               +--------------------->|virtio target|
>> +-------------+
>>
> Example #3 enables to implement virtio backend over fabrics initiator 
> in the user space, which is also a good use case.
> It can be also be done in non native virtio backend.
> More below.
>
>> 3, SmartNIC/DPU/vDPA environment. It's possible to convert virtio-pci 
>> request to virtio-of request by hardware, and forward request to 
>> virtio target directly.
>>          +----------+    +----------+       +----------+
>>          |Map-Reduce|    |   nginx  |  ...  | processes|
>>          +----------+    +----------+       +----------+
>> ------------------------------------------------------------
>> Host         |               |                  |
>> Kernel   +-------+       +-------+          +-------+
>>           | ext4  |       | LKCF  |          | HWRNG |
>>           +-------+       +-------+          +-------+
>>               |               |                  |
>>           +-------+       +-------+          +-------+
>>           |  vdx  |       |vCrypto|          | vRNG  |
>>           +-------+       +-------+          +-------+
>>               |               |                  |
>> PCI --------------------------------------------------------
>>                               |
>> SmartNIC             +---------------+
>>                       |virtio HW queue|
>>                       +---------------+
>>                               |
>>                           +------+
>>                           |NIC/IB|
>>                           +------+
>>                               | +-------------+
>>                               +--------------------->|virtio target|
>> +-------------+
>>
> All 3 seems a valid use cases.
>
> Use case 1 and 2 can be achieved directly without involving any 
> mediation layer or any other translation layer (for example virtio to 
> nfs).


Not for at least use case 2? It said it has a virtio backend in Qemu. Or 
the only possible way is to have virtio of in the guest.

Thanks


> Many blk and file protocols outside of the virtio already exists which 
> achieve this. I don't see virtio being any different to support this 
> in native manner, mainly the blk, fs, crypto device.
>
> use case #3 brings additional a benefits at the same time different 
> complexity but sure #3 is also a valid and common use case in our 
> experiences.
>
> In my experience working with FC, iSCSI, FCoE, NVMe RDMA fabrics, iSER,
> A virito fabrics needs a lot of work to reach the scale, resiliency 
> and lastly the security. (step by step...)
>
> My humble suggestion is : pick one transport instead of all at once, 
> rdma being most performant probably the first candidate to see the 
> perf gain for use case #1 and #2 from remote system.
>
> I briefly see your rdma command descriptor example, which is not 
> aligned to 16B. Perf wise it will be poor than nvme rdma fabrics.
>
> For PCI transport for net, we intent to start the work to improve 
> descriptors, the transport binding for net device. From our research I 
> see that some abstract virtio descriptors are great today, but if you 
> want to get best out of the system (sw, hw, cpu), such abstraction is 
> not the best. Sharing of "id" all the way to target and bring back is 
> an example of such inefficiency in your example.
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
@ 2023-04-25  6:31         ` Jason Wang
  0 siblings, 0 replies; 29+ messages in thread
From: Jason Wang @ 2023-04-25  6:31 UTC (permalink / raw)
  To: Parav Pandit, zhenwei pi
  Cc: Michael S . Tsirkin, Cornelia Huck, virtio-dev, virtio-comment,
	helei.sig11, houp


在 2023/4/25 13:03, Parav Pandit 写道:
>
>
> On 4/24/2023 9:38 AM, zhenwei pi wrote:
>>
>>
>>  From the point of my view, there are 3 cases:
>> 1, Host/container scenario. For example, host kernel connects a 
>> virtio target block service, maps it as a vdx(virtio-blk) device(used 
>> by Map-Reduce service which needs a fast/large size disk). The host 
>> kernel also connects a virtio target crypto service, maps it as 
>> virtio crypto device(used by nginx to accelarate HTTPS). And so on.
>>
>>          +----------+    +----------+       +----------+
>>          |Map-Reduce|    |   nginx  |  ...  | processes|
>>          +----------+    +----------+       +----------+
>> ------------------------------------------------------------
>> Host         |               |                  |
>> Kernel   +-------+       +-------+          +-------+
>>           | ext4  |       | LKCF  |          | HWRNG |
>>           +-------+       +-------+          +-------+
>>               |               |                  |
>>           +-------+       +-------+          +-------+
>>           |  vdx  |       |vCrypto|          | vRNG  |
>>           +-------+       +-------+          +-------+
>>               |               |                  |
>>               |           +--------+             |
>>               +---------->|TCP/RDMA|<------------+
>>                           +--------+
>>                               |
>>                           +------+
>>                           |NIC/IB|
>>                           +------+
>>                               | +-------------+
>>                               +--------------------->|virtio target|
>> +-------------+
>>
>> 2, Typical virtualization environment. The workloads run in a guest, 
>> and QEMU handles virtio-pci(or MMIO), and forwards requests to target.
>>          +----------+    +----------+       +----------+
>>          |Map-Reduce|    |   nginx  |  ...  | processes|
>>          +----------+    +----------+       +----------+
>> ------------------------------------------------------------
>> Guest        |               |                  |
>> Kernel   +-------+       +-------+          +-------+
>>           | ext4  |       | LKCF  |          | HWRNG |
>>           +-------+       +-------+          +-------+
>>               |               |                  |
>>           +-------+       +-------+          +-------+
>>           |  vdx  |       |vCrypto|          | vRNG  |
>>           +-------+       +-------+          +-------+
>>               |               |                  |
>> PCI --------------------------------------------------------
>>                               |
>> QEMU                 +--------------+
>>                       |virtio backend|
>>                       +--------------+
>>                               |
>>                           +------+
>>                           |NIC/IB|
>>                           +------+
>>                               | +-------------+
>>                               +--------------------->|virtio target|
>> +-------------+
>>
> Example #3 enables to implement virtio backend over fabrics initiator 
> in the user space, which is also a good use case.
> It can be also be done in non native virtio backend.
> More below.
>
>> 3, SmartNIC/DPU/vDPA environment. It's possible to convert virtio-pci 
>> request to virtio-of request by hardware, and forward request to 
>> virtio target directly.
>>          +----------+    +----------+       +----------+
>>          |Map-Reduce|    |   nginx  |  ...  | processes|
>>          +----------+    +----------+       +----------+
>> ------------------------------------------------------------
>> Host         |               |                  |
>> Kernel   +-------+       +-------+          +-------+
>>           | ext4  |       | LKCF  |          | HWRNG |
>>           +-------+       +-------+          +-------+
>>               |               |                  |
>>           +-------+       +-------+          +-------+
>>           |  vdx  |       |vCrypto|          | vRNG  |
>>           +-------+       +-------+          +-------+
>>               |               |                  |
>> PCI --------------------------------------------------------
>>                               |
>> SmartNIC             +---------------+
>>                       |virtio HW queue|
>>                       +---------------+
>>                               |
>>                           +------+
>>                           |NIC/IB|
>>                           +------+
>>                               | +-------------+
>>                               +--------------------->|virtio target|
>> +-------------+
>>
> All 3 seems a valid use cases.
>
> Use case 1 and 2 can be achieved directly without involving any 
> mediation layer or any other translation layer (for example virtio to 
> nfs).


Not for at least use case 2? It said it has a virtio backend in Qemu. Or 
the only possible way is to have virtio of in the guest.

Thanks


> Many blk and file protocols outside of the virtio already exists which 
> achieve this. I don't see virtio being any different to support this 
> in native manner, mainly the blk, fs, crypto device.
>
> use case #3 brings additional a benefits at the same time different 
> complexity but sure #3 is also a valid and common use case in our 
> experiences.
>
> In my experience working with FC, iSCSI, FCoE, NVMe RDMA fabrics, iSER,
> A virito fabrics needs a lot of work to reach the scale, resiliency 
> and lastly the security. (step by step...)
>
> My humble suggestion is : pick one transport instead of all at once, 
> rdma being most performant probably the first candidate to see the 
> perf gain for use case #1 and #2 from remote system.
>
> I briefly see your rdma command descriptor example, which is not 
> aligned to 16B. Perf wise it will be poor than nvme rdma fabrics.
>
> For PCI transport for net, we intent to start the work to improve 
> descriptors, the transport binding for net device. From our research I 
> see that some abstract virtio descriptors are great today, but if you 
> want to get best out of the system (sw, hw, cpu), such abstraction is 
> not the best. Sharing of "id" all the way to target and bring back is 
> an example of such inefficiency in your example.
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [virtio-dev] Re: Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
  2023-04-24 13:38   ` zhenwei pi
@ 2023-04-25  6:36       ` Jason Wang
  2023-04-25  6:36       ` Jason Wang
  1 sibling, 0 replies; 29+ messages in thread
From: Jason Wang @ 2023-04-25  6:36 UTC (permalink / raw)
  To: zhenwei pi
  Cc: Michael S . Tsirkin, Cornelia Huck, parav, virtio-dev,
	virtio-comment, helei.sig11, houp

On Mon, Apr 24, 2023 at 9:38 PM zhenwei pi <pizhenwei@bytedance.com> wrote:
>
>
>
> On 4/24/23 11:40, Jason Wang wrote:
> > On Sun, Apr 23, 2023 at 7:31 PM zhenwei pi <pizhenwei@bytedance.com> wrote:
> >>
> >> Hi,
> >>
> >> In the past years, virtio supports lots of device specifications by
> >> PCI/MMIO/CCW. These devices work fine in the virtualization environment,
> >> and we have a chance to support virtio device family for the
> >> container/host scenario.
> >
> > PCI can work for containers for sure (or does it meet any issue like
> > scalability?). It's better to describe what problems you met and why
> > you choose this way to solve it.
> >
> > It's better to compare this with
> >
> > 1) hiding the fabrics details via DPU
> > 2) vDPA
> >
> Hi,
>
> Sorry, I missed this part. "Network defined peripheral devices of virtio
> family" is the main purpose of this proposal,

This can be achieved by either DPU or vDPA. I think the advantages is,
if we standardize this in the spec, it avoids vendor specific
protocol.

> this allows us to use many
> types of remote resources which are provided by virtio target.
>
>  From the point of my view, there are 3 cases:
> 1, Host/container scenario. For example, host kernel connects a virtio
> target block service, maps it as a vdx(virtio-blk) device(used by
> Map-Reduce service which needs a fast/large size disk). The host kernel
> also connects a virtio target crypto service, maps it as virtio crypto
> device(used by nginx to accelarate HTTPS). And so on.
>
>          +----------+    +----------+       +----------+
>          |Map-Reduce|    |   nginx  |  ...  | processes|
>          +----------+    +----------+       +----------+
> ------------------------------------------------------------
> Host         |               |                  |
> Kernel   +-------+       +-------+          +-------+
>           | ext4  |       | LKCF  |          | HWRNG |
>           +-------+       +-------+          +-------+
>               |               |                  |
>           +-------+       +-------+          +-------+
>           |  vdx  |       |vCrypto|          | vRNG  |
>           +-------+       +-------+          +-------+
>               |               |                  |
>               |           +--------+             |
>               +---------->|TCP/RDMA|<------------+
>                           +--------+
>                               |
>                           +------+
>                           |NIC/IB|
>                           +------+
>                               |                      +-------------+
>                               +--------------------->|virtio target|
>                                                      +-------------+
>
> 2, Typical virtualization environment. The workloads run in a guest, and
> QEMU handles virtio-pci(or MMIO), and forwards requests to target.
>          +----------+    +----------+       +----------+
>          |Map-Reduce|    |   nginx  |  ...  | processes|
>          +----------+    +----------+       +----------+
> ------------------------------------------------------------
> Guest        |               |                  |
> Kernel   +-------+       +-------+          +-------+
>           | ext4  |       | LKCF  |          | HWRNG |
>           +-------+       +-------+          +-------+
>               |               |                  |
>           +-------+       +-------+          +-------+
>           |  vdx  |       |vCrypto|          | vRNG  |
>           +-------+       +-------+          +-------+
>               |               |                  |
> PCI --------------------------------------------------------
>                               |
> QEMU                 +--------------+
>                       |virtio backend|
>                       +--------------+
>                               |
>                           +------+
>                           |NIC/IB|
>                           +------+
>                               |                      +-------------+
>                               +--------------------->|virtio target|
>                                                      +-------------+
>

So it's the job of the Qemu to do the translation from virtqueue to packet here?

> 3, SmartNIC/DPU/vDPA environment. It's possible to convert virtio-pci
> request to virtio-of request by hardware, and forward request to virtio
> target directly.
>          +----------+    +----------+       +----------+
>          |Map-Reduce|    |   nginx  |  ...  | processes|
>          +----------+    +----------+       +----------+
> ------------------------------------------------------------
> Host         |               |                  |
> Kernel   +-------+       +-------+          +-------+
>           | ext4  |       | LKCF  |          | HWRNG |
>           +-------+       +-------+          +-------+
>               |               |                  |
>           +-------+       +-------+          +-------+
>           |  vdx  |       |vCrypto|          | vRNG  |
>           +-------+       +-------+          +-------+
>               |               |                  |
> PCI --------------------------------------------------------
>                               |
> SmartNIC             +---------------+
>                       |virtio HW queue|
>                       +---------------+
>                               |
>                           +------+
>                           |NIC/IB|
>                           +------+
>                               |                      +-------------+
>                               +--------------------->|virtio target|
>                                                      +-------------+
>
> >>
> >> - Theory
> >> "Virtio Over Fabrics" aims at "reuse virtio device specifications", and
> >> provides network defined peripheral devices.
> >> And this protocol also could be used in virtualization environment,
> >> typically hypervisor(or vhost-user process) handles request from virtio
> >> PCI/MMIO/CCW, remaps request and forwards to target by fabrics.
> >
> > This requires meditation in the datapath, isn't it?
> >
> >>
> >> - Protocol
> >> The detail protocol definition see:
> >> https://github.com/pizhenwei/linux/blob/virtio-of-github/include/uapi/linux/virtio_of.h
> >
> > I'd say a RFC patch for virtio spec is more suitable than the codes.
> >
>
> OK. I'll send a RFC patch for virtio spec later if this proposal is
> acceptable.

Well, I think we need to have an RFC first to know if it is acceptable or not.

>
> [...]
>
> >
> > A quick glance at the code told me it's a mediation layer that convert
> > descriptors in the vring to the fabric specific packet. This is the
> > vDPA way.
> >
> > If we agree virtio of fabic is useful, we need invent facilities to
> > allow building packet directly without bothering the virtqueue (the
> > API is layout independent anyhow).
> >
> > Thanks
> >
>
> This code describes the case 1[Host/container scenario], also proves
> this case works.
> Create a virtqueue in the virtio fabric module, also emulate a
> "virtqueue backend" here, when uplayer kicks vring, the "backend" gets
> notified and builds packet to TCP/RDMA.

In this case, it won't perform good. Since it still use virtqueue
which is unnecessary in the datapath for fabric.

Thanks

>
> [...]
>
> --
> zhenwei pi
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
@ 2023-04-25  6:36       ` Jason Wang
  0 siblings, 0 replies; 29+ messages in thread
From: Jason Wang @ 2023-04-25  6:36 UTC (permalink / raw)
  To: zhenwei pi
  Cc: Michael S . Tsirkin, Cornelia Huck, parav, virtio-dev,
	virtio-comment, helei.sig11, houp

On Mon, Apr 24, 2023 at 9:38 PM zhenwei pi <pizhenwei@bytedance.com> wrote:
>
>
>
> On 4/24/23 11:40, Jason Wang wrote:
> > On Sun, Apr 23, 2023 at 7:31 PM zhenwei pi <pizhenwei@bytedance.com> wrote:
> >>
> >> Hi,
> >>
> >> In the past years, virtio supports lots of device specifications by
> >> PCI/MMIO/CCW. These devices work fine in the virtualization environment,
> >> and we have a chance to support virtio device family for the
> >> container/host scenario.
> >
> > PCI can work for containers for sure (or does it meet any issue like
> > scalability?). It's better to describe what problems you met and why
> > you choose this way to solve it.
> >
> > It's better to compare this with
> >
> > 1) hiding the fabrics details via DPU
> > 2) vDPA
> >
> Hi,
>
> Sorry, I missed this part. "Network defined peripheral devices of virtio
> family" is the main purpose of this proposal,

This can be achieved by either DPU or vDPA. I think the advantages is,
if we standardize this in the spec, it avoids vendor specific
protocol.

> this allows us to use many
> types of remote resources which are provided by virtio target.
>
>  From the point of my view, there are 3 cases:
> 1, Host/container scenario. For example, host kernel connects a virtio
> target block service, maps it as a vdx(virtio-blk) device(used by
> Map-Reduce service which needs a fast/large size disk). The host kernel
> also connects a virtio target crypto service, maps it as virtio crypto
> device(used by nginx to accelarate HTTPS). And so on.
>
>          +----------+    +----------+       +----------+
>          |Map-Reduce|    |   nginx  |  ...  | processes|
>          +----------+    +----------+       +----------+
> ------------------------------------------------------------
> Host         |               |                  |
> Kernel   +-------+       +-------+          +-------+
>           | ext4  |       | LKCF  |          | HWRNG |
>           +-------+       +-------+          +-------+
>               |               |                  |
>           +-------+       +-------+          +-------+
>           |  vdx  |       |vCrypto|          | vRNG  |
>           +-------+       +-------+          +-------+
>               |               |                  |
>               |           +--------+             |
>               +---------->|TCP/RDMA|<------------+
>                           +--------+
>                               |
>                           +------+
>                           |NIC/IB|
>                           +------+
>                               |                      +-------------+
>                               +--------------------->|virtio target|
>                                                      +-------------+
>
> 2, Typical virtualization environment. The workloads run in a guest, and
> QEMU handles virtio-pci(or MMIO), and forwards requests to target.
>          +----------+    +----------+       +----------+
>          |Map-Reduce|    |   nginx  |  ...  | processes|
>          +----------+    +----------+       +----------+
> ------------------------------------------------------------
> Guest        |               |                  |
> Kernel   +-------+       +-------+          +-------+
>           | ext4  |       | LKCF  |          | HWRNG |
>           +-------+       +-------+          +-------+
>               |               |                  |
>           +-------+       +-------+          +-------+
>           |  vdx  |       |vCrypto|          | vRNG  |
>           +-------+       +-------+          +-------+
>               |               |                  |
> PCI --------------------------------------------------------
>                               |
> QEMU                 +--------------+
>                       |virtio backend|
>                       +--------------+
>                               |
>                           +------+
>                           |NIC/IB|
>                           +------+
>                               |                      +-------------+
>                               +--------------------->|virtio target|
>                                                      +-------------+
>

So it's the job of the Qemu to do the translation from virtqueue to packet here?

> 3, SmartNIC/DPU/vDPA environment. It's possible to convert virtio-pci
> request to virtio-of request by hardware, and forward request to virtio
> target directly.
>          +----------+    +----------+       +----------+
>          |Map-Reduce|    |   nginx  |  ...  | processes|
>          +----------+    +----------+       +----------+
> ------------------------------------------------------------
> Host         |               |                  |
> Kernel   +-------+       +-------+          +-------+
>           | ext4  |       | LKCF  |          | HWRNG |
>           +-------+       +-------+          +-------+
>               |               |                  |
>           +-------+       +-------+          +-------+
>           |  vdx  |       |vCrypto|          | vRNG  |
>           +-------+       +-------+          +-------+
>               |               |                  |
> PCI --------------------------------------------------------
>                               |
> SmartNIC             +---------------+
>                       |virtio HW queue|
>                       +---------------+
>                               |
>                           +------+
>                           |NIC/IB|
>                           +------+
>                               |                      +-------------+
>                               +--------------------->|virtio target|
>                                                      +-------------+
>
> >>
> >> - Theory
> >> "Virtio Over Fabrics" aims at "reuse virtio device specifications", and
> >> provides network defined peripheral devices.
> >> And this protocol also could be used in virtualization environment,
> >> typically hypervisor(or vhost-user process) handles request from virtio
> >> PCI/MMIO/CCW, remaps request and forwards to target by fabrics.
> >
> > This requires meditation in the datapath, isn't it?
> >
> >>
> >> - Protocol
> >> The detail protocol definition see:
> >> https://github.com/pizhenwei/linux/blob/virtio-of-github/include/uapi/linux/virtio_of.h
> >
> > I'd say a RFC patch for virtio spec is more suitable than the codes.
> >
>
> OK. I'll send a RFC patch for virtio spec later if this proposal is
> acceptable.

Well, I think we need to have an RFC first to know if it is acceptable or not.

>
> [...]
>
> >
> > A quick glance at the code told me it's a mediation layer that convert
> > descriptors in the vring to the fabric specific packet. This is the
> > vDPA way.
> >
> > If we agree virtio of fabic is useful, we need invent facilities to
> > allow building packet directly without bothering the virtqueue (the
> > API is layout independent anyhow).
> >
> > Thanks
> >
>
> This code describes the case 1[Host/container scenario], also proves
> this case works.
> Create a virtqueue in the virtio fabric module, also emulate a
> "virtqueue backend" here, when uplayer kicks vring, the "backend" gets
> notified and builds packet to TCP/RDMA.

In this case, it won't perform good. Since it still use virtqueue
which is unnecessary in the datapath for fabric.

Thanks

>
> [...]
>
> --
> zhenwei pi
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
  2023-04-25  5:03       ` Parav Pandit
  (?)
  (?)
@ 2023-04-25  7:02       ` zhenwei pi
  -1 siblings, 0 replies; 29+ messages in thread
From: zhenwei pi @ 2023-04-25  7:02 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S . Tsirkin, Cornelia Huck, virtio-dev, virtio-comment,
	helei.sig11, houp

On 4/25/23 13:03, Parav Pandit wrote:
[...]
> Example #3 enables to implement virtio backend over fabrics initiator in 
> the user space, which is also a good use case.
Yes.

> It can be also be done in non native virtio backend.
> More below.
> 
>> 3, SmartNIC/DPU/vDPA environment. It's possible to convert virtio-pci 
>> request to virtio-of request by hardware, and forward request to 
>> virtio target directly.
>>          +----------+    +----------+       +----------+
>>          |Map-Reduce|    |   nginx  |  ...  | processes|
>>          +----------+    +----------+       +----------+
>> ------------------------------------------------------------
>> Host         |               |                  |
>> Kernel   +-------+       +-------+          +-------+
>>           | ext4  |       | LKCF  |          | HWRNG |
>>           +-------+       +-------+          +-------+
>>               |               |                  |
>>           +-------+       +-------+          +-------+
>>           |  vdx  |       |vCrypto|          | vRNG  |
>>           +-------+       +-------+          +-------+
>>               |               |                  |
>> PCI --------------------------------------------------------
>>                               |
>> SmartNIC             +---------------+
>>                       |virtio HW queue|
>>                       +---------------+
>>                               |
>>                           +------+
>>                           |NIC/IB|
>>                           +------+
>>                               |                      +-------------+
>>                               +--------------------->|virtio target|
>>                                                      +-------------+
>>
> All 3 seems a valid use cases.
> 
> Use case 1 and 2 can be achieved directly without involving any 
> mediation layer or any other translation layer (for example virtio to nfs).
> Many blk and file protocols outside of the virtio already exists which 
> achieve this. I don't see virtio being any different to support this in 
> native manner, mainly the blk, fs, crypto device.
> 

I'm working on iSCSI/iSER/NVMe-oF for several years, frankly, I also 
think there is no essential difference between virtio-blk over fabrics 
and NVMe-oF. As far as I know, storage specified protocol is quit common 
and popular, but the other device types seems lack. For example, I don't 
know any protocol to support remote crypto device(please correct me if 
there is already one). I also notice that virtio-camera is reserved, a 
remote camera protocol may be developed to support camera redirection in 
future. virtio-of works on virtio transport layer, it allow us to 
support almost the full virtio device family.

> use case #3 brings additional a benefits at the same time different 
> complexity but sure #3 is also a valid and common use case in our 
> experiences.
> 
> In my experience working with FC, iSCSI, FCoE, NVMe RDMA fabrics, iSER,
> A virito fabrics needs a lot of work to reach the scale, resiliency and 
> lastly the security. (step by step...)
> 
> My humble suggestion is : pick one transport instead of all at once, 
> rdma being most performant probably the first candidate to see the perf 
> gain for use case #1 and #2 from remote system.
> 
> I briefly see your rdma command descriptor example, which is not aligned 
> to 16B. Perf wise it will be poor than nvme rdma fabrics.
> 
Thanks, I'll confirm this with RDMA expert, maybe add a padding field in 
command.

> For PCI transport for net, we intent to start the work to improve 
> descriptors, the transport binding for net device. From our research I 
> see that some abstract virtio descriptors are great today, but if you 
> want to get best out of the system (sw, hw, cpu), such abstraction is 
> not the best. Sharing of "id" all the way to target and bring back is an 
> example of such inefficiency in your example.

Could you please give me more hint about this?
Because requests from virtqueue always have unique id, so the inflight 
virtio-of request use id to distinguish. otherwise, it has no other meaning.

-- 
zhenwei pi

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Re: Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
  2023-04-25  6:36       ` Jason Wang
  (?)
@ 2023-04-25  7:21       ` zhenwei pi
  -1 siblings, 0 replies; 29+ messages in thread
From: zhenwei pi @ 2023-04-25  7:21 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S . Tsirkin, Cornelia Huck, parav, virtio-dev,
	virtio-comment, helei.sig11, houp



On 4/25/23 14:36, Jason Wang wrote:
> On Mon, Apr 24, 2023 at 9:38 PM zhenwei pi <pizhenwei@bytedance.com> wrote:
>>
>>
>>
>> On 4/24/23 11:40, Jason Wang wrote:
>>> On Sun, Apr 23, 2023 at 7:31 PM zhenwei pi <pizhenwei@bytedance.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> In the past years, virtio supports lots of device specifications by
>>>> PCI/MMIO/CCW. These devices work fine in the virtualization environment,
>>>> and we have a chance to support virtio device family for the
>>>> container/host scenario.
>>>
>>> PCI can work for containers for sure (or does it meet any issue like
>>> scalability?). It's better to describe what problems you met and why
>>> you choose this way to solve it.
>>>
>>> It's better to compare this with
>>>
>>> 1) hiding the fabrics details via DPU
>>> 2) vDPA
>>>
>> Hi,
>>
>> Sorry, I missed this part. "Network defined peripheral devices of virtio
>> family" is the main purpose of this proposal,
> 
> This can be achieved by either DPU or vDPA. I think the advantages is,
> if we standardize this in the spec, it avoids vendor specific
> protocol.
> 

Agree with "standardize this in the spec to avoid vendor specific 
protocol", this also avoid device specific protocol, for example, we 
don't need "block over fabrics", "crypto over fabrics", "camera over 
fabrics" ... to support virtio device family, instead we only need a 
single virtio-of.

"This can be achieved by either DPU or vDPA", do you mean case 3?

>> this allows us to use many
>> types of remote resources which are provided by virtio target.
>>
>>   From the point of my view, there are 3 cases:
>> 1, Host/container scenario. For example, host kernel connects a virtio
>> target block service, maps it as a vdx(virtio-blk) device(used by
>> Map-Reduce service which needs a fast/large size disk). The host kernel
>> also connects a virtio target crypto service, maps it as virtio crypto
>> device(used by nginx to accelarate HTTPS). And so on.
>>
>>           +----------+    +----------+       +----------+
>>           |Map-Reduce|    |   nginx  |  ...  | processes|
>>           +----------+    +----------+       +----------+
>> ------------------------------------------------------------
>> Host         |               |                  |
>> Kernel   +-------+       +-------+          +-------+
>>            | ext4  |       | LKCF  |          | HWRNG |
>>            +-------+       +-------+          +-------+
>>                |               |                  |
>>            +-------+       +-------+          +-------+
>>            |  vdx  |       |vCrypto|          | vRNG  |
>>            +-------+       +-------+          +-------+
>>                |               |                  |
>>                |           +--------+             |
>>                +---------->|TCP/RDMA|<------------+
>>                            +--------+
>>                                |
>>                            +------+
>>                            |NIC/IB|
>>                            +------+
>>                                |                      +-------------+
>>                                +--------------------->|virtio target|
>>                                                       +-------------+
>>
>> 2, Typical virtualization environment. The workloads run in a guest, and
>> QEMU handles virtio-pci(or MMIO), and forwards requests to target.
>>           +----------+    +----------+       +----------+
>>           |Map-Reduce|    |   nginx  |  ...  | processes|
>>           +----------+    +----------+       +----------+
>> ------------------------------------------------------------
>> Guest        |               |                  |
>> Kernel   +-------+       +-------+          +-------+
>>            | ext4  |       | LKCF  |          | HWRNG |
>>            +-------+       +-------+          +-------+
>>                |               |                  |
>>            +-------+       +-------+          +-------+
>>            |  vdx  |       |vCrypto|          | vRNG  |
>>            +-------+       +-------+          +-------+
>>                |               |                  |
>> PCI --------------------------------------------------------
>>                                |
>> QEMU                 +--------------+
>>                        |virtio backend|
>>                        +--------------+
>>                                |
>>                            +------+
>>                            |NIC/IB|
>>                            +------+
>>                                |                      +-------------+
>>                                +--------------------->|virtio target|
>>                                                       +-------------+
>>
> 
> So it's the job of the Qemu to do the translation from virtqueue to packet here?
> 

Yes. QEMU pops request from virtqueue backend, translates it into 
virtio-of, and forwards to target side. Handle response from target 
side, then interrupt guest.

>> 3, SmartNIC/DPU/vDPA environment. It's possible to convert virtio-pci
>> request to virtio-of request by hardware, and forward request to virtio
>> target directly.
>>           +----------+    +----------+       +----------+
>>           |Map-Reduce|    |   nginx  |  ...  | processes|
>>           +----------+    +----------+       +----------+
>> ------------------------------------------------------------
>> Host         |               |                  |
>> Kernel   +-------+       +-------+          +-------+
>>            | ext4  |       | LKCF  |          | HWRNG |
>>            +-------+       +-------+          +-------+
>>                |               |                  |
>>            +-------+       +-------+          +-------+
>>            |  vdx  |       |vCrypto|          | vRNG  |
>>            +-------+       +-------+          +-------+
>>                |               |                  |
>> PCI --------------------------------------------------------
>>                                |
>> SmartNIC             +---------------+
>>                        |virtio HW queue|
>>                        +---------------+
>>                                |
>>                            +------+
>>                            |NIC/IB|
>>                            +------+
>>                                |                      +-------------+
>>                                +--------------------->|virtio target|
>>                                                       +-------------+
>>
>>>>
>>>> - Theory
>>>> "Virtio Over Fabrics" aims at "reuse virtio device specifications", and
>>>> provides network defined peripheral devices.
>>>> And this protocol also could be used in virtualization environment,
>>>> typically hypervisor(or vhost-user process) handles request from virtio
>>>> PCI/MMIO/CCW, remaps request and forwards to target by fabrics.
>>>
>>> This requires meditation in the datapath, isn't it?
>>>
>>>>
>>>> - Protocol
>>>> The detail protocol definition see:
>>>> https://github.com/pizhenwei/linux/blob/virtio-of-github/include/uapi/linux/virtio_of.h
>>>
>>> I'd say a RFC patch for virtio spec is more suitable than the codes.
>>>
>>
>> OK. I'll send a RFC patch for virtio spec later if this proposal is
>> acceptable.
> 
> Well, I think we need to have an RFC first to know if it is acceptable or not.
> 

Sure, I'll send a draft.

>>
>> [...]
>>
>>>
>>> A quick glance at the code told me it's a mediation layer that convert
>>> descriptors in the vring to the fabric specific packet. This is the
>>> vDPA way.
>>>
>>> If we agree virtio of fabic is useful, we need invent facilities to
>>> allow building packet directly without bothering the virtqueue (the
>>> API is layout independent anyhow).
>>>
>>> Thanks
>>>
>>
>> This code describes the case 1[Host/container scenario], also proves
>> this case works.
>> Create a virtqueue in the virtio fabric module, also emulate a
>> "virtqueue backend" here, when uplayer kicks vring, the "backend" gets
>> notified and builds packet to TCP/RDMA.
> 
> In this case, it won't perform good. Since it still use virtqueue
> which is unnecessary in the datapath for fabric.
>  > Thanks
> 
>>
>> [...]
>>
>> --
>> zhenwei pi
>>
> 

-- 
zhenwei pi

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
  2023-04-25  6:31         ` Jason Wang
@ 2023-04-25 13:27           ` Parav Pandit
  -1 siblings, 0 replies; 29+ messages in thread
From: Parav Pandit @ 2023-04-25 13:27 UTC (permalink / raw)
  To: Jason Wang, zhenwei pi
  Cc: Michael S . Tsirkin, Cornelia Huck, virtio-dev, virtio-comment,
	helei.sig11, houp


> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, April 25, 2023 2:31 AM

> >> 2, Typical virtualization environment. The workloads run in a guest,
> >> and QEMU handles virtio-pci(or MMIO), and forwards requests to target.
> >>          +----------+    +----------+       +----------+
> >>          |Map-Reduce|    |   nginx  |  ...  | processes|
> >>          +----------+    +----------+       +----------+
> >> ------------------------------------------------------------
> >> Guest        |               |                  | Kernel   +-------+
> >> +-------+          +-------+
> >>           | ext4  |       | LKCF  |          | HWRNG |
> >>           +-------+       +-------+          +-------+
> >>               |               |                  |
> >>           +-------+       +-------+          +-------+
> >>           |  vdx  |       |vCrypto|          | vRNG  |
> >>           +-------+       +-------+          +-------+
> >>               |               |                  | PCI
> >> --------------------------------------------------------
> >>                               |
> >> QEMU                 +--------------+
> >>                       |virtio backend|
> >>                       +--------------+
> >>                               |
> >>                           +------+
> >>                           |NIC/IB|
> >>                           +------+
> >>                               | +-------------+
> >>                               +--------------------->|virtio target|
> >> +-------------+
> >>

> > Use case 1 and 2 can be achieved directly without involving any
> > mediation layer or any other translation layer (for example virtio to
> > nfs).
> 
> 
> Not for at least use case 2? It said it has a virtio backend in Qemu. Or the only
> possible way is to have virtio of in the guest.
>
Front end and back end both are virtio. So There is some layer of mediation/translation from PCI to fabric commands.
But not as radical as virtio blk to nfs or virtio blk to nvme.
 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
@ 2023-04-25 13:27           ` Parav Pandit
  0 siblings, 0 replies; 29+ messages in thread
From: Parav Pandit @ 2023-04-25 13:27 UTC (permalink / raw)
  To: Jason Wang, zhenwei pi
  Cc: Michael S . Tsirkin, Cornelia Huck, virtio-dev, virtio-comment,
	helei.sig11, houp


> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, April 25, 2023 2:31 AM

> >> 2, Typical virtualization environment. The workloads run in a guest,
> >> and QEMU handles virtio-pci(or MMIO), and forwards requests to target.
> >>          +----------+    +----------+       +----------+
> >>          |Map-Reduce|    |   nginx  |  ...  | processes|
> >>          +----------+    +----------+       +----------+
> >> ------------------------------------------------------------
> >> Guest        |               |                  | Kernel   +-------+
> >> +-------+          +-------+
> >>           | ext4  |       | LKCF  |          | HWRNG |
> >>           +-------+       +-------+          +-------+
> >>               |               |                  |
> >>           +-------+       +-------+          +-------+
> >>           |  vdx  |       |vCrypto|          | vRNG  |
> >>           +-------+       +-------+          +-------+
> >>               |               |                  | PCI
> >> --------------------------------------------------------
> >>                               |
> >> QEMU                 +--------------+
> >>                       |virtio backend|
> >>                       +--------------+
> >>                               |
> >>                           +------+
> >>                           |NIC/IB|
> >>                           +------+
> >>                               | +-------------+
> >>                               +--------------------->|virtio target|
> >> +-------------+
> >>

> > Use case 1 and 2 can be achieved directly without involving any
> > mediation layer or any other translation layer (for example virtio to
> > nfs).
> 
> 
> Not for at least use case 2? It said it has a virtio backend in Qemu. Or the only
> possible way is to have virtio of in the guest.
>
Front end and back end both are virtio. So There is some layer of mediation/translation from PCI to fabric commands.
But not as radical as virtio blk to nfs or virtio blk to nvme.
 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
  2023-04-24  3:40   ` Jason Wang
@ 2023-04-25 13:55     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2023-04-25 13:55 UTC (permalink / raw)
  To: Jason Wang
  Cc: zhenwei pi, Michael S . Tsirkin, Cornelia Huck, parav,
	virtio-dev, virtio-comment, helei.sig11, houp

[-- Attachment #1: Type: text/plain, Size: 1507 bytes --]

On Mon, Apr 24, 2023 at 11:40:02AM +0800, Jason Wang wrote:
> On Sun, Apr 23, 2023 at 7:31 PM zhenwei pi <pizhenwei@bytedance.com> wrote:
> > "Virtio Over Fabrics" aims at "reuse virtio device specifications", and
> > provides network defined peripheral devices.
> > And this protocol also could be used in virtualization environment,
> > typically hypervisor(or vhost-user process) handles request from virtio
> > PCI/MMIO/CCW, remaps request and forwards to target by fabrics.
> 
> This requires meditation in the datapath, isn't it?
> 
> >
> > - Protocol
> > The detail protocol definition see:
> > https://github.com/pizhenwei/linux/blob/virtio-of-github/include/uapi/linux/virtio_of.h
> 
> I'd say a RFC patch for virtio spec is more suitable than the codes.

VIRTIO over TCP has long been anticipated but so far no one posted an
implementation. There are probably mentions of it from 10+ years ago.
I'm excited to see this!

Both the VIRTIO spec and the Linux drivers provide an abstraction that
allows fabrics (e.g. TCP) to fit in as a VIRTIO Transport. vrings are
not the only way to implement virtqueues.

Many VIRTIO devices will work fine over a message passing transport like
TCP. A few devices like the balloon device may not make sense. Shared
Memory Regions won't work.

Please define VIRTIO over Fabrics as a Transport in the VIRTIO spec so
that the integration with the VIRTIO device model is seamless. I look
forward to discussing spec patches.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
@ 2023-04-25 13:55     ` Stefan Hajnoczi
  0 siblings, 0 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2023-04-25 13:55 UTC (permalink / raw)
  To: Jason Wang
  Cc: zhenwei pi, Michael S . Tsirkin, Cornelia Huck, parav,
	virtio-dev, virtio-comment, helei.sig11, houp

[-- Attachment #1: Type: text/plain, Size: 1507 bytes --]

On Mon, Apr 24, 2023 at 11:40:02AM +0800, Jason Wang wrote:
> On Sun, Apr 23, 2023 at 7:31 PM zhenwei pi <pizhenwei@bytedance.com> wrote:
> > "Virtio Over Fabrics" aims at "reuse virtio device specifications", and
> > provides network defined peripheral devices.
> > And this protocol also could be used in virtualization environment,
> > typically hypervisor(or vhost-user process) handles request from virtio
> > PCI/MMIO/CCW, remaps request and forwards to target by fabrics.
> 
> This requires meditation in the datapath, isn't it?
> 
> >
> > - Protocol
> > The detail protocol definition see:
> > https://github.com/pizhenwei/linux/blob/virtio-of-github/include/uapi/linux/virtio_of.h
> 
> I'd say a RFC patch for virtio spec is more suitable than the codes.

VIRTIO over TCP has long been anticipated but so far no one posted an
implementation. There are probably mentions of it from 10+ years ago.
I'm excited to see this!

Both the VIRTIO spec and the Linux drivers provide an abstraction that
allows fabrics (e.g. TCP) to fit in as a VIRTIO Transport. vrings are
not the only way to implement virtqueues.

Many VIRTIO devices will work fine over a message passing transport like
TCP. A few devices like the balloon device may not make sense. Shared
Memory Regions won't work.

Please define VIRTIO over Fabrics as a Transport in the VIRTIO spec so
that the integration with the VIRTIO device model is seamless. I look
forward to discussing spec patches.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
  2023-04-24  3:40   ` Jason Wang
@ 2023-04-25 14:09     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2023-04-25 14:09 UTC (permalink / raw)
  To: Jason Wang
  Cc: zhenwei pi, Michael S . Tsirkin, Cornelia Huck, parav,
	virtio-dev, virtio-comment, helei.sig11, houp

[-- Attachment #1: Type: text/plain, Size: 1264 bytes --]

On Mon, Apr 24, 2023 at 11:40:02AM +0800, Jason Wang wrote:
> On Sun, Apr 23, 2023 at 7:31 PM zhenwei pi <pizhenwei@bytedance.com> wrote:
> > I develop an kernel initiator(unstable, WIP version, currently TCP/RDMA
> > supported):
> > https://github.com/pizhenwei/linux/tree/virtio-of-github
> 
> A quick glance at the code told me it's a mediation layer that convert
> descriptors in the vring to the fabric specific packet. This is the
> vDPA way.
>
> If we agree virtio of fabic is useful, we need invent facilities to
> allow building packet directly without bothering the virtqueue (the
> API is layout independent anyhow).

I agree. vrings makes sense for RDMA, but I think virtio_fabrics.c
should not be dependent on vrings.

Linux struct virtqueue is independent of vrings but the implementation
currently lives in virtio_ring.c because there has never been a
non-vring transport before.

It would be nice to implement virtqueue_add_sgs() specifically for
virtio_tcp.c without the use of vrings. Is a new struct
virtqueue_ops needed with with .add_sgs() and related callbacks?

Luckily the <linux/virtio.h> API already supports this abstraction and
changes to existing device drivers should be unnecessary or minimal.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
@ 2023-04-25 14:09     ` Stefan Hajnoczi
  0 siblings, 0 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2023-04-25 14:09 UTC (permalink / raw)
  To: Jason Wang
  Cc: zhenwei pi, Michael S . Tsirkin, Cornelia Huck, parav,
	virtio-dev, virtio-comment, helei.sig11, houp

[-- Attachment #1: Type: text/plain, Size: 1264 bytes --]

On Mon, Apr 24, 2023 at 11:40:02AM +0800, Jason Wang wrote:
> On Sun, Apr 23, 2023 at 7:31 PM zhenwei pi <pizhenwei@bytedance.com> wrote:
> > I develop an kernel initiator(unstable, WIP version, currently TCP/RDMA
> > supported):
> > https://github.com/pizhenwei/linux/tree/virtio-of-github
> 
> A quick glance at the code told me it's a mediation layer that convert
> descriptors in the vring to the fabric specific packet. This is the
> vDPA way.
>
> If we agree virtio of fabic is useful, we need invent facilities to
> allow building packet directly without bothering the virtqueue (the
> API is layout independent anyhow).

I agree. vrings makes sense for RDMA, but I think virtio_fabrics.c
should not be dependent on vrings.

Linux struct virtqueue is independent of vrings but the implementation
currently lives in virtio_ring.c because there has never been a
non-vring transport before.

It would be nice to implement virtqueue_add_sgs() specifically for
virtio_tcp.c without the use of vrings. Is a new struct
virtqueue_ops needed with with .add_sgs() and related callbacks?

Luckily the <linux/virtio.h> API already supports this abstraction and
changes to existing device drivers should be unnecessary or minimal.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [virtio-dev] Re: Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
  2023-04-25 13:55     ` Stefan Hajnoczi
@ 2023-04-26  1:08       ` zhenwei pi
  -1 siblings, 0 replies; 29+ messages in thread
From: zhenwei pi @ 2023-04-26  1:08 UTC (permalink / raw)
  To: Stefan Hajnoczi, Jason Wang
  Cc: Michael S . Tsirkin, Cornelia Huck, parav, virtio-dev,
	virtio-comment, helei.sig11, houp



On 4/25/23 21:55, Stefan Hajnoczi wrote:
> On Mon, Apr 24, 2023 at 11:40:02AM +0800, Jason Wang wrote:
>> On Sun, Apr 23, 2023 at 7:31 PM zhenwei pi <pizhenwei@bytedance.com> wrote:
>>> "Virtio Over Fabrics" aims at "reuse virtio device specifications", and
>>> provides network defined peripheral devices.
>>> And this protocol also could be used in virtualization environment,
>>> typically hypervisor(or vhost-user process) handles request from virtio
>>> PCI/MMIO/CCW, remaps request and forwards to target by fabrics.
>>
>> This requires meditation in the datapath, isn't it?
>>
>>>
>>> - Protocol
>>> The detail protocol definition see:
>>> https://github.com/pizhenwei/linux/blob/virtio-of-github/include/uapi/linux/virtio_of.h
>>
>> I'd say a RFC patch for virtio spec is more suitable than the codes.
> 
> VIRTIO over TCP has long been anticipated but so far no one posted an
> implementation. There are probably mentions of it from 10+ years ago.
> I'm excited to see this!
> 
> Both the VIRTIO spec and the Linux drivers provide an abstraction that
> allows fabrics (e.g. TCP) to fit in as a VIRTIO Transport. vrings are
> not the only way to implement virtqueues.
> 
> Many VIRTIO devices will work fine over a message passing transport like
> TCP. A few devices like the balloon device may not make sense. Shared
> Memory Regions won't work.
> 

Fully agree.

> Please define VIRTIO over Fabrics as a Transport in the VIRTIO spec so
> that the integration with the VIRTIO device model is seamless. I look
> forward to discussing spec patches.
> 
> Stefan

Thanks, I'm working on it.

-- 
zhenwei pi

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
@ 2023-04-26  1:08       ` zhenwei pi
  0 siblings, 0 replies; 29+ messages in thread
From: zhenwei pi @ 2023-04-26  1:08 UTC (permalink / raw)
  To: Stefan Hajnoczi, Jason Wang
  Cc: Michael S . Tsirkin, Cornelia Huck, parav, virtio-dev,
	virtio-comment, helei.sig11, houp



On 4/25/23 21:55, Stefan Hajnoczi wrote:
> On Mon, Apr 24, 2023 at 11:40:02AM +0800, Jason Wang wrote:
>> On Sun, Apr 23, 2023 at 7:31 PM zhenwei pi <pizhenwei@bytedance.com> wrote:
>>> "Virtio Over Fabrics" aims at "reuse virtio device specifications", and
>>> provides network defined peripheral devices.
>>> And this protocol also could be used in virtualization environment,
>>> typically hypervisor(or vhost-user process) handles request from virtio
>>> PCI/MMIO/CCW, remaps request and forwards to target by fabrics.
>>
>> This requires meditation in the datapath, isn't it?
>>
>>>
>>> - Protocol
>>> The detail protocol definition see:
>>> https://github.com/pizhenwei/linux/blob/virtio-of-github/include/uapi/linux/virtio_of.h
>>
>> I'd say a RFC patch for virtio spec is more suitable than the codes.
> 
> VIRTIO over TCP has long been anticipated but so far no one posted an
> implementation. There are probably mentions of it from 10+ years ago.
> I'm excited to see this!
> 
> Both the VIRTIO spec and the Linux drivers provide an abstraction that
> allows fabrics (e.g. TCP) to fit in as a VIRTIO Transport. vrings are
> not the only way to implement virtqueues.
> 
> Many VIRTIO devices will work fine over a message passing transport like
> TCP. A few devices like the balloon device may not make sense. Shared
> Memory Regions won't work.
> 

Fully agree.

> Please define VIRTIO over Fabrics as a Transport in the VIRTIO spec so
> that the integration with the VIRTIO device model is seamless. I look
> forward to discussing spec patches.
> 
> Stefan

Thanks, I'm working on it.

-- 
zhenwei pi

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
  2023-04-25 14:09     ` Stefan Hajnoczi
@ 2023-04-26  3:03       ` Jason Wang
  -1 siblings, 0 replies; 29+ messages in thread
From: Jason Wang @ 2023-04-26  3:03 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: zhenwei pi, Michael S . Tsirkin, Cornelia Huck, parav,
	virtio-dev, virtio-comment, helei.sig11, houp

On Tue, Apr 25, 2023 at 10:09 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Mon, Apr 24, 2023 at 11:40:02AM +0800, Jason Wang wrote:
> > On Sun, Apr 23, 2023 at 7:31 PM zhenwei pi <pizhenwei@bytedance.com> wrote:
> > > I develop an kernel initiator(unstable, WIP version, currently TCP/RDMA
> > > supported):
> > > https://github.com/pizhenwei/linux/tree/virtio-of-github
> >
> > A quick glance at the code told me it's a mediation layer that convert
> > descriptors in the vring to the fabric specific packet. This is the
> > vDPA way.
> >
> > If we agree virtio of fabic is useful, we need invent facilities to
> > allow building packet directly without bothering the virtqueue (the
> > API is layout independent anyhow).
>
> I agree. vrings makes sense for RDMA, but I think virtio_fabrics.c
> should not be dependent on vrings.
>
> Linux struct virtqueue is independent of vrings but the implementation
> currently lives in virtio_ring.c because there has never been a
> non-vring transport before.
>
> It would be nice to implement virtqueue_add_sgs() specifically for
> virtio_tcp.c without the use of vrings. Is a new struct
> virtqueue_ops needed with with .add_sgs() and related callbacks?
>
> Luckily the <linux/virtio.h> API already supports this abstraction and
> changes to existing device drivers should be unnecessary or minimal.

That's my understanding.

Btw, I'm also ok if we start with vring and optimize on top.

Thanks

>
> Stefan


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
@ 2023-04-26  3:03       ` Jason Wang
  0 siblings, 0 replies; 29+ messages in thread
From: Jason Wang @ 2023-04-26  3:03 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: zhenwei pi, Michael S . Tsirkin, Cornelia Huck, parav,
	virtio-dev, virtio-comment, helei.sig11, houp

On Tue, Apr 25, 2023 at 10:09 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Mon, Apr 24, 2023 at 11:40:02AM +0800, Jason Wang wrote:
> > On Sun, Apr 23, 2023 at 7:31 PM zhenwei pi <pizhenwei@bytedance.com> wrote:
> > > I develop an kernel initiator(unstable, WIP version, currently TCP/RDMA
> > > supported):
> > > https://github.com/pizhenwei/linux/tree/virtio-of-github
> >
> > A quick glance at the code told me it's a mediation layer that convert
> > descriptors in the vring to the fabric specific packet. This is the
> > vDPA way.
> >
> > If we agree virtio of fabic is useful, we need invent facilities to
> > allow building packet directly without bothering the virtqueue (the
> > API is layout independent anyhow).
>
> I agree. vrings makes sense for RDMA, but I think virtio_fabrics.c
> should not be dependent on vrings.
>
> Linux struct virtqueue is independent of vrings but the implementation
> currently lives in virtio_ring.c because there has never been a
> non-vring transport before.
>
> It would be nice to implement virtqueue_add_sgs() specifically for
> virtio_tcp.c without the use of vrings. Is a new struct
> virtqueue_ops needed with with .add_sgs() and related callbacks?
>
> Luckily the <linux/virtio.h> API already supports this abstraction and
> changes to existing device drivers should be unnecessary or minimal.

That's my understanding.

Btw, I'm also ok if we start with vring and optimize on top.

Thanks

>
> Stefan


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [virtio-dev] Re: Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
  2023-04-25  6:36       ` Jason Wang
@ 2023-04-26  9:29         ` Xuan Zhuo
  -1 siblings, 0 replies; 29+ messages in thread
From: Xuan Zhuo @ 2023-04-26  9:29 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S . Tsirkin, Cornelia Huck, parav, virtio-dev,
	virtio-comment, helei.sig11, houp, zhenwei pi

On Tue, 25 Apr 2023 14:36:04 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Apr 24, 2023 at 9:38 PM zhenwei pi <pizhenwei@bytedance.com> wrote:
> >
> >
> >
> > On 4/24/23 11:40, Jason Wang wrote:
> > > On Sun, Apr 23, 2023 at 7:31 PM zhenwei pi <pizhenwei@bytedance.com> wrote:
> > >>
> > >> Hi,
> > >>
> > >> In the past years, virtio supports lots of device specifications by
> > >> PCI/MMIO/CCW. These devices work fine in the virtualization environment,
> > >> and we have a chance to support virtio device family for the
> > >> container/host scenario.
> > >
> > > PCI can work for containers for sure (or does it meet any issue like
> > > scalability?). It's better to describe what problems you met and why
> > > you choose this way to solve it.
> > >
> > > It's better to compare this with
> > >
> > > 1) hiding the fabrics details via DPU
> > > 2) vDPA
> > >
> > Hi,
> >
> > Sorry, I missed this part. "Network defined peripheral devices of virtio
> > family" is the main purpose of this proposal,
>
> This can be achieved by either DPU or vDPA.

I agree this.

So I didn't understand what the meaning of this realization. Although I am also
very excited to this idea, this broaden the possibility of virtio. But, I still
really want to know what the meaning of this idea is, better performance? Or
can achieve some situations that we cannot achieved now.

> I think the advantages is,
> if we standardize this in the spec, it avoids vendor specific
> protocol.


Sorry, I dont got this.

Thanks.

>
> > this allows us to use many
> > types of remote resources which are provided by virtio target.
> >
> >  From the point of my view, there are 3 cases:
> > 1, Host/container scenario. For example, host kernel connects a virtio
> > target block service, maps it as a vdx(virtio-blk) device(used by
> > Map-Reduce service which needs a fast/large size disk). The host kernel
> > also connects a virtio target crypto service, maps it as virtio crypto
> > device(used by nginx to accelarate HTTPS). And so on.
> >
> >          +----------+    +----------+       +----------+
> >          |Map-Reduce|    |   nginx  |  ...  | processes|
> >          +----------+    +----------+       +----------+
> > ------------------------------------------------------------
> > Host         |               |                  |
> > Kernel   +-------+       +-------+          +-------+
> >           | ext4  |       | LKCF  |          | HWRNG |
> >           +-------+       +-------+          +-------+
> >               |               |                  |
> >           +-------+       +-------+          +-------+
> >           |  vdx  |       |vCrypto|          | vRNG  |
> >           +-------+       +-------+          +-------+
> >               |               |                  |
> >               |           +--------+             |
> >               +---------->|TCP/RDMA|<------------+
> >                           +--------+
> >                               |
> >                           +------+
> >                           |NIC/IB|
> >                           +------+
> >                               |                      +-------------+
> >                               +--------------------->|virtio target|
> >                                                      +-------------+
> >
> > 2, Typical virtualization environment. The workloads run in a guest, and
> > QEMU handles virtio-pci(or MMIO), and forwards requests to target.
> >          +----------+    +----------+       +----------+
> >          |Map-Reduce|    |   nginx  |  ...  | processes|
> >          +----------+    +----------+       +----------+
> > ------------------------------------------------------------
> > Guest        |               |                  |
> > Kernel   +-------+       +-------+          +-------+
> >           | ext4  |       | LKCF  |          | HWRNG |
> >           +-------+       +-------+          +-------+
> >               |               |                  |
> >           +-------+       +-------+          +-------+
> >           |  vdx  |       |vCrypto|          | vRNG  |
> >           +-------+       +-------+          +-------+
> >               |               |                  |
> > PCI --------------------------------------------------------
> >                               |
> > QEMU                 +--------------+
> >                       |virtio backend|
> >                       +--------------+
> >                               |
> >                           +------+
> >                           |NIC/IB|
> >                           +------+
> >                               |                      +-------------+
> >                               +--------------------->|virtio target|
> >                                                      +-------------+
> >
>
> So it's the job of the Qemu to do the translation from virtqueue to packet here?
>
> > 3, SmartNIC/DPU/vDPA environment. It's possible to convert virtio-pci
> > request to virtio-of request by hardware, and forward request to virtio
> > target directly.
> >          +----------+    +----------+       +----------+
> >          |Map-Reduce|    |   nginx  |  ...  | processes|
> >          +----------+    +----------+       +----------+
> > ------------------------------------------------------------
> > Host         |               |                  |
> > Kernel   +-------+       +-------+          +-------+
> >           | ext4  |       | LKCF  |          | HWRNG |
> >           +-------+       +-------+          +-------+
> >               |               |                  |
> >           +-------+       +-------+          +-------+
> >           |  vdx  |       |vCrypto|          | vRNG  |
> >           +-------+       +-------+          +-------+
> >               |               |                  |
> > PCI --------------------------------------------------------
> >                               |
> > SmartNIC             +---------------+
> >                       |virtio HW queue|
> >                       +---------------+
> >                               |
> >                           +------+
> >                           |NIC/IB|
> >                           +------+
> >                               |                      +-------------+
> >                               +--------------------->|virtio target|
> >                                                      +-------------+
> >
> > >>
> > >> - Theory
> > >> "Virtio Over Fabrics" aims at "reuse virtio device specifications", and
> > >> provides network defined peripheral devices.
> > >> And this protocol also could be used in virtualization environment,
> > >> typically hypervisor(or vhost-user process) handles request from virtio
> > >> PCI/MMIO/CCW, remaps request and forwards to target by fabrics.
> > >
> > > This requires meditation in the datapath, isn't it?
> > >
> > >>
> > >> - Protocol
> > >> The detail protocol definition see:
> > >> https://github.com/pizhenwei/linux/blob/virtio-of-github/include/uapi/linux/virtio_of.h
> > >
> > > I'd say a RFC patch for virtio spec is more suitable than the codes.
> > >
> >
> > OK. I'll send a RFC patch for virtio spec later if this proposal is
> > acceptable.
>
> Well, I think we need to have an RFC first to know if it is acceptable or not.
>
> >
> > [...]
> >
> > >
> > > A quick glance at the code told me it's a mediation layer that convert
> > > descriptors in the vring to the fabric specific packet. This is the
> > > vDPA way.
> > >
> > > If we agree virtio of fabic is useful, we need invent facilities to
> > > allow building packet directly without bothering the virtqueue (the
> > > API is layout independent anyhow).
> > >
> > > Thanks
> > >
> >
> > This code describes the case 1[Host/container scenario], also proves
> > this case works.
> > Create a virtqueue in the virtio fabric module, also emulate a
> > "virtqueue backend" here, when uplayer kicks vring, the "backend" gets
> > notified and builds packet to TCP/RDMA.
>
> In this case, it won't perform good. Since it still use virtqueue
> which is unnecessary in the datapath for fabric.
>
> Thanks
>
> >
> > [...]
> >
> > --
> > zhenwei pi
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [virtio-comment] Re: [virtio-dev] Re: Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
@ 2023-04-26  9:29         ` Xuan Zhuo
  0 siblings, 0 replies; 29+ messages in thread
From: Xuan Zhuo @ 2023-04-26  9:29 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S . Tsirkin, Cornelia Huck, parav, virtio-dev,
	virtio-comment, helei.sig11, houp, zhenwei pi

On Tue, 25 Apr 2023 14:36:04 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Apr 24, 2023 at 9:38 PM zhenwei pi <pizhenwei@bytedance.com> wrote:
> >
> >
> >
> > On 4/24/23 11:40, Jason Wang wrote:
> > > On Sun, Apr 23, 2023 at 7:31 PM zhenwei pi <pizhenwei@bytedance.com> wrote:
> > >>
> > >> Hi,
> > >>
> > >> In the past years, virtio supports lots of device specifications by
> > >> PCI/MMIO/CCW. These devices work fine in the virtualization environment,
> > >> and we have a chance to support virtio device family for the
> > >> container/host scenario.
> > >
> > > PCI can work for containers for sure (or does it meet any issue like
> > > scalability?). It's better to describe what problems you met and why
> > > you choose this way to solve it.
> > >
> > > It's better to compare this with
> > >
> > > 1) hiding the fabrics details via DPU
> > > 2) vDPA
> > >
> > Hi,
> >
> > Sorry, I missed this part. "Network defined peripheral devices of virtio
> > family" is the main purpose of this proposal,
>
> This can be achieved by either DPU or vDPA.

I agree this.

So I didn't understand what the meaning of this realization. Although I am also
very excited to this idea, this broaden the possibility of virtio. But, I still
really want to know what the meaning of this idea is, better performance? Or
can achieve some situations that we cannot achieved now.

> I think the advantages is,
> if we standardize this in the spec, it avoids vendor specific
> protocol.


Sorry, I dont got this.

Thanks.

>
> > this allows us to use many
> > types of remote resources which are provided by virtio target.
> >
> >  From the point of my view, there are 3 cases:
> > 1, Host/container scenario. For example, host kernel connects a virtio
> > target block service, maps it as a vdx(virtio-blk) device(used by
> > Map-Reduce service which needs a fast/large size disk). The host kernel
> > also connects a virtio target crypto service, maps it as virtio crypto
> > device(used by nginx to accelarate HTTPS). And so on.
> >
> >          +----------+    +----------+       +----------+
> >          |Map-Reduce|    |   nginx  |  ...  | processes|
> >          +----------+    +----------+       +----------+
> > ------------------------------------------------------------
> > Host         |               |                  |
> > Kernel   +-------+       +-------+          +-------+
> >           | ext4  |       | LKCF  |          | HWRNG |
> >           +-------+       +-------+          +-------+
> >               |               |                  |
> >           +-------+       +-------+          +-------+
> >           |  vdx  |       |vCrypto|          | vRNG  |
> >           +-------+       +-------+          +-------+
> >               |               |                  |
> >               |           +--------+             |
> >               +---------->|TCP/RDMA|<------------+
> >                           +--------+
> >                               |
> >                           +------+
> >                           |NIC/IB|
> >                           +------+
> >                               |                      +-------------+
> >                               +--------------------->|virtio target|
> >                                                      +-------------+
> >
> > 2, Typical virtualization environment. The workloads run in a guest, and
> > QEMU handles virtio-pci(or MMIO), and forwards requests to target.
> >          +----------+    +----------+       +----------+
> >          |Map-Reduce|    |   nginx  |  ...  | processes|
> >          +----------+    +----------+       +----------+
> > ------------------------------------------------------------
> > Guest        |               |                  |
> > Kernel   +-------+       +-------+          +-------+
> >           | ext4  |       | LKCF  |          | HWRNG |
> >           +-------+       +-------+          +-------+
> >               |               |                  |
> >           +-------+       +-------+          +-------+
> >           |  vdx  |       |vCrypto|          | vRNG  |
> >           +-------+       +-------+          +-------+
> >               |               |                  |
> > PCI --------------------------------------------------------
> >                               |
> > QEMU                 +--------------+
> >                       |virtio backend|
> >                       +--------------+
> >                               |
> >                           +------+
> >                           |NIC/IB|
> >                           +------+
> >                               |                      +-------------+
> >                               +--------------------->|virtio target|
> >                                                      +-------------+
> >
>
> So it's the job of the Qemu to do the translation from virtqueue to packet here?
>
> > 3, SmartNIC/DPU/vDPA environment. It's possible to convert virtio-pci
> > request to virtio-of request by hardware, and forward request to virtio
> > target directly.
> >          +----------+    +----------+       +----------+
> >          |Map-Reduce|    |   nginx  |  ...  | processes|
> >          +----------+    +----------+       +----------+
> > ------------------------------------------------------------
> > Host         |               |                  |
> > Kernel   +-------+       +-------+          +-------+
> >           | ext4  |       | LKCF  |          | HWRNG |
> >           +-------+       +-------+          +-------+
> >               |               |                  |
> >           +-------+       +-------+          +-------+
> >           |  vdx  |       |vCrypto|          | vRNG  |
> >           +-------+       +-------+          +-------+
> >               |               |                  |
> > PCI --------------------------------------------------------
> >                               |
> > SmartNIC             +---------------+
> >                       |virtio HW queue|
> >                       +---------------+
> >                               |
> >                           +------+
> >                           |NIC/IB|
> >                           +------+
> >                               |                      +-------------+
> >                               +--------------------->|virtio target|
> >                                                      +-------------+
> >
> > >>
> > >> - Theory
> > >> "Virtio Over Fabrics" aims at "reuse virtio device specifications", and
> > >> provides network defined peripheral devices.
> > >> And this protocol also could be used in virtualization environment,
> > >> typically hypervisor(or vhost-user process) handles request from virtio
> > >> PCI/MMIO/CCW, remaps request and forwards to target by fabrics.
> > >
> > > This requires meditation in the datapath, isn't it?
> > >
> > >>
> > >> - Protocol
> > >> The detail protocol definition see:
> > >> https://github.com/pizhenwei/linux/blob/virtio-of-github/include/uapi/linux/virtio_of.h
> > >
> > > I'd say a RFC patch for virtio spec is more suitable than the codes.
> > >
> >
> > OK. I'll send a RFC patch for virtio spec later if this proposal is
> > acceptable.
>
> Well, I think we need to have an RFC first to know if it is acceptable or not.
>
> >
> > [...]
> >
> > >
> > > A quick glance at the code told me it's a mediation layer that convert
> > > descriptors in the vring to the fabric specific packet. This is the
> > > vDPA way.
> > >
> > > If we agree virtio of fabic is useful, we need invent facilities to
> > > allow building packet directly without bothering the virtqueue (the
> > > API is layout independent anyhow).
> > >
> > > Thanks
> > >
> >
> > This code describes the case 1[Host/container scenario], also proves
> > this case works.
> > Create a virtqueue in the virtio fabric module, also emulate a
> > "virtqueue backend" here, when uplayer kicks vring, the "backend" gets
> > notified and builds packet to TCP/RDMA.
>
> In this case, it won't perform good. Since it still use virtqueue
> which is unnecessary in the datapath for fabric.
>
> Thanks
>
> >
> > [...]
> >
> > --
> > zhenwei pi
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [virtio-dev] Re: Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
  2023-04-25  5:03       ` Parav Pandit
@ 2023-04-27  8:20         ` zhenwei pi
  -1 siblings, 0 replies; 29+ messages in thread
From: zhenwei pi @ 2023-04-27  8:20 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Michael S . Tsirkin, Cornelia Huck, virtio-dev, virtio-comment,
	helei.sig11, houp, Jason Wang, jiangzhuo.cs

On 4/25/23 13:03, Parav Pandit wrote:
> 
> 
[...]
> 
> I briefly see your rdma command descriptor example, which is not aligned 
> to 16B. Perf wise it will be poor than nvme rdma fabrics.
> 

Hi,
I'm confused here, could you please give me more hint?
1, The size of command descriptor(I defined in example) is larger than 
command size of nvme rdma, more overhead leads performance worse than 
nvme over rdma.

2, The command size not aligned to 16B leads performance issue on RDMA 
SEND operation. My colleague Zhuo help me test the performance on 
sending 16/24/32 bytes:
taskset -c 30 ib_send_bw -d mlx5_2 -i 1 -x 3 -s 16 -t 1 xx.xx.xx.xx
taskset -c 30 ib_send_bw -d mlx5_2 -i 1 -x 3 -s 24 -t 1 xx.xx.xx.xx
taskset -c 30 ib_send_bw -d mlx5_2 -i 1 -x 3 -s 32 -t 1 xx.xx.xx.xx
The QPS seems almost same.

> For PCI transport for net, we intent to start the work to improve 
> descriptors, the transport binding for net device. From our research I 
> see that some abstract virtio descriptors are great today, but if you 
> want to get best out of the system (sw, hw, cpu), such abstraction is 
> not the best. Sharing of "id" all the way to target and bring back is an 
> example of such inefficiency in your example.

-- 
zhenwei pi

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
@ 2023-04-27  8:20         ` zhenwei pi
  0 siblings, 0 replies; 29+ messages in thread
From: zhenwei pi @ 2023-04-27  8:20 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Michael S . Tsirkin, Cornelia Huck, virtio-dev, virtio-comment,
	helei.sig11, houp, Jason Wang, jiangzhuo.cs

On 4/25/23 13:03, Parav Pandit wrote:
> 
> 
[...]
> 
> I briefly see your rdma command descriptor example, which is not aligned 
> to 16B. Perf wise it will be poor than nvme rdma fabrics.
> 

Hi,
I'm confused here, could you please give me more hint?
1, The size of command descriptor(I defined in example) is larger than 
command size of nvme rdma, more overhead leads performance worse than 
nvme over rdma.

2, The command size not aligned to 16B leads performance issue on RDMA 
SEND operation. My colleague Zhuo help me test the performance on 
sending 16/24/32 bytes:
taskset -c 30 ib_send_bw -d mlx5_2 -i 1 -x 3 -s 16 -t 1 xx.xx.xx.xx
taskset -c 30 ib_send_bw -d mlx5_2 -i 1 -x 3 -s 24 -t 1 xx.xx.xx.xx
taskset -c 30 ib_send_bw -d mlx5_2 -i 1 -x 3 -s 32 -t 1 xx.xx.xx.xx
The QPS seems almost same.

> For PCI transport for net, we intent to start the work to improve 
> descriptors, the transport binding for net device. From our research I 
> see that some abstract virtio descriptors are great today, but if you 
> want to get best out of the system (sw, hw, cpu), such abstraction is 
> not the best. Sharing of "id" all the way to target and bring back is an 
> example of such inefficiency in your example.

-- 
zhenwei pi

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [virtio-dev] RE: Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
  2023-04-27  8:20         ` zhenwei pi
@ 2023-04-27 20:31           ` Parav Pandit
  -1 siblings, 0 replies; 29+ messages in thread
From: Parav Pandit @ 2023-04-27 20:31 UTC (permalink / raw)
  To: zhenwei pi
  Cc: Michael S . Tsirkin, Cornelia Huck, virtio-dev, virtio-comment,
	helei.sig11, houp, Jason Wang, jiangzhuo.cs



> From: zhenwei pi <pizhenwei@bytedance.com>
> Sent: Thursday, April 27, 2023 4:21 AM
> 
> On 4/25/23 13:03, Parav Pandit wrote:
> >
> >
> [...]
> >
> > I briefly see your rdma command descriptor example, which is not
> > aligned to 16B. Perf wise it will be poor than nvme rdma fabrics.
> >
> 
> Hi,
> I'm confused here, could you please give me more hint?
> 1, The size of command descriptor(I defined in example) is larger than
> command size of nvme rdma, more overhead leads performance worse than
> nvme over rdma.
> 
Which structure?

I am guessing from the header file that you have,

virtio_of_command_vring
followed by
virtio_of_vring_desc[cmd.ndesc] where cmd.opcode = virtio_of_op_vring

if so, it seems fine to me.
However, the lack of actual command missing in the virtio_of_command_vring struct is not so good.
Such indirection overheads only reduce the perf as it is not constant size data coming in for the blk storage target side.
And even if it comes somehow, it requires two level protocol parsers. 
Can be simplified as you are not starting with any history here, abstraction point can be possibly virtio commands than the vring.

I don’t see a need for desc to have id and flags the way its drafted over the rdma fabrics:
What I had in mind as,
struct virtio_of_descriptor {
	le64 addr;
	le32 len;
	union {
		le32 rdma_key;
		le32 id + reserved;
		le32 tcp_desc_id;
	};

We can possibly define appropriate virtio fabric descriptors; at that point, the abstraction point is not literally taking the vring across the fabric.

Depending on use case may be starting with either one of TCP or RDMA makes sense, instead of cooking all at once.

> 2, The command size not aligned to 16B leads performance issue on RDMA
> SEND operation. My colleague Zhuo help me test the performance on sending
> 16/24/32 bytes:
> taskset -c 30 ib_send_bw -d mlx5_2 -i 1 -x 3 -s 16 -t 1 xx.xx.xx.xx taskset -c 30
> ib_send_bw -d mlx5_2 -i 1 -x 3 -s 24 -t 1 xx.xx.xx.xx taskset -c 30 ib_send_bw -d
> mlx5_2 -i 1 -x 3 -s 32 -t 1 xx.xx.xx.xx The QPS seems almost same.
> 
structure [1] generated subsequent vring_desc[] descriptors to unaligned 8B address results in partial writes of the desc.

It is hard to say from ib_send_bw test what is being done.
I remember mlx5 have cache aligned accesses, nop wqe segments and more.

I also don’t see the 'id' field coming back in the response command_status.
So why to transmit over the fabric which is not used.
Did I miss the id in completion side?

[1] https://github.com/pizhenwei/linux/blob/7a13b310d1338c462f8e0b13d39a571645bc4698/include/uapi/linux/virtio_of.h#L129

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
@ 2023-04-27 20:31           ` Parav Pandit
  0 siblings, 0 replies; 29+ messages in thread
From: Parav Pandit @ 2023-04-27 20:31 UTC (permalink / raw)
  To: zhenwei pi
  Cc: Michael S . Tsirkin, Cornelia Huck, virtio-dev, virtio-comment,
	helei.sig11, houp, Jason Wang, jiangzhuo.cs



> From: zhenwei pi <pizhenwei@bytedance.com>
> Sent: Thursday, April 27, 2023 4:21 AM
> 
> On 4/25/23 13:03, Parav Pandit wrote:
> >
> >
> [...]
> >
> > I briefly see your rdma command descriptor example, which is not
> > aligned to 16B. Perf wise it will be poor than nvme rdma fabrics.
> >
> 
> Hi,
> I'm confused here, could you please give me more hint?
> 1, The size of command descriptor(I defined in example) is larger than
> command size of nvme rdma, more overhead leads performance worse than
> nvme over rdma.
> 
Which structure?

I am guessing from the header file that you have,

virtio_of_command_vring
followed by
virtio_of_vring_desc[cmd.ndesc] where cmd.opcode = virtio_of_op_vring

if so, it seems fine to me.
However, the lack of actual command missing in the virtio_of_command_vring struct is not so good.
Such indirection overheads only reduce the perf as it is not constant size data coming in for the blk storage target side.
And even if it comes somehow, it requires two level protocol parsers. 
Can be simplified as you are not starting with any history here, abstraction point can be possibly virtio commands than the vring.

I don’t see a need for desc to have id and flags the way its drafted over the rdma fabrics:
What I had in mind as,
struct virtio_of_descriptor {
	le64 addr;
	le32 len;
	union {
		le32 rdma_key;
		le32 id + reserved;
		le32 tcp_desc_id;
	};

We can possibly define appropriate virtio fabric descriptors; at that point, the abstraction point is not literally taking the vring across the fabric.

Depending on use case may be starting with either one of TCP or RDMA makes sense, instead of cooking all at once.

> 2, The command size not aligned to 16B leads performance issue on RDMA
> SEND operation. My colleague Zhuo help me test the performance on sending
> 16/24/32 bytes:
> taskset -c 30 ib_send_bw -d mlx5_2 -i 1 -x 3 -s 16 -t 1 xx.xx.xx.xx taskset -c 30
> ib_send_bw -d mlx5_2 -i 1 -x 3 -s 24 -t 1 xx.xx.xx.xx taskset -c 30 ib_send_bw -d
> mlx5_2 -i 1 -x 3 -s 32 -t 1 xx.xx.xx.xx The QPS seems almost same.
> 
structure [1] generated subsequent vring_desc[] descriptors to unaligned 8B address results in partial writes of the desc.

It is hard to say from ib_send_bw test what is being done.
I remember mlx5 have cache aligned accesses, nop wqe segments and more.

I also don’t see the 'id' field coming back in the response command_status.
So why to transmit over the fabric which is not used.
Did I miss the id in completion side?

[1] https://github.com/pizhenwei/linux/blob/7a13b310d1338c462f8e0b13d39a571645bc4698/include/uapi/linux/virtio_of.h#L129

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RE: Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
  2023-04-27 20:31           ` Parav Pandit
  (?)
@ 2023-04-28  7:53           ` zhenwei pi
  -1 siblings, 0 replies; 29+ messages in thread
From: zhenwei pi @ 2023-04-28  7:53 UTC (permalink / raw)
  To: virtio-comment



On 4/28/23 04:31, Parav Pandit wrote:
> 
> 
>> From: zhenwei pi <pizhenwei@bytedance.com>
>> Sent: Thursday, April 27, 2023 4:21 AM
>>
>> On 4/25/23 13:03, Parav Pandit wrote:
>>>
>>>
>> [...]
>>>
>>> I briefly see your rdma command descriptor example, which is not
>>> aligned to 16B. Perf wise it will be poor than nvme rdma fabrics.
>>>
>>
>> Hi,
>> I'm confused here, could you please give me more hint?
>> 1, The size of command descriptor(I defined in example) is larger than
>> command size of nvme rdma, more overhead leads performance worse than
>> nvme over rdma.
>>
> Which structure?
> 
> I am guessing from the header file that you have,
> 
> virtio_of_command_vring
> followed by
> virtio_of_vring_desc[cmd.ndesc] where cmd.opcode = virtio_of_op_vring
> 
> if so, it seems fine to me.
> However, the lack of actual command missing in the virtio_of_command_vring struct is not so good.
> Such indirection overheads only reduce the perf as it is not constant size data coming in for the blk storage target side.
> And even if it comes somehow, it requires two level protocol parsers.
> Can be simplified as you are not starting with any history here, abstraction point can be possibly virtio commands than the vring.
> 
> I don’t see a need for desc to have id and flags the way its drafted over the rdma fabrics:
> What I had in mind as,
> struct virtio_of_descriptor {
> 	le64 addr;
> 	le32 len;
> 	union {
> 		le32 rdma_key;
> 		le32 id + reserved;
> 		le32 tcp_desc_id;
> 	};
> 
> We can possibly define appropriate virtio fabric descriptors; at that point, the abstraction point is not literally taking the vring across the fabric.
> 
> Depending on use case may be starting with either one of TCP or RDMA makes sense, instead of cooking all at once.
> 
>> 2, The command size not aligned to 16B leads performance issue on RDMA
>> SEND operation. My colleague Zhuo help me test the performance on sending
>> 16/24/32 bytes:
>> taskset -c 30 ib_send_bw -d mlx5_2 -i 1 -x 3 -s 16 -t 1 xx.xx.xx.xx taskset -c 30
>> ib_send_bw -d mlx5_2 -i 1 -x 3 -s 24 -t 1 xx.xx.xx.xx taskset -c 30 ib_send_bw -d
>> mlx5_2 -i 1 -x 3 -s 32 -t 1 xx.xx.xx.xx The QPS seems almost same.
>>
> structure [1] generated subsequent vring_desc[] descriptors to unaligned 8B address results in partial writes of the desc.
> 
> It is hard to say from ib_send_bw test what is being done.
> I remember mlx5 have cache aligned accesses, nop wqe segments and more.
> 
> I also don’t see the 'id' field coming back in the response command_status.
> So why to transmit over the fabric which is not used.
> Did I miss the id in completion side?
> 
> [1] https://github.com/pizhenwei/linux/blob/7a13b310d1338c462f8e0b13d39a571645bc4698/include/uapi/linux/virtio_of.h#L129

I got your point clearly. Thanks a lot! I'm working on the spec and I'll 
send it soon. Let's continue to discuss the detail in that series.

-- 
zhenwei pi

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2023-04-28  7:55 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-23 11:29 [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA) zhenwei pi
2023-04-24  3:40 ` [virtio-dev] " Jason Wang
2023-04-24  3:40   ` Jason Wang
2023-04-24 13:38   ` zhenwei pi
2023-04-25  5:03     ` [virtio-dev] " Parav Pandit
2023-04-25  5:03       ` Parav Pandit
2023-04-25  6:31       ` [virtio-dev] " Jason Wang
2023-04-25  6:31         ` Jason Wang
2023-04-25 13:27         ` [virtio-dev] " Parav Pandit
2023-04-25 13:27           ` Parav Pandit
2023-04-25  7:02       ` zhenwei pi
2023-04-27  8:20       ` [virtio-dev] " zhenwei pi
2023-04-27  8:20         ` zhenwei pi
2023-04-27 20:31         ` [virtio-dev] " Parav Pandit
2023-04-27 20:31           ` Parav Pandit
2023-04-28  7:53           ` zhenwei pi
2023-04-25  6:36     ` [virtio-dev] " Jason Wang
2023-04-25  6:36       ` Jason Wang
2023-04-25  7:21       ` zhenwei pi
2023-04-26  9:29       ` [virtio-dev] " Xuan Zhuo
2023-04-26  9:29         ` [virtio-comment] " Xuan Zhuo
2023-04-25 13:55   ` [virtio-dev] " Stefan Hajnoczi
2023-04-25 13:55     ` Stefan Hajnoczi
2023-04-26  1:08     ` [virtio-dev] Re: " zhenwei pi
2023-04-26  1:08       ` zhenwei pi
2023-04-25 14:09   ` [virtio-dev] " Stefan Hajnoczi
2023-04-25 14:09     ` Stefan Hajnoczi
2023-04-26  3:03     ` [virtio-dev] " Jason Wang
2023-04-26  3:03       ` Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.