All of lore.kernel.org
 help / color / mirror / Atom feed
* VSOCK benchmark and optimizations
@ 2019-04-01 16:32 ` Stefano Garzarella
  0 siblings, 0 replies; 11+ messages in thread
From: Stefano Garzarella @ 2019-04-01 16:32 UTC (permalink / raw)
  To: alex.bennee; +Cc: qemu devel list, netdev

Hi Alex,
I'm sending you some benchmarks and information about VSOCK CCing qemu-devel
and linux-netdev (maybe this info could be useful for others :))

One of the VSOCK advantages is the simple configuration: you don't need to set
up IP addresses for guest/host, and it can be used with the standard POSIX
socket API. [1]

I'm currently working on it, so the "optimized" values are still work in
progress and I'll send the patches upstream (Linux) as soon as possible.
(I hope in 1 or 2 weeks)

Optimizations:
+ reducing the number of credit update packets
  - RX side sent, on every packet received, an empty packet only to inform the
    TX side about the space in the RX buffer.
+ increase RX buffers size to 64 KB (from 4 KB)
+ merge packets to fill RX buffers

As benchmark tool I used iperf3 [2] modified with VSOCK support:

             host -> guest [Gbps]      guest -> host [Gbps]
pkt_size    before opt.  optimized    before opt.  optimized
  1K            0.5         1.6           1.4         1.4
  2K            1.1         3.1           2.3         2.5
  4K            2.0         5.6           4.2         4.4
  8K            3.2        10.2           7.2         7.5
  16K           6.4        14.2           9.4        11.3
  32K           9.8        18.9           9.2        17.8
  64K          13.8        22.9           8.8        25.0
  128K         17.6        24.5           7.7        25.7
  256K         19.0        24.8           8.1        25.6
  512K         20.8        25.1           8.1        25.4


How to reproduce:

host$ modprobe vhost_vsock
host$ qemu-system-x86_64 ... -device vhost-vsock-pci,guest-cid=3
      # Note: Guest CID should be >= 3
      # (0, 1 are reserved and 2 identify the host)

guest$ iperf3 --vsock -s

host$ iperf3 --vsock -c 3 -l ${pkt_size}      # host -> guest
host$ iperf3 --vsock -c 3 -l ${pkt_size} -R   # guest -> host


If you want, I can do a similar benchmark (with iperf3) using a networking
card (do you have a specific configuration?).

Let me know if you need more details!

Thanks,
Stefano

[1] https://wiki.qemu.org/Features/VirtioVsock
[2] https://github.com/stefano-garzarella/iperf/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] VSOCK benchmark and optimizations
@ 2019-04-01 16:32 ` Stefano Garzarella
  0 siblings, 0 replies; 11+ messages in thread
From: Stefano Garzarella @ 2019-04-01 16:32 UTC (permalink / raw)
  To: alex.bennee; +Cc: qemu devel list, netdev

Hi Alex,
I'm sending you some benchmarks and information about VSOCK CCing qemu-devel
and linux-netdev (maybe this info could be useful for others :))

One of the VSOCK advantages is the simple configuration: you don't need to set
up IP addresses for guest/host, and it can be used with the standard POSIX
socket API. [1]

I'm currently working on it, so the "optimized" values are still work in
progress and I'll send the patches upstream (Linux) as soon as possible.
(I hope in 1 or 2 weeks)

Optimizations:
+ reducing the number of credit update packets
  - RX side sent, on every packet received, an empty packet only to inform the
    TX side about the space in the RX buffer.
+ increase RX buffers size to 64 KB (from 4 KB)
+ merge packets to fill RX buffers

As benchmark tool I used iperf3 [2] modified with VSOCK support:

             host -> guest [Gbps]      guest -> host [Gbps]
pkt_size    before opt.  optimized    before opt.  optimized
  1K            0.5         1.6           1.4         1.4
  2K            1.1         3.1           2.3         2.5
  4K            2.0         5.6           4.2         4.4
  8K            3.2        10.2           7.2         7.5
  16K           6.4        14.2           9.4        11.3
  32K           9.8        18.9           9.2        17.8
  64K          13.8        22.9           8.8        25.0
  128K         17.6        24.5           7.7        25.7
  256K         19.0        24.8           8.1        25.6
  512K         20.8        25.1           8.1        25.4


How to reproduce:

host$ modprobe vhost_vsock
host$ qemu-system-x86_64 ... -device vhost-vsock-pci,guest-cid=3
      # Note: Guest CID should be >= 3
      # (0, 1 are reserved and 2 identify the host)

guest$ iperf3 --vsock -s

host$ iperf3 --vsock -c 3 -l ${pkt_size}      # host -> guest
host$ iperf3 --vsock -c 3 -l ${pkt_size} -R   # guest -> host


If you want, I can do a similar benchmark (with iperf3) using a networking
card (do you have a specific configuration?).

Let me know if you need more details!

Thanks,
Stefano

[1] https://wiki.qemu.org/Features/VirtioVsock
[2] https://github.com/stefano-garzarella/iperf/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: VSOCK benchmark and optimizations
  2019-04-01 16:32 ` [Qemu-devel] " Stefano Garzarella
@ 2019-04-02  4:19   ` Alex Bennée
  -1 siblings, 0 replies; 11+ messages in thread
From: Alex Bennée @ 2019-04-02  4:19 UTC (permalink / raw)
  To: Stefano Garzarella; +Cc: qemu devel list, netdev


Stefano Garzarella <sgarzare@redhat.com> writes:

> Hi Alex,
> I'm sending you some benchmarks and information about VSOCK CCing qemu-devel
> and linux-netdev (maybe this info could be useful for others :))
>
> One of the VSOCK advantages is the simple configuration: you don't need to set
> up IP addresses for guest/host, and it can be used with the standard POSIX
> socket API. [1]
>
> I'm currently working on it, so the "optimized" values are still work in
> progress and I'll send the patches upstream (Linux) as soon as possible.
> (I hope in 1 or 2 weeks)
>
> Optimizations:
> + reducing the number of credit update packets
>   - RX side sent, on every packet received, an empty packet only to inform the
>     TX side about the space in the RX buffer.
> + increase RX buffers size to 64 KB (from 4 KB)
> + merge packets to fill RX buffers
>
> As benchmark tool I used iperf3 [2] modified with VSOCK support:
>
>              host -> guest [Gbps]      guest -> host [Gbps]
> pkt_size    before opt.  optimized    before opt.  optimized
>   1K            0.5         1.6           1.4         1.4
>   2K            1.1         3.1           2.3         2.5
>   4K            2.0         5.6           4.2         4.4
>   8K            3.2        10.2           7.2         7.5
>   16K           6.4        14.2           9.4        11.3
>   32K           9.8        18.9           9.2        17.8
>   64K          13.8        22.9           8.8        25.0
>   128K         17.6        24.5           7.7        25.7
>   256K         19.0        24.8           8.1        25.6
>   512K         20.8        25.1           8.1        25.4
>
>
> How to reproduce:
>
> host$ modprobe vhost_vsock
> host$ qemu-system-x86_64 ... -device vhost-vsock-pci,guest-cid=3
>       # Note: Guest CID should be >= 3
>       # (0, 1 are reserved and 2 identify the host)
>
> guest$ iperf3 --vsock -s
>
> host$ iperf3 --vsock -c 3 -l ${pkt_size}      # host -> guest
> host$ iperf3 --vsock -c 3 -l ${pkt_size} -R   # guest -> host
>
>
> If you want, I can do a similar benchmark (with iperf3) using a networking
> card (do you have a specific configuration?).

My main interest is how it stacks up against:

  --device virtio-net-pci and I guess the vhost equivalent

AIUI one of the motivators was being able to run something like NFS for
a guest FS over vsock instead of the overhead from UDP and having to
deal with the additional complication of having a working network setup.

>
> Let me know if you need more details!
>
> Thanks,
> Stefano
>
> [1] https://wiki.qemu.org/Features/VirtioVsock
> [2] https://github.com/stefano-garzarella/iperf/


-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] VSOCK benchmark and optimizations
@ 2019-04-02  4:19   ` Alex Bennée
  0 siblings, 0 replies; 11+ messages in thread
From: Alex Bennée @ 2019-04-02  4:19 UTC (permalink / raw)
  To: Stefano Garzarella; +Cc: qemu devel list, netdev


Stefano Garzarella <sgarzare@redhat.com> writes:

> Hi Alex,
> I'm sending you some benchmarks and information about VSOCK CCing qemu-devel
> and linux-netdev (maybe this info could be useful for others :))
>
> One of the VSOCK advantages is the simple configuration: you don't need to set
> up IP addresses for guest/host, and it can be used with the standard POSIX
> socket API. [1]
>
> I'm currently working on it, so the "optimized" values are still work in
> progress and I'll send the patches upstream (Linux) as soon as possible.
> (I hope in 1 or 2 weeks)
>
> Optimizations:
> + reducing the number of credit update packets
>   - RX side sent, on every packet received, an empty packet only to inform the
>     TX side about the space in the RX buffer.
> + increase RX buffers size to 64 KB (from 4 KB)
> + merge packets to fill RX buffers
>
> As benchmark tool I used iperf3 [2] modified with VSOCK support:
>
>              host -> guest [Gbps]      guest -> host [Gbps]
> pkt_size    before opt.  optimized    before opt.  optimized
>   1K            0.5         1.6           1.4         1.4
>   2K            1.1         3.1           2.3         2.5
>   4K            2.0         5.6           4.2         4.4
>   8K            3.2        10.2           7.2         7.5
>   16K           6.4        14.2           9.4        11.3
>   32K           9.8        18.9           9.2        17.8
>   64K          13.8        22.9           8.8        25.0
>   128K         17.6        24.5           7.7        25.7
>   256K         19.0        24.8           8.1        25.6
>   512K         20.8        25.1           8.1        25.4
>
>
> How to reproduce:
>
> host$ modprobe vhost_vsock
> host$ qemu-system-x86_64 ... -device vhost-vsock-pci,guest-cid=3
>       # Note: Guest CID should be >= 3
>       # (0, 1 are reserved and 2 identify the host)
>
> guest$ iperf3 --vsock -s
>
> host$ iperf3 --vsock -c 3 -l ${pkt_size}      # host -> guest
> host$ iperf3 --vsock -c 3 -l ${pkt_size} -R   # guest -> host
>
>
> If you want, I can do a similar benchmark (with iperf3) using a networking
> card (do you have a specific configuration?).

My main interest is how it stacks up against:

  --device virtio-net-pci and I guess the vhost equivalent

AIUI one of the motivators was being able to run something like NFS for
a guest FS over vsock instead of the overhead from UDP and having to
deal with the additional complication of having a working network setup.

>
> Let me know if you need more details!
>
> Thanks,
> Stefano
>
> [1] https://wiki.qemu.org/Features/VirtioVsock
> [2] https://github.com/stefano-garzarella/iperf/


-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: VSOCK benchmark and optimizations
  2019-04-02  4:19   ` [Qemu-devel] " Alex Bennée
@ 2019-04-02  7:37     ` Stefano Garzarella
  -1 siblings, 0 replies; 11+ messages in thread
From: Stefano Garzarella @ 2019-04-02  7:37 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu devel list, netdev, Stefan Hajnoczi

On Tue, Apr 02, 2019 at 04:19:25AM +0000, Alex Bennée wrote:
> 
> Stefano Garzarella <sgarzare@redhat.com> writes:
> 
> > Hi Alex,
> > I'm sending you some benchmarks and information about VSOCK CCing qemu-devel
> > and linux-netdev (maybe this info could be useful for others :))
> >
> > One of the VSOCK advantages is the simple configuration: you don't need to set
> > up IP addresses for guest/host, and it can be used with the standard POSIX
> > socket API. [1]
> >
> > I'm currently working on it, so the "optimized" values are still work in
> > progress and I'll send the patches upstream (Linux) as soon as possible.
> > (I hope in 1 or 2 weeks)
> >
> > Optimizations:
> > + reducing the number of credit update packets
> >   - RX side sent, on every packet received, an empty packet only to inform the
> >     TX side about the space in the RX buffer.
> > + increase RX buffers size to 64 KB (from 4 KB)
> > + merge packets to fill RX buffers
> >
> > As benchmark tool I used iperf3 [2] modified with VSOCK support:
> >
> >              host -> guest [Gbps]      guest -> host [Gbps]
> > pkt_size    before opt.  optimized    before opt.  optimized
> >   1K            0.5         1.6           1.4         1.4
> >   2K            1.1         3.1           2.3         2.5
> >   4K            2.0         5.6           4.2         4.4
> >   8K            3.2        10.2           7.2         7.5
> >   16K           6.4        14.2           9.4        11.3
> >   32K           9.8        18.9           9.2        17.8
> >   64K          13.8        22.9           8.8        25.0
> >   128K         17.6        24.5           7.7        25.7
> >   256K         19.0        24.8           8.1        25.6
> >   512K         20.8        25.1           8.1        25.4
> >
> >
> > How to reproduce:
> >
> > host$ modprobe vhost_vsock
> > host$ qemu-system-x86_64 ... -device vhost-vsock-pci,guest-cid=3
> >       # Note: Guest CID should be >= 3
> >       # (0, 1 are reserved and 2 identify the host)
> >
> > guest$ iperf3 --vsock -s
> >
> > host$ iperf3 --vsock -c 3 -l ${pkt_size}      # host -> guest
> > host$ iperf3 --vsock -c 3 -l ${pkt_size} -R   # guest -> host
> >
> >
> > If you want, I can do a similar benchmark (with iperf3) using a networking
> > card (do you have a specific configuration?).
> 
> My main interest is how it stacks up against:
> 
>   --device virtio-net-pci and I guess the vhost equivalent

I'll do some tests with virtio-net and vhost.

> 
> AIUI one of the motivators was being able to run something like NFS for
> a guest FS over vsock instead of the overhead from UDP and having to
> deal with the additional complication of having a working network setup.
> 

CCing Stefan.

I know he is working on virtio-fs that maybe suite better with your use cases.
He also worked on VSOCK support for NFS, but I think it is not merged upstream.

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] VSOCK benchmark and optimizations
@ 2019-04-02  7:37     ` Stefano Garzarella
  0 siblings, 0 replies; 11+ messages in thread
From: Stefano Garzarella @ 2019-04-02  7:37 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu devel list, netdev, Stefan Hajnoczi

On Tue, Apr 02, 2019 at 04:19:25AM +0000, Alex Bennée wrote:
> 
> Stefano Garzarella <sgarzare@redhat.com> writes:
> 
> > Hi Alex,
> > I'm sending you some benchmarks and information about VSOCK CCing qemu-devel
> > and linux-netdev (maybe this info could be useful for others :))
> >
> > One of the VSOCK advantages is the simple configuration: you don't need to set
> > up IP addresses for guest/host, and it can be used with the standard POSIX
> > socket API. [1]
> >
> > I'm currently working on it, so the "optimized" values are still work in
> > progress and I'll send the patches upstream (Linux) as soon as possible.
> > (I hope in 1 or 2 weeks)
> >
> > Optimizations:
> > + reducing the number of credit update packets
> >   - RX side sent, on every packet received, an empty packet only to inform the
> >     TX side about the space in the RX buffer.
> > + increase RX buffers size to 64 KB (from 4 KB)
> > + merge packets to fill RX buffers
> >
> > As benchmark tool I used iperf3 [2] modified with VSOCK support:
> >
> >              host -> guest [Gbps]      guest -> host [Gbps]
> > pkt_size    before opt.  optimized    before opt.  optimized
> >   1K            0.5         1.6           1.4         1.4
> >   2K            1.1         3.1           2.3         2.5
> >   4K            2.0         5.6           4.2         4.4
> >   8K            3.2        10.2           7.2         7.5
> >   16K           6.4        14.2           9.4        11.3
> >   32K           9.8        18.9           9.2        17.8
> >   64K          13.8        22.9           8.8        25.0
> >   128K         17.6        24.5           7.7        25.7
> >   256K         19.0        24.8           8.1        25.6
> >   512K         20.8        25.1           8.1        25.4
> >
> >
> > How to reproduce:
> >
> > host$ modprobe vhost_vsock
> > host$ qemu-system-x86_64 ... -device vhost-vsock-pci,guest-cid=3
> >       # Note: Guest CID should be >= 3
> >       # (0, 1 are reserved and 2 identify the host)
> >
> > guest$ iperf3 --vsock -s
> >
> > host$ iperf3 --vsock -c 3 -l ${pkt_size}      # host -> guest
> > host$ iperf3 --vsock -c 3 -l ${pkt_size} -R   # guest -> host
> >
> >
> > If you want, I can do a similar benchmark (with iperf3) using a networking
> > card (do you have a specific configuration?).
> 
> My main interest is how it stacks up against:
> 
>   --device virtio-net-pci and I guess the vhost equivalent

I'll do some tests with virtio-net and vhost.

> 
> AIUI one of the motivators was being able to run something like NFS for
> a guest FS over vsock instead of the overhead from UDP and having to
> deal with the additional complication of having a working network setup.
> 

CCing Stefan.

I know he is working on virtio-fs that maybe suite better with your use cases.
He also worked on VSOCK support for NFS, but I think it is not merged upstream.

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] VSOCK benchmark and optimizations
  2019-04-01 16:32 ` [Qemu-devel] " Stefano Garzarella
  (?)
  (?)
@ 2019-04-03 12:34 ` Stefan Hajnoczi
  2019-04-03 15:10   ` Stefano Garzarella
  -1 siblings, 1 reply; 11+ messages in thread
From: Stefan Hajnoczi @ 2019-04-03 12:34 UTC (permalink / raw)
  To: Stefano Garzarella; +Cc: alex.bennee, netdev, qemu devel list

[-- Attachment #1: Type: text/plain, Size: 1858 bytes --]

On Mon, Apr 01, 2019 at 06:32:40PM +0200, Stefano Garzarella wrote:
> Hi Alex,
> I'm sending you some benchmarks and information about VSOCK CCing qemu-devel
> and linux-netdev (maybe this info could be useful for others :))
> 
> One of the VSOCK advantages is the simple configuration: you don't need to set
> up IP addresses for guest/host, and it can be used with the standard POSIX
> socket API. [1]
> 
> I'm currently working on it, so the "optimized" values are still work in
> progress and I'll send the patches upstream (Linux) as soon as possible.
> (I hope in 1 or 2 weeks)
> 
> Optimizations:
> + reducing the number of credit update packets
>   - RX side sent, on every packet received, an empty packet only to inform the
>     TX side about the space in the RX buffer.
> + increase RX buffers size to 64 KB (from 4 KB)
> + merge packets to fill RX buffers
> 
> As benchmark tool I used iperf3 [2] modified with VSOCK support:
> 
>              host -> guest [Gbps]      guest -> host [Gbps]
> pkt_size    before opt.  optimized    before opt.  optimized
>   1K            0.5         1.6           1.4         1.4

This is a "large" small package size.  I think 64 bytes is a common
"small" packet size and is worth benchmarking too.

>   2K            1.1         3.1           2.3         2.5
>   4K            2.0         5.6           4.2         4.4
>   8K            3.2        10.2           7.2         7.5
>   16K           6.4        14.2           9.4        11.3
>   32K           9.8        18.9           9.2        17.8
>   64K          13.8        22.9           8.8        25.0
>   128K         17.6        24.5           7.7        25.7
>   256K         19.0        24.8           8.1        25.6
>   512K         20.8        25.1           8.1        25.4

Nice improvements!

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] VSOCK benchmark and optimizations
  2019-04-02  7:37     ` [Qemu-devel] " Stefano Garzarella
  (?)
@ 2019-04-03 12:36     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2019-04-03 12:36 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Alex Bennée, netdev, qemu devel list, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 3516 bytes --]

On Tue, Apr 02, 2019 at 09:37:06AM +0200, Stefano Garzarella wrote:
> On Tue, Apr 02, 2019 at 04:19:25AM +0000, Alex Bennée wrote:
> > 
> > Stefano Garzarella <sgarzare@redhat.com> writes:
> > 
> > > Hi Alex,
> > > I'm sending you some benchmarks and information about VSOCK CCing qemu-devel
> > > and linux-netdev (maybe this info could be useful for others :))
> > >
> > > One of the VSOCK advantages is the simple configuration: you don't need to set
> > > up IP addresses for guest/host, and it can be used with the standard POSIX
> > > socket API. [1]
> > >
> > > I'm currently working on it, so the "optimized" values are still work in
> > > progress and I'll send the patches upstream (Linux) as soon as possible.
> > > (I hope in 1 or 2 weeks)
> > >
> > > Optimizations:
> > > + reducing the number of credit update packets
> > >   - RX side sent, on every packet received, an empty packet only to inform the
> > >     TX side about the space in the RX buffer.
> > > + increase RX buffers size to 64 KB (from 4 KB)
> > > + merge packets to fill RX buffers
> > >
> > > As benchmark tool I used iperf3 [2] modified with VSOCK support:
> > >
> > >              host -> guest [Gbps]      guest -> host [Gbps]
> > > pkt_size    before opt.  optimized    before opt.  optimized
> > >   1K            0.5         1.6           1.4         1.4
> > >   2K            1.1         3.1           2.3         2.5
> > >   4K            2.0         5.6           4.2         4.4
> > >   8K            3.2        10.2           7.2         7.5
> > >   16K           6.4        14.2           9.4        11.3
> > >   32K           9.8        18.9           9.2        17.8
> > >   64K          13.8        22.9           8.8        25.0
> > >   128K         17.6        24.5           7.7        25.7
> > >   256K         19.0        24.8           8.1        25.6
> > >   512K         20.8        25.1           8.1        25.4
> > >
> > >
> > > How to reproduce:
> > >
> > > host$ modprobe vhost_vsock
> > > host$ qemu-system-x86_64 ... -device vhost-vsock-pci,guest-cid=3
> > >       # Note: Guest CID should be >= 3
> > >       # (0, 1 are reserved and 2 identify the host)
> > >
> > > guest$ iperf3 --vsock -s
> > >
> > > host$ iperf3 --vsock -c 3 -l ${pkt_size}      # host -> guest
> > > host$ iperf3 --vsock -c 3 -l ${pkt_size} -R   # guest -> host
> > >
> > >
> > > If you want, I can do a similar benchmark (with iperf3) using a networking
> > > card (do you have a specific configuration?).
> > 
> > My main interest is how it stacks up against:
> > 
> >   --device virtio-net-pci and I guess the vhost equivalent
> 
> I'll do some tests with virtio-net and vhost.
> 
> > 
> > AIUI one of the motivators was being able to run something like NFS for
> > a guest FS over vsock instead of the overhead from UDP and having to
> > deal with the additional complication of having a working network setup.
> > 
> 
> CCing Stefan.
> 
> I know he is working on virtio-fs that maybe suite better with your use cases.
> He also worked on VSOCK support for NFS, but I think it is not merged upstream.

Hi Alex,
David Gilbert, Vivek Goyal, Miklos Szeredi, and I are working on
virtio-fs for host<->guest file sharing.  It performs better than
virtio-9p and we're currently working on getting it upstream (first the
VIRTIO device spec, then Linux and QEMU patches).

You can read about it and try it here:

https://virtio-fs.gitlab.io/

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] VSOCK benchmark and optimizations
  2019-04-03 12:34 ` Stefan Hajnoczi
@ 2019-04-03 15:10   ` Stefano Garzarella
  0 siblings, 0 replies; 11+ messages in thread
From: Stefano Garzarella @ 2019-04-03 15:10 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: alex.bennee, netdev, qemu devel list

On Wed, Apr 03, 2019 at 01:34:38PM +0100, Stefan Hajnoczi wrote:
> On Mon, Apr 01, 2019 at 06:32:40PM +0200, Stefano Garzarella wrote:
> > Hi Alex,
> > I'm sending you some benchmarks and information about VSOCK CCing qemu-devel
> > and linux-netdev (maybe this info could be useful for others :))
> > 
> > One of the VSOCK advantages is the simple configuration: you don't need to set
> > up IP addresses for guest/host, and it can be used with the standard POSIX
> > socket API. [1]
> > 
> > I'm currently working on it, so the "optimized" values are still work in
> > progress and I'll send the patches upstream (Linux) as soon as possible.
> > (I hope in 1 or 2 weeks)
> > 
> > Optimizations:
> > + reducing the number of credit update packets
> >   - RX side sent, on every packet received, an empty packet only to inform the
> >     TX side about the space in the RX buffer.
> > + increase RX buffers size to 64 KB (from 4 KB)
> > + merge packets to fill RX buffers
> > 
> > As benchmark tool I used iperf3 [2] modified with VSOCK support:
> > 
> >              host -> guest [Gbps]      guest -> host [Gbps]
> > pkt_size    before opt.  optimized    before opt.  optimized
> >   1K            0.5         1.6           1.4         1.4
> 
> This is a "large" small package size.  I think 64 bytes is a common
> "small" packet size and is worth benchmarking too.
> 

Okay, I'll add more small packet sizes for the benchmark.

> >   2K            1.1         3.1           2.3         2.5
> >   4K            2.0         5.6           4.2         4.4
> >   8K            3.2        10.2           7.2         7.5
> >   16K           6.4        14.2           9.4        11.3
> >   32K           9.8        18.9           9.2        17.8
> >   64K          13.8        22.9           8.8        25.0
> >   128K         17.6        24.5           7.7        25.7
> >   256K         19.0        24.8           8.1        25.6
> >   512K         20.8        25.1           8.1        25.4
> 
> Nice improvements!

Thanks :)

I'm cleaning the patches, doing step by step benchmarks and I hope I'll
send the series upstream in these days.

Stefano

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: VSOCK benchmark and optimizations
  2019-04-02  4:19   ` [Qemu-devel] " Alex Bennée
@ 2019-04-04 10:47     ` Stefano Garzarella
  -1 siblings, 0 replies; 11+ messages in thread
From: Stefano Garzarella @ 2019-04-04 10:47 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu devel list, netdev, Stefan Hajnoczi

On Tue, Apr 02, 2019 at 04:19:25AM +0000, Alex Bennée wrote:
> 
> My main interest is how it stacks up against:
> 
>   --device virtio-net-pci and I guess the vhost equivalent
> 

Hi Alex,
I added TCP tests on virtio-net and I did also a test with TCP_NODELAY,
just to be fair, because VSOCK doesn't implement something like this
(maybe could be an improvement to add for maximizing the throughput).
I set the MTU to the maximum allowed (65520).

I also redo the VSOCK tests. There are some differences because now I'm
using tuned to have fewer fluctuations and I removed batching in VSOCK
optimization because it is not ready to be published.

                   VSOCK               TCP + virtio-net + vhost
             host -> guest [Gbps]         host -> guest [Gbps]
pkt_size    before opt.  optimized      TCP_NODELAY
  64            0.060       0.096           0.16        0.15
  256           0.22        0.36            0.32        0.57
  512           0.42        0.74            1.2         1.2
  1K            0.7         1.5             2.1         2.1
  2K            1.5         2.9             3.5         3.4
  4K            2.5         5.3             5.5         5.3
  8K            3.9         8.8             8.0         7.9
  16K           6.6        12.8             9.8        10.2
  32K           9.9        18.1            11.8        10.7
  64K          13.5        21.4            11.4        11.3
  128K         17.9        23.6            11.2        11.0
  256K         18.0        24.4            11.1        11.0
  512K         18.4        25.3            10.1        10.7

Note: Maybe I have something miss configured because TCP on virtio-net
doesn't exceed 11 Gbps.

                   VSOCK               TCP + virtio-net + vhost
             guest -> host [Gbps]         guest -> host [Gbps]
pkt_size    before opt.  optimized      TCP_NODELAY
  64            0.088       0.101           0.24        0.24
  256           0.35        0.41            0.36        1.03
  512           0.70        0.73            0.69        1.6
  1K            1.1         1.3             1.1         3.0
  2K            2.4         2.6             2.1         5.5
  4K            4.3         4.5             3.8         8.8
  8K            7.3         7.6             6.6        20.0
  16K           9.2        11.1            12.3        29.4
  32K           8.3        18.1            19.3        28.2
  64K           8.3        25.4            20.6        28.7
  128K          7.2        26.7            23.1        27.9
  256K          7.7        24.9            28.5        29.4
  512K          7.7        25.0            28.3        29.3

virtio-net is well optimized than VSOCK, but we are near :). Maybe we
will use virtio-net as a transport for VSOCK, in order to avoid duplicate
optimizations.

How to reproduce TCP tests:

host$ ip link set dev br0 mtu 65520
host$ ip link set dev tap0 mtu 65520
host$ qemu-system-x86_64 ... \
      -netdev tap,id=net0,vhost=on,ifname=tap0,script=no,downscript=no \
      -device virtio-net-pci,netdev=net0

guest$ ip link set dev eth0 mtu 65520
guest$ iperf3 -s

host$ iperf3 -c ${VM_IP} -N -l ${pkt_size}      # host -> guest
host$ iperf3 -c ${VM_IP} -N -l ${pkt_size} -R   # guest -> host


Cheers,
Stefano

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] VSOCK benchmark and optimizations
@ 2019-04-04 10:47     ` Stefano Garzarella
  0 siblings, 0 replies; 11+ messages in thread
From: Stefano Garzarella @ 2019-04-04 10:47 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu devel list, netdev, Stefan Hajnoczi

On Tue, Apr 02, 2019 at 04:19:25AM +0000, Alex Bennée wrote:
> 
> My main interest is how it stacks up against:
> 
>   --device virtio-net-pci and I guess the vhost equivalent
> 

Hi Alex,
I added TCP tests on virtio-net and I did also a test with TCP_NODELAY,
just to be fair, because VSOCK doesn't implement something like this
(maybe could be an improvement to add for maximizing the throughput).
I set the MTU to the maximum allowed (65520).

I also redo the VSOCK tests. There are some differences because now I'm
using tuned to have fewer fluctuations and I removed batching in VSOCK
optimization because it is not ready to be published.

                   VSOCK               TCP + virtio-net + vhost
             host -> guest [Gbps]         host -> guest [Gbps]
pkt_size    before opt.  optimized      TCP_NODELAY
  64            0.060       0.096           0.16        0.15
  256           0.22        0.36            0.32        0.57
  512           0.42        0.74            1.2         1.2
  1K            0.7         1.5             2.1         2.1
  2K            1.5         2.9             3.5         3.4
  4K            2.5         5.3             5.5         5.3
  8K            3.9         8.8             8.0         7.9
  16K           6.6        12.8             9.8        10.2
  32K           9.9        18.1            11.8        10.7
  64K          13.5        21.4            11.4        11.3
  128K         17.9        23.6            11.2        11.0
  256K         18.0        24.4            11.1        11.0
  512K         18.4        25.3            10.1        10.7

Note: Maybe I have something miss configured because TCP on virtio-net
doesn't exceed 11 Gbps.

                   VSOCK               TCP + virtio-net + vhost
             guest -> host [Gbps]         guest -> host [Gbps]
pkt_size    before opt.  optimized      TCP_NODELAY
  64            0.088       0.101           0.24        0.24
  256           0.35        0.41            0.36        1.03
  512           0.70        0.73            0.69        1.6
  1K            1.1         1.3             1.1         3.0
  2K            2.4         2.6             2.1         5.5
  4K            4.3         4.5             3.8         8.8
  8K            7.3         7.6             6.6        20.0
  16K           9.2        11.1            12.3        29.4
  32K           8.3        18.1            19.3        28.2
  64K           8.3        25.4            20.6        28.7
  128K          7.2        26.7            23.1        27.9
  256K          7.7        24.9            28.5        29.4
  512K          7.7        25.0            28.3        29.3

virtio-net is well optimized than VSOCK, but we are near :). Maybe we
will use virtio-net as a transport for VSOCK, in order to avoid duplicate
optimizations.

How to reproduce TCP tests:

host$ ip link set dev br0 mtu 65520
host$ ip link set dev tap0 mtu 65520
host$ qemu-system-x86_64 ... \
      -netdev tap,id=net0,vhost=on,ifname=tap0,script=no,downscript=no \
      -device virtio-net-pci,netdev=net0

guest$ ip link set dev eth0 mtu 65520
guest$ iperf3 -s

host$ iperf3 -c ${VM_IP} -N -l ${pkt_size}      # host -> guest
host$ iperf3 -c ${VM_IP} -N -l ${pkt_size} -R   # guest -> host


Cheers,
Stefano

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-04-04 10:54 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-01 16:32 VSOCK benchmark and optimizations Stefano Garzarella
2019-04-01 16:32 ` [Qemu-devel] " Stefano Garzarella
2019-04-02  4:19 ` Alex Bennée
2019-04-02  4:19   ` [Qemu-devel] " Alex Bennée
2019-04-02  7:37   ` Stefano Garzarella
2019-04-02  7:37     ` [Qemu-devel] " Stefano Garzarella
2019-04-03 12:36     ` Stefan Hajnoczi
2019-04-04 10:47   ` Stefano Garzarella
2019-04-04 10:47     ` [Qemu-devel] " Stefano Garzarella
2019-04-03 12:34 ` Stefan Hajnoczi
2019-04-03 15:10   ` Stefano Garzarella

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.