All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v5 0/5] vsock/virtio: optimizations to increase the throughput
@ 2019-07-30 15:43 Stefano Garzarella
  0 siblings, 0 replies; 8+ messages in thread
From: Stefano Garzarella @ 2019-07-30 15:43 UTC (permalink / raw)
  To: netdev
  Cc: kvm, Michael S. Tsirkin, linux-kernel, virtualization,
	Stefan Hajnoczi, David S. Miller

This series tries to increase the throughput of virtio-vsock with slight
changes.
While I was testing the v2 of this series I discovered an huge use of memory,
so I added patch 1 to mitigate this issue. I put it in this series in order
to better track the performance trends.

v5:
- rebased all patches on net-next
- added Stefan's R-b and Michael's A-b

v4: https://patchwork.kernel.org/cover/11047717
v3: https://patchwork.kernel.org/cover/10970145
v2: https://patchwork.kernel.org/cover/10938743
v1: https://patchwork.kernel.org/cover/10885431

Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
support. As Michael suggested in the v1, I booted host and guest with 'nosmap'.

A brief description of patches:
- Patches 1:   limit the memory usage with an extra copy for small packets
- Patches 2+3: reduce the number of credit update messages sent to the
               transmitter
- Patches 4+5: allow the host to split packets on multiple buffers and use
               VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed

                    host -> guest [Gbps]
pkt_size before opt   p 1     p 2+3    p 4+5

32         0.032     0.030    0.048    0.051
64         0.061     0.059    0.108    0.117
128        0.122     0.112    0.227    0.234
256        0.244     0.241    0.418    0.415
512        0.459     0.466    0.847    0.865
1K         0.927     0.919    1.657    1.641
2K         1.884     1.813    3.262    3.269
4K         3.378     3.326    6.044    6.195
8K         5.637     5.676   10.141   11.287
16K        8.250     8.402   15.976   16.736
32K       13.327    13.204   19.013   20.515
64K       21.241    21.341   20.973   21.879
128K      21.851    22.354   21.816   23.203
256K      21.408    21.693   21.846   24.088
512K      21.600    21.899   21.921   24.106

                    guest -> host [Gbps]
pkt_size before opt   p 1     p 2+3    p 4+5

32         0.045     0.046    0.057    0.057
64         0.089     0.091    0.103    0.104
128        0.170     0.179    0.192    0.200
256        0.364     0.351    0.361    0.379
512        0.709     0.699    0.731    0.790
1K         1.399     1.407    1.395    1.427
2K         2.670     2.684    2.745    2.835
4K         5.171     5.199    5.305    5.451
8K         8.442     8.500   10.083    9.941
16K       12.305    12.259   13.519   15.385
32K       11.418    11.150   11.988   24.680
64K       10.778    10.659   11.589   35.273
128K      10.421    10.339   10.939   40.338
256K      10.300     9.719   10.508   36.562
512K       9.833     9.808   10.612   35.979

As Stefan suggested in the v1, I measured also the efficiency in this way:
    efficiency = Mbps / (%CPU_Host + %CPU_Guest)

The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
but it's provided for free from iperf3 and could be an indication.

        host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
pkt_size before opt   p 1     p 2+3    p 4+5

32         0.35      0.45     0.79     1.02
64         0.56      0.80     1.41     1.54
128        1.11      1.52     3.03     3.12
256        2.20      2.16     5.44     5.58
512        4.17      4.18    10.96    11.46
1K         8.30      8.26    20.99    20.89
2K        16.82     16.31    39.76    39.73
4K        30.89     30.79    74.07    75.73
8K        53.74     54.49   124.24   148.91
16K       80.68     83.63   200.21   232.79
32K      132.27    132.52   260.81   357.07
64K      229.82    230.40   300.19   444.18
128K     332.60    329.78   331.51   492.28
256K     331.06    337.22   339.59   511.59
512K     335.58    328.50   331.56   504.56

        guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
pkt_size before opt   p 1     p 2+3    p 4+5

32         0.43      0.43     0.53     0.56
64         0.85      0.86     1.04     1.10
128        1.63      1.71     2.07     2.13
256        3.48      3.35     4.02     4.22
512        6.80      6.67     7.97     8.63
1K        13.32     13.31    15.72    15.94
2K        25.79     25.92    30.84    30.98
4K        50.37     50.48    58.79    59.69
8K        95.90     96.15   107.04   110.33
16K      145.80    145.43   143.97   174.70
32K      147.06    144.74   146.02   282.48
64K      145.25    143.99   141.62   406.40
128K     149.34    146.96   147.49   489.34
256K     156.35    149.81   152.21   536.37
512K     151.65    150.74   151.52   519.93

[1] https://github.com/stefano-garzarella/iperf/

Stefano Garzarella (5):
  vsock/virtio: limit the memory used per-socket
  vsock/virtio: reduce credit update messages
  vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
  vhost/vsock: split packets to send using multiple buffers
  vsock/virtio: change the maximum packet size allowed

 drivers/vhost/vsock.c                   | 68 ++++++++++++-----
 include/linux/virtio_vsock.h            |  4 +-
 net/vmw_vsock/virtio_transport.c        |  1 +
 net/vmw_vsock/virtio_transport_common.c | 99 ++++++++++++++++++++-----
 4 files changed, 134 insertions(+), 38 deletions(-)

-- 
2.20.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next v5 0/5] vsock/virtio: optimizations to increase the throughput
  2019-07-30 15:55   ` Michael S. Tsirkin
@ 2019-07-30 16:02     ` Stefano Garzarella
  2019-07-30 16:02     ` Stefano Garzarella
  1 sibling, 0 replies; 8+ messages in thread
From: Stefano Garzarella @ 2019-07-30 16:02 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, virtualization, linux-kernel, Jason Wang, kvm,
	David S. Miller, Stefan Hajnoczi

On Tue, Jul 30, 2019 at 11:55:09AM -0400, Michael S. Tsirkin wrote:
> On Tue, Jul 30, 2019 at 11:54:53AM -0400, Michael S. Tsirkin wrote:
> > On Tue, Jul 30, 2019 at 05:43:29PM +0200, Stefano Garzarella wrote:
> > > This series tries to increase the throughput of virtio-vsock with slight
> > > changes.
> > > While I was testing the v2 of this series I discovered an huge use of memory,
> > > so I added patch 1 to mitigate this issue. I put it in this series in order
> > > to better track the performance trends.
> > > 
> > > v5:
> > > - rebased all patches on net-next
> > > - added Stefan's R-b and Michael's A-b
> > 
> > This doesn't solve all issues around allocation - as I mentioned I think
> > we will need to improve accounting for that,
> > and maybe add pre-allocation.

Yes, I'll work on it following your suggestions.

> > But it's a great series of steps in the right direction!
> > 

Thank you very much :)
Stefano

> 
> 
> So
> 
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> 
> > 
> > > v4: https://patchwork.kernel.org/cover/11047717
> > > v3: https://patchwork.kernel.org/cover/10970145
> > > v2: https://patchwork.kernel.org/cover/10938743
> > > v1: https://patchwork.kernel.org/cover/10885431
> > > 
> > > Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
> > > support. As Michael suggested in the v1, I booted host and guest with 'nosmap'.
> > > 
> > > A brief description of patches:
> > > - Patches 1:   limit the memory usage with an extra copy for small packets
> > > - Patches 2+3: reduce the number of credit update messages sent to the
> > >                transmitter
> > > - Patches 4+5: allow the host to split packets on multiple buffers and use
> > >                VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
> > > 
> > >                     host -> guest [Gbps]
> > > pkt_size before opt   p 1     p 2+3    p 4+5
> > > 
> > > 32         0.032     0.030    0.048    0.051
> > > 64         0.061     0.059    0.108    0.117
> > > 128        0.122     0.112    0.227    0.234
> > > 256        0.244     0.241    0.418    0.415
> > > 512        0.459     0.466    0.847    0.865
> > > 1K         0.927     0.919    1.657    1.641
> > > 2K         1.884     1.813    3.262    3.269
> > > 4K         3.378     3.326    6.044    6.195
> > > 8K         5.637     5.676   10.141   11.287
> > > 16K        8.250     8.402   15.976   16.736
> > > 32K       13.327    13.204   19.013   20.515
> > > 64K       21.241    21.341   20.973   21.879
> > > 128K      21.851    22.354   21.816   23.203
> > > 256K      21.408    21.693   21.846   24.088
> > > 512K      21.600    21.899   21.921   24.106
> > > 
> > >                     guest -> host [Gbps]
> > > pkt_size before opt   p 1     p 2+3    p 4+5
> > > 
> > > 32         0.045     0.046    0.057    0.057
> > > 64         0.089     0.091    0.103    0.104
> > > 128        0.170     0.179    0.192    0.200
> > > 256        0.364     0.351    0.361    0.379
> > > 512        0.709     0.699    0.731    0.790
> > > 1K         1.399     1.407    1.395    1.427
> > > 2K         2.670     2.684    2.745    2.835
> > > 4K         5.171     5.199    5.305    5.451
> > > 8K         8.442     8.500   10.083    9.941
> > > 16K       12.305    12.259   13.519   15.385
> > > 32K       11.418    11.150   11.988   24.680
> > > 64K       10.778    10.659   11.589   35.273
> > > 128K      10.421    10.339   10.939   40.338
> > > 256K      10.300     9.719   10.508   36.562
> > > 512K       9.833     9.808   10.612   35.979
> > > 
> > > As Stefan suggested in the v1, I measured also the efficiency in this way:
> > >     efficiency = Mbps / (%CPU_Host + %CPU_Guest)
> > > 
> > > The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
> > > but it's provided for free from iperf3 and could be an indication.
> > > 
> > >         host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> > > pkt_size before opt   p 1     p 2+3    p 4+5
> > > 
> > > 32         0.35      0.45     0.79     1.02
> > > 64         0.56      0.80     1.41     1.54
> > > 128        1.11      1.52     3.03     3.12
> > > 256        2.20      2.16     5.44     5.58
> > > 512        4.17      4.18    10.96    11.46
> > > 1K         8.30      8.26    20.99    20.89
> > > 2K        16.82     16.31    39.76    39.73
> > > 4K        30.89     30.79    74.07    75.73
> > > 8K        53.74     54.49   124.24   148.91
> > > 16K       80.68     83.63   200.21   232.79
> > > 32K      132.27    132.52   260.81   357.07
> > > 64K      229.82    230.40   300.19   444.18
> > > 128K     332.60    329.78   331.51   492.28
> > > 256K     331.06    337.22   339.59   511.59
> > > 512K     335.58    328.50   331.56   504.56
> > > 
> > >         guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> > > pkt_size before opt   p 1     p 2+3    p 4+5
> > > 
> > > 32         0.43      0.43     0.53     0.56
> > > 64         0.85      0.86     1.04     1.10
> > > 128        1.63      1.71     2.07     2.13
> > > 256        3.48      3.35     4.02     4.22
> > > 512        6.80      6.67     7.97     8.63
> > > 1K        13.32     13.31    15.72    15.94
> > > 2K        25.79     25.92    30.84    30.98
> > > 4K        50.37     50.48    58.79    59.69
> > > 8K        95.90     96.15   107.04   110.33
> > > 16K      145.80    145.43   143.97   174.70
> > > 32K      147.06    144.74   146.02   282.48
> > > 64K      145.25    143.99   141.62   406.40
> > > 128K     149.34    146.96   147.49   489.34
> > > 256K     156.35    149.81   152.21   536.37
> > > 512K     151.65    150.74   151.52   519.93
> > > 
> > > [1] https://github.com/stefano-garzarella/iperf/
> > > 
> > > Stefano Garzarella (5):
> > >   vsock/virtio: limit the memory used per-socket
> > >   vsock/virtio: reduce credit update messages
> > >   vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
> > >   vhost/vsock: split packets to send using multiple buffers
> > >   vsock/virtio: change the maximum packet size allowed
> > > 
> > >  drivers/vhost/vsock.c                   | 68 ++++++++++++-----
> > >  include/linux/virtio_vsock.h            |  4 +-
> > >  net/vmw_vsock/virtio_transport.c        |  1 +
> > >  net/vmw_vsock/virtio_transport_common.c | 99 ++++++++++++++++++++-----
> > >  4 files changed, 134 insertions(+), 38 deletions(-)
> > > 
> > > -- 
> > > 2.20.1

-- 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next v5 0/5] vsock/virtio: optimizations to increase the throughput
  2019-07-30 15:55   ` Michael S. Tsirkin
  2019-07-30 16:02     ` Stefano Garzarella
@ 2019-07-30 16:02     ` Stefano Garzarella
  1 sibling, 0 replies; 8+ messages in thread
From: Stefano Garzarella @ 2019-07-30 16:02 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, netdev, linux-kernel, virtualization, Stefan Hajnoczi,
	David S. Miller

On Tue, Jul 30, 2019 at 11:55:09AM -0400, Michael S. Tsirkin wrote:
> On Tue, Jul 30, 2019 at 11:54:53AM -0400, Michael S. Tsirkin wrote:
> > On Tue, Jul 30, 2019 at 05:43:29PM +0200, Stefano Garzarella wrote:
> > > This series tries to increase the throughput of virtio-vsock with slight
> > > changes.
> > > While I was testing the v2 of this series I discovered an huge use of memory,
> > > so I added patch 1 to mitigate this issue. I put it in this series in order
> > > to better track the performance trends.
> > > 
> > > v5:
> > > - rebased all patches on net-next
> > > - added Stefan's R-b and Michael's A-b
> > 
> > This doesn't solve all issues around allocation - as I mentioned I think
> > we will need to improve accounting for that,
> > and maybe add pre-allocation.

Yes, I'll work on it following your suggestions.

> > But it's a great series of steps in the right direction!
> > 

Thank you very much :)
Stefano

> 
> 
> So
> 
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> 
> > 
> > > v4: https://patchwork.kernel.org/cover/11047717
> > > v3: https://patchwork.kernel.org/cover/10970145
> > > v2: https://patchwork.kernel.org/cover/10938743
> > > v1: https://patchwork.kernel.org/cover/10885431
> > > 
> > > Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
> > > support. As Michael suggested in the v1, I booted host and guest with 'nosmap'.
> > > 
> > > A brief description of patches:
> > > - Patches 1:   limit the memory usage with an extra copy for small packets
> > > - Patches 2+3: reduce the number of credit update messages sent to the
> > >                transmitter
> > > - Patches 4+5: allow the host to split packets on multiple buffers and use
> > >                VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
> > > 
> > >                     host -> guest [Gbps]
> > > pkt_size before opt   p 1     p 2+3    p 4+5
> > > 
> > > 32         0.032     0.030    0.048    0.051
> > > 64         0.061     0.059    0.108    0.117
> > > 128        0.122     0.112    0.227    0.234
> > > 256        0.244     0.241    0.418    0.415
> > > 512        0.459     0.466    0.847    0.865
> > > 1K         0.927     0.919    1.657    1.641
> > > 2K         1.884     1.813    3.262    3.269
> > > 4K         3.378     3.326    6.044    6.195
> > > 8K         5.637     5.676   10.141   11.287
> > > 16K        8.250     8.402   15.976   16.736
> > > 32K       13.327    13.204   19.013   20.515
> > > 64K       21.241    21.341   20.973   21.879
> > > 128K      21.851    22.354   21.816   23.203
> > > 256K      21.408    21.693   21.846   24.088
> > > 512K      21.600    21.899   21.921   24.106
> > > 
> > >                     guest -> host [Gbps]
> > > pkt_size before opt   p 1     p 2+3    p 4+5
> > > 
> > > 32         0.045     0.046    0.057    0.057
> > > 64         0.089     0.091    0.103    0.104
> > > 128        0.170     0.179    0.192    0.200
> > > 256        0.364     0.351    0.361    0.379
> > > 512        0.709     0.699    0.731    0.790
> > > 1K         1.399     1.407    1.395    1.427
> > > 2K         2.670     2.684    2.745    2.835
> > > 4K         5.171     5.199    5.305    5.451
> > > 8K         8.442     8.500   10.083    9.941
> > > 16K       12.305    12.259   13.519   15.385
> > > 32K       11.418    11.150   11.988   24.680
> > > 64K       10.778    10.659   11.589   35.273
> > > 128K      10.421    10.339   10.939   40.338
> > > 256K      10.300     9.719   10.508   36.562
> > > 512K       9.833     9.808   10.612   35.979
> > > 
> > > As Stefan suggested in the v1, I measured also the efficiency in this way:
> > >     efficiency = Mbps / (%CPU_Host + %CPU_Guest)
> > > 
> > > The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
> > > but it's provided for free from iperf3 and could be an indication.
> > > 
> > >         host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> > > pkt_size before opt   p 1     p 2+3    p 4+5
> > > 
> > > 32         0.35      0.45     0.79     1.02
> > > 64         0.56      0.80     1.41     1.54
> > > 128        1.11      1.52     3.03     3.12
> > > 256        2.20      2.16     5.44     5.58
> > > 512        4.17      4.18    10.96    11.46
> > > 1K         8.30      8.26    20.99    20.89
> > > 2K        16.82     16.31    39.76    39.73
> > > 4K        30.89     30.79    74.07    75.73
> > > 8K        53.74     54.49   124.24   148.91
> > > 16K       80.68     83.63   200.21   232.79
> > > 32K      132.27    132.52   260.81   357.07
> > > 64K      229.82    230.40   300.19   444.18
> > > 128K     332.60    329.78   331.51   492.28
> > > 256K     331.06    337.22   339.59   511.59
> > > 512K     335.58    328.50   331.56   504.56
> > > 
> > >         guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> > > pkt_size before opt   p 1     p 2+3    p 4+5
> > > 
> > > 32         0.43      0.43     0.53     0.56
> > > 64         0.85      0.86     1.04     1.10
> > > 128        1.63      1.71     2.07     2.13
> > > 256        3.48      3.35     4.02     4.22
> > > 512        6.80      6.67     7.97     8.63
> > > 1K        13.32     13.31    15.72    15.94
> > > 2K        25.79     25.92    30.84    30.98
> > > 4K        50.37     50.48    58.79    59.69
> > > 8K        95.90     96.15   107.04   110.33
> > > 16K      145.80    145.43   143.97   174.70
> > > 32K      147.06    144.74   146.02   282.48
> > > 64K      145.25    143.99   141.62   406.40
> > > 128K     149.34    146.96   147.49   489.34
> > > 256K     156.35    149.81   152.21   536.37
> > > 512K     151.65    150.74   151.52   519.93
> > > 
> > > [1] https://github.com/stefano-garzarella/iperf/
> > > 
> > > Stefano Garzarella (5):
> > >   vsock/virtio: limit the memory used per-socket
> > >   vsock/virtio: reduce credit update messages
> > >   vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
> > >   vhost/vsock: split packets to send using multiple buffers
> > >   vsock/virtio: change the maximum packet size allowed
> > > 
> > >  drivers/vhost/vsock.c                   | 68 ++++++++++++-----
> > >  include/linux/virtio_vsock.h            |  4 +-
> > >  net/vmw_vsock/virtio_transport.c        |  1 +
> > >  net/vmw_vsock/virtio_transport_common.c | 99 ++++++++++++++++++++-----
> > >  4 files changed, 134 insertions(+), 38 deletions(-)
> > > 
> > > -- 
> > > 2.20.1

-- 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next v5 0/5] vsock/virtio: optimizations to increase the throughput
  2019-07-30 15:54 ` Michael S. Tsirkin
  2019-07-30 15:55   ` Michael S. Tsirkin
@ 2019-07-30 15:55   ` Michael S. Tsirkin
  2019-07-30 16:02     ` Stefano Garzarella
  2019-07-30 16:02     ` Stefano Garzarella
  1 sibling, 2 replies; 8+ messages in thread
From: Michael S. Tsirkin @ 2019-07-30 15:55 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, virtualization, linux-kernel, Jason Wang, kvm,
	David S. Miller, Stefan Hajnoczi

On Tue, Jul 30, 2019 at 11:54:53AM -0400, Michael S. Tsirkin wrote:
> On Tue, Jul 30, 2019 at 05:43:29PM +0200, Stefano Garzarella wrote:
> > This series tries to increase the throughput of virtio-vsock with slight
> > changes.
> > While I was testing the v2 of this series I discovered an huge use of memory,
> > so I added patch 1 to mitigate this issue. I put it in this series in order
> > to better track the performance trends.
> > 
> > v5:
> > - rebased all patches on net-next
> > - added Stefan's R-b and Michael's A-b
> 
> This doesn't solve all issues around allocation - as I mentioned I think
> we will need to improve accounting for that,
> and maybe add pre-allocation.
> But it's a great series of steps in the right direction!
> 


So

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> 
> > v4: https://patchwork.kernel.org/cover/11047717
> > v3: https://patchwork.kernel.org/cover/10970145
> > v2: https://patchwork.kernel.org/cover/10938743
> > v1: https://patchwork.kernel.org/cover/10885431
> > 
> > Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
> > support. As Michael suggested in the v1, I booted host and guest with 'nosmap'.
> > 
> > A brief description of patches:
> > - Patches 1:   limit the memory usage with an extra copy for small packets
> > - Patches 2+3: reduce the number of credit update messages sent to the
> >                transmitter
> > - Patches 4+5: allow the host to split packets on multiple buffers and use
> >                VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
> > 
> >                     host -> guest [Gbps]
> > pkt_size before opt   p 1     p 2+3    p 4+5
> > 
> > 32         0.032     0.030    0.048    0.051
> > 64         0.061     0.059    0.108    0.117
> > 128        0.122     0.112    0.227    0.234
> > 256        0.244     0.241    0.418    0.415
> > 512        0.459     0.466    0.847    0.865
> > 1K         0.927     0.919    1.657    1.641
> > 2K         1.884     1.813    3.262    3.269
> > 4K         3.378     3.326    6.044    6.195
> > 8K         5.637     5.676   10.141   11.287
> > 16K        8.250     8.402   15.976   16.736
> > 32K       13.327    13.204   19.013   20.515
> > 64K       21.241    21.341   20.973   21.879
> > 128K      21.851    22.354   21.816   23.203
> > 256K      21.408    21.693   21.846   24.088
> > 512K      21.600    21.899   21.921   24.106
> > 
> >                     guest -> host [Gbps]
> > pkt_size before opt   p 1     p 2+3    p 4+5
> > 
> > 32         0.045     0.046    0.057    0.057
> > 64         0.089     0.091    0.103    0.104
> > 128        0.170     0.179    0.192    0.200
> > 256        0.364     0.351    0.361    0.379
> > 512        0.709     0.699    0.731    0.790
> > 1K         1.399     1.407    1.395    1.427
> > 2K         2.670     2.684    2.745    2.835
> > 4K         5.171     5.199    5.305    5.451
> > 8K         8.442     8.500   10.083    9.941
> > 16K       12.305    12.259   13.519   15.385
> > 32K       11.418    11.150   11.988   24.680
> > 64K       10.778    10.659   11.589   35.273
> > 128K      10.421    10.339   10.939   40.338
> > 256K      10.300     9.719   10.508   36.562
> > 512K       9.833     9.808   10.612   35.979
> > 
> > As Stefan suggested in the v1, I measured also the efficiency in this way:
> >     efficiency = Mbps / (%CPU_Host + %CPU_Guest)
> > 
> > The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
> > but it's provided for free from iperf3 and could be an indication.
> > 
> >         host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> > pkt_size before opt   p 1     p 2+3    p 4+5
> > 
> > 32         0.35      0.45     0.79     1.02
> > 64         0.56      0.80     1.41     1.54
> > 128        1.11      1.52     3.03     3.12
> > 256        2.20      2.16     5.44     5.58
> > 512        4.17      4.18    10.96    11.46
> > 1K         8.30      8.26    20.99    20.89
> > 2K        16.82     16.31    39.76    39.73
> > 4K        30.89     30.79    74.07    75.73
> > 8K        53.74     54.49   124.24   148.91
> > 16K       80.68     83.63   200.21   232.79
> > 32K      132.27    132.52   260.81   357.07
> > 64K      229.82    230.40   300.19   444.18
> > 128K     332.60    329.78   331.51   492.28
> > 256K     331.06    337.22   339.59   511.59
> > 512K     335.58    328.50   331.56   504.56
> > 
> >         guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> > pkt_size before opt   p 1     p 2+3    p 4+5
> > 
> > 32         0.43      0.43     0.53     0.56
> > 64         0.85      0.86     1.04     1.10
> > 128        1.63      1.71     2.07     2.13
> > 256        3.48      3.35     4.02     4.22
> > 512        6.80      6.67     7.97     8.63
> > 1K        13.32     13.31    15.72    15.94
> > 2K        25.79     25.92    30.84    30.98
> > 4K        50.37     50.48    58.79    59.69
> > 8K        95.90     96.15   107.04   110.33
> > 16K      145.80    145.43   143.97   174.70
> > 32K      147.06    144.74   146.02   282.48
> > 64K      145.25    143.99   141.62   406.40
> > 128K     149.34    146.96   147.49   489.34
> > 256K     156.35    149.81   152.21   536.37
> > 512K     151.65    150.74   151.52   519.93
> > 
> > [1] https://github.com/stefano-garzarella/iperf/
> > 
> > Stefano Garzarella (5):
> >   vsock/virtio: limit the memory used per-socket
> >   vsock/virtio: reduce credit update messages
> >   vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
> >   vhost/vsock: split packets to send using multiple buffers
> >   vsock/virtio: change the maximum packet size allowed
> > 
> >  drivers/vhost/vsock.c                   | 68 ++++++++++++-----
> >  include/linux/virtio_vsock.h            |  4 +-
> >  net/vmw_vsock/virtio_transport.c        |  1 +
> >  net/vmw_vsock/virtio_transport_common.c | 99 ++++++++++++++++++++-----
> >  4 files changed, 134 insertions(+), 38 deletions(-)
> > 
> > -- 
> > 2.20.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next v5 0/5] vsock/virtio: optimizations to increase the throughput
  2019-07-30 15:54 ` Michael S. Tsirkin
@ 2019-07-30 15:55   ` Michael S. Tsirkin
  2019-07-30 15:55   ` Michael S. Tsirkin
  1 sibling, 0 replies; 8+ messages in thread
From: Michael S. Tsirkin @ 2019-07-30 15:55 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: kvm, netdev, linux-kernel, virtualization, Stefan Hajnoczi,
	David S. Miller

On Tue, Jul 30, 2019 at 11:54:53AM -0400, Michael S. Tsirkin wrote:
> On Tue, Jul 30, 2019 at 05:43:29PM +0200, Stefano Garzarella wrote:
> > This series tries to increase the throughput of virtio-vsock with slight
> > changes.
> > While I was testing the v2 of this series I discovered an huge use of memory,
> > so I added patch 1 to mitigate this issue. I put it in this series in order
> > to better track the performance trends.
> > 
> > v5:
> > - rebased all patches on net-next
> > - added Stefan's R-b and Michael's A-b
> 
> This doesn't solve all issues around allocation - as I mentioned I think
> we will need to improve accounting for that,
> and maybe add pre-allocation.
> But it's a great series of steps in the right direction!
> 


So

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> 
> > v4: https://patchwork.kernel.org/cover/11047717
> > v3: https://patchwork.kernel.org/cover/10970145
> > v2: https://patchwork.kernel.org/cover/10938743
> > v1: https://patchwork.kernel.org/cover/10885431
> > 
> > Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
> > support. As Michael suggested in the v1, I booted host and guest with 'nosmap'.
> > 
> > A brief description of patches:
> > - Patches 1:   limit the memory usage with an extra copy for small packets
> > - Patches 2+3: reduce the number of credit update messages sent to the
> >                transmitter
> > - Patches 4+5: allow the host to split packets on multiple buffers and use
> >                VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
> > 
> >                     host -> guest [Gbps]
> > pkt_size before opt   p 1     p 2+3    p 4+5
> > 
> > 32         0.032     0.030    0.048    0.051
> > 64         0.061     0.059    0.108    0.117
> > 128        0.122     0.112    0.227    0.234
> > 256        0.244     0.241    0.418    0.415
> > 512        0.459     0.466    0.847    0.865
> > 1K         0.927     0.919    1.657    1.641
> > 2K         1.884     1.813    3.262    3.269
> > 4K         3.378     3.326    6.044    6.195
> > 8K         5.637     5.676   10.141   11.287
> > 16K        8.250     8.402   15.976   16.736
> > 32K       13.327    13.204   19.013   20.515
> > 64K       21.241    21.341   20.973   21.879
> > 128K      21.851    22.354   21.816   23.203
> > 256K      21.408    21.693   21.846   24.088
> > 512K      21.600    21.899   21.921   24.106
> > 
> >                     guest -> host [Gbps]
> > pkt_size before opt   p 1     p 2+3    p 4+5
> > 
> > 32         0.045     0.046    0.057    0.057
> > 64         0.089     0.091    0.103    0.104
> > 128        0.170     0.179    0.192    0.200
> > 256        0.364     0.351    0.361    0.379
> > 512        0.709     0.699    0.731    0.790
> > 1K         1.399     1.407    1.395    1.427
> > 2K         2.670     2.684    2.745    2.835
> > 4K         5.171     5.199    5.305    5.451
> > 8K         8.442     8.500   10.083    9.941
> > 16K       12.305    12.259   13.519   15.385
> > 32K       11.418    11.150   11.988   24.680
> > 64K       10.778    10.659   11.589   35.273
> > 128K      10.421    10.339   10.939   40.338
> > 256K      10.300     9.719   10.508   36.562
> > 512K       9.833     9.808   10.612   35.979
> > 
> > As Stefan suggested in the v1, I measured also the efficiency in this way:
> >     efficiency = Mbps / (%CPU_Host + %CPU_Guest)
> > 
> > The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
> > but it's provided for free from iperf3 and could be an indication.
> > 
> >         host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> > pkt_size before opt   p 1     p 2+3    p 4+5
> > 
> > 32         0.35      0.45     0.79     1.02
> > 64         0.56      0.80     1.41     1.54
> > 128        1.11      1.52     3.03     3.12
> > 256        2.20      2.16     5.44     5.58
> > 512        4.17      4.18    10.96    11.46
> > 1K         8.30      8.26    20.99    20.89
> > 2K        16.82     16.31    39.76    39.73
> > 4K        30.89     30.79    74.07    75.73
> > 8K        53.74     54.49   124.24   148.91
> > 16K       80.68     83.63   200.21   232.79
> > 32K      132.27    132.52   260.81   357.07
> > 64K      229.82    230.40   300.19   444.18
> > 128K     332.60    329.78   331.51   492.28
> > 256K     331.06    337.22   339.59   511.59
> > 512K     335.58    328.50   331.56   504.56
> > 
> >         guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> > pkt_size before opt   p 1     p 2+3    p 4+5
> > 
> > 32         0.43      0.43     0.53     0.56
> > 64         0.85      0.86     1.04     1.10
> > 128        1.63      1.71     2.07     2.13
> > 256        3.48      3.35     4.02     4.22
> > 512        6.80      6.67     7.97     8.63
> > 1K        13.32     13.31    15.72    15.94
> > 2K        25.79     25.92    30.84    30.98
> > 4K        50.37     50.48    58.79    59.69
> > 8K        95.90     96.15   107.04   110.33
> > 16K      145.80    145.43   143.97   174.70
> > 32K      147.06    144.74   146.02   282.48
> > 64K      145.25    143.99   141.62   406.40
> > 128K     149.34    146.96   147.49   489.34
> > 256K     156.35    149.81   152.21   536.37
> > 512K     151.65    150.74   151.52   519.93
> > 
> > [1] https://github.com/stefano-garzarella/iperf/
> > 
> > Stefano Garzarella (5):
> >   vsock/virtio: limit the memory used per-socket
> >   vsock/virtio: reduce credit update messages
> >   vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
> >   vhost/vsock: split packets to send using multiple buffers
> >   vsock/virtio: change the maximum packet size allowed
> > 
> >  drivers/vhost/vsock.c                   | 68 ++++++++++++-----
> >  include/linux/virtio_vsock.h            |  4 +-
> >  net/vmw_vsock/virtio_transport.c        |  1 +
> >  net/vmw_vsock/virtio_transport_common.c | 99 ++++++++++++++++++++-----
> >  4 files changed, 134 insertions(+), 38 deletions(-)
> > 
> > -- 
> > 2.20.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next v5 0/5] vsock/virtio: optimizations to increase the throughput
  2019-07-30 15:43 Stefano Garzarella
  2019-07-30 15:54 ` Michael S. Tsirkin
@ 2019-07-30 15:54 ` Michael S. Tsirkin
  2019-07-30 15:55   ` Michael S. Tsirkin
  2019-07-30 15:55   ` Michael S. Tsirkin
  1 sibling, 2 replies; 8+ messages in thread
From: Michael S. Tsirkin @ 2019-07-30 15:54 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, virtualization, linux-kernel, Jason Wang, kvm,
	David S. Miller, Stefan Hajnoczi

On Tue, Jul 30, 2019 at 05:43:29PM +0200, Stefano Garzarella wrote:
> This series tries to increase the throughput of virtio-vsock with slight
> changes.
> While I was testing the v2 of this series I discovered an huge use of memory,
> so I added patch 1 to mitigate this issue. I put it in this series in order
> to better track the performance trends.
> 
> v5:
> - rebased all patches on net-next
> - added Stefan's R-b and Michael's A-b

This doesn't solve all issues around allocation - as I mentioned I think
we will need to improve accounting for that,
and maybe add pre-allocation.
But it's a great series of steps in the right direction!



> v4: https://patchwork.kernel.org/cover/11047717
> v3: https://patchwork.kernel.org/cover/10970145
> v2: https://patchwork.kernel.org/cover/10938743
> v1: https://patchwork.kernel.org/cover/10885431
> 
> Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
> support. As Michael suggested in the v1, I booted host and guest with 'nosmap'.
> 
> A brief description of patches:
> - Patches 1:   limit the memory usage with an extra copy for small packets
> - Patches 2+3: reduce the number of credit update messages sent to the
>                transmitter
> - Patches 4+5: allow the host to split packets on multiple buffers and use
>                VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
> 
>                     host -> guest [Gbps]
> pkt_size before opt   p 1     p 2+3    p 4+5
> 
> 32         0.032     0.030    0.048    0.051
> 64         0.061     0.059    0.108    0.117
> 128        0.122     0.112    0.227    0.234
> 256        0.244     0.241    0.418    0.415
> 512        0.459     0.466    0.847    0.865
> 1K         0.927     0.919    1.657    1.641
> 2K         1.884     1.813    3.262    3.269
> 4K         3.378     3.326    6.044    6.195
> 8K         5.637     5.676   10.141   11.287
> 16K        8.250     8.402   15.976   16.736
> 32K       13.327    13.204   19.013   20.515
> 64K       21.241    21.341   20.973   21.879
> 128K      21.851    22.354   21.816   23.203
> 256K      21.408    21.693   21.846   24.088
> 512K      21.600    21.899   21.921   24.106
> 
>                     guest -> host [Gbps]
> pkt_size before opt   p 1     p 2+3    p 4+5
> 
> 32         0.045     0.046    0.057    0.057
> 64         0.089     0.091    0.103    0.104
> 128        0.170     0.179    0.192    0.200
> 256        0.364     0.351    0.361    0.379
> 512        0.709     0.699    0.731    0.790
> 1K         1.399     1.407    1.395    1.427
> 2K         2.670     2.684    2.745    2.835
> 4K         5.171     5.199    5.305    5.451
> 8K         8.442     8.500   10.083    9.941
> 16K       12.305    12.259   13.519   15.385
> 32K       11.418    11.150   11.988   24.680
> 64K       10.778    10.659   11.589   35.273
> 128K      10.421    10.339   10.939   40.338
> 256K      10.300     9.719   10.508   36.562
> 512K       9.833     9.808   10.612   35.979
> 
> As Stefan suggested in the v1, I measured also the efficiency in this way:
>     efficiency = Mbps / (%CPU_Host + %CPU_Guest)
> 
> The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
> but it's provided for free from iperf3 and could be an indication.
> 
>         host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt   p 1     p 2+3    p 4+5
> 
> 32         0.35      0.45     0.79     1.02
> 64         0.56      0.80     1.41     1.54
> 128        1.11      1.52     3.03     3.12
> 256        2.20      2.16     5.44     5.58
> 512        4.17      4.18    10.96    11.46
> 1K         8.30      8.26    20.99    20.89
> 2K        16.82     16.31    39.76    39.73
> 4K        30.89     30.79    74.07    75.73
> 8K        53.74     54.49   124.24   148.91
> 16K       80.68     83.63   200.21   232.79
> 32K      132.27    132.52   260.81   357.07
> 64K      229.82    230.40   300.19   444.18
> 128K     332.60    329.78   331.51   492.28
> 256K     331.06    337.22   339.59   511.59
> 512K     335.58    328.50   331.56   504.56
> 
>         guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt   p 1     p 2+3    p 4+5
> 
> 32         0.43      0.43     0.53     0.56
> 64         0.85      0.86     1.04     1.10
> 128        1.63      1.71     2.07     2.13
> 256        3.48      3.35     4.02     4.22
> 512        6.80      6.67     7.97     8.63
> 1K        13.32     13.31    15.72    15.94
> 2K        25.79     25.92    30.84    30.98
> 4K        50.37     50.48    58.79    59.69
> 8K        95.90     96.15   107.04   110.33
> 16K      145.80    145.43   143.97   174.70
> 32K      147.06    144.74   146.02   282.48
> 64K      145.25    143.99   141.62   406.40
> 128K     149.34    146.96   147.49   489.34
> 256K     156.35    149.81   152.21   536.37
> 512K     151.65    150.74   151.52   519.93
> 
> [1] https://github.com/stefano-garzarella/iperf/
> 
> Stefano Garzarella (5):
>   vsock/virtio: limit the memory used per-socket
>   vsock/virtio: reduce credit update messages
>   vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
>   vhost/vsock: split packets to send using multiple buffers
>   vsock/virtio: change the maximum packet size allowed
> 
>  drivers/vhost/vsock.c                   | 68 ++++++++++++-----
>  include/linux/virtio_vsock.h            |  4 +-
>  net/vmw_vsock/virtio_transport.c        |  1 +
>  net/vmw_vsock/virtio_transport_common.c | 99 ++++++++++++++++++++-----
>  4 files changed, 134 insertions(+), 38 deletions(-)
> 
> -- 
> 2.20.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next v5 0/5] vsock/virtio: optimizations to increase the throughput
  2019-07-30 15:43 Stefano Garzarella
@ 2019-07-30 15:54 ` Michael S. Tsirkin
  2019-07-30 15:54 ` Michael S. Tsirkin
  1 sibling, 0 replies; 8+ messages in thread
From: Michael S. Tsirkin @ 2019-07-30 15:54 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: kvm, netdev, linux-kernel, virtualization, Stefan Hajnoczi,
	David S. Miller

On Tue, Jul 30, 2019 at 05:43:29PM +0200, Stefano Garzarella wrote:
> This series tries to increase the throughput of virtio-vsock with slight
> changes.
> While I was testing the v2 of this series I discovered an huge use of memory,
> so I added patch 1 to mitigate this issue. I put it in this series in order
> to better track the performance trends.
> 
> v5:
> - rebased all patches on net-next
> - added Stefan's R-b and Michael's A-b

This doesn't solve all issues around allocation - as I mentioned I think
we will need to improve accounting for that,
and maybe add pre-allocation.
But it's a great series of steps in the right direction!



> v4: https://patchwork.kernel.org/cover/11047717
> v3: https://patchwork.kernel.org/cover/10970145
> v2: https://patchwork.kernel.org/cover/10938743
> v1: https://patchwork.kernel.org/cover/10885431
> 
> Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
> support. As Michael suggested in the v1, I booted host and guest with 'nosmap'.
> 
> A brief description of patches:
> - Patches 1:   limit the memory usage with an extra copy for small packets
> - Patches 2+3: reduce the number of credit update messages sent to the
>                transmitter
> - Patches 4+5: allow the host to split packets on multiple buffers and use
>                VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
> 
>                     host -> guest [Gbps]
> pkt_size before opt   p 1     p 2+3    p 4+5
> 
> 32         0.032     0.030    0.048    0.051
> 64         0.061     0.059    0.108    0.117
> 128        0.122     0.112    0.227    0.234
> 256        0.244     0.241    0.418    0.415
> 512        0.459     0.466    0.847    0.865
> 1K         0.927     0.919    1.657    1.641
> 2K         1.884     1.813    3.262    3.269
> 4K         3.378     3.326    6.044    6.195
> 8K         5.637     5.676   10.141   11.287
> 16K        8.250     8.402   15.976   16.736
> 32K       13.327    13.204   19.013   20.515
> 64K       21.241    21.341   20.973   21.879
> 128K      21.851    22.354   21.816   23.203
> 256K      21.408    21.693   21.846   24.088
> 512K      21.600    21.899   21.921   24.106
> 
>                     guest -> host [Gbps]
> pkt_size before opt   p 1     p 2+3    p 4+5
> 
> 32         0.045     0.046    0.057    0.057
> 64         0.089     0.091    0.103    0.104
> 128        0.170     0.179    0.192    0.200
> 256        0.364     0.351    0.361    0.379
> 512        0.709     0.699    0.731    0.790
> 1K         1.399     1.407    1.395    1.427
> 2K         2.670     2.684    2.745    2.835
> 4K         5.171     5.199    5.305    5.451
> 8K         8.442     8.500   10.083    9.941
> 16K       12.305    12.259   13.519   15.385
> 32K       11.418    11.150   11.988   24.680
> 64K       10.778    10.659   11.589   35.273
> 128K      10.421    10.339   10.939   40.338
> 256K      10.300     9.719   10.508   36.562
> 512K       9.833     9.808   10.612   35.979
> 
> As Stefan suggested in the v1, I measured also the efficiency in this way:
>     efficiency = Mbps / (%CPU_Host + %CPU_Guest)
> 
> The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
> but it's provided for free from iperf3 and could be an indication.
> 
>         host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt   p 1     p 2+3    p 4+5
> 
> 32         0.35      0.45     0.79     1.02
> 64         0.56      0.80     1.41     1.54
> 128        1.11      1.52     3.03     3.12
> 256        2.20      2.16     5.44     5.58
> 512        4.17      4.18    10.96    11.46
> 1K         8.30      8.26    20.99    20.89
> 2K        16.82     16.31    39.76    39.73
> 4K        30.89     30.79    74.07    75.73
> 8K        53.74     54.49   124.24   148.91
> 16K       80.68     83.63   200.21   232.79
> 32K      132.27    132.52   260.81   357.07
> 64K      229.82    230.40   300.19   444.18
> 128K     332.60    329.78   331.51   492.28
> 256K     331.06    337.22   339.59   511.59
> 512K     335.58    328.50   331.56   504.56
> 
>         guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt   p 1     p 2+3    p 4+5
> 
> 32         0.43      0.43     0.53     0.56
> 64         0.85      0.86     1.04     1.10
> 128        1.63      1.71     2.07     2.13
> 256        3.48      3.35     4.02     4.22
> 512        6.80      6.67     7.97     8.63
> 1K        13.32     13.31    15.72    15.94
> 2K        25.79     25.92    30.84    30.98
> 4K        50.37     50.48    58.79    59.69
> 8K        95.90     96.15   107.04   110.33
> 16K      145.80    145.43   143.97   174.70
> 32K      147.06    144.74   146.02   282.48
> 64K      145.25    143.99   141.62   406.40
> 128K     149.34    146.96   147.49   489.34
> 256K     156.35    149.81   152.21   536.37
> 512K     151.65    150.74   151.52   519.93
> 
> [1] https://github.com/stefano-garzarella/iperf/
> 
> Stefano Garzarella (5):
>   vsock/virtio: limit the memory used per-socket
>   vsock/virtio: reduce credit update messages
>   vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
>   vhost/vsock: split packets to send using multiple buffers
>   vsock/virtio: change the maximum packet size allowed
> 
>  drivers/vhost/vsock.c                   | 68 ++++++++++++-----
>  include/linux/virtio_vsock.h            |  4 +-
>  net/vmw_vsock/virtio_transport.c        |  1 +
>  net/vmw_vsock/virtio_transport_common.c | 99 ++++++++++++++++++++-----
>  4 files changed, 134 insertions(+), 38 deletions(-)
> 
> -- 
> 2.20.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net-next v5 0/5] vsock/virtio: optimizations to increase the throughput
@ 2019-07-30 15:43 Stefano Garzarella
  2019-07-30 15:54 ` Michael S. Tsirkin
  2019-07-30 15:54 ` Michael S. Tsirkin
  0 siblings, 2 replies; 8+ messages in thread
From: Stefano Garzarella @ 2019-07-30 15:43 UTC (permalink / raw)
  To: netdev
  Cc: virtualization, linux-kernel, Jason Wang, kvm,
	Michael S. Tsirkin, David S. Miller, Stefan Hajnoczi

This series tries to increase the throughput of virtio-vsock with slight
changes.
While I was testing the v2 of this series I discovered an huge use of memory,
so I added patch 1 to mitigate this issue. I put it in this series in order
to better track the performance trends.

v5:
- rebased all patches on net-next
- added Stefan's R-b and Michael's A-b

v4: https://patchwork.kernel.org/cover/11047717
v3: https://patchwork.kernel.org/cover/10970145
v2: https://patchwork.kernel.org/cover/10938743
v1: https://patchwork.kernel.org/cover/10885431

Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
support. As Michael suggested in the v1, I booted host and guest with 'nosmap'.

A brief description of patches:
- Patches 1:   limit the memory usage with an extra copy for small packets
- Patches 2+3: reduce the number of credit update messages sent to the
               transmitter
- Patches 4+5: allow the host to split packets on multiple buffers and use
               VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed

                    host -> guest [Gbps]
pkt_size before opt   p 1     p 2+3    p 4+5

32         0.032     0.030    0.048    0.051
64         0.061     0.059    0.108    0.117
128        0.122     0.112    0.227    0.234
256        0.244     0.241    0.418    0.415
512        0.459     0.466    0.847    0.865
1K         0.927     0.919    1.657    1.641
2K         1.884     1.813    3.262    3.269
4K         3.378     3.326    6.044    6.195
8K         5.637     5.676   10.141   11.287
16K        8.250     8.402   15.976   16.736
32K       13.327    13.204   19.013   20.515
64K       21.241    21.341   20.973   21.879
128K      21.851    22.354   21.816   23.203
256K      21.408    21.693   21.846   24.088
512K      21.600    21.899   21.921   24.106

                    guest -> host [Gbps]
pkt_size before opt   p 1     p 2+3    p 4+5

32         0.045     0.046    0.057    0.057
64         0.089     0.091    0.103    0.104
128        0.170     0.179    0.192    0.200
256        0.364     0.351    0.361    0.379
512        0.709     0.699    0.731    0.790
1K         1.399     1.407    1.395    1.427
2K         2.670     2.684    2.745    2.835
4K         5.171     5.199    5.305    5.451
8K         8.442     8.500   10.083    9.941
16K       12.305    12.259   13.519   15.385
32K       11.418    11.150   11.988   24.680
64K       10.778    10.659   11.589   35.273
128K      10.421    10.339   10.939   40.338
256K      10.300     9.719   10.508   36.562
512K       9.833     9.808   10.612   35.979

As Stefan suggested in the v1, I measured also the efficiency in this way:
    efficiency = Mbps / (%CPU_Host + %CPU_Guest)

The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
but it's provided for free from iperf3 and could be an indication.

        host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
pkt_size before opt   p 1     p 2+3    p 4+5

32         0.35      0.45     0.79     1.02
64         0.56      0.80     1.41     1.54
128        1.11      1.52     3.03     3.12
256        2.20      2.16     5.44     5.58
512        4.17      4.18    10.96    11.46
1K         8.30      8.26    20.99    20.89
2K        16.82     16.31    39.76    39.73
4K        30.89     30.79    74.07    75.73
8K        53.74     54.49   124.24   148.91
16K       80.68     83.63   200.21   232.79
32K      132.27    132.52   260.81   357.07
64K      229.82    230.40   300.19   444.18
128K     332.60    329.78   331.51   492.28
256K     331.06    337.22   339.59   511.59
512K     335.58    328.50   331.56   504.56

        guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
pkt_size before opt   p 1     p 2+3    p 4+5

32         0.43      0.43     0.53     0.56
64         0.85      0.86     1.04     1.10
128        1.63      1.71     2.07     2.13
256        3.48      3.35     4.02     4.22
512        6.80      6.67     7.97     8.63
1K        13.32     13.31    15.72    15.94
2K        25.79     25.92    30.84    30.98
4K        50.37     50.48    58.79    59.69
8K        95.90     96.15   107.04   110.33
16K      145.80    145.43   143.97   174.70
32K      147.06    144.74   146.02   282.48
64K      145.25    143.99   141.62   406.40
128K     149.34    146.96   147.49   489.34
256K     156.35    149.81   152.21   536.37
512K     151.65    150.74   151.52   519.93

[1] https://github.com/stefano-garzarella/iperf/

Stefano Garzarella (5):
  vsock/virtio: limit the memory used per-socket
  vsock/virtio: reduce credit update messages
  vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
  vhost/vsock: split packets to send using multiple buffers
  vsock/virtio: change the maximum packet size allowed

 drivers/vhost/vsock.c                   | 68 ++++++++++++-----
 include/linux/virtio_vsock.h            |  4 +-
 net/vmw_vsock/virtio_transport.c        |  1 +
 net/vmw_vsock/virtio_transport_common.c | 99 ++++++++++++++++++++-----
 4 files changed, 134 insertions(+), 38 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-07-30 16:02 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-30 15:43 [PATCH net-next v5 0/5] vsock/virtio: optimizations to increase the throughput Stefano Garzarella
  -- strict thread matches above, loose matches on Subject: below --
2019-07-30 15:43 Stefano Garzarella
2019-07-30 15:54 ` Michael S. Tsirkin
2019-07-30 15:54 ` Michael S. Tsirkin
2019-07-30 15:55   ` Michael S. Tsirkin
2019-07-30 15:55   ` Michael S. Tsirkin
2019-07-30 16:02     ` Stefano Garzarella
2019-07-30 16:02     ` Stefano Garzarella

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.