* [PATCH v2 0/5] VSOCK: support mergeable rx buffer in vhost-vsock
@ 2018-12-12 9:25 jiangyiwen
0 siblings, 0 replies; 14+ messages in thread
From: jiangyiwen @ 2018-12-12 9:25 UTC (permalink / raw)
To: Stefan Hajnoczi, Michael S. Tsirkin, Jason Wang
Cc: netdev, kvm, virtualization
Now vsock only support send/receive small packet, it can't achieve
high performance. As previous discussed with Jason Wang, I revisit the
idea of vhost-net about mergeable rx buffer and implement the mergeable
rx buffer in vhost-vsock, it can allow big packet to be scattered in
into different buffers and improve performance obviously.
This series of patches mainly did three things:
- mergeable buffer implementation
- increase the max send pkt size
- add used and signal guest in a batch
And I write a tool to test the vhost-vsock performance, mainly send big
packet(64K) included guest->Host and Host->Guest. I test performance
independently and the result as follows:
Before performance:
Single socket Multiple sockets(Max Bandwidth)
Guest->Host ~400MB/s ~480MB/s
Host->Guest ~1450MB/s ~1600MB/s
After performance only use implement mergeable rx buffer:
Single socket Multiple sockets(Max Bandwidth)
Guest->Host ~400MB/s ~480MB/s
Host->Guest ~1280MB/s ~1350MB/s
In this case, max send pkt size is still limited to 4K, so Host->Guest
performance will worse than before.
After performance increase the max send pkt size to 64K:
Single socket Multiple sockets(Max Bandwidth)
Guest->Host ~1700MB/s ~2900MB/s
Host->Guest ~1500MB/s ~2440MB/s
After performance all patches are used:
Single socket Multiple sockets(Max Bandwidth)
Guest->Host ~1700MB/s ~2900MB/s
Host->Guest ~1700MB/s ~2900MB/s
From the test results, the performance is improved obviously, and guest
memory will not be wasted.
In addition, in order to support mergeable rx buffer in virtio-vsock,
we need to add a qemu patch to support parse feature.
---
v1 -> v2:
* Addressed comments from Jason Wang.
* Add performance test result independently.
* Use Skb_page_frag_refill() which can use high order page and reduce
the stress of page allocator.
* Still use fixed size(PAGE_SIZE) to fill rx buffer, because too small
size can't fill one full packet, we only 128 vq num now.
* Use iovec to replace buf in struct virtio_vsock_pkt, keep tx and rx
consistency.
* Add virtio_transport ops to get max pkt len, in order to be compatible
with old version.
---
Yiwen Jiang (5):
VSOCK: support fill mergeable rx buffer in guest
VSOCK: support fill data to mergeable rx buffer in host
VSOCK: support receive mergeable rx buffer in guest
VSOCK: increase send pkt len in mergeable mode to improve performance
VSOCK: batch sending rx buffer to increase bandwidth
drivers/vhost/vsock.c | 183 ++++++++++++++++++++-----
include/linux/virtio_vsock.h | 13 +-
include/uapi/linux/virtio_vsock.h | 5 +
net/vmw_vsock/virtio_transport.c | 229 +++++++++++++++++++++++++++-----
net/vmw_vsock/virtio_transport_common.c | 66 ++++++---
5 files changed, 411 insertions(+), 85 deletions(-)
--
1.8.3.1
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 0/5] VSOCK: support mergeable rx buffer in vhost-vsock
2018-12-14 10:24 ` jiangyiwen
@ 2018-12-14 13:22 ` Michael S. Tsirkin
2018-12-14 13:22 ` Michael S. Tsirkin
1 sibling, 0 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2018-12-14 13:22 UTC (permalink / raw)
To: jiangyiwen; +Cc: Stefan Hajnoczi, Jason Wang, netdev, kvm, virtualization
On Fri, Dec 14, 2018 at 06:24:40PM +0800, jiangyiwen wrote:
> On 2018/12/12 23:09, Michael S. Tsirkin wrote:
> > On Wed, Dec 12, 2018 at 05:25:50PM +0800, jiangyiwen wrote:
> >> Now vsock only support send/receive small packet, it can't achieve
> >> high performance. As previous discussed with Jason Wang, I revisit the
> >> idea of vhost-net about mergeable rx buffer and implement the mergeable
> >> rx buffer in vhost-vsock, it can allow big packet to be scattered in
> >> into different buffers and improve performance obviously.
> >>
> >> This series of patches mainly did three things:
> >> - mergeable buffer implementation
> >> - increase the max send pkt size
> >> - add used and signal guest in a batch
> >>
> >> And I write a tool to test the vhost-vsock performance, mainly send big
> >> packet(64K) included guest->Host and Host->Guest. I test performance
> >> independently and the result as follows:
> >>
> >> Before performance:
> >> Single socket Multiple sockets(Max Bandwidth)
> >> Guest->Host ~400MB/s ~480MB/s
> >> Host->Guest ~1450MB/s ~1600MB/s
> >>
> >> After performance only use implement mergeable rx buffer:
> >> Single socket Multiple sockets(Max Bandwidth)
> >> Guest->Host ~400MB/s ~480MB/s
> >> Host->Guest ~1280MB/s ~1350MB/s
> >>
> >> In this case, max send pkt size is still limited to 4K, so Host->Guest
> >> performance will worse than before.
> >
> > It's concerning though, what if application sends small packets?
> > What is the source of the slowdown? Do you know?
> >
>
> Hi Michael,
>
> To the two cases, I test the results included small and big packets as
> follows:
>
> 64K packets performance comparison:
> Single socket Multiple sockets
> Host->Guest(before) 1352.60MB/s 1436.33MB/s
>
>
> Host->Guest(only use mergeable rx buffer) 1290.08MB/s 1212.67MB/s
>
> 4K packets performance comparison:
> Single socket Multiple sockets
> Host->Guest(before) 535.47MB/s 688.67MB/s
> Host->Guest(only use mergeable rx buffer) 522.33MB/s 599.00MB/s
>
> 3K packets performance comparison:
> Single socket Multiple sockets
> Host->Guest(before) 359.74MB/s 442.00MB/s
> Host->Guest(only use mergeable rx buffer) 374.47MB/s 452.33MB/s
>
> We can see an interesting thing, for 64K and 4K packets,
> using mergeable buffer has a poor performance, for 3K packet,
> both have the same performance.
>
> I guess in mergeable mode, when host send a 4k packet to guest, we
> should call vhost_get_vq_desc() twice in host(hdr + 4k data),
> and in guest we also should call virtqueue_get_buf() twice. So
> when packet is smaller than (4k - hdr), it can be packed in a
> single page, so the performance is the same as before.
>
> So in the mergeable mode, the performance may be
> worse in ((4k - hdr), 4k] than before.
>
> Thanks,
> Yiwen.
The conclusion seems to be that mergeable buffers themselves
only hurt performance, but they allow batching which improves
performance. So let's add batching without mergeable buffers then?
> >> After performance increase the max send pkt size to 64K:
> >> Single socket Multiple sockets(Max Bandwidth)
> >> Guest->Host ~1700MB/s ~2900MB/s
> >> Host->Guest ~1500MB/s ~2440MB/s
> >>
> >> After performance all patches are used:
> >> Single socket Multiple sockets(Max Bandwidth)
> >> Guest->Host ~1700MB/s ~2900MB/s
> >> Host->Guest ~1700MB/s ~2900MB/s
> >>
> >> >From the test results, the performance is improved obviously, and guest
> >> memory will not be wasted.
> >>
> >> In addition, in order to support mergeable rx buffer in virtio-vsock,
> >> we need to add a qemu patch to support parse feature.
> >>
> >> ---
> >> v1 -> v2:
> >> * Addressed comments from Jason Wang.
> >> * Add performance test result independently.
> >> * Use Skb_page_frag_refill() which can use high order page and reduce
> >> the stress of page allocator.
> >> * Still use fixed size(PAGE_SIZE) to fill rx buffer, because too small
> >> size can't fill one full packet, we only 128 vq num now.
> >> * Use iovec to replace buf in struct virtio_vsock_pkt, keep tx and rx
> >> consistency.
> >> * Add virtio_transport ops to get max pkt len, in order to be compatible
> >> with old version.
> >> ---
> >>
> >> Yiwen Jiang (5):
> >> VSOCK: support fill mergeable rx buffer in guest
> >> VSOCK: support fill data to mergeable rx buffer in host
> >> VSOCK: support receive mergeable rx buffer in guest
> >> VSOCK: increase send pkt len in mergeable mode to improve performance
> >> VSOCK: batch sending rx buffer to increase bandwidth
> >>
> >> drivers/vhost/vsock.c | 183 ++++++++++++++++++++-----
> >> include/linux/virtio_vsock.h | 13 +-
> >> include/uapi/linux/virtio_vsock.h | 5 +
> >> net/vmw_vsock/virtio_transport.c | 229 +++++++++++++++++++++++++++-----
> >> net/vmw_vsock/virtio_transport_common.c | 66 ++++++---
> >> 5 files changed, 411 insertions(+), 85 deletions(-)
> >>
> >> --
> >> 1.8.3.1
> >
> > .
> >
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 0/5] VSOCK: support mergeable rx buffer in vhost-vsock
2018-12-14 10:24 ` jiangyiwen
2018-12-14 13:22 ` Michael S. Tsirkin
@ 2018-12-14 13:22 ` Michael S. Tsirkin
1 sibling, 0 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2018-12-14 13:22 UTC (permalink / raw)
To: jiangyiwen; +Cc: netdev, kvm, Stefan Hajnoczi, virtualization
On Fri, Dec 14, 2018 at 06:24:40PM +0800, jiangyiwen wrote:
> On 2018/12/12 23:09, Michael S. Tsirkin wrote:
> > On Wed, Dec 12, 2018 at 05:25:50PM +0800, jiangyiwen wrote:
> >> Now vsock only support send/receive small packet, it can't achieve
> >> high performance. As previous discussed with Jason Wang, I revisit the
> >> idea of vhost-net about mergeable rx buffer and implement the mergeable
> >> rx buffer in vhost-vsock, it can allow big packet to be scattered in
> >> into different buffers and improve performance obviously.
> >>
> >> This series of patches mainly did three things:
> >> - mergeable buffer implementation
> >> - increase the max send pkt size
> >> - add used and signal guest in a batch
> >>
> >> And I write a tool to test the vhost-vsock performance, mainly send big
> >> packet(64K) included guest->Host and Host->Guest. I test performance
> >> independently and the result as follows:
> >>
> >> Before performance:
> >> Single socket Multiple sockets(Max Bandwidth)
> >> Guest->Host ~400MB/s ~480MB/s
> >> Host->Guest ~1450MB/s ~1600MB/s
> >>
> >> After performance only use implement mergeable rx buffer:
> >> Single socket Multiple sockets(Max Bandwidth)
> >> Guest->Host ~400MB/s ~480MB/s
> >> Host->Guest ~1280MB/s ~1350MB/s
> >>
> >> In this case, max send pkt size is still limited to 4K, so Host->Guest
> >> performance will worse than before.
> >
> > It's concerning though, what if application sends small packets?
> > What is the source of the slowdown? Do you know?
> >
>
> Hi Michael,
>
> To the two cases, I test the results included small and big packets as
> follows:
>
> 64K packets performance comparison:
> Single socket Multiple sockets
> Host->Guest(before) 1352.60MB/s 1436.33MB/s
>
>
> Host->Guest(only use mergeable rx buffer) 1290.08MB/s 1212.67MB/s
>
> 4K packets performance comparison:
> Single socket Multiple sockets
> Host->Guest(before) 535.47MB/s 688.67MB/s
> Host->Guest(only use mergeable rx buffer) 522.33MB/s 599.00MB/s
>
> 3K packets performance comparison:
> Single socket Multiple sockets
> Host->Guest(before) 359.74MB/s 442.00MB/s
> Host->Guest(only use mergeable rx buffer) 374.47MB/s 452.33MB/s
>
> We can see an interesting thing, for 64K and 4K packets,
> using mergeable buffer has a poor performance, for 3K packet,
> both have the same performance.
>
> I guess in mergeable mode, when host send a 4k packet to guest, we
> should call vhost_get_vq_desc() twice in host(hdr + 4k data),
> and in guest we also should call virtqueue_get_buf() twice. So
> when packet is smaller than (4k - hdr), it can be packed in a
> single page, so the performance is the same as before.
>
> So in the mergeable mode, the performance may be
> worse in ((4k - hdr), 4k] than before.
>
> Thanks,
> Yiwen.
The conclusion seems to be that mergeable buffers themselves
only hurt performance, but they allow batching which improves
performance. So let's add batching without mergeable buffers then?
> >> After performance increase the max send pkt size to 64K:
> >> Single socket Multiple sockets(Max Bandwidth)
> >> Guest->Host ~1700MB/s ~2900MB/s
> >> Host->Guest ~1500MB/s ~2440MB/s
> >>
> >> After performance all patches are used:
> >> Single socket Multiple sockets(Max Bandwidth)
> >> Guest->Host ~1700MB/s ~2900MB/s
> >> Host->Guest ~1700MB/s ~2900MB/s
> >>
> >> >From the test results, the performance is improved obviously, and guest
> >> memory will not be wasted.
> >>
> >> In addition, in order to support mergeable rx buffer in virtio-vsock,
> >> we need to add a qemu patch to support parse feature.
> >>
> >> ---
> >> v1 -> v2:
> >> * Addressed comments from Jason Wang.
> >> * Add performance test result independently.
> >> * Use Skb_page_frag_refill() which can use high order page and reduce
> >> the stress of page allocator.
> >> * Still use fixed size(PAGE_SIZE) to fill rx buffer, because too small
> >> size can't fill one full packet, we only 128 vq num now.
> >> * Use iovec to replace buf in struct virtio_vsock_pkt, keep tx and rx
> >> consistency.
> >> * Add virtio_transport ops to get max pkt len, in order to be compatible
> >> with old version.
> >> ---
> >>
> >> Yiwen Jiang (5):
> >> VSOCK: support fill mergeable rx buffer in guest
> >> VSOCK: support fill data to mergeable rx buffer in host
> >> VSOCK: support receive mergeable rx buffer in guest
> >> VSOCK: increase send pkt len in mergeable mode to improve performance
> >> VSOCK: batch sending rx buffer to increase bandwidth
> >>
> >> drivers/vhost/vsock.c | 183 ++++++++++++++++++++-----
> >> include/linux/virtio_vsock.h | 13 +-
> >> include/uapi/linux/virtio_vsock.h | 5 +
> >> net/vmw_vsock/virtio_transport.c | 229 +++++++++++++++++++++++++++-----
> >> net/vmw_vsock/virtio_transport_common.c | 66 ++++++---
> >> 5 files changed, 411 insertions(+), 85 deletions(-)
> >>
> >> --
> >> 1.8.3.1
> >
> > .
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 0/5] VSOCK: support mergeable rx buffer in vhost-vsock
2018-12-12 15:09 ` Michael S. Tsirkin
2018-12-13 2:14 ` jiangyiwen
2018-12-13 2:14 ` jiangyiwen
@ 2018-12-14 10:24 ` jiangyiwen
2018-12-14 13:22 ` Michael S. Tsirkin
2018-12-14 13:22 ` Michael S. Tsirkin
2018-12-14 10:24 ` jiangyiwen
3 siblings, 2 replies; 14+ messages in thread
From: jiangyiwen @ 2018-12-14 10:24 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Stefan Hajnoczi, Jason Wang, netdev, kvm, virtualization
On 2018/12/12 23:09, Michael S. Tsirkin wrote:
> On Wed, Dec 12, 2018 at 05:25:50PM +0800, jiangyiwen wrote:
>> Now vsock only support send/receive small packet, it can't achieve
>> high performance. As previous discussed with Jason Wang, I revisit the
>> idea of vhost-net about mergeable rx buffer and implement the mergeable
>> rx buffer in vhost-vsock, it can allow big packet to be scattered in
>> into different buffers and improve performance obviously.
>>
>> This series of patches mainly did three things:
>> - mergeable buffer implementation
>> - increase the max send pkt size
>> - add used and signal guest in a batch
>>
>> And I write a tool to test the vhost-vsock performance, mainly send big
>> packet(64K) included guest->Host and Host->Guest. I test performance
>> independently and the result as follows:
>>
>> Before performance:
>> Single socket Multiple sockets(Max Bandwidth)
>> Guest->Host ~400MB/s ~480MB/s
>> Host->Guest ~1450MB/s ~1600MB/s
>>
>> After performance only use implement mergeable rx buffer:
>> Single socket Multiple sockets(Max Bandwidth)
>> Guest->Host ~400MB/s ~480MB/s
>> Host->Guest ~1280MB/s ~1350MB/s
>>
>> In this case, max send pkt size is still limited to 4K, so Host->Guest
>> performance will worse than before.
>
> It's concerning though, what if application sends small packets?
> What is the source of the slowdown? Do you know?
>
Hi Michael,
To the two cases, I test the results included small and big packets as
follows:
64K packets performance comparison:
Single socket Multiple sockets
Host->Guest(before) 1352.60MB/s 1436.33MB/s
Host->Guest(only use mergeable rx buffer) 1290.08MB/s 1212.67MB/s
4K packets performance comparison:
Single socket Multiple sockets
Host->Guest(before) 535.47MB/s 688.67MB/s
Host->Guest(only use mergeable rx buffer) 522.33MB/s 599.00MB/s
3K packets performance comparison:
Single socket Multiple sockets
Host->Guest(before) 359.74MB/s 442.00MB/s
Host->Guest(only use mergeable rx buffer) 374.47MB/s 452.33MB/s
We can see an interesting thing, for 64K and 4K packets,
using mergeable buffer has a poor performance, for 3K packet,
both have the same performance.
I guess in mergeable mode, when host send a 4k packet to guest, we
should call vhost_get_vq_desc() twice in host(hdr + 4k data),
and in guest we also should call virtqueue_get_buf() twice. So
when packet is smaller than (4k - hdr), it can be packed in a
single page, so the performance is the same as before.
So in the mergeable mode, the performance may be
worse in ((4k - hdr), 4k] than before.
Thanks,
Yiwen.
>> After performance increase the max send pkt size to 64K:
>> Single socket Multiple sockets(Max Bandwidth)
>> Guest->Host ~1700MB/s ~2900MB/s
>> Host->Guest ~1500MB/s ~2440MB/s
>>
>> After performance all patches are used:
>> Single socket Multiple sockets(Max Bandwidth)
>> Guest->Host ~1700MB/s ~2900MB/s
>> Host->Guest ~1700MB/s ~2900MB/s
>>
>> >From the test results, the performance is improved obviously, and guest
>> memory will not be wasted.
>>
>> In addition, in order to support mergeable rx buffer in virtio-vsock,
>> we need to add a qemu patch to support parse feature.
>>
>> ---
>> v1 -> v2:
>> * Addressed comments from Jason Wang.
>> * Add performance test result independently.
>> * Use Skb_page_frag_refill() which can use high order page and reduce
>> the stress of page allocator.
>> * Still use fixed size(PAGE_SIZE) to fill rx buffer, because too small
>> size can't fill one full packet, we only 128 vq num now.
>> * Use iovec to replace buf in struct virtio_vsock_pkt, keep tx and rx
>> consistency.
>> * Add virtio_transport ops to get max pkt len, in order to be compatible
>> with old version.
>> ---
>>
>> Yiwen Jiang (5):
>> VSOCK: support fill mergeable rx buffer in guest
>> VSOCK: support fill data to mergeable rx buffer in host
>> VSOCK: support receive mergeable rx buffer in guest
>> VSOCK: increase send pkt len in mergeable mode to improve performance
>> VSOCK: batch sending rx buffer to increase bandwidth
>>
>> drivers/vhost/vsock.c | 183 ++++++++++++++++++++-----
>> include/linux/virtio_vsock.h | 13 +-
>> include/uapi/linux/virtio_vsock.h | 5 +
>> net/vmw_vsock/virtio_transport.c | 229 +++++++++++++++++++++++++++-----
>> net/vmw_vsock/virtio_transport_common.c | 66 ++++++---
>> 5 files changed, 411 insertions(+), 85 deletions(-)
>>
>> --
>> 1.8.3.1
>
> .
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 0/5] VSOCK: support mergeable rx buffer in vhost-vsock
2018-12-12 15:09 ` Michael S. Tsirkin
` (2 preceding siblings ...)
2018-12-14 10:24 ` jiangyiwen
@ 2018-12-14 10:24 ` jiangyiwen
3 siblings, 0 replies; 14+ messages in thread
From: jiangyiwen @ 2018-12-14 10:24 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: netdev, kvm, Stefan Hajnoczi, virtualization
On 2018/12/12 23:09, Michael S. Tsirkin wrote:
> On Wed, Dec 12, 2018 at 05:25:50PM +0800, jiangyiwen wrote:
>> Now vsock only support send/receive small packet, it can't achieve
>> high performance. As previous discussed with Jason Wang, I revisit the
>> idea of vhost-net about mergeable rx buffer and implement the mergeable
>> rx buffer in vhost-vsock, it can allow big packet to be scattered in
>> into different buffers and improve performance obviously.
>>
>> This series of patches mainly did three things:
>> - mergeable buffer implementation
>> - increase the max send pkt size
>> - add used and signal guest in a batch
>>
>> And I write a tool to test the vhost-vsock performance, mainly send big
>> packet(64K) included guest->Host and Host->Guest. I test performance
>> independently and the result as follows:
>>
>> Before performance:
>> Single socket Multiple sockets(Max Bandwidth)
>> Guest->Host ~400MB/s ~480MB/s
>> Host->Guest ~1450MB/s ~1600MB/s
>>
>> After performance only use implement mergeable rx buffer:
>> Single socket Multiple sockets(Max Bandwidth)
>> Guest->Host ~400MB/s ~480MB/s
>> Host->Guest ~1280MB/s ~1350MB/s
>>
>> In this case, max send pkt size is still limited to 4K, so Host->Guest
>> performance will worse than before.
>
> It's concerning though, what if application sends small packets?
> What is the source of the slowdown? Do you know?
>
Hi Michael,
To the two cases, I test the results included small and big packets as
follows:
64K packets performance comparison:
Single socket Multiple sockets
Host->Guest(before) 1352.60MB/s 1436.33MB/s
Host->Guest(only use mergeable rx buffer) 1290.08MB/s 1212.67MB/s
4K packets performance comparison:
Single socket Multiple sockets
Host->Guest(before) 535.47MB/s 688.67MB/s
Host->Guest(only use mergeable rx buffer) 522.33MB/s 599.00MB/s
3K packets performance comparison:
Single socket Multiple sockets
Host->Guest(before) 359.74MB/s 442.00MB/s
Host->Guest(only use mergeable rx buffer) 374.47MB/s 452.33MB/s
We can see an interesting thing, for 64K and 4K packets,
using mergeable buffer has a poor performance, for 3K packet,
both have the same performance.
I guess in mergeable mode, when host send a 4k packet to guest, we
should call vhost_get_vq_desc() twice in host(hdr + 4k data),
and in guest we also should call virtqueue_get_buf() twice. So
when packet is smaller than (4k - hdr), it can be packed in a
single page, so the performance is the same as before.
So in the mergeable mode, the performance may be
worse in ((4k - hdr), 4k] than before.
Thanks,
Yiwen.
>> After performance increase the max send pkt size to 64K:
>> Single socket Multiple sockets(Max Bandwidth)
>> Guest->Host ~1700MB/s ~2900MB/s
>> Host->Guest ~1500MB/s ~2440MB/s
>>
>> After performance all patches are used:
>> Single socket Multiple sockets(Max Bandwidth)
>> Guest->Host ~1700MB/s ~2900MB/s
>> Host->Guest ~1700MB/s ~2900MB/s
>>
>> >From the test results, the performance is improved obviously, and guest
>> memory will not be wasted.
>>
>> In addition, in order to support mergeable rx buffer in virtio-vsock,
>> we need to add a qemu patch to support parse feature.
>>
>> ---
>> v1 -> v2:
>> * Addressed comments from Jason Wang.
>> * Add performance test result independently.
>> * Use Skb_page_frag_refill() which can use high order page and reduce
>> the stress of page allocator.
>> * Still use fixed size(PAGE_SIZE) to fill rx buffer, because too small
>> size can't fill one full packet, we only 128 vq num now.
>> * Use iovec to replace buf in struct virtio_vsock_pkt, keep tx and rx
>> consistency.
>> * Add virtio_transport ops to get max pkt len, in order to be compatible
>> with old version.
>> ---
>>
>> Yiwen Jiang (5):
>> VSOCK: support fill mergeable rx buffer in guest
>> VSOCK: support fill data to mergeable rx buffer in host
>> VSOCK: support receive mergeable rx buffer in guest
>> VSOCK: increase send pkt len in mergeable mode to improve performance
>> VSOCK: batch sending rx buffer to increase bandwidth
>>
>> drivers/vhost/vsock.c | 183 ++++++++++++++++++++-----
>> include/linux/virtio_vsock.h | 13 +-
>> include/uapi/linux/virtio_vsock.h | 5 +
>> net/vmw_vsock/virtio_transport.c | 229 +++++++++++++++++++++++++++-----
>> net/vmw_vsock/virtio_transport_common.c | 66 ++++++---
>> 5 files changed, 411 insertions(+), 85 deletions(-)
>>
>> --
>> 1.8.3.1
>
> .
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 0/5] VSOCK: support mergeable rx buffer in vhost-vsock
2018-12-13 16:34 ` Stefan Hajnoczi
2018-12-14 9:39 ` jiangyiwen
@ 2018-12-14 9:39 ` jiangyiwen
1 sibling, 0 replies; 14+ messages in thread
From: jiangyiwen @ 2018-12-14 9:39 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Michael S. Tsirkin, Jason Wang, netdev, kvm, virtualization
On 2018/12/14 0:34, Stefan Hajnoczi wrote:
> On Wed, Dec 12, 2018 at 05:25:50PM +0800, jiangyiwen wrote:
>> Now vsock only support send/receive small packet, it can't achieve
>> high performance. As previous discussed with Jason Wang, I revisit the
>> idea of vhost-net about mergeable rx buffer and implement the mergeable
>> rx buffer in vhost-vsock, it can allow big packet to be scattered in
>> into different buffers and improve performance obviously.
>
> Sorry, I've been a bad maintainer. I was focussed on other projects and
> my email backlog is huge.
>
> I like the idea of trying out optimizations on virtio-vsock, seeing if
> code can be shared with virtio-net, and maybe later switching to a
> virtio-net transport for vsock (if it turns out enough code can be
> shared).
>
> Another optimization that could be interesting:
>
> Userspace processes reading from a socket sleep in
> vsock_stream_recvmsg(). I wonder if we can bypass struct
> virtio_vsock_pkt and copying the payload into pkt->buf in this case.
> (This doesn't improve poll(2)/select(2) though!)
>
> Imagine a userspace process waiting for data on a socket. When the
> virtqueue becomes ready, we can read in struct virtio_vsock_hdr and find
> the socket for that connection. Then we could copy the payload directly
> to userspace instead of creating a virtio_vsock_pkt and copying to
> pkt->buf first.
>
Great, I also consider the optimization point later.
Then, I will send the next version based on your suggestions.
Thanks,
Yiwen.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 0/5] VSOCK: support mergeable rx buffer in vhost-vsock
2018-12-13 16:34 ` Stefan Hajnoczi
@ 2018-12-14 9:39 ` jiangyiwen
2018-12-14 9:39 ` jiangyiwen
1 sibling, 0 replies; 14+ messages in thread
From: jiangyiwen @ 2018-12-14 9:39 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: netdev, virtualization, kvm, Michael S. Tsirkin
On 2018/12/14 0:34, Stefan Hajnoczi wrote:
> On Wed, Dec 12, 2018 at 05:25:50PM +0800, jiangyiwen wrote:
>> Now vsock only support send/receive small packet, it can't achieve
>> high performance. As previous discussed with Jason Wang, I revisit the
>> idea of vhost-net about mergeable rx buffer and implement the mergeable
>> rx buffer in vhost-vsock, it can allow big packet to be scattered in
>> into different buffers and improve performance obviously.
>
> Sorry, I've been a bad maintainer. I was focussed on other projects and
> my email backlog is huge.
>
> I like the idea of trying out optimizations on virtio-vsock, seeing if
> code can be shared with virtio-net, and maybe later switching to a
> virtio-net transport for vsock (if it turns out enough code can be
> shared).
>
> Another optimization that could be interesting:
>
> Userspace processes reading from a socket sleep in
> vsock_stream_recvmsg(). I wonder if we can bypass struct
> virtio_vsock_pkt and copying the payload into pkt->buf in this case.
> (This doesn't improve poll(2)/select(2) though!)
>
> Imagine a userspace process waiting for data on a socket. When the
> virtqueue becomes ready, we can read in struct virtio_vsock_hdr and find
> the socket for that connection. Then we could copy the payload directly
> to userspace instead of creating a virtio_vsock_pkt and copying to
> pkt->buf first.
>
Great, I also consider the optimization point later.
Then, I will send the next version based on your suggestions.
Thanks,
Yiwen.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 0/5] VSOCK: support mergeable rx buffer in vhost-vsock
2018-12-12 9:25 jiangyiwen
2018-12-12 15:09 ` Michael S. Tsirkin
2018-12-12 15:09 ` Michael S. Tsirkin
@ 2018-12-13 16:34 ` Stefan Hajnoczi
2018-12-14 9:39 ` jiangyiwen
2018-12-14 9:39 ` jiangyiwen
2018-12-13 16:34 ` Stefan Hajnoczi
3 siblings, 2 replies; 14+ messages in thread
From: Stefan Hajnoczi @ 2018-12-13 16:34 UTC (permalink / raw)
To: jiangyiwen; +Cc: Michael S. Tsirkin, Jason Wang, netdev, kvm, virtualization
[-- Attachment #1: Type: text/plain, Size: 1297 bytes --]
On Wed, Dec 12, 2018 at 05:25:50PM +0800, jiangyiwen wrote:
> Now vsock only support send/receive small packet, it can't achieve
> high performance. As previous discussed with Jason Wang, I revisit the
> idea of vhost-net about mergeable rx buffer and implement the mergeable
> rx buffer in vhost-vsock, it can allow big packet to be scattered in
> into different buffers and improve performance obviously.
Sorry, I've been a bad maintainer. I was focussed on other projects and
my email backlog is huge.
I like the idea of trying out optimizations on virtio-vsock, seeing if
code can be shared with virtio-net, and maybe later switching to a
virtio-net transport for vsock (if it turns out enough code can be
shared).
Another optimization that could be interesting:
Userspace processes reading from a socket sleep in
vsock_stream_recvmsg(). I wonder if we can bypass struct
virtio_vsock_pkt and copying the payload into pkt->buf in this case.
(This doesn't improve poll(2)/select(2) though!)
Imagine a userspace process waiting for data on a socket. When the
virtqueue becomes ready, we can read in struct virtio_vsock_hdr and find
the socket for that connection. Then we could copy the payload directly
to userspace instead of creating a virtio_vsock_pkt and copying to
pkt->buf first.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 0/5] VSOCK: support mergeable rx buffer in vhost-vsock
2018-12-12 9:25 jiangyiwen
` (2 preceding siblings ...)
2018-12-13 16:34 ` Stefan Hajnoczi
@ 2018-12-13 16:34 ` Stefan Hajnoczi
3 siblings, 0 replies; 14+ messages in thread
From: Stefan Hajnoczi @ 2018-12-13 16:34 UTC (permalink / raw)
To: jiangyiwen; +Cc: netdev, virtualization, kvm, Michael S. Tsirkin
[-- Attachment #1.1: Type: text/plain, Size: 1297 bytes --]
On Wed, Dec 12, 2018 at 05:25:50PM +0800, jiangyiwen wrote:
> Now vsock only support send/receive small packet, it can't achieve
> high performance. As previous discussed with Jason Wang, I revisit the
> idea of vhost-net about mergeable rx buffer and implement the mergeable
> rx buffer in vhost-vsock, it can allow big packet to be scattered in
> into different buffers and improve performance obviously.
Sorry, I've been a bad maintainer. I was focussed on other projects and
my email backlog is huge.
I like the idea of trying out optimizations on virtio-vsock, seeing if
code can be shared with virtio-net, and maybe later switching to a
virtio-net transport for vsock (if it turns out enough code can be
shared).
Another optimization that could be interesting:
Userspace processes reading from a socket sleep in
vsock_stream_recvmsg(). I wonder if we can bypass struct
virtio_vsock_pkt and copying the payload into pkt->buf in this case.
(This doesn't improve poll(2)/select(2) though!)
Imagine a userspace process waiting for data on a socket. When the
virtqueue becomes ready, we can read in struct virtio_vsock_hdr and find
the socket for that connection. Then we could copy the payload directly
to userspace instead of creating a virtio_vsock_pkt and copying to
pkt->buf first.
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 0/5] VSOCK: support mergeable rx buffer in vhost-vsock
2018-12-12 15:09 ` Michael S. Tsirkin
@ 2018-12-13 2:14 ` jiangyiwen
2018-12-13 2:14 ` jiangyiwen
` (2 subsequent siblings)
3 siblings, 0 replies; 14+ messages in thread
From: jiangyiwen @ 2018-12-13 2:14 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Stefan Hajnoczi, Jason Wang, netdev, kvm, virtualization
On 2018/12/12 23:09, Michael S. Tsirkin wrote:
> On Wed, Dec 12, 2018 at 05:25:50PM +0800, jiangyiwen wrote:
>> Now vsock only support send/receive small packet, it can't achieve
>> high performance. As previous discussed with Jason Wang, I revisit the
>> idea of vhost-net about mergeable rx buffer and implement the mergeable
>> rx buffer in vhost-vsock, it can allow big packet to be scattered in
>> into different buffers and improve performance obviously.
>>
>> This series of patches mainly did three things:
>> - mergeable buffer implementation
>> - increase the max send pkt size
>> - add used and signal guest in a batch
>>
>> And I write a tool to test the vhost-vsock performance, mainly send big
>> packet(64K) included guest->Host and Host->Guest. I test performance
>> independently and the result as follows:
>>
>> Before performance:
>> Single socket Multiple sockets(Max Bandwidth)
>> Guest->Host ~400MB/s ~480MB/s
>> Host->Guest ~1450MB/s ~1600MB/s
>>
>> After performance only use implement mergeable rx buffer:
>> Single socket Multiple sockets(Max Bandwidth)
>> Guest->Host ~400MB/s ~480MB/s
>> Host->Guest ~1280MB/s ~1350MB/s
>>
>> In this case, max send pkt size is still limited to 4K, so Host->Guest
>> performance will worse than before.
>
> It's concerning though, what if application sends small packets?
> What is the source of the slowdown? Do you know?
>
Hi Michael,
Before performance is tested by me one month ago, I don't retest this time,
this result can have some fluctuations, today I will retest all of cases
included small and big packets, and try to find out the slowdown reason.
Thanks,
Yiwen.
>> After performance increase the max send pkt size to 64K:
>> Single socket Multiple sockets(Max Bandwidth)
>> Guest->Host ~1700MB/s ~2900MB/s
>> Host->Guest ~1500MB/s ~2440MB/s
>>
>> After performance all patches are used:
>> Single socket Multiple sockets(Max Bandwidth)
>> Guest->Host ~1700MB/s ~2900MB/s
>> Host->Guest ~1700MB/s ~2900MB/s
>>
>> >From the test results, the performance is improved obviously, and guest
>> memory will not be wasted.
>>
>> In addition, in order to support mergeable rx buffer in virtio-vsock,
>> we need to add a qemu patch to support parse feature.
>>
>> ---
>> v1 -> v2:
>> * Addressed comments from Jason Wang.
>> * Add performance test result independently.
>> * Use Skb_page_frag_refill() which can use high order page and reduce
>> the stress of page allocator.
>> * Still use fixed size(PAGE_SIZE) to fill rx buffer, because too small
>> size can't fill one full packet, we only 128 vq num now.
>> * Use iovec to replace buf in struct virtio_vsock_pkt, keep tx and rx
>> consistency.
>> * Add virtio_transport ops to get max pkt len, in order to be compatible
>> with old version.
>> ---
>>
>> Yiwen Jiang (5):
>> VSOCK: support fill mergeable rx buffer in guest
>> VSOCK: support fill data to mergeable rx buffer in host
>> VSOCK: support receive mergeable rx buffer in guest
>> VSOCK: increase send pkt len in mergeable mode to improve performance
>> VSOCK: batch sending rx buffer to increase bandwidth
>>
>> drivers/vhost/vsock.c | 183 ++++++++++++++++++++-----
>> include/linux/virtio_vsock.h | 13 +-
>> include/uapi/linux/virtio_vsock.h | 5 +
>> net/vmw_vsock/virtio_transport.c | 229 +++++++++++++++++++++++++++-----
>> net/vmw_vsock/virtio_transport_common.c | 66 ++++++---
>> 5 files changed, 411 insertions(+), 85 deletions(-)
>>
>> --
>> 1.8.3.1
>
> .
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 0/5] VSOCK: support mergeable rx buffer in vhost-vsock
2018-12-12 15:09 ` Michael S. Tsirkin
2018-12-13 2:14 ` jiangyiwen
@ 2018-12-13 2:14 ` jiangyiwen
2018-12-14 10:24 ` jiangyiwen
2018-12-14 10:24 ` jiangyiwen
3 siblings, 0 replies; 14+ messages in thread
From: jiangyiwen @ 2018-12-13 2:14 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: netdev, kvm, Stefan Hajnoczi, virtualization
On 2018/12/12 23:09, Michael S. Tsirkin wrote:
> On Wed, Dec 12, 2018 at 05:25:50PM +0800, jiangyiwen wrote:
>> Now vsock only support send/receive small packet, it can't achieve
>> high performance. As previous discussed with Jason Wang, I revisit the
>> idea of vhost-net about mergeable rx buffer and implement the mergeable
>> rx buffer in vhost-vsock, it can allow big packet to be scattered in
>> into different buffers and improve performance obviously.
>>
>> This series of patches mainly did three things:
>> - mergeable buffer implementation
>> - increase the max send pkt size
>> - add used and signal guest in a batch
>>
>> And I write a tool to test the vhost-vsock performance, mainly send big
>> packet(64K) included guest->Host and Host->Guest. I test performance
>> independently and the result as follows:
>>
>> Before performance:
>> Single socket Multiple sockets(Max Bandwidth)
>> Guest->Host ~400MB/s ~480MB/s
>> Host->Guest ~1450MB/s ~1600MB/s
>>
>> After performance only use implement mergeable rx buffer:
>> Single socket Multiple sockets(Max Bandwidth)
>> Guest->Host ~400MB/s ~480MB/s
>> Host->Guest ~1280MB/s ~1350MB/s
>>
>> In this case, max send pkt size is still limited to 4K, so Host->Guest
>> performance will worse than before.
>
> It's concerning though, what if application sends small packets?
> What is the source of the slowdown? Do you know?
>
Hi Michael,
Before performance is tested by me one month ago, I don't retest this time,
this result can have some fluctuations, today I will retest all of cases
included small and big packets, and try to find out the slowdown reason.
Thanks,
Yiwen.
>> After performance increase the max send pkt size to 64K:
>> Single socket Multiple sockets(Max Bandwidth)
>> Guest->Host ~1700MB/s ~2900MB/s
>> Host->Guest ~1500MB/s ~2440MB/s
>>
>> After performance all patches are used:
>> Single socket Multiple sockets(Max Bandwidth)
>> Guest->Host ~1700MB/s ~2900MB/s
>> Host->Guest ~1700MB/s ~2900MB/s
>>
>> >From the test results, the performance is improved obviously, and guest
>> memory will not be wasted.
>>
>> In addition, in order to support mergeable rx buffer in virtio-vsock,
>> we need to add a qemu patch to support parse feature.
>>
>> ---
>> v1 -> v2:
>> * Addressed comments from Jason Wang.
>> * Add performance test result independently.
>> * Use Skb_page_frag_refill() which can use high order page and reduce
>> the stress of page allocator.
>> * Still use fixed size(PAGE_SIZE) to fill rx buffer, because too small
>> size can't fill one full packet, we only 128 vq num now.
>> * Use iovec to replace buf in struct virtio_vsock_pkt, keep tx and rx
>> consistency.
>> * Add virtio_transport ops to get max pkt len, in order to be compatible
>> with old version.
>> ---
>>
>> Yiwen Jiang (5):
>> VSOCK: support fill mergeable rx buffer in guest
>> VSOCK: support fill data to mergeable rx buffer in host
>> VSOCK: support receive mergeable rx buffer in guest
>> VSOCK: increase send pkt len in mergeable mode to improve performance
>> VSOCK: batch sending rx buffer to increase bandwidth
>>
>> drivers/vhost/vsock.c | 183 ++++++++++++++++++++-----
>> include/linux/virtio_vsock.h | 13 +-
>> include/uapi/linux/virtio_vsock.h | 5 +
>> net/vmw_vsock/virtio_transport.c | 229 +++++++++++++++++++++++++++-----
>> net/vmw_vsock/virtio_transport_common.c | 66 ++++++---
>> 5 files changed, 411 insertions(+), 85 deletions(-)
>>
>> --
>> 1.8.3.1
>
> .
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 0/5] VSOCK: support mergeable rx buffer in vhost-vsock
2018-12-12 9:25 jiangyiwen
@ 2018-12-12 15:09 ` Michael S. Tsirkin
2018-12-13 2:14 ` jiangyiwen
` (3 more replies)
2018-12-12 15:09 ` Michael S. Tsirkin
` (2 subsequent siblings)
3 siblings, 4 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2018-12-12 15:09 UTC (permalink / raw)
To: jiangyiwen; +Cc: Stefan Hajnoczi, Jason Wang, netdev, kvm, virtualization
On Wed, Dec 12, 2018 at 05:25:50PM +0800, jiangyiwen wrote:
> Now vsock only support send/receive small packet, it can't achieve
> high performance. As previous discussed with Jason Wang, I revisit the
> idea of vhost-net about mergeable rx buffer and implement the mergeable
> rx buffer in vhost-vsock, it can allow big packet to be scattered in
> into different buffers and improve performance obviously.
>
> This series of patches mainly did three things:
> - mergeable buffer implementation
> - increase the max send pkt size
> - add used and signal guest in a batch
>
> And I write a tool to test the vhost-vsock performance, mainly send big
> packet(64K) included guest->Host and Host->Guest. I test performance
> independently and the result as follows:
>
> Before performance:
> Single socket Multiple sockets(Max Bandwidth)
> Guest->Host ~400MB/s ~480MB/s
> Host->Guest ~1450MB/s ~1600MB/s
>
> After performance only use implement mergeable rx buffer:
> Single socket Multiple sockets(Max Bandwidth)
> Guest->Host ~400MB/s ~480MB/s
> Host->Guest ~1280MB/s ~1350MB/s
>
> In this case, max send pkt size is still limited to 4K, so Host->Guest
> performance will worse than before.
It's concerning though, what if application sends small packets?
What is the source of the slowdown? Do you know?
> After performance increase the max send pkt size to 64K:
> Single socket Multiple sockets(Max Bandwidth)
> Guest->Host ~1700MB/s ~2900MB/s
> Host->Guest ~1500MB/s ~2440MB/s
>
> After performance all patches are used:
> Single socket Multiple sockets(Max Bandwidth)
> Guest->Host ~1700MB/s ~2900MB/s
> Host->Guest ~1700MB/s ~2900MB/s
>
> >From the test results, the performance is improved obviously, and guest
> memory will not be wasted.
>
> In addition, in order to support mergeable rx buffer in virtio-vsock,
> we need to add a qemu patch to support parse feature.
>
> ---
> v1 -> v2:
> * Addressed comments from Jason Wang.
> * Add performance test result independently.
> * Use Skb_page_frag_refill() which can use high order page and reduce
> the stress of page allocator.
> * Still use fixed size(PAGE_SIZE) to fill rx buffer, because too small
> size can't fill one full packet, we only 128 vq num now.
> * Use iovec to replace buf in struct virtio_vsock_pkt, keep tx and rx
> consistency.
> * Add virtio_transport ops to get max pkt len, in order to be compatible
> with old version.
> ---
>
> Yiwen Jiang (5):
> VSOCK: support fill mergeable rx buffer in guest
> VSOCK: support fill data to mergeable rx buffer in host
> VSOCK: support receive mergeable rx buffer in guest
> VSOCK: increase send pkt len in mergeable mode to improve performance
> VSOCK: batch sending rx buffer to increase bandwidth
>
> drivers/vhost/vsock.c | 183 ++++++++++++++++++++-----
> include/linux/virtio_vsock.h | 13 +-
> include/uapi/linux/virtio_vsock.h | 5 +
> net/vmw_vsock/virtio_transport.c | 229 +++++++++++++++++++++++++++-----
> net/vmw_vsock/virtio_transport_common.c | 66 ++++++---
> 5 files changed, 411 insertions(+), 85 deletions(-)
>
> --
> 1.8.3.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 0/5] VSOCK: support mergeable rx buffer in vhost-vsock
2018-12-12 9:25 jiangyiwen
2018-12-12 15:09 ` Michael S. Tsirkin
@ 2018-12-12 15:09 ` Michael S. Tsirkin
2018-12-13 16:34 ` Stefan Hajnoczi
2018-12-13 16:34 ` Stefan Hajnoczi
3 siblings, 0 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2018-12-12 15:09 UTC (permalink / raw)
To: jiangyiwen; +Cc: netdev, kvm, Stefan Hajnoczi, virtualization
On Wed, Dec 12, 2018 at 05:25:50PM +0800, jiangyiwen wrote:
> Now vsock only support send/receive small packet, it can't achieve
> high performance. As previous discussed with Jason Wang, I revisit the
> idea of vhost-net about mergeable rx buffer and implement the mergeable
> rx buffer in vhost-vsock, it can allow big packet to be scattered in
> into different buffers and improve performance obviously.
>
> This series of patches mainly did three things:
> - mergeable buffer implementation
> - increase the max send pkt size
> - add used and signal guest in a batch
>
> And I write a tool to test the vhost-vsock performance, mainly send big
> packet(64K) included guest->Host and Host->Guest. I test performance
> independently and the result as follows:
>
> Before performance:
> Single socket Multiple sockets(Max Bandwidth)
> Guest->Host ~400MB/s ~480MB/s
> Host->Guest ~1450MB/s ~1600MB/s
>
> After performance only use implement mergeable rx buffer:
> Single socket Multiple sockets(Max Bandwidth)
> Guest->Host ~400MB/s ~480MB/s
> Host->Guest ~1280MB/s ~1350MB/s
>
> In this case, max send pkt size is still limited to 4K, so Host->Guest
> performance will worse than before.
It's concerning though, what if application sends small packets?
What is the source of the slowdown? Do you know?
> After performance increase the max send pkt size to 64K:
> Single socket Multiple sockets(Max Bandwidth)
> Guest->Host ~1700MB/s ~2900MB/s
> Host->Guest ~1500MB/s ~2440MB/s
>
> After performance all patches are used:
> Single socket Multiple sockets(Max Bandwidth)
> Guest->Host ~1700MB/s ~2900MB/s
> Host->Guest ~1700MB/s ~2900MB/s
>
> >From the test results, the performance is improved obviously, and guest
> memory will not be wasted.
>
> In addition, in order to support mergeable rx buffer in virtio-vsock,
> we need to add a qemu patch to support parse feature.
>
> ---
> v1 -> v2:
> * Addressed comments from Jason Wang.
> * Add performance test result independently.
> * Use Skb_page_frag_refill() which can use high order page and reduce
> the stress of page allocator.
> * Still use fixed size(PAGE_SIZE) to fill rx buffer, because too small
> size can't fill one full packet, we only 128 vq num now.
> * Use iovec to replace buf in struct virtio_vsock_pkt, keep tx and rx
> consistency.
> * Add virtio_transport ops to get max pkt len, in order to be compatible
> with old version.
> ---
>
> Yiwen Jiang (5):
> VSOCK: support fill mergeable rx buffer in guest
> VSOCK: support fill data to mergeable rx buffer in host
> VSOCK: support receive mergeable rx buffer in guest
> VSOCK: increase send pkt len in mergeable mode to improve performance
> VSOCK: batch sending rx buffer to increase bandwidth
>
> drivers/vhost/vsock.c | 183 ++++++++++++++++++++-----
> include/linux/virtio_vsock.h | 13 +-
> include/uapi/linux/virtio_vsock.h | 5 +
> net/vmw_vsock/virtio_transport.c | 229 +++++++++++++++++++++++++++-----
> net/vmw_vsock/virtio_transport_common.c | 66 ++++++---
> 5 files changed, 411 insertions(+), 85 deletions(-)
>
> --
> 1.8.3.1
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2 0/5] VSOCK: support mergeable rx buffer in vhost-vsock
@ 2018-12-12 9:25 jiangyiwen
2018-12-12 15:09 ` Michael S. Tsirkin
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: jiangyiwen @ 2018-12-12 9:25 UTC (permalink / raw)
To: Stefan Hajnoczi, Michael S. Tsirkin, Jason Wang
Cc: netdev, kvm, virtualization
Now vsock only support send/receive small packet, it can't achieve
high performance. As previous discussed with Jason Wang, I revisit the
idea of vhost-net about mergeable rx buffer and implement the mergeable
rx buffer in vhost-vsock, it can allow big packet to be scattered in
into different buffers and improve performance obviously.
This series of patches mainly did three things:
- mergeable buffer implementation
- increase the max send pkt size
- add used and signal guest in a batch
And I write a tool to test the vhost-vsock performance, mainly send big
packet(64K) included guest->Host and Host->Guest. I test performance
independently and the result as follows:
Before performance:
Single socket Multiple sockets(Max Bandwidth)
Guest->Host ~400MB/s ~480MB/s
Host->Guest ~1450MB/s ~1600MB/s
After performance only use implement mergeable rx buffer:
Single socket Multiple sockets(Max Bandwidth)
Guest->Host ~400MB/s ~480MB/s
Host->Guest ~1280MB/s ~1350MB/s
In this case, max send pkt size is still limited to 4K, so Host->Guest
performance will worse than before.
After performance increase the max send pkt size to 64K:
Single socket Multiple sockets(Max Bandwidth)
Guest->Host ~1700MB/s ~2900MB/s
Host->Guest ~1500MB/s ~2440MB/s
After performance all patches are used:
Single socket Multiple sockets(Max Bandwidth)
Guest->Host ~1700MB/s ~2900MB/s
Host->Guest ~1700MB/s ~2900MB/s
>From the test results, the performance is improved obviously, and guest
memory will not be wasted.
In addition, in order to support mergeable rx buffer in virtio-vsock,
we need to add a qemu patch to support parse feature.
---
v1 -> v2:
* Addressed comments from Jason Wang.
* Add performance test result independently.
* Use Skb_page_frag_refill() which can use high order page and reduce
the stress of page allocator.
* Still use fixed size(PAGE_SIZE) to fill rx buffer, because too small
size can't fill one full packet, we only 128 vq num now.
* Use iovec to replace buf in struct virtio_vsock_pkt, keep tx and rx
consistency.
* Add virtio_transport ops to get max pkt len, in order to be compatible
with old version.
---
Yiwen Jiang (5):
VSOCK: support fill mergeable rx buffer in guest
VSOCK: support fill data to mergeable rx buffer in host
VSOCK: support receive mergeable rx buffer in guest
VSOCK: increase send pkt len in mergeable mode to improve performance
VSOCK: batch sending rx buffer to increase bandwidth
drivers/vhost/vsock.c | 183 ++++++++++++++++++++-----
include/linux/virtio_vsock.h | 13 +-
include/uapi/linux/virtio_vsock.h | 5 +
net/vmw_vsock/virtio_transport.c | 229 +++++++++++++++++++++++++++-----
net/vmw_vsock/virtio_transport_common.c | 66 ++++++---
5 files changed, 411 insertions(+), 85 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2018-12-14 13:22 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-12 9:25 [PATCH v2 0/5] VSOCK: support mergeable rx buffer in vhost-vsock jiangyiwen
2018-12-12 9:25 jiangyiwen
2018-12-12 15:09 ` Michael S. Tsirkin
2018-12-13 2:14 ` jiangyiwen
2018-12-13 2:14 ` jiangyiwen
2018-12-14 10:24 ` jiangyiwen
2018-12-14 13:22 ` Michael S. Tsirkin
2018-12-14 13:22 ` Michael S. Tsirkin
2018-12-14 10:24 ` jiangyiwen
2018-12-12 15:09 ` Michael S. Tsirkin
2018-12-13 16:34 ` Stefan Hajnoczi
2018-12-14 9:39 ` jiangyiwen
2018-12-14 9:39 ` jiangyiwen
2018-12-13 16:34 ` Stefan Hajnoczi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.