[PATCH net-next rfc V2 0/2] basic busy polling support for vhost

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH net-next rfc V2 0/2] basic busy polling support for vhost_net
@ 2015-10-29  8:45 Jason Wang
  0 siblings, 0 replies; 5+ messages in thread
From: Jason Wang @ 2015-10-29  8:45 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev, linux-kernel

Hi all:

This series tries to add basic busy polling for vhost net. The idea is
simple: at the end of tx processing, busy polling for new tx added
descriptor and rx receive socket for a while. The maximum number of
time (in us) could be spent on busy polling was specified through
module parameter.

Test were done through:

- 50 us as busy loop timeout
- Netperf 2.6
- Two machines with back to back connected mlx4
- Guest with 8 vcpus and 1 queue

Result shows very huge improvement on both tx (at most 158%) and rr
(at most 53%) while rx is as much as in the past. Most cases the cpu
utilization is also improved:

Guest TX:
size/session/+thu%/+normalize%
   64/     1/  +17%/   +6%
   64/     4/   +9%/  +17%
   64/     8/  +34%/  +21%
  512/     1/  +48%/  +40%
  512/     4/  +31%/  +20%
  512/     8/  +39%/  +22%
 1024/     1/ +158%/  +99%
 1024/     4/  +20%/  +11%
 1024/     8/  +40%/  +18%
 2048/     1/ +108%/  +74%
 2048/     4/  +21%/   +7%
 2048/     8/  +32%/  +14%
 4096/     1/  +94%/  +77%
 4096/     4/   +7%/   -6%
 4096/     8/   +9%/   -4%
16384/     1/  +33%/   +9%
16384/     4/  +10%/   -6%
16384/     8/  +19%/   +2%
65535/     1/  +15%/   -6%
65535/     4/   +8%/   -9%
65535/     8/  +14%/    0%

Guest RX:
size/session/+thu%/+normalize%
   64/     1/   -3%/   -3%
   64/     4/   +4%/  +20%
   64/     8/   -1%/   -1%
  512/     1/  +20%/  +12%
  512/     4/   +1%/   +3%
  512/     8/    0%/   -5%
 1024/     1/   +9%/   -2%
 1024/     4/    0%/   +5%
 1024/     8/   +1%/    0%
 2048/     1/    0%/   +3%
 2048/     4/   -2%/   +3%
 2048/     8/   -1%/   -3%
 4096/     1/   -8%/   +3%
 4096/     4/    0%/   +2%
 4096/     8/    0%/   +5%
16384/     1/   +3%/    0%
16384/     4/   +2%/   +2%
16384/     8/    0%/  +13%
65535/     1/    0%/   +3%
65535/     4/   +2%/   -1%
65535/     8/   +1%/  +14%

TCP_RR:
size/session/+thu%/+normalize%
    1/     1/   +8%/   -6%
    1/    50/  +18%/  +15%
    1/   100/  +22%/  +19%
    1/   200/  +25%/  +23%
   64/     1/   +2%/  -19%
   64/    50/  +46%/  +39%
   64/   100/  +47%/  +39%
   64/   200/  +50%/  +44%
  512/     1/    0%/  -28%
  512/    50/  +50%/  +44%
  512/   100/  +53%/  +47%
  512/   200/  +51%/  +58%
 1024/     1/   +3%/  -14%
 1024/    50/  +45%/  +37%
 1024/   100/  +53%/  +49%
 1024/   200/  +48%/  +55%

Changes from V1:
- Add a comment for vhost_has_work() to explain why it could be
  lockless
- Add param description for busyloop_timeout
- Split out the busy polling logic into a new helper
- Check and exit the loop when there's a pending signal
- Disable preemption during busy looping to make sure lock_clock() was
  correctly used.

Todo:
- Make the busyloop timeout could be configure per VM through ioctl.

Please review.

Thanks

Jason Wang (2):
  vhost: introduce vhost_has_work()
  vhost_net: basic polling support

 drivers/vhost/net.c   | 54 +++++++++++++++++++++++++++++++++++++++++++++++----
 drivers/vhost/vhost.c |  7 +++++++
 drivers/vhost/vhost.h |  1 +
 3 files changed, 58 insertions(+), 4 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next rfc V2 0/2] basic busy polling support for vhost_net
  2015-10-30 11:58 ` Jason Wang
@ 2015-11-03  7:46   ` Jason Wang
  2015-11-03  7:46   ` Jason Wang
  1 sibling, 0 replies; 5+ messages in thread
From: Jason Wang @ 2015-11-03  7:46 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev, linux-kernel



On 10/30/2015 07:58 PM, Jason Wang wrote:
>
> On 10/29/2015 04:45 PM, Jason Wang wrote:
>> Hi all:
>>
>> This series tries to add basic busy polling for vhost net. The idea is
>> simple: at the end of tx processing, busy polling for new tx added
>> descriptor and rx receive socket for a while. The maximum number of
>> time (in us) could be spent on busy polling was specified through
>> module parameter.
>>
>> Test were done through:
>>
>> - 50 us as busy loop timeout
>> - Netperf 2.6
>> - Two machines with back to back connected mlx4
>> - Guest with 8 vcpus and 1 queue
>>
>> Result shows very huge improvement on both tx (at most 158%) and rr
>> (at most 53%) while rx is as much as in the past. Most cases the cpu
>> utilization is also improved:
>>
> Just notice there's something wrong in the setup. So the numbers are
> incorrect here. Will re-run and post correct number here.
>
> Sorry.

Here's the updated testing result:

1) 1 vcpu 1 queue:

TCP_RR
size/session/+thu%/+normalize%
    1/     1/    0%/  -25%
    1/    50/  +12%/    0%
    1/   100/  +12%/   +1%
    1/   200/   +9%/   -1%
   64/     1/   +3%/  -21%
   64/    50/   +8%/    0%
   64/   100/   +7%/    0%
   64/   200/   +9%/    0%
  256/     1/   +1%/  -25%
  256/    50/   +7%/   -2%
  256/   100/   +6%/   -2%
  256/   200/   +4%/   -2%
  512/     1/   +2%/  -19%
  512/    50/   +5%/   -2%
  512/   100/   +3%/   -3%
  512/   200/   +6%/   -2%
 1024/     1/   +2%/  -20%
 1024/    50/   +3%/   -3%
 1024/   100/   +5%/   -3%
 1024/   200/   +4%/   -2%
Guest RX
size/session/+thu%/+normalize%
   64/     1/   -4%/   -5%
   64/     4/   -3%/  -10%
   64/     8/   -3%/   -5%
  512/     1/  +15%/   +1%
  512/     4/   -5%/   -5%
  512/     8/   -2%/   -4%
 1024/     1/   -5%/  -16%
 1024/     4/   -2%/   -5%
 1024/     8/   -6%/   -6%
 2048/     1/  +10%/   +5%
 2048/     4/   -8%/   -4%
 2048/     8/   -1%/   -4%
 4096/     1/   -9%/  -11%
 4096/     4/   +1%/   -1%
 4096/     8/   +1%/    0%
16384/     1/  +20%/  +11%
16384/     4/    0%/   -3%
16384/     8/   +1%/    0%
65535/     1/  +36%/  +13%
65535/     4/  -10%/   -9%
65535/     8/   -3%/   -2%
Guest TX
size/session/+thu%/+normalize%
   64/     1/   -7%/  -16%
   64/     4/  -14%/  -23%
   64/     8/   -9%/  -20%
  512/     1/  -62%/  -56%
  512/     4/  -62%/  -56%
  512/     8/  -61%/  -53%
 1024/     1/  -66%/  -61%
 1024/     4/  -77%/  -73%
 1024/     8/  -73%/  -67%
 2048/     1/  -74%/  -75%
 2048/     4/  -77%/  -74%
 2048/     8/  -72%/  -68%
 4096/     1/  -65%/  -68%
 4096/     4/  -66%/  -63%
 4096/     8/  -62%/  -57%
16384/     1/  -25%/  -28%
16384/     4/  -28%/  -17%
16384/     8/  -24%/  -10%
65535/     1/  -17%/  -14%
65535/     4/  -22%/   -5%
65535/     8/  -25%/   -9%

- obvious improvement on TCP_RR (at most 12%)
- improvement on guest RX
- huge decreasing on Guest TX (at most -75%), this is probably because
virtio-net driver suffers from buffer bloat by orphaning skb before
transmission. The faster vhost it is, the smaller packet it could
produced. To reduce the impact on this, turning off gso in guest can
result the following result:

size/session/+thu%/+normalize%
   64/     1/   +3%/  -11%
   64/     4/   +4%/  -10%
   64/     8/   +4%/  -10%
  512/     1/   +2%/   +5%
  512/     4/    0%/   -1%
  512/     8/    0%/    0%
 1024/     1/  +11%/    0%
 1024/     4/    0%/   -1%
 1024/     8/   +3%/   +1%
 2048/     1/   +4%/   -1%
 2048/     4/   +8%/   +3%
 2048/     8/    0%/   -1%
 4096/     1/   +4%/   -1%
 4096/     4/   +1%/    0%
 4096/     8/   +2%/    0%
16384/     1/   +2%/   -2%
16384/     4/   +3%/   +1%
16384/     8/    0%/   -1%
65535/     1/   +9%/   +7%
65535/     4/    0%/   -3%
65535/     8/   -1%/   -1%

2) 8 vcpus 1 queue:

TCP_RR
size/session/+thu%/+normalize%
    1/     1/   +5%/  -14%
    1/    50/   +2%/   +1%
    1/   100/    0%/   -1%
    1/   200/    0%/    0%
   64/     1/    0%/  -25%
   64/    50/   +5%/   +5%
   64/   100/    0%/    0%
   64/   200/    0%/   -1%
  256/     1/    0%/  -30%
  256/    50/    0%/    0%
  256/   100/   -2%/   -2%
  256/   200/    0%/    0%
  512/     1/   +1%/  -23%
  512/    50/   +1%/   +1%
  512/   100/   +1%/    0%
  512/   200/   +1%/   +1%
 1024/     1/   +1%/  -23%
 1024/    50/   +5%/   +5%
 1024/   100/    0%/   -1%
 1024/   200/    0%/    0%
Guest RX
size/session/+thu%/+normalize%
   64/     1/   +1%/   +1%
   64/     4/   -2%/   +1%
   64/     8/   +6%/  +19%
  512/     1/   +5%/   -7%
  512/     4/   -4%/   -4%
  512/     8/    0%/    0%
 1024/     1/   +1%/   +2%
 1024/     4/   -2%/   -2%
 1024/     8/   -1%/   +7%
 2048/     1/   +8%/   -2%
 2048/     4/    0%/   +5%
 2048/     8/   -1%/  +13%
 4096/     1/   -1%/   +2%
 4096/     4/    0%/   +6%
 4096/     8/   -2%/  +15%
16384/     1/   -1%/    0%
16384/     4/   -2%/   -1%
16384/     8/   -2%/   +2%
65535/     1/   -2%/    0%
65535/     4/   -3%/   -3%
65535/     8/   -2%/   +2%
Guest TX
size/session/+thu%/+normalize%
   64/     1/   +6%/   +3%
   64/     4/  +11%/   +8%
   64/     8/    0%/    0%
  512/     1/  +19%/  +18%
  512/     4/   -4%/   +1%
  512/     8/   -1%/   -1%
 1024/     1/    0%/   +8%
 1024/     4/   -1%/   -1%
 1024/     8/    0%/   +1%
 2048/     1/   +1%/    0%
 2048/     4/   -1%/   -2%
 2048/     8/    0%/    0%
 4096/     1/  +12%/  +14%
 4096/     4/    0%/   -1%
 4096/     8/   -2%/   -1%
16384/     1/   +9%/   +6%
16384/     4/   +3%/   -1%
16384/     8/   +2%/   -1%
65535/     1/   +1%/   -2%
65535/     4/    0%/   -4%
65535/     8/    0%/   -2%

- latency get improved a little bit
- small improvement on single session rx
- no other obvious changes
- this may because 8 vcpu could give enough stress on a single vhost
thread. Then the busy polling was not trigged enough (unless on light
load case e.g 1 session TCP_RR).

3) 8 vcpus 8 queues

8 vcpu 8 queue
TCP_RR
size/session/+thu%/+normalize%
    1/     1/   +6%/  -16%
    1/    50/  +14%/   +1%
    1/   100/  +17%/   +3%
    1/   200/  +16%/   +2%
   64/     1/   +2%/  -19%
   64/    50/  +10%/    0%
   64/   100/  +17%/   +5%
   64/   200/  +15%/   +3%
  256/     1/    0%/  -19%
  256/    50/   +5%/   -3%
  256/   100/   +4%/   -3%
  256/   200/   +2%/   -4%
  512/     1/   +4%/  -19%
  512/    50/   +7%/   -2%
  512/   100/   +4%/   -4%
  512/   200/   +3%/   -4%
 1024/     1/   +9%/  -19%
 1024/    50/   +6%/   -2%
 1024/   100/   +5%/   -3%
 1024/   200/   +5%/   -3%
Guest RX
size/session/+thu%/+normalize%
   64/     1/  +18%/  +13%
   64/     4/    0%/   -1%
   64/     8/   -4%/  -11%
  512/     1/   +3%/   -6%
  512/     4/   +1%/  -11%
  512/     8/   -1%/   -7%
 1024/     1/    0%/   -9%
 1024/     4/   +9%/  -16%
 1024/     8/   -1%/  -11%
 2048/     1/    0%/   -2%
 2048/     4/    0%/  -16%
 2048/     8/   -1%/   -2%
 4096/     1/   +3%/    0%
 4096/     4/   -1%/  -12%
 4096/     8/    0%/   -5%
16384/     1/   -2%/   -6%
16384/     4/    0%/   -6%
16384/     8/    0%/   -6%
65535/     1/    0%/    0%
65535/     4/    0%/   -9%
65535/     8/    0%/   +1%
Guest TX
size/session/+thu%/+normalize%
   64/     1/   +7%/   +3%
   64/     4/   +6%/    0%
   64/     8/  +10%/   +5%
  512/     1/    0%/  +14%
  512/     4/   +9%/   -1%
  512/     8/  +14%/   +4%
 1024/     1/  +44%/  +37%
 1024/     4/   +6%/   +2%
 1024/     8/  +19%/  +12%
 2048/     1/  -14%/  -16%
 2048/     4/  +11%/   +8%
 2048/     8/  +26%/  +28%
 4096/     1/  +21%/  +19%
 4096/     4/   +2%/  +10%
 4096/     8/  +14%/   +7%
16384/     1/  +12%/   +4%
16384/     4/   +7%/   +2%
16384/     8/   +2%/   +9%
65535/     1/   -3%/   -5%
65535/     4/   +9%/   +5%
65535/     8/    0%/   -8%

- TCP_RR get obviously improved (at most 17%)
- obvious improvement on Guest TX (at most 44%)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next rfc V2 0/2] basic busy polling support for vhost_net
  2015-10-30 11:58 ` Jason Wang
  2015-11-03  7:46   ` Jason Wang
@ 2015-11-03  7:46   ` Jason Wang
  1 sibling, 0 replies; 5+ messages in thread
From: Jason Wang @ 2015-11-03  7:46 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev, linux-kernel



On 10/30/2015 07:58 PM, Jason Wang wrote:
>
> On 10/29/2015 04:45 PM, Jason Wang wrote:
>> Hi all:
>>
>> This series tries to add basic busy polling for vhost net. The idea is
>> simple: at the end of tx processing, busy polling for new tx added
>> descriptor and rx receive socket for a while. The maximum number of
>> time (in us) could be spent on busy polling was specified through
>> module parameter.
>>
>> Test were done through:
>>
>> - 50 us as busy loop timeout
>> - Netperf 2.6
>> - Two machines with back to back connected mlx4
>> - Guest with 8 vcpus and 1 queue
>>
>> Result shows very huge improvement on both tx (at most 158%) and rr
>> (at most 53%) while rx is as much as in the past. Most cases the cpu
>> utilization is also improved:
>>
> Just notice there's something wrong in the setup. So the numbers are
> incorrect here. Will re-run and post correct number here.
>
> Sorry.

Here's the updated testing result:

1) 1 vcpu 1 queue:

TCP_RR
size/session/+thu%/+normalize%
    1/     1/    0%/  -25%
    1/    50/  +12%/    0%
    1/   100/  +12%/   +1%
    1/   200/   +9%/   -1%
   64/     1/   +3%/  -21%
   64/    50/   +8%/    0%
   64/   100/   +7%/    0%
   64/   200/   +9%/    0%
  256/     1/   +1%/  -25%
  256/    50/   +7%/   -2%
  256/   100/   +6%/   -2%
  256/   200/   +4%/   -2%
  512/     1/   +2%/  -19%
  512/    50/   +5%/   -2%
  512/   100/   +3%/   -3%
  512/   200/   +6%/   -2%
 1024/     1/   +2%/  -20%
 1024/    50/   +3%/   -3%
 1024/   100/   +5%/   -3%
 1024/   200/   +4%/   -2%
Guest RX
size/session/+thu%/+normalize%
   64/     1/   -4%/   -5%
   64/     4/   -3%/  -10%
   64/     8/   -3%/   -5%
  512/     1/  +15%/   +1%
  512/     4/   -5%/   -5%
  512/     8/   -2%/   -4%
 1024/     1/   -5%/  -16%
 1024/     4/   -2%/   -5%
 1024/     8/   -6%/   -6%
 2048/     1/  +10%/   +5%
 2048/     4/   -8%/   -4%
 2048/     8/   -1%/   -4%
 4096/     1/   -9%/  -11%
 4096/     4/   +1%/   -1%
 4096/     8/   +1%/    0%
16384/     1/  +20%/  +11%
16384/     4/    0%/   -3%
16384/     8/   +1%/    0%
65535/     1/  +36%/  +13%
65535/     4/  -10%/   -9%
65535/     8/   -3%/   -2%
Guest TX
size/session/+thu%/+normalize%
   64/     1/   -7%/  -16%
   64/     4/  -14%/  -23%
   64/     8/   -9%/  -20%
  512/     1/  -62%/  -56%
  512/     4/  -62%/  -56%
  512/     8/  -61%/  -53%
 1024/     1/  -66%/  -61%
 1024/     4/  -77%/  -73%
 1024/     8/  -73%/  -67%
 2048/     1/  -74%/  -75%
 2048/     4/  -77%/  -74%
 2048/     8/  -72%/  -68%
 4096/     1/  -65%/  -68%
 4096/     4/  -66%/  -63%
 4096/     8/  -62%/  -57%
16384/     1/  -25%/  -28%
16384/     4/  -28%/  -17%
16384/     8/  -24%/  -10%
65535/     1/  -17%/  -14%
65535/     4/  -22%/   -5%
65535/     8/  -25%/   -9%

- obvious improvement on TCP_RR (at most 12%)
- improvement on guest RX
- huge decreasing on Guest TX (at most -75%), this is probably because
virtio-net driver suffers from buffer bloat by orphaning skb before
transmission. The faster vhost it is, the smaller packet it could
produced. To reduce the impact on this, turning off gso in guest can
result the following result:

size/session/+thu%/+normalize%
   64/     1/   +3%/  -11%
   64/     4/   +4%/  -10%
   64/     8/   +4%/  -10%
  512/     1/   +2%/   +5%
  512/     4/    0%/   -1%
  512/     8/    0%/    0%
 1024/     1/  +11%/    0%
 1024/     4/    0%/   -1%
 1024/     8/   +3%/   +1%
 2048/     1/   +4%/   -1%
 2048/     4/   +8%/   +3%
 2048/     8/    0%/   -1%
 4096/     1/   +4%/   -1%
 4096/     4/   +1%/    0%
 4096/     8/   +2%/    0%
16384/     1/   +2%/   -2%
16384/     4/   +3%/   +1%
16384/     8/    0%/   -1%
65535/     1/   +9%/   +7%
65535/     4/    0%/   -3%
65535/     8/   -1%/   -1%

2) 8 vcpus 1 queue:

TCP_RR
size/session/+thu%/+normalize%
    1/     1/   +5%/  -14%
    1/    50/   +2%/   +1%
    1/   100/    0%/   -1%
    1/   200/    0%/    0%
   64/     1/    0%/  -25%
   64/    50/   +5%/   +5%
   64/   100/    0%/    0%
   64/   200/    0%/   -1%
  256/     1/    0%/  -30%
  256/    50/    0%/    0%
  256/   100/   -2%/   -2%
  256/   200/    0%/    0%
  512/     1/   +1%/  -23%
  512/    50/   +1%/   +1%
  512/   100/   +1%/    0%
  512/   200/   +1%/   +1%
 1024/     1/   +1%/  -23%
 1024/    50/   +5%/   +5%
 1024/   100/    0%/   -1%
 1024/   200/    0%/    0%
Guest RX
size/session/+thu%/+normalize%
   64/     1/   +1%/   +1%
   64/     4/   -2%/   +1%
   64/     8/   +6%/  +19%
  512/     1/   +5%/   -7%
  512/     4/   -4%/   -4%
  512/     8/    0%/    0%
 1024/     1/   +1%/   +2%
 1024/     4/   -2%/   -2%
 1024/     8/   -1%/   +7%
 2048/     1/   +8%/   -2%
 2048/     4/    0%/   +5%
 2048/     8/   -1%/  +13%
 4096/     1/   -1%/   +2%
 4096/     4/    0%/   +6%
 4096/     8/   -2%/  +15%
16384/     1/   -1%/    0%
16384/     4/   -2%/   -1%
16384/     8/   -2%/   +2%
65535/     1/   -2%/    0%
65535/     4/   -3%/   -3%
65535/     8/   -2%/   +2%
Guest TX
size/session/+thu%/+normalize%
   64/     1/   +6%/   +3%
   64/     4/  +11%/   +8%
   64/     8/    0%/    0%
  512/     1/  +19%/  +18%
  512/     4/   -4%/   +1%
  512/     8/   -1%/   -1%
 1024/     1/    0%/   +8%
 1024/     4/   -1%/   -1%
 1024/     8/    0%/   +1%
 2048/     1/   +1%/    0%
 2048/     4/   -1%/   -2%
 2048/     8/    0%/    0%
 4096/     1/  +12%/  +14%
 4096/     4/    0%/   -1%
 4096/     8/   -2%/   -1%
16384/     1/   +9%/   +6%
16384/     4/   +3%/   -1%
16384/     8/   +2%/   -1%
65535/     1/   +1%/   -2%
65535/     4/    0%/   -4%
65535/     8/    0%/   -2%

- latency get improved a little bit
- small improvement on single session rx
- no other obvious changes
- this may because 8 vcpu could give enough stress on a single vhost
thread. Then the busy polling was not trigged enough (unless on light
load case e.g 1 session TCP_RR).

3) 8 vcpus 8 queues

8 vcpu 8 queue
TCP_RR
size/session/+thu%/+normalize%
    1/     1/   +6%/  -16%
    1/    50/  +14%/   +1%
    1/   100/  +17%/   +3%
    1/   200/  +16%/   +2%
   64/     1/   +2%/  -19%
   64/    50/  +10%/    0%
   64/   100/  +17%/   +5%
   64/   200/  +15%/   +3%
  256/     1/    0%/  -19%
  256/    50/   +5%/   -3%
  256/   100/   +4%/   -3%
  256/   200/   +2%/   -4%
  512/     1/   +4%/  -19%
  512/    50/   +7%/   -2%
  512/   100/   +4%/   -4%
  512/   200/   +3%/   -4%
 1024/     1/   +9%/  -19%
 1024/    50/   +6%/   -2%
 1024/   100/   +5%/   -3%
 1024/   200/   +5%/   -3%
Guest RX
size/session/+thu%/+normalize%
   64/     1/  +18%/  +13%
   64/     4/    0%/   -1%
   64/     8/   -4%/  -11%
  512/     1/   +3%/   -6%
  512/     4/   +1%/  -11%
  512/     8/   -1%/   -7%
 1024/     1/    0%/   -9%
 1024/     4/   +9%/  -16%
 1024/     8/   -1%/  -11%
 2048/     1/    0%/   -2%
 2048/     4/    0%/  -16%
 2048/     8/   -1%/   -2%
 4096/     1/   +3%/    0%
 4096/     4/   -1%/  -12%
 4096/     8/    0%/   -5%
16384/     1/   -2%/   -6%
16384/     4/    0%/   -6%
16384/     8/    0%/   -6%
65535/     1/    0%/    0%
65535/     4/    0%/   -9%
65535/     8/    0%/   +1%
Guest TX
size/session/+thu%/+normalize%
   64/     1/   +7%/   +3%
   64/     4/   +6%/    0%
   64/     8/  +10%/   +5%
  512/     1/    0%/  +14%
  512/     4/   +9%/   -1%
  512/     8/  +14%/   +4%
 1024/     1/  +44%/  +37%
 1024/     4/   +6%/   +2%
 1024/     8/  +19%/  +12%
 2048/     1/  -14%/  -16%
 2048/     4/  +11%/   +8%
 2048/     8/  +26%/  +28%
 4096/     1/  +21%/  +19%
 4096/     4/   +2%/  +10%
 4096/     8/  +14%/   +7%
16384/     1/  +12%/   +4%
16384/     4/   +7%/   +2%
16384/     8/   +2%/   +9%
65535/     1/   -3%/   -5%
65535/     4/   +9%/   +5%
65535/     8/    0%/   -8%

- TCP_RR get obviously improved (at most 17%)
- obvious improvement on Guest TX (at most 44%)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next rfc V2 0/2] basic busy polling support for vhost_net
  2015-10-29  8:45 Jason Wang
@ 2015-10-30 11:58 ` Jason Wang
  2015-11-03  7:46   ` Jason Wang
  2015-11-03  7:46   ` Jason Wang
  0 siblings, 2 replies; 5+ messages in thread
From: Jason Wang @ 2015-10-30 11:58 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev, linux-kernel



On 10/29/2015 04:45 PM, Jason Wang wrote:
> Hi all:
>
> This series tries to add basic busy polling for vhost net. The idea is
> simple: at the end of tx processing, busy polling for new tx added
> descriptor and rx receive socket for a while. The maximum number of
> time (in us) could be spent on busy polling was specified through
> module parameter.
>
> Test were done through:
>
> - 50 us as busy loop timeout
> - Netperf 2.6
> - Two machines with back to back connected mlx4
> - Guest with 8 vcpus and 1 queue
>
> Result shows very huge improvement on both tx (at most 158%) and rr
> (at most 53%) while rx is as much as in the past. Most cases the cpu
> utilization is also improved:
>

Just notice there's something wrong in the setup. So the numbers are
incorrect here. Will re-run and post correct number here.

Sorry.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH net-next rfc V2 0/2] basic busy polling support for vhost_net
@ 2015-10-29  8:45 Jason Wang
  2015-10-30 11:58 ` Jason Wang
  0 siblings, 1 reply; 5+ messages in thread
From: Jason Wang @ 2015-10-29  8:45 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev, linux-kernel; +Cc: Jason Wang

Hi all:

This series tries to add basic busy polling for vhost net. The idea is
simple: at the end of tx processing, busy polling for new tx added
descriptor and rx receive socket for a while. The maximum number of
time (in us) could be spent on busy polling was specified through
module parameter.

Test were done through:

- 50 us as busy loop timeout
- Netperf 2.6
- Two machines with back to back connected mlx4
- Guest with 8 vcpus and 1 queue

Result shows very huge improvement on both tx (at most 158%) and rr
(at most 53%) while rx is as much as in the past. Most cases the cpu
utilization is also improved:

Guest TX:
size/session/+thu%/+normalize%
   64/     1/  +17%/   +6%
   64/     4/   +9%/  +17%
   64/     8/  +34%/  +21%
  512/     1/  +48%/  +40%
  512/     4/  +31%/  +20%
  512/     8/  +39%/  +22%
 1024/     1/ +158%/  +99%
 1024/     4/  +20%/  +11%
 1024/     8/  +40%/  +18%
 2048/     1/ +108%/  +74%
 2048/     4/  +21%/   +7%
 2048/     8/  +32%/  +14%
 4096/     1/  +94%/  +77%
 4096/     4/   +7%/   -6%
 4096/     8/   +9%/   -4%
16384/     1/  +33%/   +9%
16384/     4/  +10%/   -6%
16384/     8/  +19%/   +2%
65535/     1/  +15%/   -6%
65535/     4/   +8%/   -9%
65535/     8/  +14%/    0%

Guest RX:
size/session/+thu%/+normalize%
   64/     1/   -3%/   -3%
   64/     4/   +4%/  +20%
   64/     8/   -1%/   -1%
  512/     1/  +20%/  +12%
  512/     4/   +1%/   +3%
  512/     8/    0%/   -5%
 1024/     1/   +9%/   -2%
 1024/     4/    0%/   +5%
 1024/     8/   +1%/    0%
 2048/     1/    0%/   +3%
 2048/     4/   -2%/   +3%
 2048/     8/   -1%/   -3%
 4096/     1/   -8%/   +3%
 4096/     4/    0%/   +2%
 4096/     8/    0%/   +5%
16384/     1/   +3%/    0%
16384/     4/   +2%/   +2%
16384/     8/    0%/  +13%
65535/     1/    0%/   +3%
65535/     4/   +2%/   -1%
65535/     8/   +1%/  +14%

TCP_RR:
size/session/+thu%/+normalize%
    1/     1/   +8%/   -6%
    1/    50/  +18%/  +15%
    1/   100/  +22%/  +19%
    1/   200/  +25%/  +23%
   64/     1/   +2%/  -19%
   64/    50/  +46%/  +39%
   64/   100/  +47%/  +39%
   64/   200/  +50%/  +44%
  512/     1/    0%/  -28%
  512/    50/  +50%/  +44%
  512/   100/  +53%/  +47%
  512/   200/  +51%/  +58%
 1024/     1/   +3%/  -14%
 1024/    50/  +45%/  +37%
 1024/   100/  +53%/  +49%
 1024/   200/  +48%/  +55%

Changes from V1:
- Add a comment for vhost_has_work() to explain why it could be
  lockless
- Add param description for busyloop_timeout
- Split out the busy polling logic into a new helper
- Check and exit the loop when there's a pending signal
- Disable preemption during busy looping to make sure lock_clock() was
  correctly used.

Todo:
- Make the busyloop timeout could be configure per VM through ioctl.

Please review.

Thanks

Jason Wang (2):
  vhost: introduce vhost_has_work()
  vhost_net: basic polling support

 drivers/vhost/net.c   | 54 +++++++++++++++++++++++++++++++++++++++++++++++----
 drivers/vhost/vhost.c |  7 +++++++
 drivers/vhost/vhost.h |  1 +
 3 files changed, 58 insertions(+), 4 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-11-03  7:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-29  8:45 [PATCH net-next rfc V2 0/2] basic busy polling support for vhost_net Jason Wang
  -- strict thread matches above, loose matches on Subject: below --
2015-10-29  8:45 Jason Wang
2015-10-30 11:58 ` Jason Wang
2015-11-03  7:46   ` Jason Wang
2015-11-03  7:46   ` Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.