* [PATCH V2 0/3] basic busy polling support for vhost_net
@ 2015-12-01 6:39 Jason Wang
0 siblings, 0 replies; 7+ messages in thread
From: Jason Wang @ 2015-12-01 6:39 UTC (permalink / raw)
To: mst, kvm, virtualization, netdev, linux-kernel
Hi all:
This series tries to add basic busy polling for vhost net. The idea is
simple: at the end of tx/rx processing, busy polling for new tx added
descriptor and rx receive socket for a while. The maximum number of
time (in us) could be spent on busy polling was specified ioctl.
Test A were done through:
- 50 us as busy loop timeout
- Netperf 2.6
- Two machines with back to back connected ixgbe
- Guest with 1 vcpu and 1 queue
Results:
- For stream workload, ioexits were reduced dramatically in medium
size (1024-2048) of tx (at most -43%) and almost all rx (at most
-84%) as a result of polling. This compensate for the possible
wasted cpu cycles more or less. That porbably why we can still see
some increasing in the normalized throughput in some cases.
- Throughput of tx were increased (at most 50%) expect for the huge
write (16384). And we can send more packets in the case (+tpkts were
increased).
- Very minor rx regression in some cases.
- Improvemnt on TCP_RR (at most 17%).
Guest TX:
size/session/+thu%/+normalize%/+tpkts%/+rpkts%/+ioexits%/
64/ 1/ +18%/ -10%/ +7%/ +11%/ 0%
64/ 2/ +14%/ -13%/ +7%/ +10%/ 0%
64/ 4/ +8%/ -17%/ +7%/ +9%/ 0%
64/ 8/ +11%/ -15%/ +7%/ +10%/ 0%
256/ 1/ +35%/ +9%/ +21%/ +12%/ -11%
256/ 2/ +26%/ +2%/ +20%/ +9%/ -10%
256/ 4/ +23%/ 0%/ +21%/ +10%/ -9%
256/ 8/ +23%/ 0%/ +21%/ +9%/ -9%
512/ 1/ +31%/ +9%/ +23%/ +18%/ -12%
512/ 2/ +30%/ +8%/ +24%/ +15%/ -10%
512/ 4/ +26%/ +5%/ +24%/ +14%/ -11%
512/ 8/ +32%/ +9%/ +23%/ +15%/ -11%
1024/ 1/ +39%/ +16%/ +29%/ +22%/ -26%
1024/ 2/ +35%/ +14%/ +30%/ +21%/ -22%
1024/ 4/ +34%/ +13%/ +32%/ +21%/ -25%
1024/ 8/ +36%/ +14%/ +32%/ +19%/ -26%
2048/ 1/ +50%/ +27%/ +34%/ +26%/ -42%
2048/ 2/ +43%/ +21%/ +36%/ +25%/ -43%
2048/ 4/ +41%/ +20%/ +37%/ +27%/ -43%
2048/ 8/ +40%/ +18%/ +35%/ +25%/ -42%
16384/ 1/ 0%/ -12%/ -1%/ +8%/ +15%
16384/ 2/ 0%/ -10%/ +1%/ +4%/ +5%
16384/ 4/ 0%/ -11%/ -3%/ 0%/ +3%
16384/ 8/ 0%/ -10%/ -4%/ 0%/ +1%
Guest RX:
size/session/+thu%/+normalize%/+tpkts%/+rpkts%/+ioexits%/
64/ 1/ -2%/ -21%/ +1%/ +2%/ -75%
64/ 2/ +1%/ -9%/ +12%/ 0%/ -55%
64/ 4/ 0%/ -6%/ +5%/ -1%/ -44%
64/ 8/ -5%/ -5%/ +7%/ -23%/ -50%
256/ 1/ -8%/ -18%/ +16%/ +15%/ -63%
256/ 2/ 0%/ -8%/ +9%/ -2%/ -26%
256/ 4/ 0%/ -7%/ -8%/ +20%/ -41%
256/ 8/ -8%/ -11%/ -9%/ -24%/ -78%
512/ 1/ -6%/ -19%/ +20%/ +18%/ -29%
512/ 2/ 0%/ -10%/ -14%/ -8%/ -31%
512/ 4/ -1%/ -5%/ -11%/ -9%/ -38%
512/ 8/ -7%/ -9%/ -17%/ -22%/ -81%
1024/ 1/ 0%/ -16%/ +12%/ +9%/ -11%
1024/ 2/ 0%/ -11%/ 0%/ +3%/ -30%
1024/ 4/ 0%/ -4%/ +2%/ +6%/ -15%
1024/ 8/ -3%/ -4%/ -8%/ -8%/ -70%
2048/ 1/ -8%/ -23%/ +36%/ +22%/ -11%
2048/ 2/ 0%/ -12%/ +1%/ +3%/ -29%
2048/ 4/ 0%/ -3%/ -17%/ -15%/ -84%
2048/ 8/ 0%/ -3%/ +1%/ -3%/ +10%
16384/ 1/ 0%/ -11%/ +4%/ +7%/ -22%
16384/ 2/ 0%/ -7%/ +4%/ +4%/ -33%
16384/ 4/ 0%/ -2%/ -2%/ -4%/ -23%
16384/ 8/ -1%/ -2%/ +1%/ -22%/ -40%
TCP_RR:
size/session/+thu%/+normalize%/+tpkts%/+rpkts%/+ioexits%/
1/ 1/ +11%/ -26%/ +11%/ +11%/ +10%
1/ 25/ +11%/ -15%/ +11%/ +11%/ 0%
1/ 50/ +9%/ -16%/ +10%/ +10%/ 0%
1/ 100/ +9%/ -15%/ +9%/ +9%/ 0%
64/ 1/ +11%/ -31%/ +11%/ +11%/ +11%
64/ 25/ +12%/ -14%/ +12%/ +12%/ 0%
64/ 50/ +11%/ -14%/ +12%/ +12%/ 0%
64/ 100/ +11%/ -15%/ +11%/ +11%/ 0%
256/ 1/ +11%/ -27%/ +11%/ +11%/ +10%
256/ 25/ +17%/ -11%/ +16%/ +16%/ -1%
256/ 50/ +16%/ -11%/ +17%/ +17%/ +1%
256/ 100/ +17%/ -11%/ +18%/ +18%/ +1%
Test B were done through:
- 50us as busy loop timeout
- Netperf 2.6
- Two machines with back to back connected ixgbe
- Two guests each wich 1 vcpu and 1 queue
- pin two vhost threads to the same cpu on host to simulate the cpu
contending
Results:
- In this radical case, we can still get at most 14% improvement on
TCP_RR.
- For guest tx stream, minor improvemnt with at most 5% regression in
one byte case. For guest rx stream, at most 5% regression were seen.
Guest TX:
size /-+% /
1 /-5.55%/
64 /+1.11%/
256 /+2.33%/
512 /-0.03%/
1024 /+1.14%/
4096 /+0.00%/
16384/+0.00%/
Guest RX:
size /-+% /
1 /-5.11%/
64 /-0.55%/
256 /-2.35%/
512 /-3.39%/
1024 /+6.8% /
4096 /-0.01%/
16384/+0.00%/
TCP_RR:
size /-+% /
1 /+9.79% /
64 /+4.51% /
256 /+6.47% /
512 /-3.37% /
1024 /+6.15% /
4096 /+14.88%/
16384/-2.23% /
Changes from V1:
- Remove the buggy vq_error() in vhost_vq_more_avail().
- Leave vhost_enable_notify() untouched.
Changes from RFC V3:
- small tweak on the code to avoid multiple duplicate conditions in
critical path when busy loop is not enabled.
- Add the test result of multiple VMs
Changes from RFC V2:
- poll also at the end of rx handling
- factor out the polling logic and optimize the code a little bit
- add two ioctls to get and set the busy poll timeout
- test on ixgbe (which can give more stable and reproducable numbers)
instead of mlx4.
Changes from RFC V1:
- Add a comment for vhost_has_work() to explain why it could be
lockless
- Add param description for busyloop_timeout
- Split out the busy polling logic into a new helper
- Check and exit the loop when there's a pending signal
- Disable preemption during busy looping to make sure lock_clock() was
correctly used.
Jason Wang (3):
vhost: introduce vhost_has_work()
vhost: introduce vhost_vq_more_avail()
vhost_net: basic polling support
drivers/vhost/net.c | 72 ++++++++++++++++++++++++++++++++++++++++++----
drivers/vhost/vhost.c | 35 ++++++++++++++++++++++
drivers/vhost/vhost.h | 3 ++
include/uapi/linux/vhost.h | 11 +++++++
4 files changed, 116 insertions(+), 5 deletions(-)
--
2.5.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH V2 0/3] basic busy polling support for vhost_net
@ 2015-12-01 6:39 Jason Wang
2016-01-24 9:00 ` Mike Rapoport
0 siblings, 1 reply; 7+ messages in thread
From: Jason Wang @ 2015-12-01 6:39 UTC (permalink / raw)
To: mst, kvm, virtualization, netdev, linux-kernel; +Cc: Jason Wang
Hi all:
This series tries to add basic busy polling for vhost net. The idea is
simple: at the end of tx/rx processing, busy polling for new tx added
descriptor and rx receive socket for a while. The maximum number of
time (in us) could be spent on busy polling was specified ioctl.
Test A were done through:
- 50 us as busy loop timeout
- Netperf 2.6
- Two machines with back to back connected ixgbe
- Guest with 1 vcpu and 1 queue
Results:
- For stream workload, ioexits were reduced dramatically in medium
size (1024-2048) of tx (at most -43%) and almost all rx (at most
-84%) as a result of polling. This compensate for the possible
wasted cpu cycles more or less. That porbably why we can still see
some increasing in the normalized throughput in some cases.
- Throughput of tx were increased (at most 50%) expect for the huge
write (16384). And we can send more packets in the case (+tpkts were
increased).
- Very minor rx regression in some cases.
- Improvemnt on TCP_RR (at most 17%).
Guest TX:
size/session/+thu%/+normalize%/+tpkts%/+rpkts%/+ioexits%/
64/ 1/ +18%/ -10%/ +7%/ +11%/ 0%
64/ 2/ +14%/ -13%/ +7%/ +10%/ 0%
64/ 4/ +8%/ -17%/ +7%/ +9%/ 0%
64/ 8/ +11%/ -15%/ +7%/ +10%/ 0%
256/ 1/ +35%/ +9%/ +21%/ +12%/ -11%
256/ 2/ +26%/ +2%/ +20%/ +9%/ -10%
256/ 4/ +23%/ 0%/ +21%/ +10%/ -9%
256/ 8/ +23%/ 0%/ +21%/ +9%/ -9%
512/ 1/ +31%/ +9%/ +23%/ +18%/ -12%
512/ 2/ +30%/ +8%/ +24%/ +15%/ -10%
512/ 4/ +26%/ +5%/ +24%/ +14%/ -11%
512/ 8/ +32%/ +9%/ +23%/ +15%/ -11%
1024/ 1/ +39%/ +16%/ +29%/ +22%/ -26%
1024/ 2/ +35%/ +14%/ +30%/ +21%/ -22%
1024/ 4/ +34%/ +13%/ +32%/ +21%/ -25%
1024/ 8/ +36%/ +14%/ +32%/ +19%/ -26%
2048/ 1/ +50%/ +27%/ +34%/ +26%/ -42%
2048/ 2/ +43%/ +21%/ +36%/ +25%/ -43%
2048/ 4/ +41%/ +20%/ +37%/ +27%/ -43%
2048/ 8/ +40%/ +18%/ +35%/ +25%/ -42%
16384/ 1/ 0%/ -12%/ -1%/ +8%/ +15%
16384/ 2/ 0%/ -10%/ +1%/ +4%/ +5%
16384/ 4/ 0%/ -11%/ -3%/ 0%/ +3%
16384/ 8/ 0%/ -10%/ -4%/ 0%/ +1%
Guest RX:
size/session/+thu%/+normalize%/+tpkts%/+rpkts%/+ioexits%/
64/ 1/ -2%/ -21%/ +1%/ +2%/ -75%
64/ 2/ +1%/ -9%/ +12%/ 0%/ -55%
64/ 4/ 0%/ -6%/ +5%/ -1%/ -44%
64/ 8/ -5%/ -5%/ +7%/ -23%/ -50%
256/ 1/ -8%/ -18%/ +16%/ +15%/ -63%
256/ 2/ 0%/ -8%/ +9%/ -2%/ -26%
256/ 4/ 0%/ -7%/ -8%/ +20%/ -41%
256/ 8/ -8%/ -11%/ -9%/ -24%/ -78%
512/ 1/ -6%/ -19%/ +20%/ +18%/ -29%
512/ 2/ 0%/ -10%/ -14%/ -8%/ -31%
512/ 4/ -1%/ -5%/ -11%/ -9%/ -38%
512/ 8/ -7%/ -9%/ -17%/ -22%/ -81%
1024/ 1/ 0%/ -16%/ +12%/ +9%/ -11%
1024/ 2/ 0%/ -11%/ 0%/ +3%/ -30%
1024/ 4/ 0%/ -4%/ +2%/ +6%/ -15%
1024/ 8/ -3%/ -4%/ -8%/ -8%/ -70%
2048/ 1/ -8%/ -23%/ +36%/ +22%/ -11%
2048/ 2/ 0%/ -12%/ +1%/ +3%/ -29%
2048/ 4/ 0%/ -3%/ -17%/ -15%/ -84%
2048/ 8/ 0%/ -3%/ +1%/ -3%/ +10%
16384/ 1/ 0%/ -11%/ +4%/ +7%/ -22%
16384/ 2/ 0%/ -7%/ +4%/ +4%/ -33%
16384/ 4/ 0%/ -2%/ -2%/ -4%/ -23%
16384/ 8/ -1%/ -2%/ +1%/ -22%/ -40%
TCP_RR:
size/session/+thu%/+normalize%/+tpkts%/+rpkts%/+ioexits%/
1/ 1/ +11%/ -26%/ +11%/ +11%/ +10%
1/ 25/ +11%/ -15%/ +11%/ +11%/ 0%
1/ 50/ +9%/ -16%/ +10%/ +10%/ 0%
1/ 100/ +9%/ -15%/ +9%/ +9%/ 0%
64/ 1/ +11%/ -31%/ +11%/ +11%/ +11%
64/ 25/ +12%/ -14%/ +12%/ +12%/ 0%
64/ 50/ +11%/ -14%/ +12%/ +12%/ 0%
64/ 100/ +11%/ -15%/ +11%/ +11%/ 0%
256/ 1/ +11%/ -27%/ +11%/ +11%/ +10%
256/ 25/ +17%/ -11%/ +16%/ +16%/ -1%
256/ 50/ +16%/ -11%/ +17%/ +17%/ +1%
256/ 100/ +17%/ -11%/ +18%/ +18%/ +1%
Test B were done through:
- 50us as busy loop timeout
- Netperf 2.6
- Two machines with back to back connected ixgbe
- Two guests each wich 1 vcpu and 1 queue
- pin two vhost threads to the same cpu on host to simulate the cpu
contending
Results:
- In this radical case, we can still get at most 14% improvement on
TCP_RR.
- For guest tx stream, minor improvemnt with at most 5% regression in
one byte case. For guest rx stream, at most 5% regression were seen.
Guest TX:
size /-+% /
1 /-5.55%/
64 /+1.11%/
256 /+2.33%/
512 /-0.03%/
1024 /+1.14%/
4096 /+0.00%/
16384/+0.00%/
Guest RX:
size /-+% /
1 /-5.11%/
64 /-0.55%/
256 /-2.35%/
512 /-3.39%/
1024 /+6.8% /
4096 /-0.01%/
16384/+0.00%/
TCP_RR:
size /-+% /
1 /+9.79% /
64 /+4.51% /
256 /+6.47% /
512 /-3.37% /
1024 /+6.15% /
4096 /+14.88%/
16384/-2.23% /
Changes from V1:
- Remove the buggy vq_error() in vhost_vq_more_avail().
- Leave vhost_enable_notify() untouched.
Changes from RFC V3:
- small tweak on the code to avoid multiple duplicate conditions in
critical path when busy loop is not enabled.
- Add the test result of multiple VMs
Changes from RFC V2:
- poll also at the end of rx handling
- factor out the polling logic and optimize the code a little bit
- add two ioctls to get and set the busy poll timeout
- test on ixgbe (which can give more stable and reproducable numbers)
instead of mlx4.
Changes from RFC V1:
- Add a comment for vhost_has_work() to explain why it could be
lockless
- Add param description for busyloop_timeout
- Split out the busy polling logic into a new helper
- Check and exit the loop when there's a pending signal
- Disable preemption during busy looping to make sure lock_clock() was
correctly used.
Jason Wang (3):
vhost: introduce vhost_has_work()
vhost: introduce vhost_vq_more_avail()
vhost_net: basic polling support
drivers/vhost/net.c | 72 ++++++++++++++++++++++++++++++++++++++++++----
drivers/vhost/vhost.c | 35 ++++++++++++++++++++++
drivers/vhost/vhost.h | 3 ++
include/uapi/linux/vhost.h | 11 +++++++
4 files changed, 116 insertions(+), 5 deletions(-)
--
2.5.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH V2 0/3] basic busy polling support for vhost_net
2015-12-01 6:39 Jason Wang
@ 2016-01-24 9:00 ` Mike Rapoport
2016-01-25 3:00 ` Jason Wang
0 siblings, 1 reply; 7+ messages in thread
From: Mike Rapoport @ 2016-01-24 9:00 UTC (permalink / raw)
To: linux-kernel
Hi Jason,
> Jason Wang <jasowang <at> redhat.com> writes:
>
> Hi all:
>
> This series tries to add basic busy polling for vhost net. The idea is
> simple: at the end of tx/rx processing, busy polling for new tx added
> descriptor and rx receive socket for a while.
There were several conciens Michael raised on the Razya's attempt to add
polling to vhost-net ([1], [2]). Some of them seem relevant for these
patches as well:
- What happens in overcommit scenarios?
- Have you checked the effect of polling on some macro benchmarks?
> The maximum number of time (in us) could be spent on busy polling was
> specified ioctl.
Although ioctl is definitely more appropriate interface to allow user to
tune polling, it's still not clear for me how *end user* will interact with
it and how easy it would be for him/her.
[1] http://thread.gmane.org/gmane.linux.kernel/1765593
[2] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/131343
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH V2 0/3] basic busy polling support for vhost_net
2016-01-24 9:00 ` Mike Rapoport
@ 2016-01-25 3:00 ` Jason Wang
2016-01-25 7:58 ` Michael Rapoport
0 siblings, 1 reply; 7+ messages in thread
From: Jason Wang @ 2016-01-25 3:00 UTC (permalink / raw)
To: Mike Rapoport, linux-kernel
On 01/24/2016 05:00 PM, Mike Rapoport wrote:
> Hi Jason,
>
>> Jason Wang <jasowang <at> redhat.com> writes:
>>
>> Hi all:
>>
>> This series tries to add basic busy polling for vhost net. The idea is
>> simple: at the end of tx/rx processing, busy polling for new tx added
>> descriptor and rx receive socket for a while.
> There were several conciens Michael raised on the Razya's attempt to add
> polling to vhost-net ([1], [2]). Some of them seem relevant for these
> patches as well:
>
> - What happens in overcommit scenarios?
We have an optimization here: busy polling will end if more than one
processes is runnable on local cpu. This was done by checking
single_task_running() in each iteration. So at the worst case, busy
polling should be as fast as or only a minor regression compared to
normal case. You can see this from the last test result.
> - Have you checked the effect of polling on some macro benchmarks?
I'm not sure I get the question. Cover letters shows some benchmark
result of netperf. What do you mean by "macro benchmarks"?
>
>> The maximum number of time (in us) could be spent on busy polling was
>> specified ioctl.
> Although ioctl is definitely more appropriate interface to allow user to
> tune polling, it's still not clear for me how *end user* will interact with
> it and how easy it would be for him/her.
There will be qemu part of the codes for end user. E.g. a vhost_poll_us
parameter for tap like:
-netdev tap,id=hn0,vhost=on,vhost_pull_us=20
Thanks
>
> [1] http://thread.gmane.org/gmane.linux.kernel/1765593
> [2] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/131343
>
> --
> Sincerely yours,
> Mike.
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH V2 0/3] basic busy polling support for vhost_net
2016-01-25 3:00 ` Jason Wang
@ 2016-01-25 7:58 ` Michael Rapoport
2016-01-25 8:41 ` Jason Wang
0 siblings, 1 reply; 7+ messages in thread
From: Michael Rapoport @ 2016-01-25 7:58 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, virtualization, linux-kernel, kvm, Michael S. Tsirkin
[-- Attachment #1.1: Type: text/plain, Size: 2411 bytes --]
(restored 'CC, sorry for dropping it originally, Notes is still hard for
me)
> Jason Wang <jasowang@redhat.com> wrote on 01/25/2016 05:00:05 AM:
> On 01/24/2016 05:00 PM, Mike Rapoport wrote:
> > Hi Jason,
> >
> >> Jason Wang <jasowang <at> redhat.com> writes:
> >>
> >> Hi all:
> >>
> >> This series tries to add basic busy polling for vhost net. The idea
is
> >> simple: at the end of tx/rx processing, busy polling for new tx added
> >> descriptor and rx receive socket for a while.
> > There were several conciens Michael raised on the Razya's attempt to
add
> > polling to vhost-net ([1], [2]). Some of them seem relevant for these
> > patches as well:
> >
> > - What happens in overcommit scenarios?
>
> We have an optimization here: busy polling will end if more than one
> processes is runnable on local cpu. This was done by checking
> single_task_running() in each iteration. So at the worst case, busy
> polling should be as fast as or only a minor regression compared to
> normal case. You can see this from the last test result.
>
> > - Have you checked the effect of polling on some macro benchmarks?
>
> I'm not sure I get the question. Cover letters shows some benchmark
> result of netperf. What do you mean by "macro benchmarks"?
Back then, when Razya posted her polling implementation, Michael had
concern about the macro effect ([3]),
so I was wondering if this concern is also valid for your implementation.
Now, after I've reread your changes, I think it's not that relevant...
> >> The maximum number of time (in us) could be spent on busy polling was
> >> specified ioctl.
> > Although ioctl is definitely more appropriate interface to allow user
to
> > tune polling, it's still not clear for me how *end user* will interact
with
> > it and how easy it would be for him/her.
>
> There will be qemu part of the codes for end user. E.g. a vhost_poll_us
> parameter for tap like:
>
> -netdev tap,id=hn0,vhost=on,vhost_pull_us=20
Not strictly related, I'd like to give a try to polling + vhost thread
sharing and polling + workqueues.
Do you mind sharing the scripts you used to test the polling?
Thanks,
Mike.
> Thanks
>
> >
> > [1] http://thread.gmane.org/gmane.linux.kernel/1765593
> > [2] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/131343
> >
> > --
> > Sincerely yours,
> > Mike.
> >
[3] https://www.mail-archive.com/kvm@vger.kernel.org/msg109703.html
[-- Attachment #1.2: Type: text/html, Size: 3689 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH V2 0/3] basic busy polling support for vhost_net
2016-01-25 7:58 ` Michael Rapoport
@ 2016-01-25 8:41 ` Jason Wang
0 siblings, 0 replies; 7+ messages in thread
From: Jason Wang @ 2016-01-25 8:41 UTC (permalink / raw)
To: Michael Rapoport
Cc: Michael S. Tsirkin, kvm, virtualization, netdev, linux-kernel
On 01/25/2016 03:58 PM, Michael Rapoport wrote:
> (restored 'CC, sorry for dropping it originally, Notes is still hard
> for me)
>
> > Jason Wang <jasowang@redhat.com> wrote on 01/25/2016 05:00:05 AM:
> > On 01/24/2016 05:00 PM, Mike Rapoport wrote:
> > > Hi Jason,
> > >
> > >> Jason Wang <jasowang <at> redhat.com> writes:
> > >>
> > >> Hi all:
> > >>
> > >> This series tries to add basic busy polling for vhost net. The
> idea is
> > >> simple: at the end of tx/rx processing, busy polling for new tx added
> > >> descriptor and rx receive socket for a while.
> > > There were several conciens Michael raised on the Razya's attempt
> to add
> > > polling to vhost-net ([1], [2]). Some of them seem relevant for these
> > > patches as well:
> > >
> > > - What happens in overcommit scenarios?
> >
> > We have an optimization here: busy polling will end if more than one
> > processes is runnable on local cpu. This was done by checking
> > single_task_running() in each iteration. So at the worst case, busy
> > polling should be as fast as or only a minor regression compared to
> > normal case. You can see this from the last test result.
> >
> > > - Have you checked the effect of polling on some macro benchmarks?
> >
> > I'm not sure I get the question. Cover letters shows some benchmark
> > result of netperf. What do you mean by "macro benchmarks"?
>
> Back then, when Razya posted her polling implementation, Michael had
> concern about the macro effect ([3]),
> so I was wondering if this concern is also valid for your implementation.
> Now, after I've reread your changes, I think it's not that relevant...
More benchmarks is good, but lots of kernel patches were accepted only
with simple netperf results. Anyway busy polling is disabled by default,
will try to do macro benchmark in the future if I had time.
>
>
> > >> The maximum number of time (in us) could be spent on busy polling was
> > >> specified ioctl.
> > > Although ioctl is definitely more appropriate interface to allow
> user to
> > > tune polling, it's still not clear for me how *end user* will
> interact with
> > > it and how easy it would be for him/her.
> >
> > There will be qemu part of the codes for end user. E.g. a vhost_poll_us
> > parameter for tap like:
> >
> > -netdev tap,id=hn0,vhost=on,vhost_pull_us=20
>
> Not strictly related, I'd like to give a try to polling + vhost thread
> sharing and polling + workqueues.
> Do you mind sharing the scripts you used to test the polling?
Sure, it was a subtest of autotest[1].
[1]
https://github.com/autotest/tp-qemu/blob/7cf589b490aff7511eccbf2e1336ecf8d9fa9cb9/generic/tests/netperf.py
>
>
> Thanks,
> Mike.
>
> > Thanks
> >
> > >
> > > [1] http://thread.gmane.org/gmane.linux.kernel/1765593
> > > [2] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/131343
> > >
> > > --
> > > Sincerely yours,
> > > Mike.
> > >
>
> [3] https://www.mail-archive.com/kvm@vger.kernel.org/msg109703.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH V2 0/3] basic busy polling support for vhost_net
@ 2016-01-25 8:41 ` Jason Wang
0 siblings, 0 replies; 7+ messages in thread
From: Jason Wang @ 2016-01-25 8:41 UTC (permalink / raw)
To: Michael Rapoport
Cc: netdev, virtualization, linux-kernel, kvm, Michael S. Tsirkin
On 01/25/2016 03:58 PM, Michael Rapoport wrote:
> (restored 'CC, sorry for dropping it originally, Notes is still hard
> for me)
>
> > Jason Wang <jasowang@redhat.com> wrote on 01/25/2016 05:00:05 AM:
> > On 01/24/2016 05:00 PM, Mike Rapoport wrote:
> > > Hi Jason,
> > >
> > >> Jason Wang <jasowang <at> redhat.com> writes:
> > >>
> > >> Hi all:
> > >>
> > >> This series tries to add basic busy polling for vhost net. The
> idea is
> > >> simple: at the end of tx/rx processing, busy polling for new tx added
> > >> descriptor and rx receive socket for a while.
> > > There were several conciens Michael raised on the Razya's attempt
> to add
> > > polling to vhost-net ([1], [2]). Some of them seem relevant for these
> > > patches as well:
> > >
> > > - What happens in overcommit scenarios?
> >
> > We have an optimization here: busy polling will end if more than one
> > processes is runnable on local cpu. This was done by checking
> > single_task_running() in each iteration. So at the worst case, busy
> > polling should be as fast as or only a minor regression compared to
> > normal case. You can see this from the last test result.
> >
> > > - Have you checked the effect of polling on some macro benchmarks?
> >
> > I'm not sure I get the question. Cover letters shows some benchmark
> > result of netperf. What do you mean by "macro benchmarks"?
>
> Back then, when Razya posted her polling implementation, Michael had
> concern about the macro effect ([3]),
> so I was wondering if this concern is also valid for your implementation.
> Now, after I've reread your changes, I think it's not that relevant...
More benchmarks is good, but lots of kernel patches were accepted only
with simple netperf results. Anyway busy polling is disabled by default,
will try to do macro benchmark in the future if I had time.
>
>
> > >> The maximum number of time (in us) could be spent on busy polling was
> > >> specified ioctl.
> > > Although ioctl is definitely more appropriate interface to allow
> user to
> > > tune polling, it's still not clear for me how *end user* will
> interact with
> > > it and how easy it would be for him/her.
> >
> > There will be qemu part of the codes for end user. E.g. a vhost_poll_us
> > parameter for tap like:
> >
> > -netdev tap,id=hn0,vhost=on,vhost_pull_us=20
>
> Not strictly related, I'd like to give a try to polling + vhost thread
> sharing and polling + workqueues.
> Do you mind sharing the scripts you used to test the polling?
Sure, it was a subtest of autotest[1].
[1]
https://github.com/autotest/tp-qemu/blob/7cf589b490aff7511eccbf2e1336ecf8d9fa9cb9/generic/tests/netperf.py
>
>
> Thanks,
> Mike.
>
> > Thanks
> >
> > >
> > > [1] http://thread.gmane.org/gmane.linux.kernel/1765593
> > > [2] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/131343
> > >
> > > --
> > > Sincerely yours,
> > > Mike.
> > >
>
> [3] https://www.mail-archive.com/kvm@vger.kernel.org/msg109703.html
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-01-25 8:41 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-01 6:39 [PATCH V2 0/3] basic busy polling support for vhost_net Jason Wang
2015-12-01 6:39 Jason Wang
2016-01-24 9:00 ` Mike Rapoport
2016-01-25 3:00 ` Jason Wang
2016-01-25 7:58 ` Michael Rapoport
2016-01-25 8:41 ` Jason Wang
2016-01-25 8:41 ` Jason Wang
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.