* net_sched strange in 4.11
@ 2017-05-08 7:15 Anton Ivanov
2017-05-09 7:46 ` DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11") Anton Ivanov
0 siblings, 1 reply; 11+ messages in thread
From: Anton Ivanov @ 2017-05-08 7:15 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev
Hi all,
I was revising some of my old work for UML to prepare it for submission
and I noticed that skb->xmit_more does not seem to be set any more.
I traced the issue as far as net/sched/sched_generic.c
try_bulk_dequeue_skb() is never invoked (the drivers I am working on are
dql enabled so that is not the problem).
More interestingly, if I put a breakpoint and debug output into
dequeue_skb() around line 147 - right before the bulk: tag that skb
there is always NULL. ???
Similarly, debug in pfifo_fast_dequeue shows only NULLs being dequeued.
Again - ???
First and foremost, I apologize for the silly question, but how can this
work at all? I see the skbs showing up at the driver level, why are
NULLs being returned at qdisc dequeue and where do the skbs at the
driver level come from?
Second, where should I look to fix it?
A.
^ permalink raw reply [flat|nested] 11+ messages in thread
* DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11")
2017-05-08 7:15 net_sched strange in 4.11 Anton Ivanov
@ 2017-05-09 7:46 ` Anton Ivanov
2017-05-09 8:00 ` [uml-devel] Fwd: " Anton Ivanov
2017-05-09 15:11 ` Stefan Hajnoczi
0 siblings, 2 replies; 11+ messages in thread
From: Anton Ivanov @ 2017-05-09 7:46 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Stefan Hajnoczi
I have figured it out. Two issues.
1) skb->xmit_more is hardly ever set under virtualization because the
qdisc is usually bypassed because of TCQ_F_CAN_BYPASS. Once
TCQ_F_CAN_BYPASS is set a virtual NIC driver is not likely see
skb->xmit_more (this answers my "how does this work at all" question).
2) If that flag is turned off (I patched sched_generic to turn it off in
pfifo_fast while testing), DQL keeps xmit_more from being set. If the
driver is not DQL enabled xmit_more is never ever set. If the driver is
DQL enabled the queue is adjusted to ensure xmit_more stops happening
within 10-15 xmit cycles.
That is plain *wrong* for virtual NICs - virtio, emulated NICs, etc.
There, the BIG cost is telling the hypervisor that it needs to "kick"
the packets. The cost of putting them into the vNIC buffers is
negligible. You want xmit_more to happen - it makes between 50% and 300%
(depending on vNIC design) difference. If there is no xmit_more the vNIC
will immediately "kick" the hypervisor and try to signal that the
packet needs to move straight away (as for example in virtio_net).
In addition to that, the perceived line rate is proportional to this
cost, so I am not sure that the current dql math holds. In fact, I think
it does not - it is trying to adjust something which influences the
perceived line rate.
So - how do we turn BOTH bypass and DQL adjustment while under
virtualization and set them to be "always qdisc" + "always xmit_more
allowed"
A.
P.S. Cc-ing virtio maintainer
A.
On 08/05/17 08:15, Anton Ivanov wrote:
> Hi all,
>
> I was revising some of my old work for UML to prepare it for
> submission and I noticed that skb->xmit_more does not seem to be set
> any more.
>
> I traced the issue as far as net/sched/sched_generic.c
>
> try_bulk_dequeue_skb() is never invoked (the drivers I am working on
> are dql enabled so that is not the problem).
>
> More interestingly, if I put a breakpoint and debug output into
> dequeue_skb() around line 147 - right before the bulk: tag that skb
> there is always NULL. ???
>
> Similarly, debug in pfifo_fast_dequeue shows only NULLs being
> dequeued. Again - ???
>
> First and foremost, I apologize for the silly question, but how can
> this work at all? I see the skbs showing up at the driver level, why
> are NULLs being returned at qdisc dequeue and where do the skbs at the
> driver level come from?
>
> Second, where should I look to fix it?
>
> A.
>
--
Anton R. Ivanov
Cambridge Greys Limited, England company No 10273661
http://www.cambridgegreys.com/
^ permalink raw reply [flat|nested] 11+ messages in thread
* [uml-devel] Fwd: DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11")
2017-05-09 7:46 ` DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11") Anton Ivanov
@ 2017-05-09 8:00 ` Anton Ivanov
2017-05-09 15:11 ` Stefan Hajnoczi
1 sibling, 0 replies; 11+ messages in thread
From: Anton Ivanov @ 2017-05-09 8:00 UTC (permalink / raw)
To: user-mode-linux-devel
[-- Attachment #1.1: Type: text/plain, Size: 3534 bytes --]
Once I get some ideas on how to sort out THIS (forwarded) mess I will
submit the vector drivers and the epoll controller they depend on.
I got the RX to > 1.7Gbit (for the reference, kvm on same machine just
about manages 1.4 using tap). I cannot get TX done because of the
wonderful bufferbloat optimizations in the recent kernels.
As usually, the glorious quest against too many buffers is doing more
harm than good.
I can of course just #ifdef CONFIG_UML the relevant bits in the packet
scheduler, but this is vandalism. We should not be doing it and it
affects kvm as well.
A.
-------- Forwarded Message --------
Subject: DQL and TCQ_F_CAN_BYPASS destroy performance under
virtualizaiton (Was: "Re: net_sched strange in 4.11")
Date: Tue, 9 May 2017 08:46:46 +0100
From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Organization: Cambridge Greys Limited
To: David S. Miller <davem@davemloft.net>
CC: netdev@vger.kernel.org, Stefan Hajnoczi <stefanha@redhat.com>
I have figured it out. Two issues.
1) skb->xmit_more is hardly ever set under virtualization because the
qdisc is usually bypassed because of TCQ_F_CAN_BYPASS. Once
TCQ_F_CAN_BYPASS is set a virtual NIC driver is not likely see
skb->xmit_more (this answers my "how does this work at all" question).
2) If that flag is turned off (I patched sched_generic to turn it off in
pfifo_fast while testing), DQL keeps xmit_more from being set. If the
driver is not DQL enabled xmit_more is never ever set. If the driver is
DQL enabled the queue is adjusted to ensure xmit_more stops happening
within 10-15 xmit cycles.
That is plain *wrong* for virtual NICs - virtio, emulated NICs, etc.
There, the BIG cost is telling the hypervisor that it needs to "kick"
the packets. The cost of putting them into the vNIC buffers is
negligible. You want xmit_more to happen - it makes between 50% and 300%
(depending on vNIC design) difference. If there is no xmit_more the vNIC
will immediately "kick" the hypervisor and try to signal that the
packet needs to move straight away (as for example in virtio_net).
In addition to that, the perceived line rate is proportional to this
cost, so I am not sure that the current dql math holds. In fact, I think
it does not - it is trying to adjust something which influences the
perceived line rate.
So - how do we turn BOTH bypass and DQL adjustment while under
virtualization and set them to be "always qdisc" + "always xmit_more
allowed"
A.
P.S. Cc-ing virtio maintainer
A.
On 08/05/17 08:15, Anton Ivanov wrote:
> Hi all,
>
> I was revising some of my old work for UML to prepare it for
> submission and I noticed that skb->xmit_more does not seem to be set
> any more.
>
> I traced the issue as far as net/sched/sched_generic.c
>
> try_bulk_dequeue_skb() is never invoked (the drivers I am working on
> are dql enabled so that is not the problem).
>
> More interestingly, if I put a breakpoint and debug output into
> dequeue_skb() around line 147 - right before the bulk: tag that skb
> there is always NULL. ???
>
> Similarly, debug in pfifo_fast_dequeue shows only NULLs being
> dequeued. Again - ???
>
> First and foremost, I apologize for the silly question, but how can
> this work at all? I see the skbs showing up at the driver level, why
> are NULLs being returned at qdisc dequeue and where do the skbs at the
> driver level come from?
>
> Second, where should I look to fix it?
>
> A.
>
--
Anton R. Ivanov
Cambridge Greys Limited, England company No 10273661
http://www.cambridgegreys.com/
[-- Attachment #1.2: Type: text/html, Size: 5366 bytes --]
[-- Attachment #2: Type: text/plain, Size: 202 bytes --]
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
[-- Attachment #3: Type: text/plain, Size: 194 bytes --]
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11")
2017-05-09 7:46 ` DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11") Anton Ivanov
2017-05-09 8:00 ` [uml-devel] Fwd: " Anton Ivanov
@ 2017-05-09 15:11 ` Stefan Hajnoczi
2017-05-10 2:18 ` Jason Wang
1 sibling, 1 reply; 11+ messages in thread
From: Stefan Hajnoczi @ 2017-05-09 15:11 UTC (permalink / raw)
To: Anton Ivanov; +Cc: David S. Miller, netdev, Michael S. Tsirkin, jasowang
[-- Attachment #1: Type: text/plain, Size: 2995 bytes --]
On Tue, May 09, 2017 at 08:46:46AM +0100, Anton Ivanov wrote:
> I have figured it out. Two issues.
>
> 1) skb->xmit_more is hardly ever set under virtualization because the qdisc
> is usually bypassed because of TCQ_F_CAN_BYPASS. Once TCQ_F_CAN_BYPASS is
> set a virtual NIC driver is not likely see skb->xmit_more (this answers my
> "how does this work at all" question).
>
> 2) If that flag is turned off (I patched sched_generic to turn it off in
> pfifo_fast while testing), DQL keeps xmit_more from being set. If the driver
> is not DQL enabled xmit_more is never ever set. If the driver is DQL enabled
> the queue is adjusted to ensure xmit_more stops happening within 10-15 xmit
> cycles.
>
> That is plain *wrong* for virtual NICs - virtio, emulated NICs, etc. There,
> the BIG cost is telling the hypervisor that it needs to "kick" the packets.
> The cost of putting them into the vNIC buffers is negligible. You want
> xmit_more to happen - it makes between 50% and 300% (depending on vNIC
> design) difference. If there is no xmit_more the vNIC will immediately
> "kick" the hypervisor and try to signal that the packet needs to move
> straight away (as for example in virtio_net).
>
> In addition to that, the perceived line rate is proportional to this cost,
> so I am not sure that the current dql math holds. In fact, I think it does
> not - it is trying to adjust something which influences the perceived line
> rate.
>
> So - how do we turn BOTH bypass and DQL adjustment while under
> virtualization and set them to be "always qdisc" + "always xmit_more
> allowed"
>
> A.
>
> P.S. Cc-ing virtio maintainer
CCing Michael Tsirkin and Jason Wang, who are the core virtio and
virtio-net maintainers. (I maintain the vsock driver - it's unrelated
to this discussion.)
>
> A.
>
>
> On 08/05/17 08:15, Anton Ivanov wrote:
> > Hi all,
> >
> > I was revising some of my old work for UML to prepare it for submission
> > and I noticed that skb->xmit_more does not seem to be set any more.
> >
> > I traced the issue as far as net/sched/sched_generic.c
> >
> > try_bulk_dequeue_skb() is never invoked (the drivers I am working on are
> > dql enabled so that is not the problem).
> >
> > More interestingly, if I put a breakpoint and debug output into
> > dequeue_skb() around line 147 - right before the bulk: tag that skb
> > there is always NULL. ???
> >
> > Similarly, debug in pfifo_fast_dequeue shows only NULLs being dequeued.
> > Again - ???
> >
> > First and foremost, I apologize for the silly question, but how can this
> > work at all? I see the skbs showing up at the driver level, why are
> > NULLs being returned at qdisc dequeue and where do the skbs at the
> > driver level come from?
> >
> > Second, where should I look to fix it?
> >
> > A.
> >
>
>
> --
> Anton R. Ivanov
>
> Cambridge Greys Limited, England company No 10273661
> http://www.cambridgegreys.com/
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11")
2017-05-09 15:11 ` Stefan Hajnoczi
@ 2017-05-10 2:18 ` Jason Wang
2017-05-10 5:28 ` Anton Ivanov
2017-05-10 5:35 ` Anton Ivanov
0 siblings, 2 replies; 11+ messages in thread
From: Jason Wang @ 2017-05-10 2:18 UTC (permalink / raw)
To: Stefan Hajnoczi, Anton Ivanov; +Cc: David S. Miller, netdev, Michael S. Tsirkin
On 2017年05月09日 23:11, Stefan Hajnoczi wrote:
> On Tue, May 09, 2017 at 08:46:46AM +0100, Anton Ivanov wrote:
>> I have figured it out. Two issues.
>>
>> 1) skb->xmit_more is hardly ever set under virtualization because the qdisc
>> is usually bypassed because of TCQ_F_CAN_BYPASS. Once TCQ_F_CAN_BYPASS is
>> set a virtual NIC driver is not likely see skb->xmit_more (this answers my
>> "how does this work at all" question).
>>
>> 2) If that flag is turned off (I patched sched_generic to turn it off in
>> pfifo_fast while testing), DQL keeps xmit_more from being set. If the driver
>> is not DQL enabled xmit_more is never ever set. If the driver is DQL enabled
>> the queue is adjusted to ensure xmit_more stops happening within 10-15 xmit
>> cycles.
>>
>> That is plain *wrong* for virtual NICs - virtio, emulated NICs, etc. There,
>> the BIG cost is telling the hypervisor that it needs to "kick" the packets.
>> The cost of putting them into the vNIC buffers is negligible. You want
>> xmit_more to happen - it makes between 50% and 300% (depending on vNIC
>> design) difference. If there is no xmit_more the vNIC will immediately
>> "kick" the hypervisor and try to signal that the packet needs to move
>> straight away (as for example in virtio_net).
How do you measure the performance? TCP or just measure pps?
>>
>> In addition to that, the perceived line rate is proportional to this cost,
>> so I am not sure that the current dql math holds. In fact, I think it does
>> not - it is trying to adjust something which influences the perceived line
>> rate.
>>
>> So - how do we turn BOTH bypass and DQL adjustment while under
>> virtualization and set them to be "always qdisc" + "always xmit_more
>> allowed"
Virtio-net net does not support BQL. Before commit ea7735d97ba9
("virtio-net: move free_old_xmit_skbs"), it's even impossible to support
that since we don't have tx interrupt for each packet. I haven't
measured the impact of xmit_more, maybe I was wrong but I think it may
help in some cases since it may improve the batching on host more or less.
Thanks
>>
>> A.
>>
>> P.S. Cc-ing virtio maintainer
> CCing Michael Tsirkin and Jason Wang, who are the core virtio and
> virtio-net maintainers. (I maintain the vsock driver - it's unrelated
> to this discussion.)
>
>> A.
>>
>>
>> On 08/05/17 08:15, Anton Ivanov wrote:
>>> Hi all,
>>>
>>> I was revising some of my old work for UML to prepare it for submission
>>> and I noticed that skb->xmit_more does not seem to be set any more.
>>>
>>> I traced the issue as far as net/sched/sched_generic.c
>>>
>>> try_bulk_dequeue_skb() is never invoked (the drivers I am working on are
>>> dql enabled so that is not the problem).
>>>
>>> More interestingly, if I put a breakpoint and debug output into
>>> dequeue_skb() around line 147 - right before the bulk: tag that skb
>>> there is always NULL. ???
>>>
>>> Similarly, debug in pfifo_fast_dequeue shows only NULLs being dequeued.
>>> Again - ???
>>>
>>> First and foremost, I apologize for the silly question, but how can this
>>> work at all? I see the skbs showing up at the driver level, why are
>>> NULLs being returned at qdisc dequeue and where do the skbs at the
>>> driver level come from?
>>>
>>> Second, where should I look to fix it?
>>>
>>> A.
>>>
>>
>> --
>> Anton R. Ivanov
>>
>> Cambridge Greys Limited, England company No 10273661
>> http://www.cambridgegreys.com/
>>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11")
2017-05-10 2:18 ` Jason Wang
@ 2017-05-10 5:28 ` Anton Ivanov
2017-05-10 8:56 ` Jason Wang
2017-05-10 5:35 ` Anton Ivanov
1 sibling, 1 reply; 11+ messages in thread
From: Anton Ivanov @ 2017-05-10 5:28 UTC (permalink / raw)
To: Jason Wang, Stefan Hajnoczi; +Cc: David S. Miller, netdev, Michael S. Tsirkin
On 10/05/17 03:18, Jason Wang wrote:
>
>
> On 2017年05月09日 23:11, Stefan Hajnoczi wrote:
>> On Tue, May 09, 2017 at 08:46:46AM +0100, Anton Ivanov wrote:
>>> I have figured it out. Two issues.
>>>
>>> 1) skb->xmit_more is hardly ever set under virtualization because
>>> the qdisc
>>> is usually bypassed because of TCQ_F_CAN_BYPASS. Once
>>> TCQ_F_CAN_BYPASS is
>>> set a virtual NIC driver is not likely see skb->xmit_more (this
>>> answers my
>>> "how does this work at all" question).
>>>
>>> 2) If that flag is turned off (I patched sched_generic to turn it
>>> off in
>>> pfifo_fast while testing), DQL keeps xmit_more from being set. If
>>> the driver
>>> is not DQL enabled xmit_more is never ever set. If the driver is DQL
>>> enabled
>>> the queue is adjusted to ensure xmit_more stops happening within
>>> 10-15 xmit
>>> cycles.
>>>
>>> That is plain *wrong* for virtual NICs - virtio, emulated NICs, etc.
>>> There,
>>> the BIG cost is telling the hypervisor that it needs to "kick" the
>>> packets.
>>> The cost of putting them into the vNIC buffers is negligible. You want
>>> xmit_more to happen - it makes between 50% and 300% (depending on vNIC
>>> design) difference. If there is no xmit_more the vNIC will immediately
>>> "kick" the hypervisor and try to signal that the packet needs to move
>>> straight away (as for example in virtio_net).
>
> How do you measure the performance? TCP or just measure pps?
In this particular case - tcp from guest. I have a couple of other
benchmarks (forwarding, etc).
>
>>>
>>> In addition to that, the perceived line rate is proportional to this
>>> cost,
>>> so I am not sure that the current dql math holds. In fact, I think
>>> it does
>>> not - it is trying to adjust something which influences the
>>> perceived line
>>> rate.
>>>
>>> So - how do we turn BOTH bypass and DQL adjustment while under
>>> virtualization and set them to be "always qdisc" + "always xmit_more
>>> allowed"
>
> Virtio-net net does not support BQL. Before commit ea7735d97ba9
> ("virtio-net: move free_old_xmit_skbs"), it's even impossible to
> support that since we don't have tx interrupt for each packet. I
> haven't measured the impact of xmit_more, maybe I was wrong but I
> think it may help in some cases since it may improve the batching on
> host more or less.
If you do not support BQL, you might as well look the xmit_more part
kick code path. Line 1127.
bool kick = !skb->xmit_more; effectively means kick = true;
It will never be triggered. You will be kicking each packet and per
packet. xmit_more is now set only out of BQL. If BQL is not enabled you
never get it. Now, will the current dql code work correctly if you do
not have a defined line rate and completion interrupts - no idea.
Probably not. IMHO instead of trying to fix it there should be a way for
a device or architecture to turn it off.
To be clear - I ran into this working on my own drivers for UML, you are
cc-ed because you are likely to be one of the most affected.
A.
>
> Thanks
>
>>>
>>> A.
>>>
>>> P.S. Cc-ing virtio maintainer
>> CCing Michael Tsirkin and Jason Wang, who are the core virtio and
>> virtio-net maintainers. (I maintain the vsock driver - it's unrelated
>> to this discussion.)
>>
>>> A.
>>>
>>>
>>> On 08/05/17 08:15, Anton Ivanov wrote:
>>>> Hi all,
>>>>
>>>> I was revising some of my old work for UML to prepare it for
>>>> submission
>>>> and I noticed that skb->xmit_more does not seem to be set any more.
>>>>
>>>> I traced the issue as far as net/sched/sched_generic.c
>>>>
>>>> try_bulk_dequeue_skb() is never invoked (the drivers I am working
>>>> on are
>>>> dql enabled so that is not the problem).
>>>>
>>>> More interestingly, if I put a breakpoint and debug output into
>>>> dequeue_skb() around line 147 - right before the bulk: tag that skb
>>>> there is always NULL. ???
>>>>
>>>> Similarly, debug in pfifo_fast_dequeue shows only NULLs being
>>>> dequeued.
>>>> Again - ???
>>>>
>>>> First and foremost, I apologize for the silly question, but how can
>>>> this
>>>> work at all? I see the skbs showing up at the driver level, why are
>>>> NULLs being returned at qdisc dequeue and where do the skbs at the
>>>> driver level come from?
>>>>
>>>> Second, where should I look to fix it?
>>>>
>>>> A.
>>>>
>>>
>>> --
>>> Anton R. Ivanov
>>>
>>> Cambridge Greys Limited, England company No 10273661
>>> http://www.cambridgegreys.com/
>>>
>
>
--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11")
2017-05-10 2:18 ` Jason Wang
2017-05-10 5:28 ` Anton Ivanov
@ 2017-05-10 5:35 ` Anton Ivanov
1 sibling, 0 replies; 11+ messages in thread
From: Anton Ivanov @ 2017-05-10 5:35 UTC (permalink / raw)
To: Jason Wang, Stefan Hajnoczi; +Cc: David S. Miller, netdev, Michael S. Tsirkin
[snip]
> Virtio-net net does not support BQL. Before commit ea7735d97ba9
> ("virtio-net: move free_old_xmit_skbs"), it's even impossible to
> support that since we don't have tx interrupt for each packet. I
> haven't measured the impact of xmit_more, maybe I was wrong but I
> think it may help in some cases since it may improve the batching on
> host more or less.
Sorry, hit send too soon.
Impact of xmit more depends on your transport.
If, for example, you are using sendmmsg on the outer side which can
consume the bulked data "as is", the impact is quite significant. If
your transport does not support bulking, the fact there was bulking
earlier in the chain has little impact.
There is some, but not a lot.
[snip]
--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11")
2017-05-10 5:28 ` Anton Ivanov
@ 2017-05-10 8:56 ` Jason Wang
2017-05-10 9:42 ` Anton Ivanov
0 siblings, 1 reply; 11+ messages in thread
From: Jason Wang @ 2017-05-10 8:56 UTC (permalink / raw)
To: Anton Ivanov, Stefan Hajnoczi; +Cc: David S. Miller, netdev, Michael S. Tsirkin
On 2017年05月10日 13:28, Anton Ivanov wrote:
> On 10/05/17 03:18, Jason Wang wrote:
>>
>> On 2017年05月09日 23:11, Stefan Hajnoczi wrote:
>>> On Tue, May 09, 2017 at 08:46:46AM +0100, Anton Ivanov wrote:
>>>> I have figured it out. Two issues.
>>>>
>>>> 1) skb->xmit_more is hardly ever set under virtualization because
>>>> the qdisc
>>>> is usually bypassed because of TCQ_F_CAN_BYPASS. Once
>>>> TCQ_F_CAN_BYPASS is
>>>> set a virtual NIC driver is not likely see skb->xmit_more (this
>>>> answers my
>>>> "how does this work at all" question).
>>>>
>>>> 2) If that flag is turned off (I patched sched_generic to turn it
>>>> off in
>>>> pfifo_fast while testing), DQL keeps xmit_more from being set. If
>>>> the driver
>>>> is not DQL enabled xmit_more is never ever set. If the driver is DQL
>>>> enabled
>>>> the queue is adjusted to ensure xmit_more stops happening within
>>>> 10-15 xmit
>>>> cycles.
>>>>
>>>> That is plain *wrong* for virtual NICs - virtio, emulated NICs, etc.
>>>> There,
>>>> the BIG cost is telling the hypervisor that it needs to "kick" the
>>>> packets.
>>>> The cost of putting them into the vNIC buffers is negligible. You want
>>>> xmit_more to happen - it makes between 50% and 300% (depending on vNIC
>>>> design) difference. If there is no xmit_more the vNIC will immediately
>>>> "kick" the hypervisor and try to signal that the packet needs to move
>>>> straight away (as for example in virtio_net).
>> How do you measure the performance? TCP or just measure pps?
> In this particular case - tcp from guest. I have a couple of other
> benchmarks (forwarding, etc).
One more question, is the number for virtio-net or other emulated vNIC?
>
>>>> In addition to that, the perceived line rate is proportional to this
>>>> cost,
>>>> so I am not sure that the current dql math holds. In fact, I think
>>>> it does
>>>> not - it is trying to adjust something which influences the
>>>> perceived line
>>>> rate.
>>>>
>>>> So - how do we turn BOTH bypass and DQL adjustment while under
>>>> virtualization and set them to be "always qdisc" + "always xmit_more
>>>> allowed"
>> Virtio-net net does not support BQL. Before commit ea7735d97ba9
>> ("virtio-net: move free_old_xmit_skbs"), it's even impossible to
>> support that since we don't have tx interrupt for each packet. I
>> haven't measured the impact of xmit_more, maybe I was wrong but I
>> think it may help in some cases since it may improve the batching on
>> host more or less.
> If you do not support BQL, you might as well look the xmit_more part
> kick code path. Line 1127.
>
> bool kick = !skb->xmit_more; effectively means kick = true;
>
> It will never be triggered. You will be kicking each packet and per
> packet.
Probably not, we have several ways to try to suppress this on the virtio
layer, host can give hints to disable the kicks through:
- explicitly set a flag
- implicitly by not publishing a new event idx
FYI, I can get 100-200 packets per vm exit when testing 64 byte
TCP_STREAM using netperf.
> xmit_more is now set only out of BQL. If BQL is not enabled you
> never get it. Now, will the current dql code work correctly if you do
> not have a defined line rate and completion interrupts - no idea.
> Probably not. IMHO instead of trying to fix it there should be a way for
> a device or architecture to turn it off.
In fact BQL is not the only user for xmit_more. Pktgen with burst is
another. Test does not show obvious difference if I set burst from 0 to
64 since we already had other ways to avoid kicking host.
>
> To be clear - I ran into this working on my own drivers for UML, you are
> cc-ed because you are likely to be one of the most affected.
I'm still not quite sure the issue. Looks like virtio-net is ok since
BQL is not supported and the impact of xmit_more could be ignored.
Thanks
>
> A.
>
>> Thanks
>>
>>>> A.
>>>>
>>>> P.S. Cc-ing virtio maintainer
>>> CCing Michael Tsirkin and Jason Wang, who are the core virtio and
>>> virtio-net maintainers. (I maintain the vsock driver - it's unrelated
>>> to this discussion.)
>>>
>>>> A.
>>>>
>>>>
>>>> On 08/05/17 08:15, Anton Ivanov wrote:
>>>>> Hi all,
>>>>>
>>>>> I was revising some of my old work for UML to prepare it for
>>>>> submission
>>>>> and I noticed that skb->xmit_more does not seem to be set any more.
>>>>>
>>>>> I traced the issue as far as net/sched/sched_generic.c
>>>>>
>>>>> try_bulk_dequeue_skb() is never invoked (the drivers I am working
>>>>> on are
>>>>> dql enabled so that is not the problem).
>>>>>
>>>>> More interestingly, if I put a breakpoint and debug output into
>>>>> dequeue_skb() around line 147 - right before the bulk: tag that skb
>>>>> there is always NULL. ???
>>>>>
>>>>> Similarly, debug in pfifo_fast_dequeue shows only NULLs being
>>>>> dequeued.
>>>>> Again - ???
>>>>>
>>>>> First and foremost, I apologize for the silly question, but how can
>>>>> this
>>>>> work at all? I see the skbs showing up at the driver level, why are
>>>>> NULLs being returned at qdisc dequeue and where do the skbs at the
>>>>> driver level come from?
>>>>>
>>>>> Second, where should I look to fix it?
>>>>>
>>>>> A.
>>>>>
>>>> --
>>>> Anton R. Ivanov
>>>>
>>>> Cambridge Greys Limited, England company No 10273661
>>>> http://www.cambridgegreys.com/
>>>>
>>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11")
2017-05-10 8:56 ` Jason Wang
@ 2017-05-10 9:42 ` Anton Ivanov
2017-05-11 2:43 ` Jason Wang
0 siblings, 1 reply; 11+ messages in thread
From: Anton Ivanov @ 2017-05-10 9:42 UTC (permalink / raw)
To: Jason Wang, Stefan Hajnoczi; +Cc: David S. Miller, netdev, Michael S. Tsirkin
On 10/05/17 09:56, Jason Wang wrote:
>
>
> On 2017年05月10日 13:28, Anton Ivanov wrote:
>> On 10/05/17 03:18, Jason Wang wrote:
>>>
>>> On 2017年05月09日 23:11, Stefan Hajnoczi wrote:
>>>> On Tue, May 09, 2017 at 08:46:46AM +0100, Anton Ivanov wrote:
>>>>> I have figured it out. Two issues.
>>>>>
>>>>> 1) skb->xmit_more is hardly ever set under virtualization because
>>>>> the qdisc
>>>>> is usually bypassed because of TCQ_F_CAN_BYPASS. Once
>>>>> TCQ_F_CAN_BYPASS is
>>>>> set a virtual NIC driver is not likely see skb->xmit_more (this
>>>>> answers my
>>>>> "how does this work at all" question).
>>>>>
>>>>> 2) If that flag is turned off (I patched sched_generic to turn it
>>>>> off in
>>>>> pfifo_fast while testing), DQL keeps xmit_more from being set. If
>>>>> the driver
>>>>> is not DQL enabled xmit_more is never ever set. If the driver is DQL
>>>>> enabled
>>>>> the queue is adjusted to ensure xmit_more stops happening within
>>>>> 10-15 xmit
>>>>> cycles.
>>>>>
>>>>> That is plain *wrong* for virtual NICs - virtio, emulated NICs, etc.
>>>>> There,
>>>>> the BIG cost is telling the hypervisor that it needs to "kick" the
>>>>> packets.
>>>>> The cost of putting them into the vNIC buffers is negligible. You
>>>>> want
>>>>> xmit_more to happen - it makes between 50% and 300% (depending on
>>>>> vNIC
>>>>> design) difference. If there is no xmit_more the vNIC will
>>>>> immediately
>>>>> "kick" the hypervisor and try to signal that the packet needs to
>>>>> move
>>>>> straight away (as for example in virtio_net).
>>> How do you measure the performance? TCP or just measure pps?
>> In this particular case - tcp from guest. I have a couple of other
>> benchmarks (forwarding, etc).
>
> One more question, is the number for virtio-net or other emulated vNIC?
Other for now - you are cc-ed to keep you in the loop.
Virtio is next on my list - I am revisiting the l2tpv3.c driver in QEMU
and looking at how to preserve bulking by adding back sendmmsg (as well
as a list of other features/transports).
We had sendmmsg removed for the final inclusion in QEMU 2.1, it
presently uses only recvmmsg so for the time being it does not care.
That will most likely change once it starts using sendmmsg as well.
>
>>
>>>>> In addition to that, the perceived line rate is proportional to this
>>>>> cost,
>>>>> so I am not sure that the current dql math holds. In fact, I think
>>>>> it does
>>>>> not - it is trying to adjust something which influences the
>>>>> perceived line
>>>>> rate.
>>>>>
>>>>> So - how do we turn BOTH bypass and DQL adjustment while under
>>>>> virtualization and set them to be "always qdisc" + "always xmit_more
>>>>> allowed"
>>> Virtio-net net does not support BQL. Before commit ea7735d97ba9
>>> ("virtio-net: move free_old_xmit_skbs"), it's even impossible to
>>> support that since we don't have tx interrupt for each packet. I
>>> haven't measured the impact of xmit_more, maybe I was wrong but I
>>> think it may help in some cases since it may improve the batching on
>>> host more or less.
>> If you do not support BQL, you might as well look the xmit_more part
>> kick code path. Line 1127.
>>
>> bool kick = !skb->xmit_more; effectively means kick = true;
>>
>> It will never be triggered. You will be kicking each packet and per
>> packet.
>
> Probably not, we have several ways to try to suppress this on the
> virtio layer, host can give hints to disable the kicks through:
>
> - explicitly set a flag
> - implicitly by not publishing a new event idx
>
> FYI, I can get 100-200 packets per vm exit when testing 64 byte
> TCP_STREAM using netperf.
I am aware of that. If, however, the host is providing a hint we might
as well use it.
>
>> xmit_more is now set only out of BQL. If BQL is not enabled you
>> never get it. Now, will the current dql code work correctly if you do
>> not have a defined line rate and completion interrupts - no idea.
>> Probably not. IMHO instead of trying to fix it there should be a way for
>> a device or architecture to turn it off.
>
> In fact BQL is not the only user for xmit_more. Pktgen with burst is
> another. Test does not show obvious difference if I set burst from 0
> to 64 since we already had other ways to avoid kicking host.
That, as well as this not being wired to bulk transport.
>
>>
>> To be clear - I ran into this working on my own drivers for UML, you are
>> cc-ed because you are likely to be one of the most affected.
>
> I'm still not quite sure the issue. Looks like virtio-net is ok since
> BQL is not supported and the impact of xmit_more could be ignored.
Presently - yes. If you have bulk aware transports to wire into that is
likely to make a difference.
>
> Thanks
>
>>
>> A.
>>
>>> Thanks
>>>
>>>>> A.
>>>>>
>>>>> P.S. Cc-ing virtio maintainer
>>>> CCing Michael Tsirkin and Jason Wang, who are the core virtio and
>>>> virtio-net maintainers. (I maintain the vsock driver - it's unrelated
>>>> to this discussion.)
>>>>
>>>>> A.
>>>>>
>>>>>
>>>>> On 08/05/17 08:15, Anton Ivanov wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I was revising some of my old work for UML to prepare it for
>>>>>> submission
>>>>>> and I noticed that skb->xmit_more does not seem to be set any more.
>>>>>>
>>>>>> I traced the issue as far as net/sched/sched_generic.c
>>>>>>
>>>>>> try_bulk_dequeue_skb() is never invoked (the drivers I am working
>>>>>> on are
>>>>>> dql enabled so that is not the problem).
>>>>>>
>>>>>> More interestingly, if I put a breakpoint and debug output into
>>>>>> dequeue_skb() around line 147 - right before the bulk: tag that skb
>>>>>> there is always NULL. ???
>>>>>>
>>>>>> Similarly, debug in pfifo_fast_dequeue shows only NULLs being
>>>>>> dequeued.
>>>>>> Again - ???
>>>>>>
>>>>>> First and foremost, I apologize for the silly question, but how can
>>>>>> this
>>>>>> work at all? I see the skbs showing up at the driver level, why are
>>>>>> NULLs being returned at qdisc dequeue and where do the skbs at the
>>>>>> driver level come from?
>>>>>>
>>>>>> Second, where should I look to fix it?
>>>>>>
>>>>>> A.
>>>>>>
>>>>> --
>>>>> Anton R. Ivanov
>>>>>
>>>>> Cambridge Greys Limited, England company No 10273661
>>>>> http://www.cambridgegreys.com/
>>>>>
>>>
>>
>
>
--
Anton R. Ivanov
Cambridge Greys Limited, England company No 10273661
http://www.cambridgegreys.com/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11")
2017-05-10 9:42 ` Anton Ivanov
@ 2017-05-11 2:43 ` Jason Wang
2017-05-11 5:43 ` Anton Ivanov
0 siblings, 1 reply; 11+ messages in thread
From: Jason Wang @ 2017-05-11 2:43 UTC (permalink / raw)
To: Anton Ivanov, Stefan Hajnoczi; +Cc: David S. Miller, netdev, Michael S. Tsirkin
On 2017年05月10日 17:42, Anton Ivanov wrote:
> On 10/05/17 09:56, Jason Wang wrote:
>>
>>
>> On 2017年05月10日 13:28, Anton Ivanov wrote:
>>> On 10/05/17 03:18, Jason Wang wrote:
>>>>
>>>> On 2017年05月09日 23:11, Stefan Hajnoczi wrote:
>>>>> On Tue, May 09, 2017 at 08:46:46AM +0100, Anton Ivanov wrote:
>>>>>> I have figured it out. Two issues.
>>>>>>
>>>>>> 1) skb->xmit_more is hardly ever set under virtualization because
>>>>>> the qdisc
>>>>>> is usually bypassed because of TCQ_F_CAN_BYPASS. Once
>>>>>> TCQ_F_CAN_BYPASS is
>>>>>> set a virtual NIC driver is not likely see skb->xmit_more (this
>>>>>> answers my
>>>>>> "how does this work at all" question).
>>>>>>
>>>>>> 2) If that flag is turned off (I patched sched_generic to turn it
>>>>>> off in
>>>>>> pfifo_fast while testing), DQL keeps xmit_more from being set. If
>>>>>> the driver
>>>>>> is not DQL enabled xmit_more is never ever set. If the driver is DQL
>>>>>> enabled
>>>>>> the queue is adjusted to ensure xmit_more stops happening within
>>>>>> 10-15 xmit
>>>>>> cycles.
>>>>>>
>>>>>> That is plain *wrong* for virtual NICs - virtio, emulated NICs, etc.
>>>>>> There,
>>>>>> the BIG cost is telling the hypervisor that it needs to "kick" the
>>>>>> packets.
>>>>>> The cost of putting them into the vNIC buffers is negligible. You
>>>>>> want
>>>>>> xmit_more to happen - it makes between 50% and 300% (depending on
>>>>>> vNIC
>>>>>> design) difference. If there is no xmit_more the vNIC will
>>>>>> immediately
>>>>>> "kick" the hypervisor and try to signal that the packet needs to
>>>>>> move
>>>>>> straight away (as for example in virtio_net).
>>>> How do you measure the performance? TCP or just measure pps?
>>> In this particular case - tcp from guest. I have a couple of other
>>> benchmarks (forwarding, etc).
>>
>> One more question, is the number for virtio-net or other emulated vNIC?
>
> Other for now - you are cc-ed to keep you in the loop.
>
> Virtio is next on my list - I am revisiting the l2tpv3.c driver in
> QEMU and looking at how to preserve bulking by adding back sendmmsg
> (as well as a list of other features/transports).
>
> We had sendmmsg removed for the final inclusion in QEMU 2.1, it
> presently uses only recvmmsg so for the time being it does not care.
> That will most likely change once it starts using sendmmsg as well.
An issue is that qemu net API does not support bulking, do you plan to
add it?
Thanks
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11")
2017-05-11 2:43 ` Jason Wang
@ 2017-05-11 5:43 ` Anton Ivanov
0 siblings, 0 replies; 11+ messages in thread
From: Anton Ivanov @ 2017-05-11 5:43 UTC (permalink / raw)
To: Jason Wang, Stefan Hajnoczi; +Cc: David S. Miller, netdev, Michael S. Tsirkin
On 11/05/17 03:43, Jason Wang wrote:
>
>
> On 2017年05月10日 17:42, Anton Ivanov wrote:
>> On 10/05/17 09:56, Jason Wang wrote:
>>>
>>>
>>> On 2017年05月10日 13:28, Anton Ivanov wrote:
>>>> On 10/05/17 03:18, Jason Wang wrote:
>>>>>
>>>>> On 2017年05月09日 23:11, Stefan Hajnoczi wrote:
>>>>>> On Tue, May 09, 2017 at 08:46:46AM +0100, Anton Ivanov wrote:
>>>>>>> I have figured it out. Two issues.
>>>>>>>
>>>>>>> 1) skb->xmit_more is hardly ever set under virtualization because
>>>>>>> the qdisc
>>>>>>> is usually bypassed because of TCQ_F_CAN_BYPASS. Once
>>>>>>> TCQ_F_CAN_BYPASS is
>>>>>>> set a virtual NIC driver is not likely see skb->xmit_more (this
>>>>>>> answers my
>>>>>>> "how does this work at all" question).
>>>>>>>
>>>>>>> 2) If that flag is turned off (I patched sched_generic to turn it
>>>>>>> off in
>>>>>>> pfifo_fast while testing), DQL keeps xmit_more from being set. If
>>>>>>> the driver
>>>>>>> is not DQL enabled xmit_more is never ever set. If the driver is
>>>>>>> DQL
>>>>>>> enabled
>>>>>>> the queue is adjusted to ensure xmit_more stops happening within
>>>>>>> 10-15 xmit
>>>>>>> cycles.
>>>>>>>
>>>>>>> That is plain *wrong* for virtual NICs - virtio, emulated NICs,
>>>>>>> etc.
>>>>>>> There,
>>>>>>> the BIG cost is telling the hypervisor that it needs to "kick" the
>>>>>>> packets.
>>>>>>> The cost of putting them into the vNIC buffers is negligible.
>>>>>>> You want
>>>>>>> xmit_more to happen - it makes between 50% and 300% (depending
>>>>>>> on vNIC
>>>>>>> design) difference. If there is no xmit_more the vNIC will
>>>>>>> immediately
>>>>>>> "kick" the hypervisor and try to signal that the packet needs
>>>>>>> to move
>>>>>>> straight away (as for example in virtio_net).
>>>>> How do you measure the performance? TCP or just measure pps?
>>>> In this particular case - tcp from guest. I have a couple of other
>>>> benchmarks (forwarding, etc).
>>>
>>> One more question, is the number for virtio-net or other emulated vNIC?
>>
>> Other for now - you are cc-ed to keep you in the loop.
>>
>> Virtio is next on my list - I am revisiting the l2tpv3.c driver in
>> QEMU and looking at how to preserve bulking by adding back sendmmsg
>> (as well as a list of other features/transports).
>>
>> We had sendmmsg removed for the final inclusion in QEMU 2.1, it
>> presently uses only recvmmsg so for the time being it does not care.
>> That will most likely change once it starts using sendmmsg as well.
>
> An issue is that qemu net API does not support bulking, do you plan to
> add it?
Yes :)
A.
>
> Thanks
>
--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2017-05-11 5:43 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-08 7:15 net_sched strange in 4.11 Anton Ivanov
2017-05-09 7:46 ` DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11") Anton Ivanov
2017-05-09 8:00 ` [uml-devel] Fwd: " Anton Ivanov
2017-05-09 15:11 ` Stefan Hajnoczi
2017-05-10 2:18 ` Jason Wang
2017-05-10 5:28 ` Anton Ivanov
2017-05-10 8:56 ` Jason Wang
2017-05-10 9:42 ` Anton Ivanov
2017-05-11 2:43 ` Jason Wang
2017-05-11 5:43 ` Anton Ivanov
2017-05-10 5:35 ` Anton Ivanov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.