All of lore.kernel.org
 help / color / mirror / Atom feed
* net_sched strange in 4.11
@ 2017-05-08  7:15 Anton Ivanov
  2017-05-09  7:46 ` DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11") Anton Ivanov
  0 siblings, 1 reply; 11+ messages in thread
From: Anton Ivanov @ 2017-05-08  7:15 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

Hi all,

I was revising some of my old work for UML to prepare it for submission 
and I noticed that skb->xmit_more does not seem to be set any more.

I traced the issue as far as net/sched/sched_generic.c

try_bulk_dequeue_skb() is never invoked (the drivers I am working on are 
dql enabled so that is not the problem).

More interestingly, if I put a breakpoint and debug output into 
dequeue_skb() around line 147 - right before the bulk: tag that skb 
there is always NULL. ???

Similarly, debug in pfifo_fast_dequeue shows only NULLs being dequeued. 
Again - ???

First and foremost, I apologize for the silly question, but how can this 
work at all? I see the skbs showing up at the driver level, why are 
NULLs being returned at qdisc dequeue and where do the skbs at the 
driver level come from?

Second, where should I look to fix it?

A.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11")
  2017-05-08  7:15 net_sched strange in 4.11 Anton Ivanov
@ 2017-05-09  7:46 ` Anton Ivanov
  2017-05-09  8:00   ` [uml-devel] Fwd: " Anton Ivanov
  2017-05-09 15:11   ` Stefan Hajnoczi
  0 siblings, 2 replies; 11+ messages in thread
From: Anton Ivanov @ 2017-05-09  7:46 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Stefan Hajnoczi

I have figured it out. Two issues.

1) skb->xmit_more is hardly ever set under virtualization because the 
qdisc is usually bypassed because of TCQ_F_CAN_BYPASS. Once 
TCQ_F_CAN_BYPASS is set a virtual NIC driver is not likely see 
skb->xmit_more (this answers my "how does this work at all" question).

2) If that flag is turned off (I patched sched_generic to turn it off in 
pfifo_fast while testing), DQL keeps xmit_more from being set. If the 
driver is not DQL enabled xmit_more is never ever set. If the driver is 
DQL enabled the queue is adjusted to ensure xmit_more stops happening 
within 10-15 xmit cycles.

That is plain *wrong* for virtual NICs - virtio, emulated NICs, etc. 
There, the BIG cost is telling the hypervisor that it needs to "kick" 
the packets. The cost of putting them into the vNIC buffers is 
negligible. You want xmit_more to happen - it makes between 50% and 300% 
(depending on vNIC design) difference. If there is no xmit_more the vNIC 
will immediately "kick" the hypervisor and try to signal that  the 
packet needs to move straight away (as for example in virtio_net).

In addition to that, the perceived line rate is proportional to this 
cost, so I am not sure that the current dql math holds. In fact, I think 
it does not - it is trying to adjust something which influences the 
perceived line rate.

So - how do we turn BOTH bypass and DQL adjustment while under 
virtualization and set them to be "always qdisc" + "always xmit_more 
allowed"

A.

P.S. Cc-ing virtio maintainer

A.


On 08/05/17 08:15, Anton Ivanov wrote:
> Hi all,
>
> I was revising some of my old work for UML to prepare it for 
> submission and I noticed that skb->xmit_more does not seem to be set 
> any more.
>
> I traced the issue as far as net/sched/sched_generic.c
>
> try_bulk_dequeue_skb() is never invoked (the drivers I am working on 
> are dql enabled so that is not the problem).
>
> More interestingly, if I put a breakpoint and debug output into 
> dequeue_skb() around line 147 - right before the bulk: tag that skb 
> there is always NULL. ???
>
> Similarly, debug in pfifo_fast_dequeue shows only NULLs being 
> dequeued. Again - ???
>
> First and foremost, I apologize for the silly question, but how can 
> this work at all? I see the skbs showing up at the driver level, why 
> are NULLs being returned at qdisc dequeue and where do the skbs at the 
> driver level come from?
>
> Second, where should I look to fix it?
>
> A.
>


-- 
Anton R. Ivanov

Cambridge Greys Limited, England company No 10273661
http://www.cambridgegreys.com/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [uml-devel] Fwd: DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11")
  2017-05-09  7:46 ` DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11") Anton Ivanov
@ 2017-05-09  8:00   ` Anton Ivanov
  2017-05-09 15:11   ` Stefan Hajnoczi
  1 sibling, 0 replies; 11+ messages in thread
From: Anton Ivanov @ 2017-05-09  8:00 UTC (permalink / raw)
  To: user-mode-linux-devel


[-- Attachment #1.1: Type: text/plain, Size: 3534 bytes --]

Once I get some ideas on how to sort out THIS (forwarded) mess I will 
submit the vector drivers and the epoll controller they depend on.

I got the RX to > 1.7Gbit (for the reference, kvm on same machine just 
about manages 1.4 using tap). I cannot get TX done because of the 
wonderful bufferbloat optimizations in the recent kernels.

As usually, the glorious quest against too many buffers is doing more 
harm than good.

I can of course just #ifdef CONFIG_UML the relevant bits in the packet 
scheduler, but this is vandalism. We should not be doing it and it 
affects kvm as well.

A.



-------- Forwarded Message --------
Subject: 	DQL and TCQ_F_CAN_BYPASS destroy performance under 
virtualizaiton (Was: "Re: net_sched strange in 4.11")
Date: 	Tue, 9 May 2017 08:46:46 +0100
From: 	Anton Ivanov <anton.ivanov@cambridgegreys.com>
Organization: 	Cambridge Greys Limited
To: 	David S. Miller <davem@davemloft.net>
CC: 	netdev@vger.kernel.org, Stefan Hajnoczi <stefanha@redhat.com>



I have figured it out. Two issues.

1) skb->xmit_more is hardly ever set under virtualization because the
qdisc is usually bypassed because of TCQ_F_CAN_BYPASS. Once
TCQ_F_CAN_BYPASS is set a virtual NIC driver is not likely see
skb->xmit_more (this answers my "how does this work at all" question).

2) If that flag is turned off (I patched sched_generic to turn it off in
pfifo_fast while testing), DQL keeps xmit_more from being set. If the
driver is not DQL enabled xmit_more is never ever set. If the driver is
DQL enabled the queue is adjusted to ensure xmit_more stops happening
within 10-15 xmit cycles.

That is plain *wrong* for virtual NICs - virtio, emulated NICs, etc.
There, the BIG cost is telling the hypervisor that it needs to "kick"
the packets. The cost of putting them into the vNIC buffers is
negligible. You want xmit_more to happen - it makes between 50% and 300%
(depending on vNIC design) difference. If there is no xmit_more the vNIC
will immediately "kick" the hypervisor and try to signal that  the
packet needs to move straight away (as for example in virtio_net).

In addition to that, the perceived line rate is proportional to this
cost, so I am not sure that the current dql math holds. In fact, I think
it does not - it is trying to adjust something which influences the
perceived line rate.

So - how do we turn BOTH bypass and DQL adjustment while under
virtualization and set them to be "always qdisc" + "always xmit_more
allowed"

A.

P.S. Cc-ing virtio maintainer

A.


On 08/05/17 08:15, Anton Ivanov wrote:
> Hi all,
>
> I was revising some of my old work for UML to prepare it for
> submission and I noticed that skb->xmit_more does not seem to be set
> any more.
>
> I traced the issue as far as net/sched/sched_generic.c
>
> try_bulk_dequeue_skb() is never invoked (the drivers I am working on
> are dql enabled so that is not the problem).
>
> More interestingly, if I put a breakpoint and debug output into
> dequeue_skb() around line 147 - right before the bulk: tag that skb
> there is always NULL. ???
>
> Similarly, debug in pfifo_fast_dequeue shows only NULLs being
> dequeued. Again - ???
>
> First and foremost, I apologize for the silly question, but how can
> this work at all? I see the skbs showing up at the driver level, why
> are NULLs being returned at qdisc dequeue and where do the skbs at the
> driver level come from?
>
> Second, where should I look to fix it?
>
> A.
>


-- 
Anton R. Ivanov

Cambridge Greys Limited, England company No 10273661
http://www.cambridgegreys.com/


[-- Attachment #1.2: Type: text/html, Size: 5366 bytes --]

[-- Attachment #2: Type: text/plain, Size: 202 bytes --]

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

[-- Attachment #3: Type: text/plain, Size: 194 bytes --]

_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11")
  2017-05-09  7:46 ` DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11") Anton Ivanov
  2017-05-09  8:00   ` [uml-devel] Fwd: " Anton Ivanov
@ 2017-05-09 15:11   ` Stefan Hajnoczi
  2017-05-10  2:18     ` Jason Wang
  1 sibling, 1 reply; 11+ messages in thread
From: Stefan Hajnoczi @ 2017-05-09 15:11 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: David S. Miller, netdev, Michael S. Tsirkin, jasowang

[-- Attachment #1: Type: text/plain, Size: 2995 bytes --]

On Tue, May 09, 2017 at 08:46:46AM +0100, Anton Ivanov wrote:
> I have figured it out. Two issues.
> 
> 1) skb->xmit_more is hardly ever set under virtualization because the qdisc
> is usually bypassed because of TCQ_F_CAN_BYPASS. Once TCQ_F_CAN_BYPASS is
> set a virtual NIC driver is not likely see skb->xmit_more (this answers my
> "how does this work at all" question).
> 
> 2) If that flag is turned off (I patched sched_generic to turn it off in
> pfifo_fast while testing), DQL keeps xmit_more from being set. If the driver
> is not DQL enabled xmit_more is never ever set. If the driver is DQL enabled
> the queue is adjusted to ensure xmit_more stops happening within 10-15 xmit
> cycles.
> 
> That is plain *wrong* for virtual NICs - virtio, emulated NICs, etc. There,
> the BIG cost is telling the hypervisor that it needs to "kick" the packets.
> The cost of putting them into the vNIC buffers is negligible. You want
> xmit_more to happen - it makes between 50% and 300% (depending on vNIC
> design) difference. If there is no xmit_more the vNIC will immediately
> "kick" the hypervisor and try to signal that  the packet needs to move
> straight away (as for example in virtio_net).
> 
> In addition to that, the perceived line rate is proportional to this cost,
> so I am not sure that the current dql math holds. In fact, I think it does
> not - it is trying to adjust something which influences the perceived line
> rate.
> 
> So - how do we turn BOTH bypass and DQL adjustment while under
> virtualization and set them to be "always qdisc" + "always xmit_more
> allowed"
> 
> A.
> 
> P.S. Cc-ing virtio maintainer

CCing Michael Tsirkin and Jason Wang, who are the core virtio and
virtio-net maintainers.  (I maintain the vsock driver - it's unrelated
to this discussion.)

> 
> A.
> 
> 
> On 08/05/17 08:15, Anton Ivanov wrote:
> > Hi all,
> > 
> > I was revising some of my old work for UML to prepare it for submission
> > and I noticed that skb->xmit_more does not seem to be set any more.
> > 
> > I traced the issue as far as net/sched/sched_generic.c
> > 
> > try_bulk_dequeue_skb() is never invoked (the drivers I am working on are
> > dql enabled so that is not the problem).
> > 
> > More interestingly, if I put a breakpoint and debug output into
> > dequeue_skb() around line 147 - right before the bulk: tag that skb
> > there is always NULL. ???
> > 
> > Similarly, debug in pfifo_fast_dequeue shows only NULLs being dequeued.
> > Again - ???
> > 
> > First and foremost, I apologize for the silly question, but how can this
> > work at all? I see the skbs showing up at the driver level, why are
> > NULLs being returned at qdisc dequeue and where do the skbs at the
> > driver level come from?
> > 
> > Second, where should I look to fix it?
> > 
> > A.
> > 
> 
> 
> -- 
> Anton R. Ivanov
> 
> Cambridge Greys Limited, England company No 10273661
> http://www.cambridgegreys.com/
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11")
  2017-05-09 15:11   ` Stefan Hajnoczi
@ 2017-05-10  2:18     ` Jason Wang
  2017-05-10  5:28       ` Anton Ivanov
  2017-05-10  5:35       ` Anton Ivanov
  0 siblings, 2 replies; 11+ messages in thread
From: Jason Wang @ 2017-05-10  2:18 UTC (permalink / raw)
  To: Stefan Hajnoczi, Anton Ivanov; +Cc: David S. Miller, netdev, Michael S. Tsirkin



On 2017年05月09日 23:11, Stefan Hajnoczi wrote:
> On Tue, May 09, 2017 at 08:46:46AM +0100, Anton Ivanov wrote:
>> I have figured it out. Two issues.
>>
>> 1) skb->xmit_more is hardly ever set under virtualization because the qdisc
>> is usually bypassed because of TCQ_F_CAN_BYPASS. Once TCQ_F_CAN_BYPASS is
>> set a virtual NIC driver is not likely see skb->xmit_more (this answers my
>> "how does this work at all" question).
>>
>> 2) If that flag is turned off (I patched sched_generic to turn it off in
>> pfifo_fast while testing), DQL keeps xmit_more from being set. If the driver
>> is not DQL enabled xmit_more is never ever set. If the driver is DQL enabled
>> the queue is adjusted to ensure xmit_more stops happening within 10-15 xmit
>> cycles.
>>
>> That is plain *wrong* for virtual NICs - virtio, emulated NICs, etc. There,
>> the BIG cost is telling the hypervisor that it needs to "kick" the packets.
>> The cost of putting them into the vNIC buffers is negligible. You want
>> xmit_more to happen - it makes between 50% and 300% (depending on vNIC
>> design) difference. If there is no xmit_more the vNIC will immediately
>> "kick" the hypervisor and try to signal that  the packet needs to move
>> straight away (as for example in virtio_net).

How do you measure the performance? TCP or just measure pps?

>>
>> In addition to that, the perceived line rate is proportional to this cost,
>> so I am not sure that the current dql math holds. In fact, I think it does
>> not - it is trying to adjust something which influences the perceived line
>> rate.
>>
>> So - how do we turn BOTH bypass and DQL adjustment while under
>> virtualization and set them to be "always qdisc" + "always xmit_more
>> allowed"

Virtio-net net does not support BQL. Before commit ea7735d97ba9 
("virtio-net: move free_old_xmit_skbs"), it's even impossible to support 
that since we don't have tx interrupt for each packet.  I haven't 
measured the impact of xmit_more, maybe I was wrong but I think it may 
help in some cases since it may improve the batching on host more or less.

Thanks

>>
>> A.
>>
>> P.S. Cc-ing virtio maintainer
> CCing Michael Tsirkin and Jason Wang, who are the core virtio and
> virtio-net maintainers.  (I maintain the vsock driver - it's unrelated
> to this discussion.)
>
>> A.
>>
>>
>> On 08/05/17 08:15, Anton Ivanov wrote:
>>> Hi all,
>>>
>>> I was revising some of my old work for UML to prepare it for submission
>>> and I noticed that skb->xmit_more does not seem to be set any more.
>>>
>>> I traced the issue as far as net/sched/sched_generic.c
>>>
>>> try_bulk_dequeue_skb() is never invoked (the drivers I am working on are
>>> dql enabled so that is not the problem).
>>>
>>> More interestingly, if I put a breakpoint and debug output into
>>> dequeue_skb() around line 147 - right before the bulk: tag that skb
>>> there is always NULL. ???
>>>
>>> Similarly, debug in pfifo_fast_dequeue shows only NULLs being dequeued.
>>> Again - ???
>>>
>>> First and foremost, I apologize for the silly question, but how can this
>>> work at all? I see the skbs showing up at the driver level, why are
>>> NULLs being returned at qdisc dequeue and where do the skbs at the
>>> driver level come from?
>>>
>>> Second, where should I look to fix it?
>>>
>>> A.
>>>
>>
>> -- 
>> Anton R. Ivanov
>>
>> Cambridge Greys Limited, England company No 10273661
>> http://www.cambridgegreys.com/
>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11")
  2017-05-10  2:18     ` Jason Wang
@ 2017-05-10  5:28       ` Anton Ivanov
  2017-05-10  8:56         ` Jason Wang
  2017-05-10  5:35       ` Anton Ivanov
  1 sibling, 1 reply; 11+ messages in thread
From: Anton Ivanov @ 2017-05-10  5:28 UTC (permalink / raw)
  To: Jason Wang, Stefan Hajnoczi; +Cc: David S. Miller, netdev, Michael S. Tsirkin

On 10/05/17 03:18, Jason Wang wrote:
>
>
> On 2017年05月09日 23:11, Stefan Hajnoczi wrote:
>> On Tue, May 09, 2017 at 08:46:46AM +0100, Anton Ivanov wrote:
>>> I have figured it out. Two issues.
>>>
>>> 1) skb->xmit_more is hardly ever set under virtualization because
>>> the qdisc
>>> is usually bypassed because of TCQ_F_CAN_BYPASS. Once
>>> TCQ_F_CAN_BYPASS is
>>> set a virtual NIC driver is not likely see skb->xmit_more (this
>>> answers my
>>> "how does this work at all" question).
>>>
>>> 2) If that flag is turned off (I patched sched_generic to turn it
>>> off in
>>> pfifo_fast while testing), DQL keeps xmit_more from being set. If
>>> the driver
>>> is not DQL enabled xmit_more is never ever set. If the driver is DQL
>>> enabled
>>> the queue is adjusted to ensure xmit_more stops happening within
>>> 10-15 xmit
>>> cycles.
>>>
>>> That is plain *wrong* for virtual NICs - virtio, emulated NICs, etc.
>>> There,
>>> the BIG cost is telling the hypervisor that it needs to "kick" the
>>> packets.
>>> The cost of putting them into the vNIC buffers is negligible. You want
>>> xmit_more to happen - it makes between 50% and 300% (depending on vNIC
>>> design) difference. If there is no xmit_more the vNIC will immediately
>>> "kick" the hypervisor and try to signal that  the packet needs to move
>>> straight away (as for example in virtio_net).
>
> How do you measure the performance? TCP or just measure pps?

In this particular case - tcp from guest. I have a couple of other
benchmarks (forwarding, etc).

>
>>>
>>> In addition to that, the perceived line rate is proportional to this
>>> cost,
>>> so I am not sure that the current dql math holds. In fact, I think
>>> it does
>>> not - it is trying to adjust something which influences the
>>> perceived line
>>> rate.
>>>
>>> So - how do we turn BOTH bypass and DQL adjustment while under
>>> virtualization and set them to be "always qdisc" + "always xmit_more
>>> allowed"
>
> Virtio-net net does not support BQL. Before commit ea7735d97ba9
> ("virtio-net: move free_old_xmit_skbs"), it's even impossible to
> support that since we don't have tx interrupt for each packet.  I
> haven't measured the impact of xmit_more, maybe I was wrong but I
> think it may help in some cases since it may improve the batching on
> host more or less.

If you do not support BQL, you might as well look the xmit_more part
kick code path. Line 1127.

bool kick = !skb->xmit_more; effectively means kick = true;

It will never be triggered. You will be kicking each packet and per
packet. xmit_more is now set only out of BQL. If BQL is not enabled you
never get it. Now, will the current dql code work correctly if you do
not have a defined line rate and completion interrupts - no idea.
Probably not. IMHO instead of trying to fix it there should be a way for
a device or architecture to turn it off.

To be clear - I ran into this working on my own drivers for UML, you are
cc-ed because you are likely to be one of the most affected.

A.

>
> Thanks
>
>>>
>>> A.
>>>
>>> P.S. Cc-ing virtio maintainer
>> CCing Michael Tsirkin and Jason Wang, who are the core virtio and
>> virtio-net maintainers.  (I maintain the vsock driver - it's unrelated
>> to this discussion.)
>>
>>> A.
>>>
>>>
>>> On 08/05/17 08:15, Anton Ivanov wrote:
>>>> Hi all,
>>>>
>>>> I was revising some of my old work for UML to prepare it for
>>>> submission
>>>> and I noticed that skb->xmit_more does not seem to be set any more.
>>>>
>>>> I traced the issue as far as net/sched/sched_generic.c
>>>>
>>>> try_bulk_dequeue_skb() is never invoked (the drivers I am working
>>>> on are
>>>> dql enabled so that is not the problem).
>>>>
>>>> More interestingly, if I put a breakpoint and debug output into
>>>> dequeue_skb() around line 147 - right before the bulk: tag that skb
>>>> there is always NULL. ???
>>>>
>>>> Similarly, debug in pfifo_fast_dequeue shows only NULLs being
>>>> dequeued.
>>>> Again - ???
>>>>
>>>> First and foremost, I apologize for the silly question, but how can
>>>> this
>>>> work at all? I see the skbs showing up at the driver level, why are
>>>> NULLs being returned at qdisc dequeue and where do the skbs at the
>>>> driver level come from?
>>>>
>>>> Second, where should I look to fix it?
>>>>
>>>> A.
>>>>
>>>
>>> -- 
>>> Anton R. Ivanov
>>>
>>> Cambridge Greys Limited, England company No 10273661
>>> http://www.cambridgegreys.com/
>>>
>
>


-- 
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11")
  2017-05-10  2:18     ` Jason Wang
  2017-05-10  5:28       ` Anton Ivanov
@ 2017-05-10  5:35       ` Anton Ivanov
  1 sibling, 0 replies; 11+ messages in thread
From: Anton Ivanov @ 2017-05-10  5:35 UTC (permalink / raw)
  To: Jason Wang, Stefan Hajnoczi; +Cc: David S. Miller, netdev, Michael S. Tsirkin

[snip]

> Virtio-net net does not support BQL. Before commit ea7735d97ba9
> ("virtio-net: move free_old_xmit_skbs"), it's even impossible to
> support that since we don't have tx interrupt for each packet.  I
> haven't measured the impact of xmit_more, maybe I was wrong but I
> think it may help in some cases since it may improve the batching on
> host more or less.

Sorry, hit send too soon.

Impact of xmit more depends on your transport.

If, for example, you are using sendmmsg on the outer side which can
consume the bulked data "as is", the impact is quite significant. If
your transport does not support bulking, the fact there was bulking
earlier in the chain has little impact.

There is some, but not a lot.

[snip]

-- 
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11")
  2017-05-10  5:28       ` Anton Ivanov
@ 2017-05-10  8:56         ` Jason Wang
  2017-05-10  9:42           ` Anton Ivanov
  0 siblings, 1 reply; 11+ messages in thread
From: Jason Wang @ 2017-05-10  8:56 UTC (permalink / raw)
  To: Anton Ivanov, Stefan Hajnoczi; +Cc: David S. Miller, netdev, Michael S. Tsirkin



On 2017年05月10日 13:28, Anton Ivanov wrote:
> On 10/05/17 03:18, Jason Wang wrote:
>>
>> On 2017年05月09日 23:11, Stefan Hajnoczi wrote:
>>> On Tue, May 09, 2017 at 08:46:46AM +0100, Anton Ivanov wrote:
>>>> I have figured it out. Two issues.
>>>>
>>>> 1) skb->xmit_more is hardly ever set under virtualization because
>>>> the qdisc
>>>> is usually bypassed because of TCQ_F_CAN_BYPASS. Once
>>>> TCQ_F_CAN_BYPASS is
>>>> set a virtual NIC driver is not likely see skb->xmit_more (this
>>>> answers my
>>>> "how does this work at all" question).
>>>>
>>>> 2) If that flag is turned off (I patched sched_generic to turn it
>>>> off in
>>>> pfifo_fast while testing), DQL keeps xmit_more from being set. If
>>>> the driver
>>>> is not DQL enabled xmit_more is never ever set. If the driver is DQL
>>>> enabled
>>>> the queue is adjusted to ensure xmit_more stops happening within
>>>> 10-15 xmit
>>>> cycles.
>>>>
>>>> That is plain *wrong* for virtual NICs - virtio, emulated NICs, etc.
>>>> There,
>>>> the BIG cost is telling the hypervisor that it needs to "kick" the
>>>> packets.
>>>> The cost of putting them into the vNIC buffers is negligible. You want
>>>> xmit_more to happen - it makes between 50% and 300% (depending on vNIC
>>>> design) difference. If there is no xmit_more the vNIC will immediately
>>>> "kick" the hypervisor and try to signal that  the packet needs to move
>>>> straight away (as for example in virtio_net).
>> How do you measure the performance? TCP or just measure pps?
> In this particular case - tcp from guest. I have a couple of other
> benchmarks (forwarding, etc).

One more question, is the number for virtio-net or other emulated vNIC?

>
>>>> In addition to that, the perceived line rate is proportional to this
>>>> cost,
>>>> so I am not sure that the current dql math holds. In fact, I think
>>>> it does
>>>> not - it is trying to adjust something which influences the
>>>> perceived line
>>>> rate.
>>>>
>>>> So - how do we turn BOTH bypass and DQL adjustment while under
>>>> virtualization and set them to be "always qdisc" + "always xmit_more
>>>> allowed"
>> Virtio-net net does not support BQL. Before commit ea7735d97ba9
>> ("virtio-net: move free_old_xmit_skbs"), it's even impossible to
>> support that since we don't have tx interrupt for each packet.  I
>> haven't measured the impact of xmit_more, maybe I was wrong but I
>> think it may help in some cases since it may improve the batching on
>> host more or less.
> If you do not support BQL, you might as well look the xmit_more part
> kick code path. Line 1127.
>
> bool kick = !skb->xmit_more; effectively means kick = true;
>
> It will never be triggered. You will be kicking each packet and per
> packet.

Probably not, we have several ways to try to suppress this on the virtio 
layer, host can give hints to disable the kicks through:

- explicitly set a flag
- implicitly by not publishing a new event idx

FYI, I can get 100-200 packets per vm exit when testing 64 byte 
TCP_STREAM using netperf.

> xmit_more is now set only out of BQL. If BQL is not enabled you
> never get it. Now, will the current dql code work correctly if you do
> not have a defined line rate and completion interrupts - no idea.
> Probably not. IMHO instead of trying to fix it there should be a way for
> a device or architecture to turn it off.

In fact BQL is not the only user for xmit_more. Pktgen with burst is 
another. Test does not show obvious difference if I set burst from 0 to 
64 since we already had other ways to avoid kicking host.

>
> To be clear - I ran into this working on my own drivers for UML, you are
> cc-ed because you are likely to be one of the most affected.

I'm still not quite sure the issue. Looks like virtio-net is ok since 
BQL is not supported and the impact of xmit_more could be ignored.

Thanks

>
> A.
>
>> Thanks
>>
>>>> A.
>>>>
>>>> P.S. Cc-ing virtio maintainer
>>> CCing Michael Tsirkin and Jason Wang, who are the core virtio and
>>> virtio-net maintainers.  (I maintain the vsock driver - it's unrelated
>>> to this discussion.)
>>>
>>>> A.
>>>>
>>>>
>>>> On 08/05/17 08:15, Anton Ivanov wrote:
>>>>> Hi all,
>>>>>
>>>>> I was revising some of my old work for UML to prepare it for
>>>>> submission
>>>>> and I noticed that skb->xmit_more does not seem to be set any more.
>>>>>
>>>>> I traced the issue as far as net/sched/sched_generic.c
>>>>>
>>>>> try_bulk_dequeue_skb() is never invoked (the drivers I am working
>>>>> on are
>>>>> dql enabled so that is not the problem).
>>>>>
>>>>> More interestingly, if I put a breakpoint and debug output into
>>>>> dequeue_skb() around line 147 - right before the bulk: tag that skb
>>>>> there is always NULL. ???
>>>>>
>>>>> Similarly, debug in pfifo_fast_dequeue shows only NULLs being
>>>>> dequeued.
>>>>> Again - ???
>>>>>
>>>>> First and foremost, I apologize for the silly question, but how can
>>>>> this
>>>>> work at all? I see the skbs showing up at the driver level, why are
>>>>> NULLs being returned at qdisc dequeue and where do the skbs at the
>>>>> driver level come from?
>>>>>
>>>>> Second, where should I look to fix it?
>>>>>
>>>>> A.
>>>>>
>>>> -- 
>>>> Anton R. Ivanov
>>>>
>>>> Cambridge Greys Limited, England company No 10273661
>>>> http://www.cambridgegreys.com/
>>>>
>>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11")
  2017-05-10  8:56         ` Jason Wang
@ 2017-05-10  9:42           ` Anton Ivanov
  2017-05-11  2:43             ` Jason Wang
  0 siblings, 1 reply; 11+ messages in thread
From: Anton Ivanov @ 2017-05-10  9:42 UTC (permalink / raw)
  To: Jason Wang, Stefan Hajnoczi; +Cc: David S. Miller, netdev, Michael S. Tsirkin

On 10/05/17 09:56, Jason Wang wrote:
>
>
> On 2017年05月10日 13:28, Anton Ivanov wrote:
>> On 10/05/17 03:18, Jason Wang wrote:
>>>
>>> On 2017年05月09日 23:11, Stefan Hajnoczi wrote:
>>>> On Tue, May 09, 2017 at 08:46:46AM +0100, Anton Ivanov wrote:
>>>>> I have figured it out. Two issues.
>>>>>
>>>>> 1) skb->xmit_more is hardly ever set under virtualization because
>>>>> the qdisc
>>>>> is usually bypassed because of TCQ_F_CAN_BYPASS. Once
>>>>> TCQ_F_CAN_BYPASS is
>>>>> set a virtual NIC driver is not likely see skb->xmit_more (this
>>>>> answers my
>>>>> "how does this work at all" question).
>>>>>
>>>>> 2) If that flag is turned off (I patched sched_generic to turn it
>>>>> off in
>>>>> pfifo_fast while testing), DQL keeps xmit_more from being set. If
>>>>> the driver
>>>>> is not DQL enabled xmit_more is never ever set. If the driver is DQL
>>>>> enabled
>>>>> the queue is adjusted to ensure xmit_more stops happening within
>>>>> 10-15 xmit
>>>>> cycles.
>>>>>
>>>>> That is plain *wrong* for virtual NICs - virtio, emulated NICs, etc.
>>>>> There,
>>>>> the BIG cost is telling the hypervisor that it needs to "kick" the
>>>>> packets.
>>>>> The cost of putting them into the vNIC buffers is negligible. You 
>>>>> want
>>>>> xmit_more to happen - it makes between 50% and 300% (depending on 
>>>>> vNIC
>>>>> design) difference. If there is no xmit_more the vNIC will 
>>>>> immediately
>>>>> "kick" the hypervisor and try to signal that  the packet needs to 
>>>>> move
>>>>> straight away (as for example in virtio_net).
>>> How do you measure the performance? TCP or just measure pps?
>> In this particular case - tcp from guest. I have a couple of other
>> benchmarks (forwarding, etc).
>
> One more question, is the number for virtio-net or other emulated vNIC?

Other for now - you are cc-ed to keep you in the loop.

Virtio is next on my list - I am revisiting the l2tpv3.c driver in QEMU 
and looking at how to preserve bulking by adding back sendmmsg (as well 
as a list of other features/transports).

We had sendmmsg removed for the final inclusion in QEMU 2.1, it 
presently uses only recvmmsg so for the time being it does not care. 
That will most likely change once it starts using sendmmsg as well.

>
>>
>>>>> In addition to that, the perceived line rate is proportional to this
>>>>> cost,
>>>>> so I am not sure that the current dql math holds. In fact, I think
>>>>> it does
>>>>> not - it is trying to adjust something which influences the
>>>>> perceived line
>>>>> rate.
>>>>>
>>>>> So - how do we turn BOTH bypass and DQL adjustment while under
>>>>> virtualization and set them to be "always qdisc" + "always xmit_more
>>>>> allowed"
>>> Virtio-net net does not support BQL. Before commit ea7735d97ba9
>>> ("virtio-net: move free_old_xmit_skbs"), it's even impossible to
>>> support that since we don't have tx interrupt for each packet.  I
>>> haven't measured the impact of xmit_more, maybe I was wrong but I
>>> think it may help in some cases since it may improve the batching on
>>> host more or less.
>> If you do not support BQL, you might as well look the xmit_more part
>> kick code path. Line 1127.
>>
>> bool kick = !skb->xmit_more; effectively means kick = true;
>>
>> It will never be triggered. You will be kicking each packet and per
>> packet.
>
> Probably not, we have several ways to try to suppress this on the 
> virtio layer, host can give hints to disable the kicks through:
>
> - explicitly set a flag
> - implicitly by not publishing a new event idx
>
> FYI, I can get 100-200 packets per vm exit when testing 64 byte 
> TCP_STREAM using netperf.

I am aware of that. If, however, the host is providing a hint we might 
as well use it.

>
>> xmit_more is now set only out of BQL. If BQL is not enabled you
>> never get it. Now, will the current dql code work correctly if you do
>> not have a defined line rate and completion interrupts - no idea.
>> Probably not. IMHO instead of trying to fix it there should be a way for
>> a device or architecture to turn it off.
>
> In fact BQL is not the only user for xmit_more. Pktgen with burst is 
> another. Test does not show obvious difference if I set burst from 0 
> to 64 since we already had other ways to avoid kicking host.

That, as well as this not being wired to bulk transport.

>
>>
>> To be clear - I ran into this working on my own drivers for UML, you are
>> cc-ed because you are likely to be one of the most affected.
>
> I'm still not quite sure the issue. Looks like virtio-net is ok since 
> BQL is not supported and the impact of xmit_more could be ignored.

Presently - yes. If you have bulk aware transports to wire into that is 
likely to make a difference.

>
> Thanks
>
>>
>> A.
>>
>>> Thanks
>>>
>>>>> A.
>>>>>
>>>>> P.S. Cc-ing virtio maintainer
>>>> CCing Michael Tsirkin and Jason Wang, who are the core virtio and
>>>> virtio-net maintainers.  (I maintain the vsock driver - it's unrelated
>>>> to this discussion.)
>>>>
>>>>> A.
>>>>>
>>>>>
>>>>> On 08/05/17 08:15, Anton Ivanov wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I was revising some of my old work for UML to prepare it for
>>>>>> submission
>>>>>> and I noticed that skb->xmit_more does not seem to be set any more.
>>>>>>
>>>>>> I traced the issue as far as net/sched/sched_generic.c
>>>>>>
>>>>>> try_bulk_dequeue_skb() is never invoked (the drivers I am working
>>>>>> on are
>>>>>> dql enabled so that is not the problem).
>>>>>>
>>>>>> More interestingly, if I put a breakpoint and debug output into
>>>>>> dequeue_skb() around line 147 - right before the bulk: tag that skb
>>>>>> there is always NULL. ???
>>>>>>
>>>>>> Similarly, debug in pfifo_fast_dequeue shows only NULLs being
>>>>>> dequeued.
>>>>>> Again - ???
>>>>>>
>>>>>> First and foremost, I apologize for the silly question, but how can
>>>>>> this
>>>>>> work at all? I see the skbs showing up at the driver level, why are
>>>>>> NULLs being returned at qdisc dequeue and where do the skbs at the
>>>>>> driver level come from?
>>>>>>
>>>>>> Second, where should I look to fix it?
>>>>>>
>>>>>> A.
>>>>>>
>>>>> -- 
>>>>> Anton R. Ivanov
>>>>>
>>>>> Cambridge Greys Limited, England company No 10273661
>>>>> http://www.cambridgegreys.com/
>>>>>
>>>
>>
>
>


-- 
Anton R. Ivanov

Cambridge Greys Limited, England company No 10273661
http://www.cambridgegreys.com/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11")
  2017-05-10  9:42           ` Anton Ivanov
@ 2017-05-11  2:43             ` Jason Wang
  2017-05-11  5:43               ` Anton Ivanov
  0 siblings, 1 reply; 11+ messages in thread
From: Jason Wang @ 2017-05-11  2:43 UTC (permalink / raw)
  To: Anton Ivanov, Stefan Hajnoczi; +Cc: David S. Miller, netdev, Michael S. Tsirkin



On 2017年05月10日 17:42, Anton Ivanov wrote:
> On 10/05/17 09:56, Jason Wang wrote:
>>
>>
>> On 2017年05月10日 13:28, Anton Ivanov wrote:
>>> On 10/05/17 03:18, Jason Wang wrote:
>>>>
>>>> On 2017年05月09日 23:11, Stefan Hajnoczi wrote:
>>>>> On Tue, May 09, 2017 at 08:46:46AM +0100, Anton Ivanov wrote:
>>>>>> I have figured it out. Two issues.
>>>>>>
>>>>>> 1) skb->xmit_more is hardly ever set under virtualization because
>>>>>> the qdisc
>>>>>> is usually bypassed because of TCQ_F_CAN_BYPASS. Once
>>>>>> TCQ_F_CAN_BYPASS is
>>>>>> set a virtual NIC driver is not likely see skb->xmit_more (this
>>>>>> answers my
>>>>>> "how does this work at all" question).
>>>>>>
>>>>>> 2) If that flag is turned off (I patched sched_generic to turn it
>>>>>> off in
>>>>>> pfifo_fast while testing), DQL keeps xmit_more from being set. If
>>>>>> the driver
>>>>>> is not DQL enabled xmit_more is never ever set. If the driver is DQL
>>>>>> enabled
>>>>>> the queue is adjusted to ensure xmit_more stops happening within
>>>>>> 10-15 xmit
>>>>>> cycles.
>>>>>>
>>>>>> That is plain *wrong* for virtual NICs - virtio, emulated NICs, etc.
>>>>>> There,
>>>>>> the BIG cost is telling the hypervisor that it needs to "kick" the
>>>>>> packets.
>>>>>> The cost of putting them into the vNIC buffers is negligible. You 
>>>>>> want
>>>>>> xmit_more to happen - it makes between 50% and 300% (depending on 
>>>>>> vNIC
>>>>>> design) difference. If there is no xmit_more the vNIC will 
>>>>>> immediately
>>>>>> "kick" the hypervisor and try to signal that  the packet needs to 
>>>>>> move
>>>>>> straight away (as for example in virtio_net).
>>>> How do you measure the performance? TCP or just measure pps?
>>> In this particular case - tcp from guest. I have a couple of other
>>> benchmarks (forwarding, etc).
>>
>> One more question, is the number for virtio-net or other emulated vNIC?
>
> Other for now - you are cc-ed to keep you in the loop.
>
> Virtio is next on my list - I am revisiting the l2tpv3.c driver in 
> QEMU and looking at how to preserve bulking by adding back sendmmsg 
> (as well as a list of other features/transports).
>
> We had sendmmsg removed for the final inclusion in QEMU 2.1, it 
> presently uses only recvmmsg so for the time being it does not care. 
> That will most likely change once it starts using sendmmsg as well.

An issue is that qemu net API does not support bulking, do you plan to 
add it?

Thanks

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11")
  2017-05-11  2:43             ` Jason Wang
@ 2017-05-11  5:43               ` Anton Ivanov
  0 siblings, 0 replies; 11+ messages in thread
From: Anton Ivanov @ 2017-05-11  5:43 UTC (permalink / raw)
  To: Jason Wang, Stefan Hajnoczi; +Cc: David S. Miller, netdev, Michael S. Tsirkin

On 11/05/17 03:43, Jason Wang wrote:
>
>
> On 2017年05月10日 17:42, Anton Ivanov wrote:
>> On 10/05/17 09:56, Jason Wang wrote:
>>>
>>>
>>> On 2017年05月10日 13:28, Anton Ivanov wrote:
>>>> On 10/05/17 03:18, Jason Wang wrote:
>>>>>
>>>>> On 2017年05月09日 23:11, Stefan Hajnoczi wrote:
>>>>>> On Tue, May 09, 2017 at 08:46:46AM +0100, Anton Ivanov wrote:
>>>>>>> I have figured it out. Two issues.
>>>>>>>
>>>>>>> 1) skb->xmit_more is hardly ever set under virtualization because
>>>>>>> the qdisc
>>>>>>> is usually bypassed because of TCQ_F_CAN_BYPASS. Once
>>>>>>> TCQ_F_CAN_BYPASS is
>>>>>>> set a virtual NIC driver is not likely see skb->xmit_more (this
>>>>>>> answers my
>>>>>>> "how does this work at all" question).
>>>>>>>
>>>>>>> 2) If that flag is turned off (I patched sched_generic to turn it
>>>>>>> off in
>>>>>>> pfifo_fast while testing), DQL keeps xmit_more from being set. If
>>>>>>> the driver
>>>>>>> is not DQL enabled xmit_more is never ever set. If the driver is
>>>>>>> DQL
>>>>>>> enabled
>>>>>>> the queue is adjusted to ensure xmit_more stops happening within
>>>>>>> 10-15 xmit
>>>>>>> cycles.
>>>>>>>
>>>>>>> That is plain *wrong* for virtual NICs - virtio, emulated NICs,
>>>>>>> etc.
>>>>>>> There,
>>>>>>> the BIG cost is telling the hypervisor that it needs to "kick" the
>>>>>>> packets.
>>>>>>> The cost of putting them into the vNIC buffers is negligible.
>>>>>>> You want
>>>>>>> xmit_more to happen - it makes between 50% and 300% (depending
>>>>>>> on vNIC
>>>>>>> design) difference. If there is no xmit_more the vNIC will
>>>>>>> immediately
>>>>>>> "kick" the hypervisor and try to signal that  the packet needs
>>>>>>> to move
>>>>>>> straight away (as for example in virtio_net).
>>>>> How do you measure the performance? TCP or just measure pps?
>>>> In this particular case - tcp from guest. I have a couple of other
>>>> benchmarks (forwarding, etc).
>>>
>>> One more question, is the number for virtio-net or other emulated vNIC?
>>
>> Other for now - you are cc-ed to keep you in the loop.
>>
>> Virtio is next on my list - I am revisiting the l2tpv3.c driver in
>> QEMU and looking at how to preserve bulking by adding back sendmmsg
>> (as well as a list of other features/transports).
>>
>> We had sendmmsg removed for the final inclusion in QEMU 2.1, it
>> presently uses only recvmmsg so for the time being it does not care.
>> That will most likely change once it starts using sendmmsg as well.
>
> An issue is that qemu net API does not support bulking, do you plan to
> add it?

Yes :)

A.

>
> Thanks
>


-- 
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-05-11  5:43 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-08  7:15 net_sched strange in 4.11 Anton Ivanov
2017-05-09  7:46 ` DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11") Anton Ivanov
2017-05-09  8:00   ` [uml-devel] Fwd: " Anton Ivanov
2017-05-09 15:11   ` Stefan Hajnoczi
2017-05-10  2:18     ` Jason Wang
2017-05-10  5:28       ` Anton Ivanov
2017-05-10  8:56         ` Jason Wang
2017-05-10  9:42           ` Anton Ivanov
2017-05-11  2:43             ` Jason Wang
2017-05-11  5:43               ` Anton Ivanov
2017-05-10  5:35       ` Anton Ivanov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.