All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
@ 2013-10-30  0:50 Zoltan Kiss
  0 siblings, 0 replies; 21+ messages in thread
From: Zoltan Kiss @ 2013-10-30  0:50 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

A long known problem of the upstream netback implementation that on the TX
path (from guest to Dom0) it copies the whole packet from guest memory into
Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a
huge perfomance penalty. The classic kernel version of netback used grant
mapping, and to get notified when the page can be unmapped, it used page
destructors. Unfortunately that destructor is not an upstreamable solution.
Ian Campbell's skb fragment destructor patch series
(http://lwn.net/Articles/491522/) tried to solve this problem, however it
seems to be very invasive on the network stack's code, and therefore haven't
progressed very well.
This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
know when the skb is freed up. That is the way KVM solved the same problem,
and based on my initial tests it can do the same for us. Avoiding the extra
copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower
Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node,
running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb
switch)
Based on my investigations the packet get only copied if it is delivered to
Dom0 stack, which is due to this patch:
https://lkml.org/lkml/2012/7/20/363
That's a bit unfortunate, but as far as I know for the huge majority this use
case is not too important. There are a couple of things which need more
polishing, see the FIXME comments. I will run some more extensive tests, but
in the meantime I would like to hear comments about what I've done so far.
I've tried to broke it down to smaller patches, with mixed results, so I
welcome suggestions on that part as well:
1/5: Introduce TX grant map definitions
2/5: Change TX path from grant copy to mapping
3/5: Remove old TX grant copy definitons
4/5: Fix indentations
5/5: Change RX path for mapped SKB fragments

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
  2013-12-12 22:08           ` Zoltan Kiss
  2013-12-16 10:14             ` Ian Campbell
@ 2013-12-16 10:14             ` Ian Campbell
  1 sibling, 0 replies; 21+ messages in thread
From: Ian Campbell @ 2013-12-16 10:14 UTC (permalink / raw)
  To: Zoltan Kiss; +Cc: wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies

On Thu, 2013-12-12 at 22:08 +0000, Zoltan Kiss wrote:
> On 28/11/13 17:43, Ian Campbell wrote:
> > On Thu, 2013-11-28 at 17:37 +0000, Zoltan Kiss wrote:
> > Routing/firewalling domUs is as valid as bridging. There is nothing in
> > the slightest bit suboptimal about it.
> >
> > If this use case regresses with this approach then I'm afraid that
> > either needs to be addressed or a different approach considered.
> >
> >> Anyway, I will try this out, and see if it really copies everything, and
> >> get some numbers as well.
> >
> > Thanks.
> 
> Now I managed to try it out. As I expected, Dom0 does copy the mapped 
> page. The peak throuhput I could get was 6.6 Gbps, however it could keep 
> that only for short periods, I guess when the unmapping was ideally 
> batched. The average was 5.53.
> On the same machine the same 10 min iperf session, without my patches 
> made the peak 5.9 while the average was 5.65. Do you think it is an 
> acceptable regression?

Well, it would of course be preferable to avoid it. I'm quite reluctant
to see this scenario become a second class citizen.

> I used 3.12 Dom0 and guest kernel, the guest transmitted though a 10Gb 
> card to a bare metal box.
> I plan to look further if we can avoid somehow this:
> 
> https://lkml.org/lkml/2012/7/20/363
> 
> So then this scenario can benefit from grant mapping.
> 
> Regards,
> 
> Zoli



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
  2013-12-12 22:08           ` Zoltan Kiss
@ 2013-12-16 10:14             ` Ian Campbell
  2013-12-16 10:14             ` Ian Campbell
  1 sibling, 0 replies; 21+ messages in thread
From: Ian Campbell @ 2013-12-16 10:14 UTC (permalink / raw)
  To: Zoltan Kiss; +Cc: xen-devel, jonathan.davies, wei.liu2, linux-kernel, netdev

On Thu, 2013-12-12 at 22:08 +0000, Zoltan Kiss wrote:
> On 28/11/13 17:43, Ian Campbell wrote:
> > On Thu, 2013-11-28 at 17:37 +0000, Zoltan Kiss wrote:
> > Routing/firewalling domUs is as valid as bridging. There is nothing in
> > the slightest bit suboptimal about it.
> >
> > If this use case regresses with this approach then I'm afraid that
> > either needs to be addressed or a different approach considered.
> >
> >> Anyway, I will try this out, and see if it really copies everything, and
> >> get some numbers as well.
> >
> > Thanks.
> 
> Now I managed to try it out. As I expected, Dom0 does copy the mapped 
> page. The peak throuhput I could get was 6.6 Gbps, however it could keep 
> that only for short periods, I guess when the unmapping was ideally 
> batched. The average was 5.53.
> On the same machine the same 10 min iperf session, without my patches 
> made the peak 5.9 while the average was 5.65. Do you think it is an 
> acceptable regression?

Well, it would of course be preferable to avoid it. I'm quite reluctant
to see this scenario become a second class citizen.

> I used 3.12 Dom0 and guest kernel, the guest transmitted though a 10Gb 
> card to a bare metal box.
> I plan to look further if we can avoid somehow this:
> 
> https://lkml.org/lkml/2012/7/20/363
> 
> So then this scenario can benefit from grant mapping.
> 
> Regards,
> 
> Zoli

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
  2013-11-28 17:43         ` Ian Campbell
@ 2013-12-12 22:08           ` Zoltan Kiss
  2013-12-16 10:14             ` Ian Campbell
  2013-12-16 10:14             ` Ian Campbell
  2013-12-12 22:08           ` Zoltan Kiss
  1 sibling, 2 replies; 21+ messages in thread
From: Zoltan Kiss @ 2013-12-12 22:08 UTC (permalink / raw)
  To: Ian Campbell; +Cc: wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies

On 28/11/13 17:43, Ian Campbell wrote:
> On Thu, 2013-11-28 at 17:37 +0000, Zoltan Kiss wrote:
> Routing/firewalling domUs is as valid as bridging. There is nothing in
> the slightest bit suboptimal about it.
>
> If this use case regresses with this approach then I'm afraid that
> either needs to be addressed or a different approach considered.
>
>> Anyway, I will try this out, and see if it really copies everything, and
>> get some numbers as well.
>
> Thanks.

Now I managed to try it out. As I expected, Dom0 does copy the mapped 
page. The peak throuhput I could get was 6.6 Gbps, however it could keep 
that only for short periods, I guess when the unmapping was ideally 
batched. The average was 5.53.
On the same machine the same 10 min iperf session, without my patches 
made the peak 5.9 while the average was 5.65. Do you think it is an 
acceptable regression?
I used 3.12 Dom0 and guest kernel, the guest transmitted though a 10Gb 
card to a bare metal box.
I plan to look further if we can avoid somehow this:

https://lkml.org/lkml/2012/7/20/363

So then this scenario can benefit from grant mapping.

Regards,

Zoli

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
  2013-11-28 17:43         ` Ian Campbell
  2013-12-12 22:08           ` Zoltan Kiss
@ 2013-12-12 22:08           ` Zoltan Kiss
  1 sibling, 0 replies; 21+ messages in thread
From: Zoltan Kiss @ 2013-12-12 22:08 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, jonathan.davies, wei.liu2, linux-kernel, netdev

On 28/11/13 17:43, Ian Campbell wrote:
> On Thu, 2013-11-28 at 17:37 +0000, Zoltan Kiss wrote:
> Routing/firewalling domUs is as valid as bridging. There is nothing in
> the slightest bit suboptimal about it.
>
> If this use case regresses with this approach then I'm afraid that
> either needs to be addressed or a different approach considered.
>
>> Anyway, I will try this out, and see if it really copies everything, and
>> get some numbers as well.
>
> Thanks.

Now I managed to try it out. As I expected, Dom0 does copy the mapped 
page. The peak throuhput I could get was 6.6 Gbps, however it could keep 
that only for short periods, I guess when the unmapping was ideally 
batched. The average was 5.53.
On the same machine the same 10 min iperf session, without my patches 
made the peak 5.9 while the average was 5.65. Do you think it is an 
acceptable regression?
I used 3.12 Dom0 and guest kernel, the guest transmitted though a 10Gb 
card to a bare metal box.
I plan to look further if we can avoid somehow this:

https://lkml.org/lkml/2012/7/20/363

So then this scenario can benefit from grant mapping.

Regards,

Zoli

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
  2013-11-28 17:37       ` Zoltan Kiss
  2013-11-28 17:43         ` Ian Campbell
@ 2013-11-28 17:43         ` Ian Campbell
  2013-12-12 22:08           ` Zoltan Kiss
  2013-12-12 22:08           ` Zoltan Kiss
  1 sibling, 2 replies; 21+ messages in thread
From: Ian Campbell @ 2013-11-28 17:43 UTC (permalink / raw)
  To: Zoltan Kiss; +Cc: wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies

On Thu, 2013-11-28 at 17:37 +0000, Zoltan Kiss wrote:
> On 07/11/13 10:52, Ian Campbell wrote:
> > On Fri, 2013-11-01 at 19:00 +0000, Zoltan Kiss wrote:
> >> On 01/11/13 10:50, Ian Campbell wrote:
> >>> Does this always avoid copying when bridging/openvswitching/forwarding
> >>> (e.g. masquerading etc)? For both domU->domU and domU->physical NIC?
> >> I've tested the domU->domU, domU->physical with bridge and openvswitch
> >> usecase, and now I've created a new stat counter to see how often copy
> >> happens (the callback's second parameter tells you whether the skb was
> >> freed or copied). It doesn't do copy in all of these scenarios.
> >> What do you mean by forwarding? The scenario when you use bridge and
> >> iptables mangling with the packet, not just filtering?
> >
> > I mean using L3 routing rather L2 bridging. Which might involve
> > NAT/MASQUERADE or might just be normal IP routing.
> I still couldn't find time to try out this scenario, but I think in this 
> case packet goes through deliver_skb, which means it will get copied. So 
> performance would be a bit worse due to the extra map/unmap. And I'm 
> afraid we can't help that too much due to this:
> https://lkml.org/lkml/2012/7/20/363
> However I think using Dom0 as a router/firewall is already a suboptimal 
> solution, so maybe a small performance regression is acceptable?

Routing/firewalling domUs is as valid as bridging. There is nothing in
the slightest bit suboptimal about it.

If this use case regresses with this approach then I'm afraid that
either needs to be addressed or a different approach considered.

> Anyway, I will try this out, and see if it really copies everything, and 
> get some numbers as well.

Thanks.

> >>> How does it deal with broadcast traffic?
> Now I had time to check it: broadcast packets get copied only once, when 
> cloning happens. It will swap out the frags with local ones, so any 
> subsequent cloning will have a local SKB.

That's good.

Ian.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
  2013-11-28 17:37       ` Zoltan Kiss
@ 2013-11-28 17:43         ` Ian Campbell
  2013-11-28 17:43         ` Ian Campbell
  1 sibling, 0 replies; 21+ messages in thread
From: Ian Campbell @ 2013-11-28 17:43 UTC (permalink / raw)
  To: Zoltan Kiss; +Cc: xen-devel, jonathan.davies, wei.liu2, linux-kernel, netdev

On Thu, 2013-11-28 at 17:37 +0000, Zoltan Kiss wrote:
> On 07/11/13 10:52, Ian Campbell wrote:
> > On Fri, 2013-11-01 at 19:00 +0000, Zoltan Kiss wrote:
> >> On 01/11/13 10:50, Ian Campbell wrote:
> >>> Does this always avoid copying when bridging/openvswitching/forwarding
> >>> (e.g. masquerading etc)? For both domU->domU and domU->physical NIC?
> >> I've tested the domU->domU, domU->physical with bridge and openvswitch
> >> usecase, and now I've created a new stat counter to see how often copy
> >> happens (the callback's second parameter tells you whether the skb was
> >> freed or copied). It doesn't do copy in all of these scenarios.
> >> What do you mean by forwarding? The scenario when you use bridge and
> >> iptables mangling with the packet, not just filtering?
> >
> > I mean using L3 routing rather L2 bridging. Which might involve
> > NAT/MASQUERADE or might just be normal IP routing.
> I still couldn't find time to try out this scenario, but I think in this 
> case packet goes through deliver_skb, which means it will get copied. So 
> performance would be a bit worse due to the extra map/unmap. And I'm 
> afraid we can't help that too much due to this:
> https://lkml.org/lkml/2012/7/20/363
> However I think using Dom0 as a router/firewall is already a suboptimal 
> solution, so maybe a small performance regression is acceptable?

Routing/firewalling domUs is as valid as bridging. There is nothing in
the slightest bit suboptimal about it.

If this use case regresses with this approach then I'm afraid that
either needs to be addressed or a different approach considered.

> Anyway, I will try this out, and see if it really copies everything, and 
> get some numbers as well.

Thanks.

> >>> How does it deal with broadcast traffic?
> Now I had time to check it: broadcast packets get copied only once, when 
> cloning happens. It will swap out the frags with local ones, so any 
> subsequent cloning will have a local SKB.

That's good.

Ian.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
  2013-11-07 10:52     ` Ian Campbell
  2013-11-28 17:37       ` Zoltan Kiss
@ 2013-11-28 17:37       ` Zoltan Kiss
  2013-11-28 17:43         ` Ian Campbell
  2013-11-28 17:43         ` Ian Campbell
  1 sibling, 2 replies; 21+ messages in thread
From: Zoltan Kiss @ 2013-11-28 17:37 UTC (permalink / raw)
  To: Ian Campbell; +Cc: wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies

On 07/11/13 10:52, Ian Campbell wrote:
> On Fri, 2013-11-01 at 19:00 +0000, Zoltan Kiss wrote:
>> On 01/11/13 10:50, Ian Campbell wrote:
>>> Does this always avoid copying when bridging/openvswitching/forwarding
>>> (e.g. masquerading etc)? For both domU->domU and domU->physical NIC?
>> I've tested the domU->domU, domU->physical with bridge and openvswitch
>> usecase, and now I've created a new stat counter to see how often copy
>> happens (the callback's second parameter tells you whether the skb was
>> freed or copied). It doesn't do copy in all of these scenarios.
>> What do you mean by forwarding? The scenario when you use bridge and
>> iptables mangling with the packet, not just filtering?
>
> I mean using L3 routing rather L2 bridging. Which might involve
> NAT/MASQUERADE or might just be normal IP routing.
I still couldn't find time to try out this scenario, but I think in this 
case packet goes through deliver_skb, which means it will get copied. So 
performance would be a bit worse due to the extra map/unmap. And I'm 
afraid we can't help that too much due to this:
https://lkml.org/lkml/2012/7/20/363
However I think using Dom0 as a router/firewall is already a suboptimal 
solution, so maybe a small performance regression is acceptable?
Anyway, I will try this out, and see if it really copies everything, and 
get some numbers as well.

>>> How does it deal with broadcast traffic?
Now I had time to check it: broadcast packets get copied only once, when 
cloning happens. It will swap out the frags with local ones, so any 
subsequent cloning will have a local SKB.

Zoli

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
  2013-11-07 10:52     ` Ian Campbell
@ 2013-11-28 17:37       ` Zoltan Kiss
  2013-11-28 17:37       ` Zoltan Kiss
  1 sibling, 0 replies; 21+ messages in thread
From: Zoltan Kiss @ 2013-11-28 17:37 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, jonathan.davies, wei.liu2, linux-kernel, netdev

On 07/11/13 10:52, Ian Campbell wrote:
> On Fri, 2013-11-01 at 19:00 +0000, Zoltan Kiss wrote:
>> On 01/11/13 10:50, Ian Campbell wrote:
>>> Does this always avoid copying when bridging/openvswitching/forwarding
>>> (e.g. masquerading etc)? For both domU->domU and domU->physical NIC?
>> I've tested the domU->domU, domU->physical with bridge and openvswitch
>> usecase, and now I've created a new stat counter to see how often copy
>> happens (the callback's second parameter tells you whether the skb was
>> freed or copied). It doesn't do copy in all of these scenarios.
>> What do you mean by forwarding? The scenario when you use bridge and
>> iptables mangling with the packet, not just filtering?
>
> I mean using L3 routing rather L2 bridging. Which might involve
> NAT/MASQUERADE or might just be normal IP routing.
I still couldn't find time to try out this scenario, but I think in this 
case packet goes through deliver_skb, which means it will get copied. So 
performance would be a bit worse due to the extra map/unmap. And I'm 
afraid we can't help that too much due to this:
https://lkml.org/lkml/2012/7/20/363
However I think using Dom0 as a router/firewall is already a suboptimal 
solution, so maybe a small performance regression is acceptable?
Anyway, I will try this out, and see if it really copies everything, and 
get some numbers as well.

>>> How does it deal with broadcast traffic?
Now I had time to check it: broadcast packets get copied only once, when 
cloning happens. It will swap out the frags with local ones, so any 
subsequent cloning will have a local SKB.

Zoli

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
  2013-11-01 19:00   ` Zoltan Kiss
  2013-11-05 17:01     ` Zoltan Kiss
  2013-11-05 17:01     ` Zoltan Kiss
@ 2013-11-07 10:52     ` Ian Campbell
  2013-11-28 17:37       ` Zoltan Kiss
  2013-11-28 17:37       ` Zoltan Kiss
  2013-11-07 10:52     ` Ian Campbell
  3 siblings, 2 replies; 21+ messages in thread
From: Ian Campbell @ 2013-11-07 10:52 UTC (permalink / raw)
  To: Zoltan Kiss; +Cc: wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies

On Fri, 2013-11-01 at 19:00 +0000, Zoltan Kiss wrote:
> On 01/11/13 10:50, Ian Campbell wrote:
> > Does this always avoid copying when bridging/openvswitching/forwarding
> > (e.g. masquerading etc)? For both domU->domU and domU->physical NIC?
> I've tested the domU->domU, domU->physical with bridge and openvswitch 
> usecase, and now I've created a new stat counter to see how often copy 
> happens (the callback's second parameter tells you whether the skb was 
> freed or copied). It doesn't do copy in all of these scenarios.
> What do you mean by forwarding? The scenario when you use bridge and 
> iptables mangling with the packet, not just filtering?

I mean using L3 routing rather L2 bridging. Which might involve
NAT/MASQUERADE or might just be normal IP routing.

> > How does it deal with broadcast traffic?
> Most of the real broadcast traffic actually small packets fit in the 
> PKT_PROT_LEN sized linear space, so it doesn't make any difference, 
> apart from doing a mapping before copy. But that will be eliminated 
> later on, I plan to add an incremental improvement to grant copy the 
> linear part.

OK. If I were a malicious guest and decided to start sending out loads
of huge broadcasts would that lead to a massive spike of activity in
dom0?

> I haven't spent too much time on that, but I couldn't find any broadcast 
> protocol which use large enough packets and easy to test, so I'm open to 
> ideas.

I guess you could hack something up using raw sockets?

> What I already know, skb_clone trigger a copy, and if the caller use the 
> original skb for every cloning, it will do several copy. I think that 
> could be fixed by using the first clone to do any further clones.

Yes. I suppose doing this automatically might be an interesting
extension to SKBTX_DEV_ZEROCOPY?

Ian.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
  2013-11-01 19:00   ` Zoltan Kiss
                       ` (2 preceding siblings ...)
  2013-11-07 10:52     ` Ian Campbell
@ 2013-11-07 10:52     ` Ian Campbell
  3 siblings, 0 replies; 21+ messages in thread
From: Ian Campbell @ 2013-11-07 10:52 UTC (permalink / raw)
  To: Zoltan Kiss; +Cc: xen-devel, jonathan.davies, wei.liu2, linux-kernel, netdev

On Fri, 2013-11-01 at 19:00 +0000, Zoltan Kiss wrote:
> On 01/11/13 10:50, Ian Campbell wrote:
> > Does this always avoid copying when bridging/openvswitching/forwarding
> > (e.g. masquerading etc)? For both domU->domU and domU->physical NIC?
> I've tested the domU->domU, domU->physical with bridge and openvswitch 
> usecase, and now I've created a new stat counter to see how often copy 
> happens (the callback's second parameter tells you whether the skb was 
> freed or copied). It doesn't do copy in all of these scenarios.
> What do you mean by forwarding? The scenario when you use bridge and 
> iptables mangling with the packet, not just filtering?

I mean using L3 routing rather L2 bridging. Which might involve
NAT/MASQUERADE or might just be normal IP routing.

> > How does it deal with broadcast traffic?
> Most of the real broadcast traffic actually small packets fit in the 
> PKT_PROT_LEN sized linear space, so it doesn't make any difference, 
> apart from doing a mapping before copy. But that will be eliminated 
> later on, I plan to add an incremental improvement to grant copy the 
> linear part.

OK. If I were a malicious guest and decided to start sending out loads
of huge broadcasts would that lead to a massive spike of activity in
dom0?

> I haven't spent too much time on that, but I couldn't find any broadcast 
> protocol which use large enough packets and easy to test, so I'm open to 
> ideas.

I guess you could hack something up using raw sockets?

> What I already know, skb_clone trigger a copy, and if the caller use the 
> original skb for every cloning, it will do several copy. I think that 
> could be fixed by using the first clone to do any further clones.

Yes. I suppose doing this automatically might be an interesting
extension to SKBTX_DEV_ZEROCOPY?

Ian.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
  2013-11-01 19:00   ` Zoltan Kiss
  2013-11-05 17:01     ` Zoltan Kiss
@ 2013-11-05 17:01     ` Zoltan Kiss
  2013-11-07 10:52     ` Ian Campbell
  2013-11-07 10:52     ` Ian Campbell
  3 siblings, 0 replies; 21+ messages in thread
From: Zoltan Kiss @ 2013-11-05 17:01 UTC (permalink / raw)
  To: Ian Campbell; +Cc: wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies

On 01/11/13 19:00, Zoltan Kiss wrote:
>  >> Based on my investigations the packet get only copied if it is
> delivered to
>  >>Dom0 stack, which is due to this patch:
>  >>https://lkml.org/lkml/2012/7/20/363
>  >>That's a bit unfortunate, but as far as I know for the huge majority
> this use
>  >>case is not too important.
>> Likely to be true, but it would still be interesting to know how badly
>> this use case suffers with this change, and any increase in CPU usage
>> would be interesting to know about as well.
> I can't find my numbers, but as far as I remember it wasn't
> significantly worse than grant copy. I will check that again.
I've measured it now: with my patch it was 5.2 Gbps, without it 5.4. 
Both cases iperf in Dom0 maxed out its CPU, mostly in soft interrupt 
context, based on top.

Zoli


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
  2013-11-01 19:00   ` Zoltan Kiss
@ 2013-11-05 17:01     ` Zoltan Kiss
  2013-11-05 17:01     ` Zoltan Kiss
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 21+ messages in thread
From: Zoltan Kiss @ 2013-11-05 17:01 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, jonathan.davies, wei.liu2, linux-kernel, netdev

On 01/11/13 19:00, Zoltan Kiss wrote:
>  >> Based on my investigations the packet get only copied if it is
> delivered to
>  >>Dom0 stack, which is due to this patch:
>  >>https://lkml.org/lkml/2012/7/20/363
>  >>That's a bit unfortunate, but as far as I know for the huge majority
> this use
>  >>case is not too important.
>> Likely to be true, but it would still be interesting to know how badly
>> this use case suffers with this change, and any increase in CPU usage
>> would be interesting to know about as well.
> I can't find my numbers, but as far as I remember it wasn't
> significantly worse than grant copy. I will check that again.
I've measured it now: with my patch it was 5.2 Gbps, without it 5.4. 
Both cases iperf in Dom0 maxed out its CPU, mostly in soft interrupt 
context, based on top.

Zoli

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
  2013-11-01 10:50 ` Ian Campbell
  2013-11-01 19:00   ` Zoltan Kiss
@ 2013-11-01 19:00   ` Zoltan Kiss
  2013-11-05 17:01     ` Zoltan Kiss
                       ` (3 more replies)
  1 sibling, 4 replies; 21+ messages in thread
From: Zoltan Kiss @ 2013-11-01 19:00 UTC (permalink / raw)
  To: Ian Campbell; +Cc: wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies

On 01/11/13 10:50, Ian Campbell wrote:
> Does this always avoid copying when bridging/openvswitching/forwarding
> (e.g. masquerading etc)? For both domU->domU and domU->physical NIC?
I've tested the domU->domU, domU->physical with bridge and openvswitch 
usecase, and now I've created a new stat counter to see how often copy 
happens (the callback's second parameter tells you whether the skb was 
freed or copied). It doesn't do copy in all of these scenarios.
What do you mean by forwarding? The scenario when you use bridge and 
iptables mangling with the packet, not just filtering?

> How does it deal with broadcast traffic?
Most of the real broadcast traffic actually small packets fit in the 
PKT_PROT_LEN sized linear space, so it doesn't make any difference, 
apart from doing a mapping before copy. But that will be eliminated 
later on, I plan to add an incremental improvement to grant copy the 
linear part.
I haven't spent too much time on that, but I couldn't find any broadcast 
protocol which use large enough packets and easy to test, so I'm open to 
ideas.
What I already know, skb_clone trigger a copy, and if the caller use the 
original skb for every cloning, it will do several copy. I think that 
could be fixed by using the first clone to do any further clones.

> Do you have any numbers for the dom0 cpu usage impact?
DomU->NIC: the vif took 40% according to top, I guess the bottleneck 
there is the TLB flushing.
DomU->DomU: the vif of the RX side cause the bottleneck due to grant 
copy to the guest

> Aggregate throughput for many guests would be a useful datapoint too.
I will do measurements about that.

 >> Based on my investigations the packet get only copied if it is 
delivered to
 >>Dom0 stack, which is due to this patch:
 >>https://lkml.org/lkml/2012/7/20/363
 >>That's a bit unfortunate, but as far as I know for the huge majority 
this use
 >>case is not too important.
> Likely to be true, but it would still be interesting to know how badly
> this use case suffers with this change, and any increase in CPU usage
> would be interesting to know about as well.
I can't find my numbers, but as far as I remember it wasn't 
significantly worse than grant copy. I will check that again.

Zoli

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
  2013-11-01 10:50 ` Ian Campbell
@ 2013-11-01 19:00   ` Zoltan Kiss
  2013-11-01 19:00   ` Zoltan Kiss
  1 sibling, 0 replies; 21+ messages in thread
From: Zoltan Kiss @ 2013-11-01 19:00 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, jonathan.davies, wei.liu2, linux-kernel, netdev

On 01/11/13 10:50, Ian Campbell wrote:
> Does this always avoid copying when bridging/openvswitching/forwarding
> (e.g. masquerading etc)? For both domU->domU and domU->physical NIC?
I've tested the domU->domU, domU->physical with bridge and openvswitch 
usecase, and now I've created a new stat counter to see how often copy 
happens (the callback's second parameter tells you whether the skb was 
freed or copied). It doesn't do copy in all of these scenarios.
What do you mean by forwarding? The scenario when you use bridge and 
iptables mangling with the packet, not just filtering?

> How does it deal with broadcast traffic?
Most of the real broadcast traffic actually small packets fit in the 
PKT_PROT_LEN sized linear space, so it doesn't make any difference, 
apart from doing a mapping before copy. But that will be eliminated 
later on, I plan to add an incremental improvement to grant copy the 
linear part.
I haven't spent too much time on that, but I couldn't find any broadcast 
protocol which use large enough packets and easy to test, so I'm open to 
ideas.
What I already know, skb_clone trigger a copy, and if the caller use the 
original skb for every cloning, it will do several copy. I think that 
could be fixed by using the first clone to do any further clones.

> Do you have any numbers for the dom0 cpu usage impact?
DomU->NIC: the vif took 40% according to top, I guess the bottleneck 
there is the TLB flushing.
DomU->DomU: the vif of the RX side cause the bottleneck due to grant 
copy to the guest

> Aggregate throughput for many guests would be a useful datapoint too.
I will do measurements about that.

 >> Based on my investigations the packet get only copied if it is 
delivered to
 >>Dom0 stack, which is due to this patch:
 >>https://lkml.org/lkml/2012/7/20/363
 >>That's a bit unfortunate, but as far as I know for the huge majority 
this use
 >>case is not too important.
> Likely to be true, but it would still be interesting to know how badly
> this use case suffers with this change, and any increase in CPU usage
> would be interesting to know about as well.
I can't find my numbers, but as far as I remember it wasn't 
significantly worse than grant copy. I will check that again.

Zoli

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
  2013-10-30  0:50 Zoltan Kiss
  2013-10-30 19:16 ` Konrad Rzeszutek Wilk
  2013-10-30 19:16 ` [Xen-devel] " Konrad Rzeszutek Wilk
@ 2013-11-01 10:50 ` Ian Campbell
  2013-11-01 19:00   ` Zoltan Kiss
  2013-11-01 19:00   ` Zoltan Kiss
  2013-11-01 10:50 ` Ian Campbell
  3 siblings, 2 replies; 21+ messages in thread
From: Ian Campbell @ 2013-11-01 10:50 UTC (permalink / raw)
  To: Zoltan Kiss; +Cc: wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies

On Wed, 2013-10-30 at 00:50 +0000, Zoltan Kiss wrote:
> This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
> know when the skb is freed up.

Does this always avoid copying when bridging/openvswitching/forwarding
(e.g. masquerading etc)? For both domU->domU and domU->physical NIC?

How does it deal with broadcast traffic?

>  That is the way KVM solved the same problem,
> and based on my initial tests it can do the same for us. Avoiding the extra
> copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower
> Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node,
> running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb
> switch)

Do you have any numbers for the dom0 cpu usage impact?

Aggregate throughput for many guests would be a useful datapoint too.

> Based on my investigations the packet get only copied if it is delivered to
> Dom0 stack, which is due to this patch:
> https://lkml.org/lkml/2012/7/20/363
> That's a bit unfortunate, but as far as I know for the huge majority this use
> case is not too important.

Likely to be true, but it would still be interesting to know how badly
this use case suffers with this change, and any increase in CPU usage
would be interesting to know about as well.

>  There are a couple of things which need more
> polishing, see the FIXME comments. I will run some more extensive tests, but
> in the meantime I would like to hear comments about what I've done so far.
> I've tried to broke it down to smaller patches, with mixed results, so I
> welcome suggestions on that part as well:
> 1/5: Introduce TX grant map definitions
> 2/5: Change TX path from grant copy to mapping
> 3/5: Remove old TX grant copy definitons
> 4/5: Fix indentations
> 5/5: Change RX path for mapped SKB fragments
> 
> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
> 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
  2013-10-30  0:50 Zoltan Kiss
                   ` (2 preceding siblings ...)
  2013-11-01 10:50 ` Ian Campbell
@ 2013-11-01 10:50 ` Ian Campbell
  3 siblings, 0 replies; 21+ messages in thread
From: Ian Campbell @ 2013-11-01 10:50 UTC (permalink / raw)
  To: Zoltan Kiss; +Cc: xen-devel, jonathan.davies, wei.liu2, linux-kernel, netdev

On Wed, 2013-10-30 at 00:50 +0000, Zoltan Kiss wrote:
> This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
> know when the skb is freed up.

Does this always avoid copying when bridging/openvswitching/forwarding
(e.g. masquerading etc)? For both domU->domU and domU->physical NIC?

How does it deal with broadcast traffic?

>  That is the way KVM solved the same problem,
> and based on my initial tests it can do the same for us. Avoiding the extra
> copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower
> Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node,
> running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb
> switch)

Do you have any numbers for the dom0 cpu usage impact?

Aggregate throughput for many guests would be a useful datapoint too.

> Based on my investigations the packet get only copied if it is delivered to
> Dom0 stack, which is due to this patch:
> https://lkml.org/lkml/2012/7/20/363
> That's a bit unfortunate, but as far as I know for the huge majority this use
> case is not too important.

Likely to be true, but it would still be interesting to know how badly
this use case suffers with this change, and any increase in CPU usage
would be interesting to know about as well.

>  There are a couple of things which need more
> polishing, see the FIXME comments. I will run some more extensive tests, but
> in the meantime I would like to hear comments about what I've done so far.
> I've tried to broke it down to smaller patches, with mixed results, so I
> welcome suggestions on that part as well:
> 1/5: Introduce TX grant map definitions
> 2/5: Change TX path from grant copy to mapping
> 3/5: Remove old TX grant copy definitons
> 4/5: Fix indentations
> 5/5: Change RX path for mapped SKB fragments
> 
> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
  2013-10-30 19:17   ` [Xen-devel] " Konrad Rzeszutek Wilk
@ 2013-10-30 21:14     ` Zoltan Kiss
  0 siblings, 0 replies; 21+ messages in thread
From: Zoltan Kiss @ 2013-10-30 21:14 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: jonathan.davies, wei.liu2, ian.campbell, netdev, linux-kernel, xen-devel

On 30/10/13 19:17, Konrad Rzeszutek Wilk wrote:
> On Wed, Oct 30, 2013 at 03:16:17PM -0400, Konrad Rzeszutek Wilk wrote:
>> Odd. I don't see #5 patch patch?
>
> Ah, you have two #4 patches:
>
> [PATCH net-next RFC 4/5] xen-netback: Change RX path for mapped SKB fragments
> [PATCH net-next RFC 4/5] xen-netback: Fix indentations

Yep, sorry, I will fix it up in the next version!

Zoli

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
  2013-10-30 19:16 ` [Xen-devel] " Konrad Rzeszutek Wilk
@ 2013-10-30 19:17   ` Konrad Rzeszutek Wilk
  2013-10-30 19:17   ` [Xen-devel] " Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 21+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-10-30 19:17 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: jonathan.davies, wei.liu2, ian.campbell, netdev, linux-kernel, xen-devel

On Wed, Oct 30, 2013 at 03:16:17PM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Oct 30, 2013 at 12:50:15AM +0000, Zoltan Kiss wrote:
> > A long known problem of the upstream netback implementation that on the TX
> > path (from guest to Dom0) it copies the whole packet from guest memory into
> > Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a
> > huge perfomance penalty. The classic kernel version of netback used grant
> > mapping, and to get notified when the page can be unmapped, it used page
> > destructors. Unfortunately that destructor is not an upstreamable solution.
> > Ian Campbell's skb fragment destructor patch series
> > (http://lwn.net/Articles/491522/) tried to solve this problem, however it
> > seems to be very invasive on the network stack's code, and therefore haven't
> > progressed very well.
> > This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
> > know when the skb is freed up. That is the way KVM solved the same problem,
> > and based on my initial tests it can do the same for us. Avoiding the extra
> > copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower
> > Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node,
> > running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb
> > switch)
> > Based on my investigations the packet get only copied if it is delivered to
> > Dom0 stack, which is due to this patch:
> > https://lkml.org/lkml/2012/7/20/363
> > That's a bit unfortunate, but as far as I know for the huge majority this use
> > case is not too important. There are a couple of things which need more
> > polishing, see the FIXME comments. I will run some more extensive tests, but
> > in the meantime I would like to hear comments about what I've done so far.
> > I've tried to broke it down to smaller patches, with mixed results, so I
> > welcome suggestions on that part as well:
> > 1/5: Introduce TX grant map definitions
> > 2/5: Change TX path from grant copy to mapping
> > 3/5: Remove old TX grant copy definitons
> > 4/5: Fix indentations
> > 5/5: Change RX path for mapped SKB fragments
> 
> Odd. I don't see #5 patch patch?

Ah, you have two #4 patches:

[PATCH net-next RFC 4/5] xen-netback: Change RX path for mapped SKB fragments
[PATCH net-next RFC 4/5] xen-netback: Fix indentations

!
> > 
> > Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
> > 
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
  2013-10-30  0:50 Zoltan Kiss
@ 2013-10-30 19:16 ` Konrad Rzeszutek Wilk
  2013-10-30 19:16 ` [Xen-devel] " Konrad Rzeszutek Wilk
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 21+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-10-30 19:16 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: jonathan.davies, wei.liu2, ian.campbell, netdev, linux-kernel, xen-devel

On Wed, Oct 30, 2013 at 12:50:15AM +0000, Zoltan Kiss wrote:
> A long known problem of the upstream netback implementation that on the TX
> path (from guest to Dom0) it copies the whole packet from guest memory into
> Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a
> huge perfomance penalty. The classic kernel version of netback used grant
> mapping, and to get notified when the page can be unmapped, it used page
> destructors. Unfortunately that destructor is not an upstreamable solution.
> Ian Campbell's skb fragment destructor patch series
> (http://lwn.net/Articles/491522/) tried to solve this problem, however it
> seems to be very invasive on the network stack's code, and therefore haven't
> progressed very well.
> This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
> know when the skb is freed up. That is the way KVM solved the same problem,
> and based on my initial tests it can do the same for us. Avoiding the extra
> copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower
> Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node,
> running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb
> switch)
> Based on my investigations the packet get only copied if it is delivered to
> Dom0 stack, which is due to this patch:
> https://lkml.org/lkml/2012/7/20/363
> That's a bit unfortunate, but as far as I know for the huge majority this use
> case is not too important. There are a couple of things which need more
> polishing, see the FIXME comments. I will run some more extensive tests, but
> in the meantime I would like to hear comments about what I've done so far.
> I've tried to broke it down to smaller patches, with mixed results, so I
> welcome suggestions on that part as well:
> 1/5: Introduce TX grant map definitions
> 2/5: Change TX path from grant copy to mapping
> 3/5: Remove old TX grant copy definitons
> 4/5: Fix indentations
> 5/5: Change RX path for mapped SKB fragments

Odd. I don't see #5 patch patch?
> 
> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
@ 2013-10-30  0:50 Zoltan Kiss
  2013-10-30 19:16 ` Konrad Rzeszutek Wilk
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Zoltan Kiss @ 2013-10-30  0:50 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

A long known problem of the upstream netback implementation that on the TX
path (from guest to Dom0) it copies the whole packet from guest memory into
Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a
huge perfomance penalty. The classic kernel version of netback used grant
mapping, and to get notified when the page can be unmapped, it used page
destructors. Unfortunately that destructor is not an upstreamable solution.
Ian Campbell's skb fragment destructor patch series
(http://lwn.net/Articles/491522/) tried to solve this problem, however it
seems to be very invasive on the network stack's code, and therefore haven't
progressed very well.
This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
know when the skb is freed up. That is the way KVM solved the same problem,
and based on my initial tests it can do the same for us. Avoiding the extra
copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower
Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node,
running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb
switch)
Based on my investigations the packet get only copied if it is delivered to
Dom0 stack, which is due to this patch:
https://lkml.org/lkml/2012/7/20/363
That's a bit unfortunate, but as far as I know for the huge majority this use
case is not too important. There are a couple of things which need more
polishing, see the FIXME comments. I will run some more extensive tests, but
in the meantime I would like to hear comments about what I've done so far.
I've tried to broke it down to smaller patches, with mixed results, so I
welcome suggestions on that part as well:
1/5: Introduce TX grant map definitions
2/5: Change TX path from grant copy to mapping
3/5: Remove old TX grant copy definitons
4/5: Fix indentations
5/5: Change RX path for mapped SKB fragments

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2013-12-16 10:14 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-30  0:50 [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy Zoltan Kiss
2013-10-30  0:50 Zoltan Kiss
2013-10-30 19:16 ` Konrad Rzeszutek Wilk
2013-10-30 19:16 ` [Xen-devel] " Konrad Rzeszutek Wilk
2013-10-30 19:17   ` Konrad Rzeszutek Wilk
2013-10-30 19:17   ` [Xen-devel] " Konrad Rzeszutek Wilk
2013-10-30 21:14     ` Zoltan Kiss
2013-11-01 10:50 ` Ian Campbell
2013-11-01 19:00   ` Zoltan Kiss
2013-11-01 19:00   ` Zoltan Kiss
2013-11-05 17:01     ` Zoltan Kiss
2013-11-05 17:01     ` Zoltan Kiss
2013-11-07 10:52     ` Ian Campbell
2013-11-28 17:37       ` Zoltan Kiss
2013-11-28 17:37       ` Zoltan Kiss
2013-11-28 17:43         ` Ian Campbell
2013-11-28 17:43         ` Ian Campbell
2013-12-12 22:08           ` Zoltan Kiss
2013-12-16 10:14             ` Ian Campbell
2013-12-16 10:14             ` Ian Campbell
2013-12-12 22:08           ` Zoltan Kiss
2013-11-07 10:52     ` Ian Campbell
2013-11-01 10:50 ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.