All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
@ 2013-12-12 23:48 Zoltan Kiss
  0 siblings, 0 replies; 12+ messages in thread
From: Zoltan Kiss @ 2013-12-12 23:48 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

A long known problem of the upstream netback implementation that on the TX
path (from guest to Dom0) it copies the whole packet from guest memory into
Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a
huge perfomance penalty. The classic kernel version of netback used grant
mapping, and to get notified when the page can be unmapped, it used page
destructors. Unfortunately that destructor is not an upstreamable solution.
Ian Campbell's skb fragment destructor patch series [1] tried to solve this
problem, however it seems to be very invasive on the network stack's code,
and therefore haven't progressed very well.
This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
know when the skb is freed up. That is the way KVM solved the same problem,
and based on my initial tests it can do the same for us. Avoiding the extra
copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower
Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node,
running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb
switch)
Based on my investigations the packet get only copied if it is delivered to
Dom0 stack, which is due to this [2] patch. That's a bit unfortunate, but
luckily it doesn't cause a major regression for this usecase. In the future
we should try to eliminate that copy somehow.
There are a few spinoff tasks which will be addressed in separate patches:
- grant copy the header directly instead of map and memcpy. This should help
  us avoiding TLB flushing
- use something else than ballooned pages
- fix grant map to use page->index properly
I will run some more extensive tests, but some basic XenRT tests were already
passed with good results.
I've tried to broke it down to smaller patches, with mixed results, so I
welcome suggestions on that part as well:
1: Introduce TX grant map definitions
2: Change TX path from grant copy to mapping
3: Remove old TX grant copy definitons and fix indentations
4: Change RX path for mapped SKB fragments
5: Add stat counters for zerocopy
6: Handle guests with too many frags
7: Add stat counters for frag_list skbs
8: Timeout packets in RX path
9: Aggregate TX unmap operations

v2: I've fixed some smaller things, see the individual patches. I've added a
few new stat counters, and handling the important use case when an older guest
sends lots of slots. Instead of delayed copy now we timeout packets on the RX
path, based on the assumption that otherwise packets should get stucked
anywhere else. Finally some unmap batching to avoid too much TLB flush

[1] http://lwn.net/Articles/491522/
[2] https://lkml.org/lkml/2012/7/20/363

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
  2014-01-08 14:43 ` Wei Liu
  2014-01-08 14:44   ` Zoltan Kiss
@ 2014-01-08 14:44   ` Zoltan Kiss
  1 sibling, 0 replies; 12+ messages in thread
From: Zoltan Kiss @ 2014-01-08 14:44 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, xen-devel, netdev, linux-kernel, jonathan.davies

On 08/01/14 14:43, Wei Liu wrote:
> You once mentioned that you have a trick to avoid touching TLB, is it in
> this series?
>
> (Haven't really looked at this series as I'm in today. Will have a
> closer look tonight. I'm just curious now.)
>
> Wei.
>
No, I'm currently working on that, it will be a separate series, as it 
also needs some Xen modifications which haven't reached upstream yet AFAIK.

Zoli

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
  2014-01-08 14:43 ` Wei Liu
@ 2014-01-08 14:44   ` Zoltan Kiss
  2014-01-08 14:44   ` Zoltan Kiss
  1 sibling, 0 replies; 12+ messages in thread
From: Zoltan Kiss @ 2014-01-08 14:44 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, jonathan.davies, ian.campbell, linux-kernel, netdev

On 08/01/14 14:43, Wei Liu wrote:
> You once mentioned that you have a trick to avoid touching TLB, is it in
> this series?
>
> (Haven't really looked at this series as I'm in today. Will have a
> closer look tonight. I'm just curious now.)
>
> Wei.
>
No, I'm currently working on that, it will be a separate series, as it 
also needs some Xen modifications which haven't reached upstream yet AFAIK.

Zoli

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
  2014-01-08  0:10 Zoltan Kiss
                   ` (2 preceding siblings ...)
  2014-01-08 14:43 ` Wei Liu
@ 2014-01-08 14:43 ` Wei Liu
  2014-01-08 14:44   ` Zoltan Kiss
  2014-01-08 14:44   ` Zoltan Kiss
  3 siblings, 2 replies; 12+ messages in thread
From: Wei Liu @ 2014-01-08 14:43 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies

You once mentioned that you have a trick to avoid touching TLB, is it in
this series?

(Haven't really looked at this series as I'm in today. Will have a
closer look tonight. I'm just curious now.)

Wei.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
  2014-01-08  0:10 Zoltan Kiss
  2014-01-08  0:16 ` Zoltan Kiss
  2014-01-08  0:16 ` Zoltan Kiss
@ 2014-01-08 14:43 ` Wei Liu
  2014-01-08 14:43 ` Wei Liu
  3 siblings, 0 replies; 12+ messages in thread
From: Wei Liu @ 2014-01-08 14:43 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: jonathan.davies, wei.liu2, ian.campbell, netdev, linux-kernel, xen-devel

You once mentioned that you have a trick to avoid touching TLB, is it in
this series?

(Haven't really looked at this series as I'm in today. Will have a
closer look tonight. I'm just curious now.)

Wei.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
  2014-01-08  0:10 Zoltan Kiss
@ 2014-01-08  0:16 ` Zoltan Kiss
  2014-01-08  0:16 ` Zoltan Kiss
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Zoltan Kiss @ 2014-01-08  0:16 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

Sorry, the version number in the subject should be v3

Zoli


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
  2014-01-08  0:10 Zoltan Kiss
  2014-01-08  0:16 ` Zoltan Kiss
@ 2014-01-08  0:16 ` Zoltan Kiss
  2014-01-08 14:43 ` Wei Liu
  2014-01-08 14:43 ` Wei Liu
  3 siblings, 0 replies; 12+ messages in thread
From: Zoltan Kiss @ 2014-01-08  0:16 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

Sorry, the version number in the subject should be v3

Zoli

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
@ 2014-01-08  0:10 Zoltan Kiss
  2014-01-08  0:16 ` Zoltan Kiss
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Zoltan Kiss @ 2014-01-08  0:10 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

A long known problem of the upstream netback implementation that on the TX
path (from guest to Dom0) it copies the whole packet from guest memory into
Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a
huge perfomance penalty. The classic kernel version of netback used grant
mapping, and to get notified when the page can be unmapped, it used page
destructors. Unfortunately that destructor is not an upstreamable solution.
Ian Campbell's skb fragment destructor patch series [1] tried to solve this
problem, however it seems to be very invasive on the network stack's code,
and therefore haven't progressed very well.
This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
know when the skb is freed up. That is the way KVM solved the same problem,
and based on my initial tests it can do the same for us. Avoiding the extra
copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower
Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node,
running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb
switch)
Based on my investigations the packet get only copied if it is delivered to
Dom0 stack, which is due to this [2] patch. That's a bit unfortunate, but
luckily it doesn't cause a major regression for this usecase. In the future
we should try to eliminate that copy somehow.
There are a few spinoff tasks which will be addressed in separate patches:
- grant copy the header directly instead of map and memcpy. This should help
  us avoiding TLB flushing
- use something else than ballooned pages
- fix grant map to use page->index properly
I will run some more extensive tests, but some basic XenRT tests were already
passed with good results.
I've tried to broke it down to smaller patches, with mixed results, so I
welcome suggestions on that part as well:
1: Introduce TX grant map definitions
2: Change TX path from grant copy to mapping
3: Remove old TX grant copy definitons and fix indentations
4: Change RX path for mapped SKB fragments
5: Add stat counters for zerocopy
6: Handle guests with too many frags
7: Add stat counters for frag_list skbs
8: Timeout packets in RX path
9: Aggregate TX unmap operations

v2: I've fixed some smaller things, see the individual patches. I've added a
few new stat counters, and handling the important use case when an older guest
sends lots of slots. Instead of delayed copy now we timeout packets on the RX
path, based on the assumption that otherwise packets should get stucked
anywhere else. Finally some unmap batching to avoid too much TLB flush

v3: Apart from fixing a few things mentioned in responses the important change
is the use the hypercall directly for grant [un]mapping, therefore we can
avoid m2p override.

[1] http://lwn.net/Articles/491522/
[2] https://lkml.org/lkml/2012/7/20/363

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
@ 2014-01-08  0:10 Zoltan Kiss
  0 siblings, 0 replies; 12+ messages in thread
From: Zoltan Kiss @ 2014-01-08  0:10 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

A long known problem of the upstream netback implementation that on the TX
path (from guest to Dom0) it copies the whole packet from guest memory into
Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a
huge perfomance penalty. The classic kernel version of netback used grant
mapping, and to get notified when the page can be unmapped, it used page
destructors. Unfortunately that destructor is not an upstreamable solution.
Ian Campbell's skb fragment destructor patch series [1] tried to solve this
problem, however it seems to be very invasive on the network stack's code,
and therefore haven't progressed very well.
This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
know when the skb is freed up. That is the way KVM solved the same problem,
and based on my initial tests it can do the same for us. Avoiding the extra
copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower
Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node,
running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb
switch)
Based on my investigations the packet get only copied if it is delivered to
Dom0 stack, which is due to this [2] patch. That's a bit unfortunate, but
luckily it doesn't cause a major regression for this usecase. In the future
we should try to eliminate that copy somehow.
There are a few spinoff tasks which will be addressed in separate patches:
- grant copy the header directly instead of map and memcpy. This should help
  us avoiding TLB flushing
- use something else than ballooned pages
- fix grant map to use page->index properly
I will run some more extensive tests, but some basic XenRT tests were already
passed with good results.
I've tried to broke it down to smaller patches, with mixed results, so I
welcome suggestions on that part as well:
1: Introduce TX grant map definitions
2: Change TX path from grant copy to mapping
3: Remove old TX grant copy definitons and fix indentations
4: Change RX path for mapped SKB fragments
5: Add stat counters for zerocopy
6: Handle guests with too many frags
7: Add stat counters for frag_list skbs
8: Timeout packets in RX path
9: Aggregate TX unmap operations

v2: I've fixed some smaller things, see the individual patches. I've added a
few new stat counters, and handling the important use case when an older guest
sends lots of slots. Instead of delayed copy now we timeout packets on the RX
path, based on the assumption that otherwise packets should get stucked
anywhere else. Finally some unmap batching to avoid too much TLB flush

v3: Apart from fixing a few things mentioned in responses the important change
is the use the hypercall directly for grant [un]mapping, therefore we can
avoid m2p override.

[1] http://lwn.net/Articles/491522/
[2] https://lkml.org/lkml/2012/7/20/363

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
  2013-12-16  6:32 ` [Xen-devel] " annie li
@ 2013-12-16 16:13   ` Zoltan Kiss
  0 siblings, 0 replies; 12+ messages in thread
From: Zoltan Kiss @ 2013-12-16 16:13 UTC (permalink / raw)
  To: annie li
  Cc: jonathan.davies, wei.liu2, ian.campbell, netdev, linux-kernel, xen-devel

On 16/12/13 06:32, annie li wrote:
>
> On 2013/12/13 7:48, Zoltan Kiss wrote:
>> A long known problem of the upstream netback implementation that on
>> the TX
>> path (from guest to Dom0) it copies the whole packet from guest memory
>> into
>> Dom0. That simply became a bottleneck with 10Gb NICs, and generally
>> it's a
>> huge perfomance penalty. The classic kernel version of netback used grant
>> mapping, and to get notified when the page can be unmapped, it used page
>> destructors. Unfortunately that destructor is not an upstreamable
>> solution.
>> Ian Campbell's skb fragment destructor patch series [1] tried to solve
>> this
>> problem, however it seems to be very invasive on the network stack's
>> code,
>> and therefore haven't progressed very well.
>> This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it
>> needs to
>> know when the skb is freed up. That is the way KVM solved the same
>> problem,
>> and based on my initial tests it can do the same for us. Avoiding the
>> extra
>> copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower
>> Interlagos box, both Dom0 and guest on upstream kernel, on the same
>> NUMA node,
>> running iperf 2.0.5, and the remote end was a bare metal box on the
>> same 10Gb
>> switch)
> Sounds good.
> Is the TX throughput gotten between one vm and one bare metal box? or
> between multiple vms and bare metal? Do you have any test results with
> netperf?
One VM and a bare metal box. I've used only iperf.

Regards,

Zoli

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
  2013-12-12 23:48 Zoltan Kiss
  2013-12-16  6:32 ` [Xen-devel] " annie li
@ 2013-12-16  6:32 ` annie li
  1 sibling, 0 replies; 12+ messages in thread
From: annie li @ 2013-12-16  6:32 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: jonathan.davies, wei.liu2, ian.campbell, netdev, linux-kernel, xen-devel


On 2013/12/13 7:48, Zoltan Kiss wrote:
> A long known problem of the upstream netback implementation that on the TX
> path (from guest to Dom0) it copies the whole packet from guest memory into
> Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a
> huge perfomance penalty. The classic kernel version of netback used grant
> mapping, and to get notified when the page can be unmapped, it used page
> destructors. Unfortunately that destructor is not an upstreamable solution.
> Ian Campbell's skb fragment destructor patch series [1] tried to solve this
> problem, however it seems to be very invasive on the network stack's code,
> and therefore haven't progressed very well.
> This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
> know when the skb is freed up. That is the way KVM solved the same problem,
> and based on my initial tests it can do the same for us. Avoiding the extra
> copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower
> Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node,
> running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb
> switch)
Sounds good.
Is the TX throughput gotten between one vm and one bare metal box? or 
between multiple vms and bare metal? Do you have any test results with 
netperf?

Thanks
Annie
> Based on my investigations the packet get only copied if it is delivered to
> Dom0 stack, which is due to this [2] patch. That's a bit unfortunate, but
> luckily it doesn't cause a major regression for this usecase. In the future
> we should try to eliminate that copy somehow.
> There are a few spinoff tasks which will be addressed in separate patches:
> - grant copy the header directly instead of map and memcpy. This should help
>    us avoiding TLB flushing
> - use something else than ballooned pages
> - fix grant map to use page->index properly
> I will run some more extensive tests, but some basic XenRT tests were already
> passed with good results.
> I've tried to broke it down to smaller patches, with mixed results, so I
> welcome suggestions on that part as well:
> 1: Introduce TX grant map definitions
> 2: Change TX path from grant copy to mapping
> 3: Remove old TX grant copy definitons and fix indentations
> 4: Change RX path for mapped SKB fragments
> 5: Add stat counters for zerocopy
> 6: Handle guests with too many frags
> 7: Add stat counters for frag_list skbs
> 8: Timeout packets in RX path
> 9: Aggregate TX unmap operations
>
> v2: I've fixed some smaller things, see the individual patches. I've added a
> few new stat counters, and handling the important use case when an older guest
> sends lots of slots. Instead of delayed copy now we timeout packets on the RX
> path, based on the assumption that otherwise packets should get stucked
> anywhere else. Finally some unmap batching to avoid too much TLB flush
>
> [1] http://lwn.net/Articles/491522/
> [2] https://lkml.org/lkml/2012/7/20/363
>
> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
@ 2013-12-12 23:48 Zoltan Kiss
  2013-12-16  6:32 ` [Xen-devel] " annie li
  2013-12-16  6:32 ` annie li
  0 siblings, 2 replies; 12+ messages in thread
From: Zoltan Kiss @ 2013-12-12 23:48 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

A long known problem of the upstream netback implementation that on the TX
path (from guest to Dom0) it copies the whole packet from guest memory into
Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a
huge perfomance penalty. The classic kernel version of netback used grant
mapping, and to get notified when the page can be unmapped, it used page
destructors. Unfortunately that destructor is not an upstreamable solution.
Ian Campbell's skb fragment destructor patch series [1] tried to solve this
problem, however it seems to be very invasive on the network stack's code,
and therefore haven't progressed very well.
This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
know when the skb is freed up. That is the way KVM solved the same problem,
and based on my initial tests it can do the same for us. Avoiding the extra
copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower
Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node,
running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb
switch)
Based on my investigations the packet get only copied if it is delivered to
Dom0 stack, which is due to this [2] patch. That's a bit unfortunate, but
luckily it doesn't cause a major regression for this usecase. In the future
we should try to eliminate that copy somehow.
There are a few spinoff tasks which will be addressed in separate patches:
- grant copy the header directly instead of map and memcpy. This should help
  us avoiding TLB flushing
- use something else than ballooned pages
- fix grant map to use page->index properly
I will run some more extensive tests, but some basic XenRT tests were already
passed with good results.
I've tried to broke it down to smaller patches, with mixed results, so I
welcome suggestions on that part as well:
1: Introduce TX grant map definitions
2: Change TX path from grant copy to mapping
3: Remove old TX grant copy definitons and fix indentations
4: Change RX path for mapped SKB fragments
5: Add stat counters for zerocopy
6: Handle guests with too many frags
7: Add stat counters for frag_list skbs
8: Timeout packets in RX path
9: Aggregate TX unmap operations

v2: I've fixed some smaller things, see the individual patches. I've added a
few new stat counters, and handling the important use case when an older guest
sends lots of slots. Instead of delayed copy now we timeout packets on the RX
path, based on the assumption that otherwise packets should get stucked
anywhere else. Finally some unmap batching to avoid too much TLB flush

[1] http://lwn.net/Articles/491522/
[2] https://lkml.org/lkml/2012/7/20/363

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-01-08 14:45 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-12 23:48 [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
  -- strict thread matches above, loose matches on Subject: below --
2014-01-08  0:10 Zoltan Kiss
2014-01-08  0:16 ` Zoltan Kiss
2014-01-08  0:16 ` Zoltan Kiss
2014-01-08 14:43 ` Wei Liu
2014-01-08 14:43 ` Wei Liu
2014-01-08 14:44   ` Zoltan Kiss
2014-01-08 14:44   ` Zoltan Kiss
2014-01-08  0:10 Zoltan Kiss
2013-12-12 23:48 Zoltan Kiss
2013-12-16  6:32 ` [Xen-devel] " annie li
2013-12-16 16:13   ` Zoltan Kiss
2013-12-16  6:32 ` annie li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.