All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/3] v3 GRE: TCP segmentation offload
@ 2013-02-14 19:44 Pravin B Shelar
  2013-02-15 20:18 ` David Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Pravin B Shelar @ 2013-02-14 19:44 UTC (permalink / raw)
  To: netdev; +Cc: edumazet, jesse, bhutchings, mirqus, Pravin B Shelar

Following patches add TCP segmentation offload to GRE. These 
patches shows 20-25% performance improvement in netperf single
process TCP_STREAM test on 10G network.

Pravin B Shelar (3):
  net: Add skb_unclone() helper function.
  net: factor out skb_mac_gso_segment() from skb_gso_segment()
  GRE: Add TCP segmentation offload for GRE

 drivers/net/ppp/ppp_generic.c           |    3 +-
 include/linux/netdev_features.h         |    3 +-
 include/linux/netdevice.h               |    2 +
 include/linux/skbuff.h                  |   27 +++++++
 net/core/dev.c                          |   80 ++++++++++++--------
 net/core/ethtool.c                      |    1 +
 net/core/skbuff.c                       |    6 +-
 net/ipv4/af_inet.c                      |    1 +
 net/ipv4/ah4.c                          |    3 +-
 net/ipv4/gre.c                          |  122 +++++++++++++++++++++++++++++++
 net/ipv4/ip_fragment.c                  |    2 +-
 net/ipv4/ip_gre.c                       |   82 +++++++++++++++++++--
 net/ipv4/tcp.c                          |    1 +
 net/ipv4/tcp_output.c                   |    2 +-
 net/ipv4/udp.c                          |    3 +-
 net/ipv4/xfrm4_input.c                  |    2 +-
 net/ipv4/xfrm4_mode_tunnel.c            |    3 +-
 net/ipv6/ah6.c                          |    3 +-
 net/ipv6/ip6_offload.c                  |    1 +
 net/ipv6/netfilter/nf_conntrack_reasm.c |    2 +-
 net/ipv6/reassembly.c                   |    2 +-
 net/ipv6/udp_offload.c                  |    3 +-
 net/ipv6/xfrm6_mode_tunnel.c            |    3 +-
 net/sched/act_ipt.c                     |    6 +-
 net/sched/act_pedit.c                   |    3 +-
 25 files changed, 303 insertions(+), 63 deletions(-)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next 0/3] v3 GRE: TCP segmentation offload
  2013-02-14 19:44 [PATCH net-next 0/3] v3 GRE: TCP segmentation offload Pravin B Shelar
@ 2013-02-15 20:18 ` David Miller
  2013-02-16  0:52   ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: David Miller @ 2013-02-15 20:18 UTC (permalink / raw)
  To: pshelar; +Cc: netdev, edumazet, jesse, bhutchings, mirqus

From: Pravin B Shelar <pshelar@nicira.com>
Date: Thu, 14 Feb 2013 11:44:41 -0800

> Following patches add TCP segmentation offload to GRE. These 
> patches shows 20-25% performance improvement in netperf single
> process TCP_STREAM test on 10G network.
> 
> Pravin B Shelar (3):
>   net: Add skb_unclone() helper function.
>   net: factor out skb_mac_gso_segment() from skb_gso_segment()
>   GRE: Add TCP segmentation offload for GRE

All applied, incorporating the suggestions/fixes from Eric.  Specifically,
using skb_reset_mac_len() in patch #2 and computing pkt_len before ip_local_out()
in patch #3.

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next 0/3] v3 GRE: TCP segmentation offload
  2013-02-15 20:18 ` David Miller
@ 2013-02-16  0:52   ` Eric Dumazet
  2013-02-16  1:41     ` Pravin Shelar
  2013-02-16  1:53     ` David Miller
  0 siblings, 2 replies; 6+ messages in thread
From: Eric Dumazet @ 2013-02-16  0:52 UTC (permalink / raw)
  To: David Miller; +Cc: pshelar, netdev, edumazet, jesse, bhutchings, mirqus

On Fri, 2013-02-15 at 15:18 -0500, David Miller wrote:

> All applied, incorporating the suggestions/fixes from Eric.  Specifically,
> using skb_reset_mac_len() in patch #2 and computing pkt_len before ip_local_out()
> in patch #3.

Thanks David

There is this "tx-nocache-copy" issue : 

We currently enable the nocache copy for all devices but loopback.

But its a loss of performance with tunnel devices 

Actually, it seems a loss even for regular ethernet devices :(



# ethtool -K gre1 tx-nocache-copy on 
# perf stat netperf -H 7.7.8.84
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    10.00    4252.42   

 Performance counter stats for 'netperf -H 7.7.8.84':

       9967.965824 task-clock                #    0.996 CPUs utilized          
                54 context-switches          #    0.005 K/sec                  
                 3 CPU-migrations            #    0.000 K/sec                  
               261 page-faults               #    0.026 K/sec                  
    27,964,187,393 cycles                    #    2.805 GHz                    
    20,902,040,632 stalled-cycles-frontend   #   74.75% frontend cycles idle   
    13,524,565,776 stalled-cycles-backend    #   48.36% backend  cycles idle   
    15,929,463,578 instructions              #    0.57  insns per cycle        
                                             #    1.31  stalled cycles per insn
     2,065,830,063 branches                  #  207.247 M/sec                  
        35,891,035 branch-misses             #    1.74% of all branches        

      10.003882959 seconds time elapsed


Now we use regular memory copy :

# ethtool -K gre1 tx-nocache-copy off
# perf stat netperf -H 7.7.8.84
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    10.00    7706.50   

 Performance counter stats for 'netperf -H 7.7.8.84':

       5708.284991 task-clock                #    0.571 CPUs utilized          
             5,138 context-switches          #    0.900 K/sec                  
                24 CPU-migrations            #    0.004 K/sec                  
               260 page-faults               #    0.046 K/sec                  
    15,990,404,388 cycles                    #    2.801 GHz                    
    10,903,764,099 stalled-cycles-frontend   #   68.19% frontend cycles idle   
     6,089,332,139 stalled-cycles-backend    #   38.08% backend  cycles idle   
    10,680,845,426 instructions              #    0.67  insns per cycle        
                                             #    1.02  stalled cycles per insn
     1,401,663,288 branches                  #  245.549 M/sec                  
        15,380,428 branch-misses             #    1.10% of all branches        

      10.004025020 seconds time elapsed

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next 0/3] v3 GRE: TCP segmentation offload
  2013-02-16  0:52   ` Eric Dumazet
@ 2013-02-16  1:41     ` Pravin Shelar
  2013-02-16  1:43       ` Eric Dumazet
  2013-02-16  1:53     ` David Miller
  1 sibling, 1 reply; 6+ messages in thread
From: Pravin Shelar @ 2013-02-16  1:41 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, edumazet, jesse, bhutchings, mirqus

On Fri, Feb 15, 2013 at 4:52 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2013-02-15 at 15:18 -0500, David Miller wrote:
>
>> All applied, incorporating the suggestions/fixes from Eric.  Specifically,
>> using skb_reset_mac_len() in patch #2 and computing pkt_len before ip_local_out()
>> in patch #3.
>
> Thanks David
>
> There is this "tx-nocache-copy" issue :
>
> We currently enable the nocache copy for all devices but loopback.
>
> But its a loss of performance with tunnel devices
>
> Actually, it seems a loss even for regular ethernet devices :(
>
>
>
> # ethtool -K gre1 tx-nocache-copy on
> # perf stat netperf -H 7.7.8.84
> MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 () port 0 AF_INET
> Recv   Send    Send
> Socket Socket  Message  Elapsed
> Size   Size    Size     Time     Throughput
> bytes  bytes   bytes    secs.    10^6bits/sec
>
>  87380  16384  16384    10.00    4252.42
>
>  Performance counter stats for 'netperf -H 7.7.8.84':
>
>        9967.965824 task-clock                #    0.996 CPUs utilized
>                 54 context-switches          #    0.005 K/sec
>                  3 CPU-migrations            #    0.000 K/sec
>                261 page-faults               #    0.026 K/sec
>     27,964,187,393 cycles                    #    2.805 GHz
>     20,902,040,632 stalled-cycles-frontend   #   74.75% frontend cycles idle
>     13,524,565,776 stalled-cycles-backend    #   48.36% backend  cycles idle
>     15,929,463,578 instructions              #    0.57  insns per cycle
>                                              #    1.31  stalled cycles per insn
>      2,065,830,063 branches                  #  207.247 M/sec
>         35,891,035 branch-misses             #    1.74% of all branches
>
>       10.003882959 seconds time elapsed
>
>
> Now we use regular memory copy :
>
> # ethtool -K gre1 tx-nocache-copy off
> # perf stat netperf -H 7.7.8.84
> MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 () port 0 AF_INET
> Recv   Send    Send
> Socket Socket  Message  Elapsed
> Size   Size    Size     Time     Throughput
> bytes  bytes   bytes    secs.    10^6bits/sec
>
>  87380  16384  16384    10.00    7706.50
>
>  Performance counter stats for 'netperf -H 7.7.8.84':
>
>        5708.284991 task-clock                #    0.571 CPUs utilized
>              5,138 context-switches          #    0.900 K/sec
>                 24 CPU-migrations            #    0.004 K/sec
>                260 page-faults               #    0.046 K/sec
>     15,990,404,388 cycles                    #    2.801 GHz
>     10,903,764,099 stalled-cycles-frontend   #   68.19% frontend cycles idle
>      6,089,332,139 stalled-cycles-backend    #   38.08% backend  cycles idle
>     10,680,845,426 instructions              #    0.67  insns per cycle
>                                              #    1.02  stalled cycles per insn
>      1,401,663,288 branches                  #  245.549 M/sec
>         15,380,428 branch-misses             #    1.10% of all branches
>
>       10.004025020 seconds time elapsed
>
>

I am not seeing such big difference with these setting, are you
running this test on special hardware or in VM?

Thanks,
Pravin.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next 0/3] v3 GRE: TCP segmentation offload
  2013-02-16  1:41     ` Pravin Shelar
@ 2013-02-16  1:43       ` Eric Dumazet
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2013-02-16  1:43 UTC (permalink / raw)
  To: Pravin Shelar; +Cc: David Miller, netdev, edumazet, jesse, bhutchings, mirqus

On Fri, 2013-02-15 at 17:41 -0800, Pravin Shelar wrote:

> 
> I am not seeing such big difference with these setting, are you
> running this test on special hardware or in VM?

Thats bare metal actually...

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 44
model name	: Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
stepping	: 2
microcode	: 0x13
cpu MHz		: 2800.330
cache size	: 12288 KB

...

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next 0/3] v3 GRE: TCP segmentation offload
  2013-02-16  0:52   ` Eric Dumazet
  2013-02-16  1:41     ` Pravin Shelar
@ 2013-02-16  1:53     ` David Miller
  1 sibling, 0 replies; 6+ messages in thread
From: David Miller @ 2013-02-16  1:53 UTC (permalink / raw)
  To: eric.dumazet; +Cc: pshelar, netdev, edumazet, jesse, bhutchings, mirqus

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 15 Feb 2013 16:52:35 -0800

> There is this "tx-nocache-copy" issue : 

That scheme has so many system and device dependencies, but when
it does help it's nice to have.

Unfortunately I don't know the best way to proceed about that.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-02-16  1:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-14 19:44 [PATCH net-next 0/3] v3 GRE: TCP segmentation offload Pravin B Shelar
2013-02-15 20:18 ` David Miller
2013-02-16  0:52   ` Eric Dumazet
2013-02-16  1:41     ` Pravin Shelar
2013-02-16  1:43       ` Eric Dumazet
2013-02-16  1:53     ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.