All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Francois Romieu <romieu@fr.zoreil.com>
Cc: netdev@vger.kernel.org, Hayes Wang <hayeswang@realtek.com>
Subject: Re: [RFC] r8169 : why SG / TX checksum are default disabled
Date: Wed, 18 Jul 2012 10:55:53 +0200	[thread overview]
Message-ID: <1342601753.2626.2040.camel@edumazet-glaptop> (raw)
In-Reply-To: <20120717234037.GA26972@electric-eye.fr.zoreil.com>

On Wed, 2012-07-18 at 01:40 +0200, Francois Romieu wrote:

> > (I found that activating them with ethtool automatically enables GSO,
> >  and performance with GSO is not good)
> 
> It's still an improvement though, isn't it ?
> 

On an old AMD machine, I can get line rate with default conf, but using
nearly all cpu cycles.

Following test is only partial, a real one should use forwarding for
example...


# perf stat netperf -H eric -C -c -t OMNI
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to eric () port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1000 tcpi_rttvar 750 tcpi_snd_ssthresh 16 tpci_snd_cwnd 62
tcpi_reordering 3 tcpi_total_retrans 0
Local       Remote      Local  Elapsed Throughput Throughput  Local  Local  Remote Remote Local   Remote  Service  
Send Socket Recv Socket Send   Time               Units       CPU    CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util   Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %      Method %      Method                          
290160      549032      16384  10.00   915.44     10^6bits/s  44.93  S      3.61   S      8.042   7.755   usec/KB  

 Performance counter stats for 'netperf -H eric -C -c -t OMNI':

       5206,301186 task-clock                #    0,520 CPUs utilized          
            16 568 context-switches          #    0,003 M/sec                  
                 2 CPU-migrations            #    0,000 K/sec                  
               366 page-faults               #    0,070 K/sec                  
    12 362 775 266 cycles                    #    2,375 GHz                     [66,99%]
     2 529 275 760 stalled-cycles-frontend   #   20,46% frontend cycles idle    [67,00%]
     6 878 915 080 stalled-cycles-backend    #   55,64% backend  cycles idle    [66,24%]
     5 272 222 150 instructions              #    0,43  insns per cycle        
                                             #    1,30  stalled cycles per insn [66,85%]
       819 922 185 branches                  #  157,487 M/sec                   [66,79%]
        50 135 423 branch-misses             #    6,11% of all branches         [66,15%]

      10,019141027 seconds time elapsed


If I switch to SG+TX (GSO is automatically enabled), bandwidth is lower.

# ethtool -K eth1 tx on sg on
Actual changes:
tx-checksumming: on
	tx-checksum-ipv4: on
scatter-gather: on
	tx-scatter-gather: on
generic-segmentation-offload: on

# perf stat netperf -H eric -C -c -t OMNI
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to eric () port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 21 tpci_snd_cwnd 169
tcpi_reordering 3 tcpi_total_retrans 0
Local       Remote      Local  Elapsed Throughput Throughput  Local  Local  Remote Remote Local   Remote  Service  
Send Socket Recv Socket Send   Time               Units       CPU    CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util   Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %      Method %      Method                          
790920      704640      16384  10.01   762.29     10^6bits/s  38.00  S      3.38   S      8.167   8.720   usec/KB  

 Performance counter stats for 'netperf -H eric -C -c -t OMNI':

       4526,838736 task-clock                #    0,452 CPUs utilized          
             2 031 context-switches          #    0,449 K/sec                  
                 3 CPU-migrations            #    0,001 K/sec                  
               366 page-faults               #    0,081 K/sec                  
     4 476 876 825 cycles                    #    0,989 GHz                     [66,41%]
       899 080 378 stalled-cycles-frontend   #   20,08% frontend cycles idle    [66,56%]
     2 430 763 937 stalled-cycles-backend    #   54,30% backend  cycles idle    [66,87%]
     1 685 481 163 instructions              #    0,38  insns per cycle        
                                             #    1,44  stalled cycles per insn [66,93%]
       280 404 977 branches                  #   61,943 M/sec                   [66,73%]
        15 608 497 branch-misses             #    5,57% of all branches         [66,54%]

      10,025486268 seconds time elapsed

Since most frames need between 2 and 3 segments
(one for the ip/tcp headers, and one or two frags for the payload), this
might be a MMIO issue, that Alexander tried to solve recently...

If I only switch to SG+TX its ok

# ethtool -K eth1 gso off

# perf stat netperf -H eric -C -c -t OMNI
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to eric () port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1000 tcpi_rttvar 750 tcpi_snd_ssthresh 18 tpci_snd_cwnd 60
tcpi_reordering 3 tcpi_total_retrans 0
Local       Remote      Local  Elapsed Throughput Throughput  Local  Local  Remote Remote Local   Remote  Service  
Send Socket Recv Socket Send   Time               Units       CPU    CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util   Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %      Method %      Method                          
280800      549032      16384  10.00   916.61     10^6bits/s  40.05  S      3.62   S      7.159   7.774   usec/KB  

 Performance counter stats for 'netperf -H eric -C -c -t OMNI':

       4827,259625 task-clock                #    0,482 CPUs utilized          
            17 988 context-switches          #    0,004 M/sec                  
                 3 CPU-migrations            #    0,001 K/sec                  
               366 page-faults               #    0,076 K/sec                  
    11 448 148 411 cycles                    #    2,372 GHz                     [66,57%]
     2 278 563 777 stalled-cycles-frontend   #   19,90% frontend cycles idle    [66,38%]
     6 420 123 655 stalled-cycles-backend    #   56,08% backend  cycles idle    [66,38%]
     4 471 468 064 instructions              #    0,39  insns per cycle        
                                             #    1,44  stalled cycles per insn [67,48%]
       757 302 269 branches                  #  156,880 M/sec                   [67,08%]
        44 320 435 branch-misses             #    5,85% of all branches         [66,16%]

      10,020331031 seconds time elapsed

      parent reply	other threads:[~2012-07-18  8:55 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-17 22:39 [RFC] r8169 : why SG / TX checksum are default disabled Eric Dumazet
2012-07-17 23:40 ` Francois Romieu
2012-07-18  6:45   ` hayeswang
2012-07-18 16:23     ` David Miller
2012-07-18 20:12       ` Francois Romieu
2012-07-18 20:28         ` David Miller
2012-07-18 21:44           ` Francois Romieu
2012-07-18 22:05             ` Eric Dumazet
2012-07-18 22:24               ` David Miller
2012-07-20  7:14                 ` hayeswang
2012-07-20 10:08                   ` Francois Romieu
2012-07-20 16:01                     ` hayeswang
2012-07-20 16:28                       ` David Miller
2012-07-20 21:01                       ` Francois Romieu
2012-07-24  6:34                         ` hayeswang
2012-07-24  6:59                           ` David Miller
2012-07-25  2:10                             ` hayeswang
2012-07-20 16:17                   ` David Miller
2012-07-20  2:11         ` hayeswang
2012-07-18  8:55   ` Eric Dumazet [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1342601753.2626.2040.camel@edumazet-glaptop \
    --to=eric.dumazet@gmail.com \
    --cc=hayeswang@realtek.com \
    --cc=netdev@vger.kernel.org \
    --cc=romieu@fr.zoreil.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.