From: Eric Dumazet <eric.dumazet@gmail.com>
To: Francois Romieu <romieu@fr.zoreil.com>
Cc: netdev@vger.kernel.org, Hayes Wang <hayeswang@realtek.com>
Subject: Re: [RFC] r8169 : why SG / TX checksum are default disabled
Date: Wed, 18 Jul 2012 10:55:53 +0200 [thread overview]
Message-ID: <1342601753.2626.2040.camel@edumazet-glaptop> (raw)
In-Reply-To: <20120717234037.GA26972@electric-eye.fr.zoreil.com>
On Wed, 2012-07-18 at 01:40 +0200, Francois Romieu wrote:
> > (I found that activating them with ethtool automatically enables GSO,
> > and performance with GSO is not good)
>
> It's still an improvement though, isn't it ?
>
On an old AMD machine, I can get line rate with default conf, but using
nearly all cpu cycles.
Following test is only partial, a real one should use forwarding for
example...
# perf stat netperf -H eric -C -c -t OMNI
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to eric () port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1000 tcpi_rttvar 750 tcpi_snd_ssthresh 16 tpci_snd_cwnd 62
tcpi_reordering 3 tcpi_total_retrans 0
Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Recv Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
290160 549032 16384 10.00 915.44 10^6bits/s 44.93 S 3.61 S 8.042 7.755 usec/KB
Performance counter stats for 'netperf -H eric -C -c -t OMNI':
5206,301186 task-clock # 0,520 CPUs utilized
16 568 context-switches # 0,003 M/sec
2 CPU-migrations # 0,000 K/sec
366 page-faults # 0,070 K/sec
12 362 775 266 cycles # 2,375 GHz [66,99%]
2 529 275 760 stalled-cycles-frontend # 20,46% frontend cycles idle [67,00%]
6 878 915 080 stalled-cycles-backend # 55,64% backend cycles idle [66,24%]
5 272 222 150 instructions # 0,43 insns per cycle
# 1,30 stalled cycles per insn [66,85%]
819 922 185 branches # 157,487 M/sec [66,79%]
50 135 423 branch-misses # 6,11% of all branches [66,15%]
10,019141027 seconds time elapsed
If I switch to SG+TX (GSO is automatically enabled), bandwidth is lower.
# ethtool -K eth1 tx on sg on
Actual changes:
tx-checksumming: on
tx-checksum-ipv4: on
scatter-gather: on
tx-scatter-gather: on
generic-segmentation-offload: on
# perf stat netperf -H eric -C -c -t OMNI
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to eric () port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 21 tpci_snd_cwnd 169
tcpi_reordering 3 tcpi_total_retrans 0
Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Recv Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
790920 704640 16384 10.01 762.29 10^6bits/s 38.00 S 3.38 S 8.167 8.720 usec/KB
Performance counter stats for 'netperf -H eric -C -c -t OMNI':
4526,838736 task-clock # 0,452 CPUs utilized
2 031 context-switches # 0,449 K/sec
3 CPU-migrations # 0,001 K/sec
366 page-faults # 0,081 K/sec
4 476 876 825 cycles # 0,989 GHz [66,41%]
899 080 378 stalled-cycles-frontend # 20,08% frontend cycles idle [66,56%]
2 430 763 937 stalled-cycles-backend # 54,30% backend cycles idle [66,87%]
1 685 481 163 instructions # 0,38 insns per cycle
# 1,44 stalled cycles per insn [66,93%]
280 404 977 branches # 61,943 M/sec [66,73%]
15 608 497 branch-misses # 5,57% of all branches [66,54%]
10,025486268 seconds time elapsed
Since most frames need between 2 and 3 segments
(one for the ip/tcp headers, and one or two frags for the payload), this
might be a MMIO issue, that Alexander tried to solve recently...
If I only switch to SG+TX its ok
# ethtool -K eth1 gso off
# perf stat netperf -H eric -C -c -t OMNI
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to eric () port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1000 tcpi_rttvar 750 tcpi_snd_ssthresh 18 tpci_snd_cwnd 60
tcpi_reordering 3 tcpi_total_retrans 0
Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Recv Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
280800 549032 16384 10.00 916.61 10^6bits/s 40.05 S 3.62 S 7.159 7.774 usec/KB
Performance counter stats for 'netperf -H eric -C -c -t OMNI':
4827,259625 task-clock # 0,482 CPUs utilized
17 988 context-switches # 0,004 M/sec
3 CPU-migrations # 0,001 K/sec
366 page-faults # 0,076 K/sec
11 448 148 411 cycles # 2,372 GHz [66,57%]
2 278 563 777 stalled-cycles-frontend # 19,90% frontend cycles idle [66,38%]
6 420 123 655 stalled-cycles-backend # 56,08% backend cycles idle [66,38%]
4 471 468 064 instructions # 0,39 insns per cycle
# 1,44 stalled cycles per insn [67,48%]
757 302 269 branches # 156,880 M/sec [67,08%]
44 320 435 branch-misses # 5,85% of all branches [66,16%]
10,020331031 seconds time elapsed
prev parent reply other threads:[~2012-07-18 8:55 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-17 22:39 [RFC] r8169 : why SG / TX checksum are default disabled Eric Dumazet
2012-07-17 23:40 ` Francois Romieu
2012-07-18 6:45 ` hayeswang
2012-07-18 16:23 ` David Miller
2012-07-18 20:12 ` Francois Romieu
2012-07-18 20:28 ` David Miller
2012-07-18 21:44 ` Francois Romieu
2012-07-18 22:05 ` Eric Dumazet
2012-07-18 22:24 ` David Miller
2012-07-20 7:14 ` hayeswang
2012-07-20 10:08 ` Francois Romieu
2012-07-20 16:01 ` hayeswang
2012-07-20 16:28 ` David Miller
2012-07-20 21:01 ` Francois Romieu
2012-07-24 6:34 ` hayeswang
2012-07-24 6:59 ` David Miller
2012-07-25 2:10 ` hayeswang
2012-07-20 16:17 ` David Miller
2012-07-20 2:11 ` hayeswang
2012-07-18 8:55 ` Eric Dumazet [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1342601753.2626.2040.camel@edumazet-glaptop \
--to=eric.dumazet@gmail.com \
--cc=hayeswang@realtek.com \
--cc=netdev@vger.kernel.org \
--cc=romieu@fr.zoreil.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.