From: Bill Fink <billfink@mindspring.com>
To: Willy Tarreau <w@1wt.eu>
Cc: David Miller <davem@davemloft.net>,
herbert@gondor.apana.org.au, zbr@ioremap.net, jarkao2@gmail.com,
dada1@cosmosbay.com, ben@zeus.com, mingo@elte.hu,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
jens.axboe@oracle.com
Subject: Re: [PATCH v2] tcp: splice as many packets as possible at once
Date: Thu, 5 Feb 2009 03:32:41 -0500 [thread overview]
Message-ID: <20090205033241.a99121fe.billfink@mindspring.com> (raw)
In-Reply-To: <20090204091217.GA21385@1wt.eu>
On Wed, 4 Feb 2009, Willy Tarreau wrote:
> On Wed, Feb 04, 2009 at 01:01:46AM -0800, David Miller wrote:
> > From: Herbert Xu <herbert@gondor.apana.org.au>
> > Date: Wed, 4 Feb 2009 19:59:07 +1100
> >
> > > On Wed, Feb 04, 2009 at 09:54:32AM +0100, Willy Tarreau wrote:
> > > >
> > > > My server is running 2.4 :-), but I observed the same issues with older
> > > > 2.6 as well. I can certainly imagine that things have changed a lot since,
> > > > but the initial point remains : jumbo frames are expensive to deal with,
> > > > and with recent NICs and drivers, we might get close performance for
> > > > little additional cost. After all, initial justification for jumbo frames
> > > > was the devastating interrupt rate and all NICs coalesce interrupts now.
> > >
> > > This is total crap! Jumbo frames are way better than any of the
> > > hacks (such as GSO) that people have come up with to get around it.
> > > The only reason we are not using it as much is because of this
> > > nasty thing called the Internet.
> >
> > Completely agreed.
> >
> > If Jumbo frames are slower, it is NOT some fundamental issue. It is
> > rather due to some misdesign of the hardware or it's driver.
>
> Agreed we can't use them *because* of the internet, but this
> limitation has forced hardware designers to find valid alternatives.
> For instance, having the ability to reach 10 Gbps with 1500 bytes
> frames on myri10ge with a low CPU usage is a real achievement. This
> is "only" 800 kpps after all.
>
> And the arbitrary choice of 9k for jumbo frames was total crap too.
> It's clear that no hardware designer was involved in the process.
> They have to stuff 16kB of RAM on a NIC to use only 9. And we need
> to allocate 3 pages for slightly more than 2. 7.5 kB would have been
> better in this regard.
>
> I still find it nice to lower CPU usage with frames larger than 1500,
> but given the fact that this is rarely used (even in datacenters), I
> think our efforts should concentrate on where the real users are, ie
> <1500.
Those in the HPC realm use 9000 byte jumbo frames because it makes
a major performance difference, especially across large RTT paths,
and the Internet2 backbone fully supports 9000 byte jumbo frames
(with some wishing we could support much larger frame sizes).
Local environment:
9000 byte jumbo frames:
[root@lang2 ~]# nuttcp -w10m 192.168.88.16
11818.1875 MB / 10.01 sec = 9905.9707 Mbps 100 %TX 76 %RX 0 retrans 0.15 msRTT
4080 byte MTU:
[root@lang2 ~]# nuttcp -w10m 192.168.88.16
9171.6875 MB / 10.02 sec = 7680.7663 Mbps 100 %TX 99 %RX 0 retrans 0.19 msRTT
The performance impact is even more pronounced on a large RTT path
such as the following netem emulated 80 ms RTT path:
9000 byte jumbo frames:
[root@lang2 ~]# nuttcp -T30 -w80m 192.168.89.15
25904.2500 MB / 30.16 sec = 7205.8755 Mbps 96 %TX 55 %RX 0 retrans 82.73 msRTT
4080 byte MTU:
[root@lang2 ~]# nuttcp -T30 -w80m 192.168.89.15
8650.0129 MB / 30.25 sec = 2398.8862 Mbps 33 %TX 19 %RX 2371 retrans 81.98 msRTT
And if there's any loss in the path, the performance difference is also
dramatic, such as here across a real MAN environment with about a 1 ms RTT:
9000 byte jumbo frames:
[root@chance9 ~]# nuttcp -w20m 192.168.88.8
7711.8750 MB / 10.05 sec = 6436.2406 Mbps 82 %TX 96 %RX 261 retrans 0.92 msRTT
4080 byte MTU:
[root@chance9 ~]# nuttcp -w20m 192.168.88.8
4551.0625 MB / 10.08 sec = 3786.2108 Mbps 50 %TX 95 %RX 42 retrans 0.95 msRTT
All testing was with myri10ge on the transmitter side (2.6.20.7 kernel).
So my experience has definitely been that 9000 byte jumbo frames are a
major performance win for high throughput applications.
-Bill
next prev parent reply other threads:[~2009-02-05 8:33 UTC|newest]
Thread overview: 190+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-08 17:30 [PATCH] tcp: splice as many packets as possible at once Willy Tarreau
2009-01-08 19:44 ` Jens Axboe
2009-01-08 22:03 ` Willy Tarreau
2009-01-08 21:50 ` Ben Mansell
2009-01-08 21:55 ` David Miller
2009-01-08 22:20 ` Willy Tarreau
2009-01-13 23:08 ` David Miller
2009-01-09 6:47 ` Eric Dumazet
2009-01-09 7:04 ` Willy Tarreau
2009-01-09 7:28 ` Eric Dumazet
2009-01-09 7:42 ` Willy Tarreau
2009-01-13 23:27 ` David Miller
2009-01-13 23:35 ` Eric Dumazet
2009-01-09 15:42 ` Eric Dumazet
2009-01-09 17:57 ` Eric Dumazet
2009-01-09 18:54 ` Willy Tarreau
2009-01-09 20:51 ` Eric Dumazet
2009-01-09 21:24 ` Willy Tarreau
2009-01-09 22:02 ` Eric Dumazet
2009-01-09 22:09 ` Willy Tarreau
2009-01-09 22:07 ` Willy Tarreau
2009-01-09 22:12 ` Eric Dumazet
2009-01-09 22:17 ` Willy Tarreau
2009-01-09 22:42 ` Evgeniy Polyakov
2009-01-09 22:50 ` Willy Tarreau
2009-01-09 23:01 ` Evgeniy Polyakov
2009-01-09 23:06 ` Willy Tarreau
2009-01-10 7:40 ` Eric Dumazet
2009-01-11 12:58 ` Evgeniy Polyakov
2009-01-11 13:14 ` Eric Dumazet
2009-01-11 13:35 ` Evgeniy Polyakov
2009-01-11 16:00 ` Eric Dumazet
2009-01-11 16:05 ` Evgeniy Polyakov
2009-01-14 0:07 ` David Miller
2009-01-14 0:13 ` Evgeniy Polyakov
2009-01-14 0:16 ` David Miller
2009-01-14 0:22 ` Evgeniy Polyakov
2009-01-14 0:37 ` David Miller
2009-01-14 3:51 ` Herbert Xu
2009-01-14 4:25 ` David Miller
2009-01-14 7:27 ` David Miller
2009-01-14 8:26 ` Herbert Xu
2009-01-14 8:53 ` Jarek Poplawski
2009-01-14 9:29 ` David Miller
2009-01-14 9:42 ` Jarek Poplawski
2009-01-14 10:06 ` David Miller
2009-01-14 10:47 ` Jarek Poplawski
2009-01-14 11:29 ` Herbert Xu
2009-01-14 11:40 ` Jarek Poplawski
2009-01-14 11:45 ` Jarek Poplawski
2009-01-14 9:54 ` Jarek Poplawski
2009-01-14 10:01 ` Willy Tarreau
2009-01-14 12:06 ` Jarek Poplawski
2009-01-14 12:15 ` Jarek Poplawski
2009-01-14 11:28 ` Herbert Xu
2009-01-15 23:03 ` Willy Tarreau
2009-01-15 23:19 ` David Miller
2009-01-15 23:19 ` Herbert Xu
2009-01-15 23:26 ` David Miller
2009-01-15 23:32 ` Herbert Xu
2009-01-15 23:34 ` David Miller
2009-01-15 23:42 ` Willy Tarreau
2009-01-15 23:44 ` Willy Tarreau
2009-01-15 23:54 ` David Miller
2009-01-19 0:42 ` Willy Tarreau
2009-01-19 3:08 ` Herbert Xu
2009-01-19 3:27 ` David Miller
2009-01-19 6:14 ` Willy Tarreau
2009-01-19 6:19 ` David Miller
2009-01-19 6:45 ` Willy Tarreau
2009-01-19 10:19 ` Herbert Xu
2009-01-19 20:59 ` David Miller
2009-01-19 21:24 ` Herbert Xu
2009-01-25 21:03 ` Willy Tarreau
2009-01-26 7:59 ` Jarek Poplawski
2009-01-26 8:12 ` Willy Tarreau
2009-01-19 8:40 ` Jarek Poplawski
2009-01-19 3:28 ` David Miller
2009-01-19 6:11 ` Willy Tarreau
2009-01-24 21:23 ` Willy Tarreau
2009-01-20 12:01 ` Ben Mansell
2009-01-20 12:11 ` Evgeniy Polyakov
2009-01-20 13:43 ` Ben Mansell
2009-01-20 14:06 ` Jarek Poplawski
2009-01-16 6:51 ` Jarek Poplawski
2009-01-19 6:08 ` David Miller
2009-01-19 6:16 ` David Miller
2009-01-19 10:20 ` Herbert Xu
2009-01-20 8:37 ` Jarek Poplawski
2009-01-20 9:33 ` [PATCH v2] " Jarek Poplawski
2009-01-20 10:00 ` Evgeniy Polyakov
2009-01-20 10:20 ` Jarek Poplawski
2009-01-20 10:31 ` Evgeniy Polyakov
2009-01-20 11:01 ` Jarek Poplawski
2009-01-20 17:16 ` David Miller
2009-01-21 9:54 ` Jarek Poplawski
2009-01-22 9:04 ` [PATCH v3] " Jarek Poplawski
2009-01-26 5:22 ` David Miller
2009-01-27 7:11 ` Herbert Xu
2009-01-27 7:54 ` Jarek Poplawski
2009-01-27 10:09 ` Herbert Xu
2009-01-27 10:35 ` Jarek Poplawski
2009-01-27 10:57 ` Jarek Poplawski
2009-01-27 11:48 ` Herbert Xu
2009-01-27 12:16 ` Jarek Poplawski
2009-01-27 12:31 ` Jarek Poplawski
2009-01-27 17:06 ` David Miller
2009-01-28 8:10 ` Jarek Poplawski
2009-02-01 8:41 ` David Miller
2009-01-26 8:20 ` [PATCH v2] " Jarek Poplawski
2009-01-26 21:21 ` Evgeniy Polyakov
2009-01-27 6:10 ` David Miller
2009-01-27 7:40 ` Jarek Poplawski
2009-01-30 21:42 ` David Miller
2009-01-30 21:59 ` Willy Tarreau
2009-01-30 22:03 ` David Miller
2009-01-30 22:13 ` Willy Tarreau
2009-01-30 22:15 ` David Miller
2009-01-30 22:16 ` Herbert Xu
2009-02-02 8:08 ` Jarek Poplawski
2009-02-02 8:18 ` David Miller
2009-02-02 8:43 ` Jarek Poplawski
2009-02-03 7:50 ` David Miller
2009-02-03 9:41 ` Jarek Poplawski
2009-02-03 11:10 ` Evgeniy Polyakov
2009-02-03 11:24 ` Herbert Xu
2009-02-03 11:49 ` Evgeniy Polyakov
2009-02-03 11:53 ` Herbert Xu
2009-02-03 12:07 ` Evgeniy Polyakov
2009-02-03 12:12 ` Herbert Xu
2009-02-03 12:18 ` Evgeniy Polyakov
2009-02-03 12:25 ` Willy Tarreau
2009-02-03 12:28 ` Herbert Xu
2009-02-04 0:47 ` David Miller
2009-02-04 6:19 ` Willy Tarreau
2009-02-04 8:12 ` Evgeniy Polyakov
2009-02-04 8:54 ` Willy Tarreau
2009-02-04 8:59 ` Herbert Xu
2009-02-04 9:01 ` David Miller
2009-02-04 9:12 ` Willy Tarreau
2009-02-04 9:15 ` David Miller
2009-02-04 19:19 ` Roland Dreier
2009-02-04 19:28 ` Willy Tarreau
2009-02-04 19:48 ` Jarek Poplawski
2009-02-05 8:32 ` Bill Fink [this message]
2009-02-04 9:12 ` David Miller
2009-02-03 12:27 ` Herbert Xu
2009-02-03 13:05 ` david
2009-02-03 12:12 ` Evgeniy Polyakov
2009-02-03 12:18 ` Herbert Xu
2009-02-03 12:30 ` Evgeniy Polyakov
2009-02-03 12:33 ` Herbert Xu
2009-02-03 12:33 ` Nick Piggin
2009-02-04 0:46 ` David Miller
2009-02-04 9:41 ` Benny Amorsen
2009-02-04 12:01 ` Herbert Xu
2009-02-03 12:36 ` Jarek Poplawski
2009-02-03 13:06 ` Evgeniy Polyakov
2009-02-03 13:25 ` Jarek Poplawski
2009-02-03 14:20 ` Evgeniy Polyakov
2009-02-04 0:46 ` David Miller
2009-02-04 8:08 ` Evgeniy Polyakov
2009-02-04 9:23 ` Nick Piggin
2009-02-04 7:56 ` Jarek Poplawski
2009-02-06 7:52 ` David Miller
2009-02-06 8:09 ` Herbert Xu
2009-02-06 9:10 ` Jarek Poplawski
2009-02-06 9:17 ` David Miller
2009-02-06 9:42 ` Jarek Poplawski
2009-02-06 9:49 ` David Miller
2009-02-06 9:23 ` Herbert Xu
2009-02-06 9:51 ` Jarek Poplawski
2009-02-06 10:28 ` Herbert Xu
2009-02-06 10:58 ` Jarek Poplawski
2009-02-06 11:10 ` Willy Tarreau
2009-02-06 11:47 ` Jarek Poplawski
2009-02-06 18:59 ` Jarek Poplawski
2009-02-03 11:38 ` Nick Piggin
2009-01-27 18:42 ` David Miller
2009-01-15 23:32 ` [PATCH] " Willy Tarreau
2009-01-15 23:35 ` David Miller
2009-01-14 0:51 ` Herbert Xu
2009-01-14 1:24 ` David Miller
2009-01-09 22:45 ` Eric Dumazet
2009-01-09 22:53 ` Willy Tarreau
2009-01-09 23:34 ` Eric Dumazet
2009-01-13 5:45 ` David Miller
2009-01-14 0:05 ` David Miller
2009-01-13 23:31 ` David Miller
2009-01-13 23:26 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090205033241.a99121fe.billfink@mindspring.com \
--to=billfink@mindspring.com \
--cc=ben@zeus.com \
--cc=dada1@cosmosbay.com \
--cc=davem@davemloft.net \
--cc=herbert@gondor.apana.org.au \
--cc=jarkao2@gmail.com \
--cc=jens.axboe@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=netdev@vger.kernel.org \
--cc=w@1wt.eu \
--cc=zbr@ioremap.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).