linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andi Kleen <ak@suse.de>
To: "David S. Miller" <davem@redhat.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Fire Engine??
Date: 26 Nov 2003 10:53:21 +0100	[thread overview]
Message-ID: <p73fzgbzca6.fsf@verdi.suse.de> (raw)
In-Reply-To: <20031125183035.1c17185a.davem@redhat.com.suse.lists.linux.kernel>

"David S. Miller" <davem@redhat.com> writes:
> 
> So his claim is that, in their mesaurements, "CPU utilization"
> was lower in their stack.  Was he using 2.6.x and TSO capable
> cards on the Linux side?  If not, it's not apples to apples
> against are current upcoming technology.

Maybe they just have a better copy_to_user(). That eats most time anyways.

I think there are definitely areas of improvements left in current TCP.
It has gotten quite fat over the last years.

Some issues just from the top of my head. I have not done detailed profiling
recently and don't know if any of this would help significantly. It is 
just what I remember right now.

- Window computation for incoming packets is quite dumbly coded right now
and could be optimized
- I suspect the copy/process-in--user-context setup needs to be rethought/
rebenchmarked in Gigabit setups.  There was at least one test case
where tcp_low_latency=1 helped. It just adds latency that might hurt
and is not very useful when you have hardware checksums anyways
- If they tested TCP-over-NFS then I'm pretty sure Linux lost badly because
the current paths for that are just awfully inefficient.
- Overall IP/TCP could probably have some more instructions and hopefully
cache misses shaved off with some careful going over the fast paths.
- There are too many locks. That hurts when you have slow atomic operations
(like on P4) and together with the next issue. 
- We do most things one packet at a time. This means locking and multiple
layer overhead multiplies. Most network operations come in packet bursts
and it would be much more efficient to batch operations: always process
lists of packets instead of single packets. This could probably lower
locking overhead a lot.
- On TX we are inefficient for the same reason. TCP builds one packet
at a time and then goes down through all layers taking all locks (queue,
device driver etc.) and submits the single packet. Then repeats that for 
lots of packets because many TCP writes are > MTU. Batching that would 
likely help a lot, like it was done in the 2.6 VFS. I think it could 
also make hard_start_xmit in many drivers significantly faster.
- The hash tables are too big. This causes unnecessary cache misses all the 
time.
- Doing gettimeofday on each incoming packet is just dumb, especially
when you have gettimeofday backed with a slow southbridge timer.
This shows quite badly on many profile logs.
I still think right solution for that would be to only take time stamps
when there is any user for it (= no timestamps in 99% of all systems) 
- user copy and checksum could probably also done faster if they were
batched for multiple packets. It is hard to optimize properly for 
<= 1.5K copies.
This is especially true for 4/4 split kernels which will eat an 
page table look up + lock for each individual copy, but also for others.

-Andi

       reply	other threads:[~2003-11-26 10:47 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <BAY1-DAV15JU71pROHD000040e2@hotmail.com.suse.lists.linux.kernel>
     [not found] ` <20031125183035.1c17185a.davem@redhat.com.suse.lists.linux.kernel>
2003-11-26  9:53   ` Andi Kleen [this message]
2003-11-26 11:35     ` Fire Engine?? John Bradford
2003-11-26 18:50       ` Mike Fedyk
2003-11-26 19:19         ` Diego Calleja García
2003-11-26 19:59           ` Mike Fedyk
2003-11-27  3:54           ` Bill Huey
2003-11-26 15:00     ` Trond Myklebust
2003-11-26 23:01       ` Andi Kleen
2003-11-26 23:23         ` Trond Myklebust
2003-11-26 23:38           ` Andi Kleen
2003-11-26 19:30     ` David S. Miller
2003-11-26 19:58       ` Paul Menage
2003-11-26 20:03         ` David S. Miller
2003-11-26 22:29           ` Andi Kleen
2003-11-26 22:36             ` David S. Miller
2003-11-26 22:56               ` Andi Kleen
2003-11-26 23:13                 ` David S. Miller
2003-11-26 23:29                   ` Andi Kleen
2003-11-26 23:41                   ` Ben Greear
2003-11-27  0:01                     ` Fast timestamps David S. Miller
2003-11-27  0:30                       ` Mitchell Blank Jr
2003-11-27  1:57                       ` Ben Greear
2003-11-26 20:01       ` Fire Engine?? Jamie Lokier
2003-11-26 20:04         ` David S. Miller
2003-11-26 21:54         ` Pekka Pietikainen
2003-11-26 20:22       ` Theodore Ts'o
2003-11-26 21:02         ` David S. Miller
2003-11-26 21:24           ` Jamie Lokier
2003-11-26 21:38             ` David S. Miller
2003-11-26 23:43               ` Jamie Lokier
2003-11-26 21:34       ` Arjan van de Ven
2003-11-26 22:58         ` Andi Kleen
2003-11-27 12:16           ` Ingo Oeser
2003-11-26 22:39       ` Andi Kleen
2003-11-26 22:46         ` David S. Miller
2003-11-26  0:15 Mr. BOFH
2003-11-26  2:30 ` David S. Miller
2003-11-26  5:41 ` Valdis.Kletnieks

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=p73fzgbzca6.fsf@verdi.suse.de \
    --to=ak@suse.de \
    --cc=davem@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).