From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S263463AbTKZTbV (ORCPT ); Wed, 26 Nov 2003 14:31:21 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S264283AbTKZTbV (ORCPT ); Wed, 26 Nov 2003 14:31:21 -0500 Received: from pizda.ninka.net ([216.101.162.242]:33430 "EHLO pizda.ninka.net") by vger.kernel.org with ESMTP id S263463AbTKZTbS (ORCPT ); Wed, 26 Nov 2003 14:31:18 -0500 Date: Wed, 26 Nov 2003 11:30:40 -0800 From: "David S. Miller" To: Andi Kleen Cc: linux-kernel@vger.kernel.org Subject: Re: Fire Engine?? Message-Id: <20031126113040.3b774360.davem@redhat.com> In-Reply-To: References: <20031125183035.1c17185a.davem@redhat.com.suse.lists.linux.kernel> X-Mailer: Sylpheed version 0.9.7 (GTK+ 1.2.6; sparc-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On 26 Nov 2003 10:53:21 +0100 Andi Kleen wrote: > Some issues just from the top of my head. I have not done detailed profiling > recently and don't know if any of this would help significantly. It is > just what I remember right now. Thanks for the list Andi, I'll keep it around. I'd like to comment on one entry though. > - On TX we are inefficient for the same reason. TCP builds one packet > at a time and then goes down through all layers taking all locks (queue, > device driver etc.) and submits the single packet. Then repeats that for > lots of packets because many TCP writes are > MTU. Batching that would > likely help a lot, like it was done in the 2.6 VFS. I think it could > also make hard_start_xmit in many drivers significantly faster. This is tricky, because of getting all of the queueing stuff right. All of the packet scheduler APIs would need to change, as would the classification stuff, not to mention netfilter et al. You're talking about basically redoing the whole TX path if you want to really support this. I'm not saying "don't do this", just that we should be sure we know what we're getting if we invest the time into this. > - The hash tables are too big. This causes unnecessary cache misses all the > time. I agree. See my comments on this topic in another recent linux-kernel thread wrt. huge hash tables on numa systems. > - Doing gettimeofday on each incoming packet is just dumb, especially > when you have gettimeofday backed with a slow southbridge timer. > This shows quite badly on many profile logs. > I still think right solution for that would be to only take time stamps > when there is any user for it (= no timestamps in 99% of all systems) Andi, I know this is a problem, but for the millionth time your idea does not work because we don't know if the user asked for the timestamp until we are deep within the recvmsg() processing, which is long after the packet has arrived. > - user copy and checksum could probably also done faster if they were > batched for multiple packets. It is hard to optimize properly for > <= 1.5K copies. > This is especially true for 4/4 split kernels which will eat an > page table look up + lock for each individual copy, but also for others. I disagree partially, especially in the presence of a chip that provides proper implementations of software initiated prefetching.