From mboxrd@z Thu Jan 1 00:00:00 1970 From: Benjamin LaHaise Subject: Re: [0/14] GRO: Lots of microoptimisations Date: Tue, 16 Jun 2009 12:35:47 -0400 Message-ID: <20090616163547.GC5013@neterion.com> References: <20090529162312.GA5191@neterion.com> <20090610054449.GB16984@gondor.apana.org.au> <20090612160926.GA4290@neterion.com> <20090612.164833.35888001.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: herbert@gondor.apana.org.au, netdev@vger.kernel.org To: David Miller Return-path: Received: from barracuda.s2io.com ([72.1.205.138]:59976 "EHLO barracuda.s2io.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760319AbZFPQfx (ORCPT ); Tue, 16 Jun 2009 12:35:53 -0400 Content-Disposition: inline In-Reply-To: <20090612.164833.35888001.davem@davemloft.net> Sender: netdev-owner@vger.kernel.org List-ID: On Fri, Jun 12, 2009 at 04:48:33PM -0700, David Miller wrote: > I find a 500Mbps difference, due to just one single cache miss on > every packet, simply astounding and unbelievable. But hey, it is > what you are seeing, so something has to account for it. :) The cache miss only accounts for ~50Mbpsi, it'd be nice if there was an easy way to get the whole 500Mbps back. The rest seems to be in the general overhead of the GRO code vs the normal NAPI rx path. The P4 Xeon is substantially worse at string operations than the Core 2 / Core i7 based Xeons, so I'm hoping to test and see if they do any better with the GRO code when I get access to a new machine soon. -ben