From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Miller Subject: Re: rps perfomance WAS(Re: rps: question Date: Thu, 15 Apr 2010 01:48:57 -0700 (PDT) Message-ID: <20100415.014857.168270765.davem@davemloft.net> References: <1271268242.16881.1719.camel@edumazet-laptop> <1271271222.4567.51.camel@bigi> Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: eric.dumazet@gmail.com, therbert@google.com, netdev@vger.kernel.org, robert@herjulf.net, xiaosuo@gmail.com, andi@firstfloor.org To: hadi@cyberus.ca Return-path: Received: from 74-93-104-97-Washington.hfc.comcastbusiness.net ([74.93.104.97]:56938 "EHLO sunset.davemloft.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757529Ab0DOIsy (ORCPT ); Thu, 15 Apr 2010 04:48:54 -0400 In-Reply-To: <1271271222.4567.51.camel@bigi> Sender: netdev-owner@vger.kernel.org List-ID: From: jamal Date: Wed, 14 Apr 2010 14:53:42 -0400 > On Wed, 2010-04-14 at 20:04 +0200, Eric Dumazet wrote: > >> Yes, multiqueue is far better of course, but in case of hardware lacking >> multiqueue, RPS can help many workloads, where application has _some_ >> work to do, not only counting frames or so... > > Agreed. So to enumerate, the benefits come in if: > a) you have many processors > b) you have single-queue nic > c) at sub-threshold traffic you dont care about a little latency > d) you have a specific cache hierachy > e) app is working hard to process incoming messages A single-queue NIC is actually not a requirement, RPS helps also in cases where you have 'N' application threads and N is less than the number of CPUs your multi-queue NIC is distributing traffic to. Moving the bulk of the input packet processing to the cpus where the applications actually sit had a non-trivial benefit. RFS takes this aspect to yet another level. > I think the main challenge for my pedantic mind is missing details. Is > there a paper on rps? Example for #d above, the commit log mentions that > rps benefits if you have certain types of "cache hierachy". Probably > some arch with large shared L2/3 (maybe inclusive) cache will benefit. > example: it does well on Nehalem and probably opterons as long (as you > dont start stacking these things on some interconnect like QPI or HT). > But what happens when you have FSB sharing across cores (still a very > common setup)? etc etc I think for the case where application locality is important, RPS/RFS can help regardless of cache details.