From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jesper Dangaard Brouer Subject: Re: RFC crap-patch [PATCH] net: Per CPU separate frag mem accounting Date: Thu, 14 Mar 2013 09:59:21 +0100 Message-ID: <1363251561.14913.33.camel@localhost> References: <20130308221647.5312.33631.stgit@dragon> <20130308221744.5312.14924.stgit@dragon> <1363245955.14913.21.camel@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: Hannes Frederic Sowa , netdev@vger.kernel.org, yoshfuji@linux-ipv6.org To: Eric Dumazet Return-path: Received: from mx1.redhat.com ([209.132.183.28]:40042 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752621Ab3CNI73 (ORCPT ); Thu, 14 Mar 2013 04:59:29 -0400 In-Reply-To: <1363245955.14913.21.camel@localhost> Sender: netdev-owner@vger.kernel.org List-ID: On Thu, 2013-03-14 at 08:25 +0100, Jesper Dangaard Brouer wrote: > This is NOT the patch I just mentioned in the other thread, of removing > the LRU list. This patch does real per cpu mem acct, and LRU per CPU. > > I get really good performance number with this patch, but I still think > this might not be the correct solution. The reason is this depend on fragments entering the same HW queue, some NICs might not put the first fragment (which have the full header tuples) and the remaining fragments on the same queue. In which case this patch will loose its performance gain. > My current best results, which got applied recently, compared to this > patch: > - Test-type: Test-20G64K Test-20G3F 20G64K+DoS 20G3F+DoS > - Patch-06: 18486.7 Mbit/s 10723.20 3657.85 4560.64 Mbit/s > - curr-best: 19041.0 Mbit/s 12105.20 10160.40 11179.30 Mbit/s I noticed that this also included some other patches in my stack - New patchset-B: - PatchB-07: Fix LRU list head multi CPU race - PatchB-07: 18731.5 Mbit/s 10721.9 4079.22 5208.73 Mbit/s - PatchB-08: Per hash bucket locking - PatchB-08: 15959.5 Mbit/s 10968.9 4294.63 6365.16 Mbit/s As you can see I'm looking into why "PatchB-08" which implement per hash bucket locking is reducing throughput in Test-20G64K. > Thus, I have almost solved DoS effect Test-20G3F 12GBit/s -> 11Gbit/s > under DoS. The 64K+DoS case is not perfect yet, 19Gbit/s -> 11 Gbit/s. > > --Jesper