From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753404AbaFMSWd (ORCPT ); Fri, 13 Jun 2014 14:22:33 -0400 Received: from mail-oa0-f46.google.com ([209.85.219.46]:40660 "EHLO mail-oa0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751588AbaFMSWa (ORCPT ); Fri, 13 Jun 2014 14:22:30 -0400 MIME-Version: 1.0 In-Reply-To: References: <20140611133919.GZ4581@linux.vnet.ibm.com> <539879B8.4010204@canonical.com> <20140611161857.GC4581@linux.vnet.ibm.com> <53989F7B.6000004@canonical.com> <874mzr41kf.fsf@x220.int.ebiederm.org> <20140611225228.GO4581@linux.vnet.ibm.com> <87ioo7vy5s.fsf@x220.int.ebiederm.org> <20140611234902.GQ4581@linux.vnet.ibm.com> <87bntzt24g.fsf@x220.int.ebiederm.org> <874mzrszlk.fsf@x220.int.ebiederm.org> Date: Fri, 13 Jun 2014 15:22:30 -0300 Message-ID: Subject: Re: Possible netns creation and execution performance/scalability regression since v3.8 due to rcu callbacks being offloaded to multiple cpus From: Rafael Tinoco To: "Eric W. Biederman" Cc: Paul McKenney , Dave Chiluk , linux-kernel@vger.kernel.org, davem@davemloft.net, Christopher Arges , Jay Vosburgh Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Okay, Tests with the same script were done. I'm comparing : master + patch vs 3.15.0-rc5 (last sync'ed rcu commit) and 3.9 last bisect good. Same tests were made. I'm comparing the following versions: 1) master + suggested patch 2) 3.15.0-rc5 (last rcu commit in my clone) 3) 3.9-rc2 (last bisect good) master + sug patch 3.15.0-rc5 (last rcu) 3.9-rc2 (bisec good) mark no none all no none all no # (netns add) / sec 250 125.00 250.00 250.00 20.83 22.73 50.00 83.33 500 250.00 250.00 250.00 22.73 22.73 50.00 125.00 750 250.00 125.00 125.00 20.83 22.73 62.50 125.00 1000 125.00 250.00 125.00 20.83 20.83 50.00 250.00 1250 125.00 125.00 250.00 22.73 22.73 50.00 125.00 1500 125.00 125.00 125.00 22.73 22.73 41.67 125.00 1750 125.00 125.00 83.33 22.73 22.73 50.00 83.33 2000 125.00 83.33 125.00 22.73 25.00 50.00 125.00 -> From 3.15 to patched tree, netns add performance was *** restored/improved *** OK # (netns add + 1 x exec) / sec 250 11.90 14.71 31.25 5.00 6.76 15.63 62.50 500 11.90 13.89 31.25 5.10 7.14 15.63 41.67 750 11.90 13.89 27.78 5.10 7.14 15.63 50.00 1000 11.90 13.16 25.00 4.90 6.41 15.63 35.71 1250 11.90 13.89 25.00 4.90 6.58 15.63 27.78 1500 11.36 13.16 25.00 4.72 6.25 15.63 25.00 1750 11.90 12.50 22.73 4.63 5.56 14.71 20.83 2000 11.36 12.50 22.73 4.55 5.43 13.89 17.86 -> From 3.15 to patched tree, performance improves +100% but still -50% of 3.9-rc2 # (netns add + 2 x exec) / sec 250 6.58 8.62 16.67 2.81 3.97 9.26 41.67 500 6.58 8.33 15.63 2.78 4.10 9.62 31.25 750 5.95 7.81 15.63 2.69 3.85 8.93 25.00 1000 5.95 7.35 13.89 2.60 3.73 8.93 20.83 1250 5.81 7.35 13.89 2.55 3.52 8.62 16.67 1500 5.81 7.35 13.16 0.00 3.47 8.62 13.89 1750 5.43 6.76 13.16 0.00 3.47 8.62 11.36 2000 5.32 6.58 12.50 0.00 3.38 8.33 9.26 -> Same as before. # netns add + 2 x exec + 1 x ip link to netns 250 7.14 8.33 14.71 2.87 3.97 8.62 35.71 500 6.94 8.33 13.89 2.91 3.91 8.93 25.00 750 6.10 7.58 13.89 2.75 3.79 8.06 19.23 1000 5.56 6.94 12.50 2.69 3.85 8.06 14.71 1250 5.68 6.58 11.90 2.58 3.57 7.81 11.36 1500 5.56 6.58 10.87 0.00 3.73 7.58 10.00 1750 5.43 6.41 10.42 0.00 3.57 7.14 8.62 2000 5.21 6.25 10.00 0.00 3.33 7.14 6.94 -> Ip link add to netns did not change performance proportion much. # netns add + 2 x exec + 2 x ip link to netns 250 7.35 8.62 13.89 2.94 4.03 8.33 31.25 500 7.14 8.06 12.50 2.94 4.03 8.06 20.83 750 6.41 7.58 11.90 2.81 3.85 7.81 15.63 1000 5.95 7.14 10.87 2.69 3.79 7.35 12.50 1250 5.81 6.76 10.00 2.66 3.62 7.14 10.00 1500 5.68 6.41 9.62 3.73 6.76 8.06 1750 5.32 6.25 8.93 3.68 6.58 7.35 2000 5.43 6.10 8.33 3.42 6.10 6.41 -> Same as before. OBS: 1) It seems that performance got improved for network namespace addiction but maybe there can be some improvement also on netns execution. This way we might achieve same performance as 3.9.0-rc2 (good bisect) had. 2) These tests were made with 4 cpu only. 3) Initial charts showed that 1 cpu case with all cpus as no-cb (without this patch) had something like 50% of bisect good. The 4 cpu (nocball) case had 26% of bisect good (like showed above in the last case -> 31.25 -- 8.33). 4) With the patch, using 4 cpus and nocball, we now have 44% of bisect good performance (against 26% we had). 5) NOCB_* is still an issue. It is clear that only NOCB_CPU_ALL option is giving us something near last good commit performance. Thank you Rafael