From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755211AbZDKFmY (ORCPT ); Sat, 11 Apr 2009 01:42:24 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753329AbZDKFmL (ORCPT ); Sat, 11 Apr 2009 01:42:11 -0400 Received: from e2.ny.us.ibm.com ([32.97.182.142]:39105 "EHLO e2.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751567AbZDKFmK (ORCPT ); Sat, 11 Apr 2009 01:42:10 -0400 Date: Fri, 10 Apr 2009 22:42:06 -0700 From: "Paul E. McKenney" To: Jan Engelhardt Cc: Linus Torvalds , David Miller , Ingo Molnar , Lai Jiangshan , shemminger@vyatta.com, jeff.chua.linux@gmail.com, dada1@cosmosbay.com, kaber@trash.net, r000n@r000n.net, Linux Kernel Mailing List , netfilter-devel@vger.kernel.org, netdev@vger.kernel.org Subject: Re: iptables very slow after commit 784544739a25c30637397ace5489eeb6e15d7d49 Message-ID: <20090411054206.GC6822@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20090410095246.4fdccb56@s6510> <20090410.182507.140306636.davem@davemloft.net> <20090411041533.GB6822@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.15+20070412 (2007-04-11) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Apr 11, 2009 at 07:14:50AM +0200, Jan Engelhardt wrote: > > On Saturday 2009-04-11 06:15, Paul E. McKenney wrote: > >On Fri, Apr 10, 2009 at 06:39:18PM -0700, Linus Torvalds wrote: > >>An unhappy user reported: > >>>>> Adding 200 records in iptables took 6.0sec in 2.6.30-rc1 compared to > >>>>> 0.2sec in 2.6.29. I've bisected down this commit. > >>>>> 784544739a25c30637397ace5489eeb6e15d7d49 > >> > >> I wonder if we should bring in the RCU people too, for them to tell you > >> that the networking people are beign silly, and should not synchronize > >> with the very heavy-handed > >> > >> synchronize_net() > >> > >> but instead of doing synchronization (which is probably why adding a few > >> hundred rules then takes several seconds - each synchronizes and that > >> takes a timer tick or so), add the rules to be free'd on some rcu-freeing > >> list for later freeing. > > iptables works in whole tables. Userspace submits a table, checkentry is > called for all rules in the new table, things are swapped, then destroy > is called for all rules in the old table. By that logic (which existed > since dawn I think), only the swap operation needs to be locked. > > Jeff Chua wrote: > >So, to make it easy for testing, you can do a loop like this ... > > for((i = 1; i < 100; i++)) > > do > > iptables -A block -s 10.0.0.$i -j ACCEPT > > done > > The fact that `iptables -A` is called a hundred times means you are > doing 100 table replacements -- instead of one. And calling > synchronize_net at least a 100 times. > > "Wanna use iptables-restore?" > > >1. Assuming that the synchronize_net() is intended to guarantee > > that the new rules will be in effect before returning to > > user space: > > As I read the new code, it seems that synchronize_net is only > used on copying the rules from kernel into userspace; > not when updating them from userspace: > > IPT_SO_GET_ENTRIES -> get_entries -> copy_entries_to_user -> > alloc_counters -> synchronize_net. OK. > >3. For the alloc_counters() case, the comments indicate that we > > really truly do want an atomic sampling of the counters. > > The counters are 64-bit entities, which is a bit inconvenient. > > Though people using this functionality are no doubt quite happy > > to never have to worry about overflow, I hasten to add! > > > > I will nevertheless suggest the following egregious hack to > > get a consistent sample of one counter for some other CPU: > > [...] > > Would a seqlock suffice, as it does for the 64-bit jiffies? The 64-bit jiffies counter is not updated often, so write-acquiring a seqlock on each update is OK. From what I understand, these counters are updated quite often (one each packet transmission or reception?), so write-acquiring on each update would be quite painful. Or did you have something else in mind here? Thanx, Paul