From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1422669AbWA0W7V (ORCPT ); Fri, 27 Jan 2006 17:59:21 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1422667AbWA0W7V (ORCPT ); Fri, 27 Jan 2006 17:59:21 -0500 Received: from smtp.osdl.org ([65.172.181.4]:18817 "EHLO smtp.osdl.org") by vger.kernel.org with ESMTP id S1422664AbWA0W7U (ORCPT ); Fri, 27 Jan 2006 17:59:20 -0500 Date: Fri, 27 Jan 2006 15:01:06 -0800 From: Andrew Morton To: Ravikiran G Thirumalai Cc: dada1@cosmosbay.com, davem@davemloft.net, linux-kernel@vger.kernel.org, shai@scalex86.org, netdev@vger.kernel.org, pravins@calsoftinc.com Subject: Re: [patch 3/4] net: Percpufy frequently used variables -- proto.sockets_allocated Message-Id: <20060127150106.38b9e041.akpm@osdl.org> In-Reply-To: <20060127224433.GB3565@localhost.localdomain> References: <20060126185649.GB3651@localhost.localdomain> <20060126190357.GE3651@localhost.localdomain> <43D9DFA1.9070802@cosmosbay.com> <20060127195227.GA3565@localhost.localdomain> <20060127121602.18bc3f25.akpm@osdl.org> <20060127224433.GB3565@localhost.localdomain> X-Mailer: Sylpheed version 1.0.0 (GTK+ 1.2.10; i386-vine-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Ravikiran G Thirumalai wrote: > > On Fri, Jan 27, 2006 at 12:16:02PM -0800, Andrew Morton wrote: > > Ravikiran G Thirumalai wrote: > > > > > > which can be assumed as not frequent. > > > At sk_stream_mem_schedule(), read_sockets_allocated() is invoked only > > > certain conditions, under memory pressure -- on a large CPU count machine, > > > you'd have large memory, and I don't think read_sockets_allocated would get > > > called often. It did not atleast on our 8cpu/16G box. So this should be OK > > > I think. > > > > That being said, the percpu_counters aren't a terribly successful concept > > and probably do need a revisit due to the high inaccuracy at high CPU > > counts. It might be better to do some generic version of vm_acct_memory() > > instead. > > AFAICS vm_acct_memory is no better. The deviation on large cpu counts is the > same as percpu_counters -- (NR_CPUS * NR_CPUS * 2) ... I suppose so. Except vm_acct_memory() has #define ACCT_THRESHOLD max(16, NR_CPUS * 2) But if we were to perform similar tuning to percpu_counter, yes, they're pretty similar. Oh, and because vm_acct_memory() is counting a singleton object, it can use DEFINE_PER_CPU rather than alloc_percpu(), so it saves on a bit of kmalloc overhead. > > > > If the benchmarks say that we need to. If we cannot observe any problems > > in testing of existing code and if we can't demonstrate any benefit from > > the patched code then one option is to go off and do something else ;) > > We first tried plain per-CPU counters for memory_allocated, found that reads > on memory_allocated was causing cacheline transfers, and then > switched over to batching. So batching reads is useful. To avoid > inaccuracy, we can maybe change percpu_counter_init to: > > void percpu_counter_init(struct percpu_counter *fbc, int maxdev) > > the percpu batching limit would then be maxdev/num_possible_cpus. One would > use batching counters only when both reads and writes are frequent. With > the above scheme, we would go fetch cachelines from other cpus for read > often only on large cpu counts, which is not any worse than the global > counter alternative, but it would still be beneficial on smaller machines, > without sacrificing a pre-set deviation. > > Comments? Sounds sane.