From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751139AbWA2T5L (ORCPT ); Sun, 29 Jan 2006 14:57:11 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751137AbWA2T5L (ORCPT ); Sun, 29 Jan 2006 14:57:11 -0500 Received: from kanga.kvack.org ([66.96.29.28]:38786 "EHLO kanga.kvack.org") by vger.kernel.org with ESMTP id S1751133AbWA2T5J (ORCPT ); Sun, 29 Jan 2006 14:57:09 -0500 Date: Sun, 29 Jan 2006 14:52:42 -0500 From: Benjamin LaHaise To: Eric Dumazet Cc: Andrew Morton , kiran@scalex86.org, davem@davemloft.net, linux-kernel@vger.kernel.org, shai@scalex86.org, netdev@vger.kernel.org, pravins@calsoftinc.com Subject: Re: [patch 3/4] net: Percpufy frequently used variables -- proto.sockets_allocated Message-ID: <20060129195242.GC28400@kvack.org> References: <20060126190357.GE3651@localhost.localdomain> <43D9DFA1.9070802@cosmosbay.com> <20060127195227.GA3565@localhost.localdomain> <20060127121602.18bc3f25.akpm@osdl.org> <20060127224433.GB3565@localhost.localdomain> <43DAA586.5050609@cosmosbay.com> <20060127151635.3a149fe2.akpm@osdl.org> <43DABAA4.8040208@cosmosbay.com> <20060129004459.GA24099@kvack.org> <43DC6691.9000001@cosmosbay.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <43DC6691.9000001@cosmosbay.com> User-Agent: Mutt/1.4.1i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Jan 29, 2006 at 07:54:09AM +0100, Eric Dumazet wrote: > Well, I think that might be doable, maybe RCU magic ? > > 1) local_t are not that nice on all archs. It is for the users that matter, and the hooks are there if someone finds it to be a performance problem. > 2) The consolidation phase (summing all the cpus local offset to > consolidate the central counter) might be more difficult to do (we would > need kind of 2 counters per cpu, and a index that can be changed by the cpu > that wants a consolidation (still 'expensive')) For the vast majority of these sorts of statistics counters, we don't need 100% accurate counts. And I think it should be possible to choose between a lightweight implementation and the expensive implementation. On a chip like the Core Duo the cost of bouncing between the two cores is minimal, so all the extra code and data is a waste. > 3) Are the locked ops so expensive if done on a cache line that is mostly > in exclusive state in cpu cache ? Yes. What happens on the P4 is that it forces outstanding memory transactions in the reorder buffer to be flushed so that the memory barrier semantics of the lock prefix are observed. This can take a long time as there can be over a hundred instructions in flight. -ben -- "Ladies and gentlemen, I'm sorry to interrupt, but the police are here and they've asked us to stop the party." Don't Email: .