From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756043AbZCXFck (ORCPT ); Tue, 24 Mar 2009 01:32:40 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753272AbZCXFca (ORCPT ); Tue, 24 Mar 2009 01:32:30 -0400 Received: from gw1.cosmosbay.com ([212.99.114.194]:56221 "EHLO gw1.cosmosbay.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752766AbZCXFc3 convert rfc822-to-8bit (ORCPT ); Tue, 24 Mar 2009 01:32:29 -0400 Message-ID: <49C87017.1000408@cosmosbay.com> Date: Tue, 24 Mar 2009 06:31:03 +0100 From: Eric Dumazet User-Agent: Thunderbird 2.0.0.21 (Windows/20090302) MIME-Version: 1.0 To: Ravikiran G Thirumalai CC: linux-kernel@vger.kernel.org, Ingo Molnar , shai@scalex86.org, Andrew Morton Subject: Re: [rfc] [patch 1/2 ] Process private hash tables for private futexes References: <20090321044637.GA7278@localdomain> <49C4AE64.4060400@cosmosbay.com> <20090322045414.GD7278@localdomain> <49C5F3FD.9010606@cosmosbay.com> <20090323202837.GE7278@localdomain> <49C805E3.4060808@cosmosbay.com> <20090324031925.GF7278@localdomain> In-Reply-To: <20090324031925.GF7278@localdomain> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (gw1.cosmosbay.com [0.0.0.0]); Tue, 24 Mar 2009 06:31:04 +0100 (CET) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Ravikiran G Thirumalai a écrit : > On Mon, Mar 23, 2009 at 10:57:55PM +0100, Eric Dumazet wrote: >> Ravikiran G Thirumalai a écrit : > [...] >>> Hmm! How about >>> a) Reduce hash table size for private futex hash and increase hash table >>> size for the global hash? >>> >>> OR, better, >>> >>> b) Since it is multiple spinlocks on the same cacheline which is a PITA >>> here, how about keeping the global table, but just add a dereference >>> to each hash slot, and interleave the adjacent hash buckets between >>> nodes/cpus? So even without needing to lose out memory from padding, >>> we avoid false sharing on cachelines due to unrelated futexes hashing >>> onto adjacent hash buckets? >>> >> Because of jhash(), futex slots are almost random. No need to try to interleave >> them. Since you have a "cache line" of 4096 bytes, you need almost 4 pages >> per cpu to avoid in a statistical way false sharing. > > How did you come up with that number? So there is no way adjacent > cachelines will never ever be used in the global hash?? > > There is no way, unless you use one chain per 4096 bytes block, and that would be a waste of memory. Its about statistics (assuming jhash gives us normally distributed values), not hard and guaranted constraints. http://en.wikipedia.org/wiki/Standard_deviation For a particular process, you can carefully chose (knowing hash function in use by kernel), user land futexes that will *all* be hashed in a *single* hash chain, protected by a single spinlock. This process will then suffer from contention, using private futexes and private or shared hash table. Even without knowing hash function, there is still a small chance that it can happen. Please note that the hash table size has several purposes, first one being storing potentially many futexes (say 30000 threads are waiting on different futexes), and being able to avoid cache line ping pongs or misses. This is why I chose at least 256 slots per cpu, and not 4 cache lines per cpu.