From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758962AbcEFV4l (ORCPT ); Fri, 6 May 2016 17:56:41 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:41466 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758484AbcEFV4k (ORCPT ); Fri, 6 May 2016 17:56:40 -0400 Date: Fri, 6 May 2016 14:56:34 -0700 From: Darren Hart To: Thomas Gleixner Cc: LKML , Sebastian Andrzej Siewior , Linus Torvalds , Darren Hart , Peter Zijlstra , Ingo Molnar , Michael Kerrisk , Davidlohr Bueso , Chris Mason , "Carlos O'Donell" , Torvald Riegel , Eric Dumazet Subject: Re: [patch V2 2/7] futex: Hash private futexes per process Message-ID: <20160506215634.GH48432@f23x64.localdomain> References: <20160505204230.932454245@linutronix.de> <20160505204353.973009518@linutronix.de> <20160506180933.GE48432@f23x64.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160506180933.GE48432@f23x64.localdomain> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 06, 2016 at 11:09:33AM -0700, Darren Hart wrote: > On Thu, May 05, 2016 at 08:44:04PM -0000, Thomas Gleixner wrote: > > From: Sebastian Siewior > > > > The standard futex mechanism in the Linux kernel uses a global hash to store > > transient state. Collisions on that hash can lead to performance degradation > > especially on NUMA systems and on real-time enabled kernels even to priority > > inversions. > > I think it is worth noting the how this causes an unbounded priority inversion > as it wasn't obvious to me. At least mention that "CPU pinning" can result in an > unbounded priority inversion. > > > > > To mitigate that problem we provide per process private hashing. On the first > > futex operation in a process the kernel allocates a hash table. The hash table > > is accessible via the process mm_struct. On Numa systems the hash is allocated > > node local. > > > > If the allocation fails then the global hash table is used as fallback, so > > there is no user space visible impact of this feature. > > > > It would be good to have a way to detect that the process private hash table was > successfully created. Perhaps a /proc/pid/ feature? This would allow us to write > a functional futex test for tools/testing/selftests/futex I suppose we could just use FUTEX_PREALLOC_HASH for this purpose, passing in the default hash size. This will either return the default, the previously set value, or 0, indicating the global hash is being used. That should be sufficient for programatically determining the state of the system. The /proc/pid/futex_hash_size option may still be convenient for other purposes. Perhaps with a -1 indicating it hasn't been set yet. -- Darren Hart Intel Open Source Technology Center