From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754729AbcE3LJd (ORCPT ); Mon, 30 May 2016 07:09:33 -0400 Received: from www.linutronix.de ([62.245.132.108]:53176 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752561AbcE3LJc (ORCPT ); Mon, 30 May 2016 07:09:32 -0400 Subject: Re: [patch V2 2/7] futex: Hash private futexes per process To: Peter Zijlstra , Sebastian Andrzej Siewior References: <20160505204230.932454245@linutronix.de> <20160505204353.973009518@linutronix.de> <20160519122406.GA3192@twins.programming.kicks-ass.net> <20160527171001.GC28561@breakpoint.cc> <20160530085820.GN3192@twins.programming.kicks-ass.net> Cc: Thomas Gleixner , LKML , Linus Torvalds , Darren Hart , Ingo Molnar , Michael Kerrisk , Davidlohr Bueso , Chris Mason , "Carlos O'Donell" , Torvald Riegel , Eric Dumazet From: Sebastian Andrzej Siewior Message-ID: <9e7e4fb5-7b95-7676-d0f9-8e5dc5dc3ba9@linutronix.de> Date: Mon, 30 May 2016 13:08:53 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.1.0 MIME-Version: 1.0 In-Reply-To: <20160530085820.GN3192@twins.programming.kicks-ass.net> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/30/2016 10:58 AM, Peter Zijlstra wrote: > On Fri, May 27, 2016 at 07:10:01PM +0200, Sebastian Andrzej Siewior wrote: >> On 2016-05-19 14:24:06 [+0200], Peter Zijlstra wrote: >>> On Thu, May 05, 2016 at 08:44:04PM -0000, Thomas Gleixner wrote: >>>> +static struct futex_hash_bucket *hash_futex(union futex_key *key) >>>> +{ >>>> +#ifdef CONFIG_FUTEX_PRIVATE_HASH >>>> + struct mm_struct *mm = current->mm; >>>> + unsigned int slot; >>>> + >>>> + /* >>>> + * Futexes which use the per process hash have the lower bits cleared >>>> + */ >>>> + if (key->both.offset & (FUT_OFF_INODE | FUT_OFF_MMSHARED)) >>>> + return hash_global_futex(key); >>>> + >>>> + slot = hash_long(key->private.address, mm->futex_hash.hash_bits); >>>> + return &mm->futex_hash.hash[slot]; >>> >>> Do we want the option to WARN if we get collisions in this per-process >>> hash? >>> >>> Because afaiu there is no guarantee what so ever this doesn't happen, >>> and collisions here can create the very same priority inversions as are >>> possible in the global hash. >>> >>> Less likely etc.. more contained since its only the threads of the one >>> process that get tangled up, but still possible. >> >> Since the collision is contained the same process it is less dramatic. > > Right, but can still cause significant malfunction inside the process. > So its not something to completely ignore. If your room sized CNC > machine gets the priorities of the logging thread and the motor control > thread confused bad things could happen. > >> But how do you want to warn the user? A trace-event would be handy to >> dump the uaddr and slot. > > So I think there's a number of cases: > > - PREALLOC_HASH finds a taken bucket; in this case we can simply return > an error. > - PREALLOC_HASH succeeds, but an on demand hash later hits the same > bucket. This is harder; we could maybe mark all buckets taken by > PREALLOC_HASH and allow for a signal when this collision hits. Dunno. PREALLOC_HASH happens once before any (contended) lock operation. We never rehash the hash. To rehash the hash on runtime we would need an empty futex hash and some locking in the fast path. And rehash seems not to be required since we tried to come up with a sane default value and the user/RT task can set it to the current max value. So back to when does the collision happen. Since glibc visits the kernel only on contention we might learn about the collision when it is too late. We could have a lock operation by thread1 followed by lock operation by thread2 on different uaddr resulting in the same bucket. In this case we learn about this once the spin_lock() operation blocks. Also marking a bucket as taken (on contention) might give false positive results since we know nothing about lock's lifetime (i.e. the lock might have been free()ed). But if I may bring some ideas from v1. In v1 we had "tickets / IDs" for the futex per thread. In v2 we don't have them anymore. We still have the "private" futex hash buckets but per process this time. We could introduce the "tickets / IDs" back and make them process wide. We could hide them in pthread_mutex_init() and pthread_mutex_destroy() since their IDs are no longer thread unique. I think I had something in glibc's pthread variable where we could store 16bit if I split another 32bit variable. That would be guaranteed collision free and hidden in glibc. But it would take some time to get it used since it does no longer work out of the box by updating the kernel. We could also add it later if people scream for it since we can't change the behavior of "PRIVATE" futex (uaddr vs ticket number). >> But due to ASLR >> the same application might result in a different behaviour on each run. > > Yeah, ASLR makes this all somewhat non deterministic, which is why you > really don't want a silent collision for your PREALLOC_HASH buckets. > Because once every 100 runs it does weird,.. I think that you might learn about your collision too late and there is nothing you can do about it. Logging in a logfile or restart the application after 5 minutes of runtime? Not to mention the locks which are contended in an error situation. A futex op returning the kernel's hash value might be sane. You would expose some implementation detail but the application could check its mutex for collision before starting (doing the real work). On collision it would have to restart and hope for the best if the collision from ASLR. But since you don't know about all mutexes, like those which are part of library, you wouldn't never be 100% collision free here. So it is probably a bad idea. >> However, it might be good for a indication about the size of the private >> hash… > > Yeah, now if online resize wasn't such a pain ;-) My point was during development / testing to figure out which initial value is sane / reasonable. Start your APP with 32 slot. Collisions. Again with 128 slots. Oh better. Sebastian