From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932715AbbDJRoK (ORCPT ); Fri, 10 Apr 2015 13:44:10 -0400 Received: from mail-wg0-f44.google.com ([74.125.82.44]:33362 "EHLO mail-wg0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756285AbbDJRoF (ORCPT ); Fri, 10 Apr 2015 13:44:05 -0400 Date: Fri, 10 Apr 2015 19:44:00 +0200 From: Ingo Molnar To: "Paul E. McKenney" Cc: Linus Torvalds , Jason Low , Peter Zijlstra , Davidlohr Bueso , Tim Chen , Aswin Chandramouleeswaran , LKML Subject: Re: [PATCH] mutex: Speed up mutex_spin_on_owner() by not taking the RCU lock Message-ID: <20150410174400.GA6563@gmail.com> References: <20150409053725.GB13871@gmail.com> <1428561611.3506.78.camel@j-VirtualBox> <20150409075311.GA4645@gmail.com> <20150409175652.GI6464@linux.vnet.ibm.com> <20150409183926.GM6464@linux.vnet.ibm.com> <20150410090051.GA28549@gmail.com> <20150410142024.GY6464@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150410142024.GY6464@linux.vnet.ibm.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Paul E. McKenney wrote: > > No RCU overhead, and this is the access to owner->on_cpu: > > > > 69: 49 8b 81 10 c0 ff ff mov -0x3ff0(%r9),%rax > > > > Totally untested and all that, I only built the mutex.o. > > > > What do you think? Am I missing anything? > > I suspect it is good, but let's take a look at Linus' summary of the code: > > rcu_read_lock(); > while (sem->owner == owner) { > if (!owner->on_cpu || need_resched()) > break; > cpu_relax_lowlatency(); > } > rcu_read_unlock(); Note that I patched the mutex case as a prototype, which is more commonly used than rwsem-xadd. But the rwsem case is similar as well. > The cpu_relax_lowlatency() looks to have barrier() semantics, so the > sem->owner should get reloaded every time through the loop. This is > needed, because otherwise the task structure could get freed and > reallocated as something else that happened to have the field at the > ->on_cpu offset always zero, resulting in an infinite loop. So at least with the get_kernel(..., &owner->on_cpu) approach, the get_kernel() copy has barrier semantics as well (it's in assembly), so it will be reloaded in every iteration in a natural fashion. Thanks, Ingo