From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752113AbbEDUPF (ORCPT ); Mon, 4 May 2015 16:15:05 -0400 Received: from mx1.redhat.com ([209.132.183.28]:47039 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751012AbbEDUPD (ORCPT ); Mon, 4 May 2015 16:15:03 -0400 Message-ID: <5547D2FE.9010806@redhat.com> Date: Mon, 04 May 2015 16:13:50 -0400 From: Rik van Riel User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: paulmck@linux.vnet.ibm.com CC: Paolo Bonzini , Ingo Molnar , Andy Lutomirski , "linux-kernel@vger.kernel.org" , X86 ML , williams@redhat.com, Andrew Lutomirski , fweisbec@redhat.com, Peter Zijlstra , Heiko Carstens , Thomas Gleixner , Ingo Molnar , Linus Torvalds Subject: Re: question about RCU dynticks_nesting References: <20150501163431.GB1327@gmail.com> <5543C05E.9040209@redhat.com> <20150501184025.GA2114@gmail.com> <5543CFE5.1030509@redhat.com> <20150502052733.GA9983@gmail.com> <55473B47.6080600@redhat.com> <55479749.7070608@redhat.com> <20150504183906.GS5381@linux.vnet.ibm.com> <5547CAED.9010201@redhat.com> <20150504200232.GB5381@linux.vnet.ibm.com> In-Reply-To: <20150504200232.GB5381@linux.vnet.ibm.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/04/2015 04:02 PM, Paul E. McKenney wrote: > On Mon, May 04, 2015 at 03:39:25PM -0400, Rik van Riel wrote: >> On 05/04/2015 02:39 PM, Paul E. McKenney wrote: >>> On Mon, May 04, 2015 at 11:59:05AM -0400, Rik van Riel wrote: >> >>>> In fact, would we be able to simply use tsk->rcu_read_lock_nesting >>>> as an indicator of whether or not we should bother waiting on that >>>> task or CPU when doing synchronize_rcu? >>> >>> Depends on exactly what you are asking. If you are asking if I could add >>> a few more checks to preemptible RCU and speed up grace-period detection >>> in a number of cases, the answer is very likely "yes". This is on my >>> list, but not particularly high priority. If you are asking whether >>> CPU 0 could access ->rcu_read_lock_nesting of some task running on >>> some other CPU, in theory, the answer is "yes", but in practice that >>> would require putting full memory barriers in both rcu_read_lock() >>> and rcu_read_unlock(), so the real answer is "no". >>> >>> Or am I missing your point? >> >> The main question is "how can we greatly reduce the overhead >> of nohz_full, by simplifying the RCU extended quiescent state >> code called in the syscall fast path, and maybe piggyback on >> that to do time accounting for remote CPUs?" >> >> Your memory barrier answer above makes it clear we will still >> want to do the RCU stuff at syscall entry & exit time, at least >> on x86, where we already have automatic and implicit memory >> barriers. > > We do need to keep in mind that x86's automatic and implicit memory > barriers do not order prior stores against later loads. > > Hmmm... But didn't earlier performance measurements show that the bulk of > the overhead was the delta-time computations rather than RCU accounting? The bulk of the overhead was disabling and re-enabling irqs around the calls to rcu_user_exit and rcu_user_enter :) Of the remaining time, about 2/3 seems to be the vtime stuff, and the other 1/3 the rcu code. I suspect it makes sense to optimize both, though the vtime code may be the easiest :) -- All rights reversed