From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932930AbbEEFy1 (ORCPT ); Tue, 5 May 2015 01:54:27 -0400 Received: from e39.co.us.ibm.com ([32.97.110.160]:48851 "EHLO e39.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752256AbbEEFyT (ORCPT ); Tue, 5 May 2015 01:54:19 -0400 Date: Mon, 4 May 2015 22:54:13 -0700 From: "Paul E. McKenney" To: Rik van Riel Cc: Paolo Bonzini , Ingo Molnar , Andy Lutomirski , "linux-kernel@vger.kernel.org" , X86 ML , williams@redhat.com, Andrew Lutomirski , fweisbec@redhat.com, Peter Zijlstra , Heiko Carstens , Thomas Gleixner , Ingo Molnar , Linus Torvalds Subject: Re: question about RCU dynticks_nesting Message-ID: <20150505055413.GJ5381@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20150502052733.GA9983@gmail.com> <55473B47.6080600@redhat.com> <55479749.7070608@redhat.com> <20150504183906.GS5381@linux.vnet.ibm.com> <5547CAED.9010201@redhat.com> <20150504200232.GB5381@linux.vnet.ibm.com> <5547D2FE.9010806@redhat.com> <20150504203801.GG5381@linux.vnet.ibm.com> <5547DC3C.1000504@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5547DC3C.1000504@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15050505-0033-0000-0000-0000046A1B90 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 04, 2015 at 04:53:16PM -0400, Rik van Riel wrote: > On 05/04/2015 04:38 PM, Paul E. McKenney wrote: > > On Mon, May 04, 2015 at 04:13:50PM -0400, Rik van Riel wrote: > >> On 05/04/2015 04:02 PM, Paul E. McKenney wrote: > > >>> Hmmm... But didn't earlier performance measurements show that the bulk of > >>> the overhead was the delta-time computations rather than RCU accounting? > >> > >> The bulk of the overhead was disabling and re-enabling > >> irqs around the calls to rcu_user_exit and rcu_user_enter :) > > > > Really??? OK... How about software irq masking? (I know, that is > > probably a bit of a scary change as well.) > > > >> Of the remaining time, about 2/3 seems to be the vtime > >> stuff, and the other 1/3 the rcu code. > > > > OK, worth some thought, then. > > > >> I suspect it makes sense to optimize both, though the > >> vtime code may be the easiest :) > > > > Making a crude version that does jiffies (or whatever) instead of > > fine-grained computations might give good bang for the buck. ;-) > > Ingo's idea is to simply have cpu 0 check the current task > on all other CPUs, see whether that task is running in system > mode, user mode, guest mode, irq mode, etc and update that > task's vtime accordingly. > > I suspect the runqueue lock is probably enough to do that, > and between rcu state and PF_VCPU we probably have enough > information to see what mode the task is running in, with > just remote memory reads. > > I looked at implementing the vtime bits (and am pretty sure > how to do those now), and then spent some hours looking at > the RCU bits, to see if we could not simplify both things at > once, especially considering that the current RCU context > tracking bits need to be called with irqs disabled. Remotely sampling the vtime info without memory barriers makes sense. After all, the result is statistical anyway. Unfortunately, as noted earlier, RCU correctness depends on ordering. The current RCU idle entry/exit code most definitely absolutely requires irqs be disabled. However, I will see if that can be changed. No promises, especially no short-term promises, but it does not feel impossible. You have RCU_FAST_NO_HZ=y, correct? Could you please try measuring with RCU_FAST_NO_HZ=n? If that has a significant effect, easy quick win is turning it off -- and I could then make it a boot parameter to get you back to one kernel for everyone. (The existing tick_nohz_active boot parameter already turns it off, but also turns off dyntick idle, which might be a bit excessive.) Or if there is some way that the kernel can know that the system is currently running on battery or some such. Thanx, Paul