From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754618AbbGAQQx (ORCPT ); Wed, 1 Jul 2015 12:16:53 -0400 Received: from casper.infradead.org ([85.118.1.10]:40147 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754151AbbGAQQq (ORCPT ); Wed, 1 Jul 2015 12:16:46 -0400 Date: Wed, 1 Jul 2015 18:16:40 +0200 From: Peter Zijlstra To: "Paul E. McKenney" Cc: Oleg Nesterov , tj@kernel.org, mingo@redhat.com, linux-kernel@vger.kernel.org, der.herr@hofr.at, dave@stgolabs.net, riel@redhat.com, viro@ZenIV.linux.org.uk, torvalds@linux-foundation.org Subject: Re: [RFC][PATCH 12/13] stop_machine: Remove lglock Message-ID: <20150701161640.GK3644@twins.programming.kicks-ass.net> References: <20150625110734.GX3644@twins.programming.kicks-ass.net> <20150625134726.GR3717@linux.vnet.ibm.com> <20150625142011.GU19282@twins.programming.kicks-ass.net> <20150625145133.GT3717@linux.vnet.ibm.com> <20150626123207.GZ19282@twins.programming.kicks-ass.net> <20150626161415.GY3717@linux.vnet.ibm.com> <20150629075645.GD19282@twins.programming.kicks-ass.net> <20150630213258.GO3717@linux.vnet.ibm.com> <20150701115642.GU19282@twins.programming.kicks-ass.net> <20150701155655.GG3717@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150701155655.GG3717@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 01, 2015 at 08:56:55AM -0700, Paul E. McKenney wrote: > On Wed, Jul 01, 2015 at 01:56:42PM +0200, Peter Zijlstra wrote: > Odd that you have four of eight of the rcuos CPUs with higher consumption > than the others. I would expect three of eight. Are you by chance running > an eight-core system with hyperthreading disabled in hardware, via boot > parameter, or via explicit offline? The real question I have is "is > nr_cpu_ids equal to 16 rather than to 8?" It should not, but I'd have to instrument to be sure. Its a regular 4 core + ht part. model name : Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz > Also, do you have nohz_full set? Nope.. > Just wondering why callback offloading > is enabled. (If you want it enabled, fine, but from what I can see your > workload isn't being helped by it and it does have higher overhead.) I think this is a distro .config; every time I strip the desktop kernel I end up needing a driver I hadn't built. Clearly I've not really paid attention to the RCU options. > Even if you don't want offloading and do disable it, it would be good to > reduce the penalty. Is there something I can do to reduce the overhead > of waking several kthreads? Right now, I just do a series of wake_up() > calls, one for each leader rcuos kthread. > > Oh, are you running v3.10 or some such? If so, there are some more > recent RCU changes that can help with this. They are called out here: Not that old, but not something recent either. I'll upgrade and see if it goes away. I really detest rebooting the desktop, but it needs to happen every so often. > > Yah, if only we could account it back to whomever caused it :/ > > It could be done, but would require increasing the size of rcu_head. > And would require costly fine-grained timing of callback execution. > Not something for production systems, I would guess. Nope :/ I know. > > What I was talking about was the interaction between the force > > quiescence state and the poking detectoring that a QS had indeed be > > started. > > It gets worse. > > Suppose that a grace period is already in progess. You cannot leverage > its use of the combining tree because some of the CPUs might have already > indicated a quiescent state, which means that the current grace period > won't necessarily wait for all of the CPUs that the concurrent expedited > grace period needs to wait on. So you need to kick the current grace > period, wait for it to complete, wait for the next one to start (with > all the fun and exciting issues called out earlier), do the expedited > grace period, then wait for completion. Ah yes. You do do find the fun cases :-) > > If you wake it unconditionally, even if there's nothing to do, then yes > > that'd be a waste of cycles. > > Heh! You are already complaining about rcu_sched consuming 0.7% > of your system, and rightfully so. Increasing this overhead still > further therefore cannot be considered a good thing unless there is some > overwhelming benefit. And I am not seeing that benefit. Perhaps due > to a failure of imagination, but until someone enlightens me, I have to > throttle the wakeups -- or, perhaps better, omit the wakeups entirely. > > Actually, I am not convinced that I should push any of the patches that > leverage expedited grace periods to help out normal grace periods. It would seem a shame not to.. I've not yet had time to form a coherent reply to that thread though.