From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754618AbbGAQQx (ORCPT <rfc822;w@1wt.eu>);
	Wed, 1 Jul 2015 12:16:53 -0400
Received: from casper.infradead.org ([85.118.1.10]:40147 "EHLO
	casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1754151AbbGAQQq (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Wed, 1 Jul 2015 12:16:46 -0400
Date: Wed, 1 Jul 2015 18:16:40 +0200
From: Peter Zijlstra <peterz@infradead.org>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>, tj@kernel.org, mingo@redhat.com,
        linux-kernel@vger.kernel.org, der.herr@hofr.at, dave@stgolabs.net,
        riel@redhat.com, viro@ZenIV.linux.org.uk,
        torvalds@linux-foundation.org
Subject: Re: [RFC][PATCH 12/13] stop_machine: Remove lglock
Message-ID: <20150701161640.GK3644@twins.programming.kicks-ass.net>
References: <20150625110734.GX3644@twins.programming.kicks-ass.net>
 <20150625134726.GR3717@linux.vnet.ibm.com>
 <20150625142011.GU19282@twins.programming.kicks-ass.net>
 <20150625145133.GT3717@linux.vnet.ibm.com>
 <20150626123207.GZ19282@twins.programming.kicks-ass.net>
 <20150626161415.GY3717@linux.vnet.ibm.com>
 <20150629075645.GD19282@twins.programming.kicks-ass.net>
 <20150630213258.GO3717@linux.vnet.ibm.com>
 <20150701115642.GU19282@twins.programming.kicks-ass.net>
 <20150701155655.GG3717@linux.vnet.ibm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150701155655.GG3717@linux.vnet.ibm.com>
User-Agent: Mutt/1.5.21 (2012-12-30)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wed, Jul 01, 2015 at 08:56:55AM -0700, Paul E. McKenney wrote:
> On Wed, Jul 01, 2015 at 01:56:42PM +0200, Peter Zijlstra wrote:
> Odd that you have four of eight of the rcuos CPUs with higher consumption
> than the others.  I would expect three of eight.  Are you by chance running
> an eight-core system with hyperthreading disabled in hardware, via boot
> parameter, or via explicit offline?  The real question I have is "is
> nr_cpu_ids equal to 16 rather than to 8?"

It should not, but I'd have to instrument to be sure. Its a regular
4 core + ht part.

model name      : Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz

> Also, do you have nohz_full set?

Nope..

> Just wondering why callback offloading
> is enabled.  (If you want it enabled, fine, but from what I can see your
> workload isn't being helped by it and it does have higher overhead.)

I think this is a distro .config; every time I strip the desktop kernel
I end up needing a driver I hadn't built. Clearly I've not really paid
attention to the RCU options.

> Even if you don't want offloading and do disable it, it would be good to
> reduce the penalty.  Is there something I can do to reduce the overhead
> of waking several kthreads?  Right now, I just do a series of wake_up()
> calls, one for each leader rcuos kthread.
> 
> Oh, are you running v3.10 or some such?  If so, there are some more
> recent RCU changes that can help with this.  They are called out here:

Not that old, but not something recent either. I'll upgrade and see if
it goes away. I really detest rebooting the desktop, but it needs to
happen every so often.

> > Yah, if only we could account it back to whomever caused it :/
> 
> It could be done, but would require increasing the size of rcu_head.
> And would require costly fine-grained timing of callback execution.
> Not something for production systems, I would guess.

Nope :/ I know.

> > What I was talking about was the interaction between the force
> > quiescence state and the poking detectoring that a QS had indeed be
> > started.
> 
> It gets worse.
> 
> Suppose that a grace period is already in progess.  You cannot leverage
> its use of the combining tree because some of the CPUs might have already
> indicated a quiescent state, which means that the current grace period
> won't necessarily wait for all of the CPUs that the concurrent expedited
> grace period needs to wait on.  So you need to kick the current grace
> period, wait for it to complete, wait for the next one to start (with
> all the fun and exciting issues called out earlier), do the expedited
> grace period, then wait for completion.

Ah yes. You do do find the fun cases :-)

> > If you wake it unconditionally, even if there's nothing to do, then yes
> > that'd be a waste of cycles.
> 
> Heh!  You are already complaining about rcu_sched consuming 0.7%
> of your system, and rightfully so.  Increasing this overhead still
> further therefore cannot be considered a good thing unless there is some
> overwhelming benefit.  And I am not seeing that benefit.  Perhaps due
> to a failure of imagination, but until someone enlightens me, I have to
> throttle the wakeups -- or, perhaps better, omit the wakeups entirely.
> 
> Actually, I am not convinced that I should push any of the patches that
> leverage expedited grace periods to help out normal grace periods.

It would seem a shame not to.. I've not yet had time to form a coherent
reply to that thread though.