From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753961AbbGAL45 (ORCPT <rfc822;w@1wt.eu>);
	Wed, 1 Jul 2015 07:56:57 -0400
Received: from casper.infradead.org ([85.118.1.10]:37563 "EHLO
	casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752950AbbGAL4t (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Wed, 1 Jul 2015 07:56:49 -0400
Date: Wed, 1 Jul 2015 13:56:42 +0200
From: Peter Zijlstra <peterz@infradead.org>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>, tj@kernel.org, mingo@redhat.com,
        linux-kernel@vger.kernel.org, der.herr@hofr.at, dave@stgolabs.net,
        riel@redhat.com, viro@ZenIV.linux.org.uk,
        torvalds@linux-foundation.org
Subject: Re: [RFC][PATCH 12/13] stop_machine: Remove lglock
Message-ID: <20150701115642.GU19282@twins.programming.kicks-ass.net>
References: <20150624175830.GS3644@twins.programming.kicks-ass.net>
 <20150625032303.GO3717@linux.vnet.ibm.com>
 <20150625110734.GX3644@twins.programming.kicks-ass.net>
 <20150625134726.GR3717@linux.vnet.ibm.com>
 <20150625142011.GU19282@twins.programming.kicks-ass.net>
 <20150625145133.GT3717@linux.vnet.ibm.com>
 <20150626123207.GZ19282@twins.programming.kicks-ass.net>
 <20150626161415.GY3717@linux.vnet.ibm.com>
 <20150629075645.GD19282@twins.programming.kicks-ass.net>
 <20150630213258.GO3717@linux.vnet.ibm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150630213258.GO3717@linux.vnet.ibm.com>
User-Agent: Mutt/1.5.21 (2012-12-30)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, Jun 30, 2015 at 02:32:58PM -0700, Paul E. McKenney wrote:

> > I had indeed forgotten that got farmed out to the kthread; on which, my
> > poor desktop seems to have spend ~140 minutes of its (most recent)
> > existence poking RCU things.
> > 
> >     7 root      20   0       0      0      0 S   0.0  0.0  56:34.66 rcu_sched
> >     8 root      20   0       0      0      0 S   0.0  0.0  20:58.19 rcuos/0
> >     9 root      20   0       0      0      0 S   0.0  0.0  18:50.75 rcuos/1
> >    10 root      20   0       0      0      0 S   0.0  0.0  18:30.62 rcuos/2
> >    11 root      20   0       0      0      0 S   0.0  0.0  17:33.24 rcuos/3
> >    12 root      20   0       0      0      0 S   0.0  0.0   2:43.54 rcuos/4
> >    13 root      20   0       0      0      0 S   0.0  0.0   3:00.31 rcuos/5
> >    14 root      20   0       0      0      0 S   0.0  0.0   3:09.27 rcuos/6
> >    15 root      20   0       0      0      0 S   0.0  0.0   2:52.98 rcuos/7
> > 
> > Which is almost as much time as my konsole:
> > 
> >  2853 peterz    20   0  586240 103664  41848 S   1.0  0.3 147:39.50 konsole
> > 
> > Which seems somewhat excessive. But who knows.
> 
> No idea.  How long has that system been up?  What has it been doing?

Some 40 odd days it seems. Its my desktop, I read email (in mutt in
Konsole), I type patches (in vim in Konsole), I compile kernels (in
Konsole) etc..

Now konsole is threaded and each new window/tab is just another thread
in the same process so runtime should accumulate. However I just found
that for some obscure reason there's two konsole processes around, and
the other is the one that I'm using most, it also has significantly more
runtime.

 3264 ?        Sl   452:43          \_ /usr/bin/konsole

Must be some of that brain damaged desktop shite that confused things --
I see the one is stared with some -session argument. Some day I'll
discover how to destroy all that nonsense and make things behave as they
should.

> The rcu_sched overhead is expected behavior if the system has run between
> ten and one hundred million grace periods, give or take an order of
> magnitude depending on the number of idle CPUs and so on.
> 
> The overhead for the RCU offload kthreads is what it is.  A kfree() takes
> as much time as a kfree does, and they are all nicely counted up for you.

Yah, if only we could account it back to whomever caused it :/

> > Although here I'll once again go ahead and say something ignorant; how
> > come that's a problem? Surely if we know the kthread thing has finished
> > starting a GP, any one CPU issuing a full memory barrier (as would be
> > implied by switching to the stop worker) must then indeed observe that
> > global state? due to that transitivity thing.
> > 
> > That is, I'm having a wee bit of bother for seeing how you'd need
> > manipulation of global variables as you elude to below.
> 
> Well, I thought that you wanted to leverage the combining tree to
> determine when the grace period had completed.  If a given CPU isn't
> pushing its quiescent states up the combining tree, then the combining
> tree can't do much for you.

Right that is what I wanted, and sure the combining thing needs to
happen with atomics, but that's not new, it already does that.

What I was talking about was the interaction between the force
quiescence state and the poking detectoring that a QS had indeed be
started.

> Well, I do have something that seems reasonably straightforward.  Sending
> the patches along separately.  Not sure that it is worth its weight.
> 
> The idea is that we keep the expedited grace periods working as they do
> now, independently of the normal grace period.  The normal grace period
> takes a sequence number just after initialization, and checks to see
> if an expedited grace period happened in the meantime at the beginning
> of each quiescent-state forcing episode.  This saves the last one or
> two quiescent-state forcing scans if the case where an expedited grace
> period really did happen.
> 
> It is possible for the expedited grace period to help things along by
> waking up the grace-period kthread, but of course doing this too much
> further increases the time consumed by your rcu_sched kthread. 

Ah so that is the purpose of that patch. Still, I'm having trouble
seeing how you can do this too much, you would only be waking it if
there was a GP pending completion, right? At which point waking it is
the right thing.

If you wake it unconditionally, even if there's nothing to do, then yes
that'd be a waste of cycles.