From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1755039AbbFXCX7 (ORCPT <rfc822;w@1wt.eu>);
	Tue, 23 Jun 2015 22:23:59 -0400
Received: from e33.co.us.ibm.com ([32.97.110.151]:36624 "EHLO
	e33.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752467AbbFXCXv (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Tue, 23 Jun 2015 22:23:51 -0400
X-Helo: d03dlp03.boulder.ibm.com
X-MailFrom: paulmck@linux.vnet.ibm.com
X-RcptTo: linux-kernel@vger.kernel.org
Date: Tue, 23 Jun 2015 19:23:44 -0700
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>, tj@kernel.org, mingo@redhat.com,
        linux-kernel@vger.kernel.org, der.herr@hofr.at, dave@stgolabs.net,
        riel@redhat.com, viro@ZenIV.linux.org.uk,
        torvalds@linux-foundation.org
Subject: Re: [RFC][PATCH 12/13] stop_machine: Remove lglock
Message-ID: <20150624022332.GA16620@linux.vnet.ibm.com>
Reply-To: paulmck@linux.vnet.ibm.com
References: <20150622122256.765619039@infradead.org>
 <20150622222152.GA4460@redhat.com>
 <20150623100932.GB3644@twins.programming.kicks-ass.net>
 <20150623105548.GE18673@twins.programming.kicks-ass.net>
 <20150623112041.GF18673@twins.programming.kicks-ass.net>
 <20150623130826.GG18673@twins.programming.kicks-ass.net>
 <20150623173038.GJ3892@linux.vnet.ibm.com>
 <20150623180411.GF3644@twins.programming.kicks-ass.net>
 <20150623182626.GO3892@linux.vnet.ibm.com>
 <20150623190506.GA7731@linux.vnet.ibm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150623190506.GA7731@linux.vnet.ibm.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-TM-AS-MML: disable
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 15062402-0009-0000-0000-00000BEFB150
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, Jun 23, 2015 at 12:05:06PM -0700, Paul E. McKenney wrote:
> On Tue, Jun 23, 2015 at 11:26:26AM -0700, Paul E. McKenney wrote:
> > On Tue, Jun 23, 2015 at 08:04:11PM +0200, Peter Zijlstra wrote:
> > > On Tue, Jun 23, 2015 at 10:30:38AM -0700, Paul E. McKenney wrote:
> > > > Good, you don't need this because you can check for dynticks later.
> > > > You will need to check for offline CPUs.
> > > 
> > > get_online_cpus()
> > > for_each_online_cpus() {
> > >  ...
> > > }
> > > 
> > > is what the new code does.
> > 
> > Ah, I missed that this was not deleted.
> 
> But get_online_cpus() will re-introduce a deadlock.

And here is an untested patch that applies the gist of your approach,
the series of stop_one_cpu() calls, but without undoing the rest.
I forged your Signed-off-by, please let me know if that doesn't work
for you.  There are a number of simplifications that can be made, but
the basic approach gets a good testing first.

And I just noticed that I forgot to get rid of try_stop_cpus().
Well, there will probably be a test failure or two to handle, so
I can add that in the next version.  ;-)

							Thanx, Paul

------------------------------------------------------------------------

commit 1de96c34b39d840c5fe2689640345ed26f78b8f8
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Tue Jun 23 19:03:45 2015 -0700

    rcu: Switch synchronize_sched_expedited() to stop_one_cpu()
    
    The synchronize_sched_expedited() currently invokes try_stop_cpus(),
    which schedules the stopper kthreads on each online non-idle CPU,
    and waits until all those kthreads are running before letting any
    of them stop.  This is disastrous for real-time workloads, which
    get hit with a preemption that is as long as the longest scheduling
    latency on any CPU, including any non-realtime housekeeping CPUs.
    This commit therefore switches to using stop_one_cpu() on each CPU
    in turn.  This avoids inflicting the worst-case scheduling latency
    on the worst-case CPU onto all other CPUs, and also simplifies the
    code a little bit.
    
    Follow-up commits will simplify the counter-snapshotting algorithm
    and convert a number of the counters that are now protected by the
    new ->expedited_mutex to non-atomic.
    
    Signed-off-by: Peter Zijlstra <peterz@infradead.org>
    [ paulmck: Kept stop_one_cpu(), dropped disabling of "guardrails". ]
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 78d0a87ff354..a30971474134 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -103,6 +103,7 @@ struct rcu_state sname##_state = { \
 	.orphan_nxttail = &sname##_state.orphan_nxtlist, \
 	.orphan_donetail = &sname##_state.orphan_donelist, \
 	.barrier_mutex = __MUTEX_INITIALIZER(sname##_state.barrier_mutex), \
+	.expedited_mutex = __MUTEX_INITIALIZER(sname##_state.expedited_mutex), \
 	.name = RCU_STATE_NAME(sname), \
 	.abbr = sabbr, \
 }
@@ -3357,8 +3358,6 @@ static int synchronize_sched_expedited_cpu_stop(void *data)
  */
 void synchronize_sched_expedited(void)
 {
-	cpumask_var_t cm;
-	bool cma = false;
 	int cpu;
 	long firstsnap, s, snap;
 	int trycount = 0;
@@ -3394,28 +3393,11 @@ void synchronize_sched_expedited(void)
 	}
 	WARN_ON_ONCE(cpu_is_offline(raw_smp_processor_id()));
 
-	/* Offline CPUs, idle CPUs, and any CPU we run on are quiescent. */
-	cma = zalloc_cpumask_var(&cm, GFP_KERNEL);
-	if (cma) {
-		cpumask_copy(cm, cpu_online_mask);
-		cpumask_clear_cpu(raw_smp_processor_id(), cm);
-		for_each_cpu(cpu, cm) {
-			struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
-
-			if (!(atomic_add_return(0, &rdtp->dynticks) & 0x1))
-				cpumask_clear_cpu(cpu, cm);
-		}
-		if (cpumask_weight(cm) == 0)
-			goto all_cpus_idle;
-	}
-
 	/*
 	 * Each pass through the following loop attempts to force a
 	 * context switch on each CPU.
 	 */
-	while (try_stop_cpus(cma ? cm : cpu_online_mask,
-			     synchronize_sched_expedited_cpu_stop,
-			     NULL) == -EAGAIN) {
+	while (!mutex_trylock(&rsp->expedited_mutex)) {
 		put_online_cpus();
 		atomic_long_inc(&rsp->expedited_tryfail);
 
@@ -3425,7 +3407,6 @@ void synchronize_sched_expedited(void)
 			/* ensure test happens before caller kfree */
 			smp_mb__before_atomic(); /* ^^^ */
 			atomic_long_inc(&rsp->expedited_workdone1);
-			free_cpumask_var(cm);
 			return;
 		}
 
@@ -3435,7 +3416,6 @@ void synchronize_sched_expedited(void)
 		} else {
 			wait_rcu_gp(call_rcu_sched);
 			atomic_long_inc(&rsp->expedited_normal);
-			free_cpumask_var(cm);
 			return;
 		}
 
@@ -3445,7 +3425,6 @@ void synchronize_sched_expedited(void)
 			/* ensure test happens before caller kfree */
 			smp_mb__before_atomic(); /* ^^^ */
 			atomic_long_inc(&rsp->expedited_workdone2);
-			free_cpumask_var(cm);
 			return;
 		}
 
@@ -3460,16 +3439,23 @@ void synchronize_sched_expedited(void)
 			/* CPU hotplug operation in flight, use normal GP. */
 			wait_rcu_gp(call_rcu_sched);
 			atomic_long_inc(&rsp->expedited_normal);
-			free_cpumask_var(cm);
 			return;
 		}
 		snap = atomic_long_read(&rsp->expedited_start);
 		smp_mb(); /* ensure read is before try_stop_cpus(). */
 	}
-	atomic_long_inc(&rsp->expedited_stoppedcpus);
 
-all_cpus_idle:
-	free_cpumask_var(cm);
+	/* Stop each CPU that is online, non-idle, and not us. */
+	for_each_online_cpu(cpu) {
+		struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
+
+		/* Skip our CPU and any idle CPUs. */
+		if (raw_smp_processor_id() == cpu ||
+		    !(atomic_add_return(0, &rdtp->dynticks) & 0x1))
+			continue;
+		stop_one_cpu(cpu, synchronize_sched_expedited_cpu_stop, NULL);
+	}
+	atomic_long_inc(&rsp->expedited_stoppedcpus);
 
 	/*
 	 * Everyone up to our most recent fetch is covered by our grace
@@ -3488,6 +3474,7 @@ all_cpus_idle:
 		}
 	} while (atomic_long_cmpxchg(&rsp->expedited_done, s, snap) != s);
 	atomic_long_inc(&rsp->expedited_done_exit);
+	mutex_unlock(&rsp->expedited_mutex);
 
 	put_online_cpus();
 }
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index de22d6d06bf9..b04ffa0dea58 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -478,6 +478,7 @@ struct rcu_state {
 						/*  _rcu_barrier(). */
 	/* End of fields guarded by barrier_mutex. */
 
+	struct mutex  expedited_mutex;		/* Serializes expediting. */
 	atomic_long_t expedited_start;		/* Starting ticket. */
 	atomic_long_t expedited_done;		/* Done ticket. */
 	atomic_long_t expedited_wrap;		/* # near-wrap incidents. */