From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755039AbbFXCX7 (ORCPT ); Tue, 23 Jun 2015 22:23:59 -0400 Received: from e33.co.us.ibm.com ([32.97.110.151]:36624 "EHLO e33.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752467AbbFXCXv (ORCPT ); Tue, 23 Jun 2015 22:23:51 -0400 X-Helo: d03dlp03.boulder.ibm.com X-MailFrom: paulmck@linux.vnet.ibm.com X-RcptTo: linux-kernel@vger.kernel.org Date: Tue, 23 Jun 2015 19:23:44 -0700 From: "Paul E. McKenney" To: Peter Zijlstra Cc: Oleg Nesterov , tj@kernel.org, mingo@redhat.com, linux-kernel@vger.kernel.org, der.herr@hofr.at, dave@stgolabs.net, riel@redhat.com, viro@ZenIV.linux.org.uk, torvalds@linux-foundation.org Subject: Re: [RFC][PATCH 12/13] stop_machine: Remove lglock Message-ID: <20150624022332.GA16620@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20150622122256.765619039@infradead.org> <20150622222152.GA4460@redhat.com> <20150623100932.GB3644@twins.programming.kicks-ass.net> <20150623105548.GE18673@twins.programming.kicks-ass.net> <20150623112041.GF18673@twins.programming.kicks-ass.net> <20150623130826.GG18673@twins.programming.kicks-ass.net> <20150623173038.GJ3892@linux.vnet.ibm.com> <20150623180411.GF3644@twins.programming.kicks-ass.net> <20150623182626.GO3892@linux.vnet.ibm.com> <20150623190506.GA7731@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150623190506.GA7731@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15062402-0009-0000-0000-00000BEFB150 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 23, 2015 at 12:05:06PM -0700, Paul E. McKenney wrote: > On Tue, Jun 23, 2015 at 11:26:26AM -0700, Paul E. McKenney wrote: > > On Tue, Jun 23, 2015 at 08:04:11PM +0200, Peter Zijlstra wrote: > > > On Tue, Jun 23, 2015 at 10:30:38AM -0700, Paul E. McKenney wrote: > > > > Good, you don't need this because you can check for dynticks later. > > > > You will need to check for offline CPUs. > > > > > > get_online_cpus() > > > for_each_online_cpus() { > > > ... > > > } > > > > > > is what the new code does. > > > > Ah, I missed that this was not deleted. > > But get_online_cpus() will re-introduce a deadlock. And here is an untested patch that applies the gist of your approach, the series of stop_one_cpu() calls, but without undoing the rest. I forged your Signed-off-by, please let me know if that doesn't work for you. There are a number of simplifications that can be made, but the basic approach gets a good testing first. And I just noticed that I forgot to get rid of try_stop_cpus(). Well, there will probably be a test failure or two to handle, so I can add that in the next version. ;-) Thanx, Paul ------------------------------------------------------------------------ commit 1de96c34b39d840c5fe2689640345ed26f78b8f8 Author: Peter Zijlstra Date: Tue Jun 23 19:03:45 2015 -0700 rcu: Switch synchronize_sched_expedited() to stop_one_cpu() The synchronize_sched_expedited() currently invokes try_stop_cpus(), which schedules the stopper kthreads on each online non-idle CPU, and waits until all those kthreads are running before letting any of them stop. This is disastrous for real-time workloads, which get hit with a preemption that is as long as the longest scheduling latency on any CPU, including any non-realtime housekeeping CPUs. This commit therefore switches to using stop_one_cpu() on each CPU in turn. This avoids inflicting the worst-case scheduling latency on the worst-case CPU onto all other CPUs, and also simplifies the code a little bit. Follow-up commits will simplify the counter-snapshotting algorithm and convert a number of the counters that are now protected by the new ->expedited_mutex to non-atomic. Signed-off-by: Peter Zijlstra [ paulmck: Kept stop_one_cpu(), dropped disabling of "guardrails". ] Signed-off-by: Paul E. McKenney diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 78d0a87ff354..a30971474134 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -103,6 +103,7 @@ struct rcu_state sname##_state = { \ .orphan_nxttail = &sname##_state.orphan_nxtlist, \ .orphan_donetail = &sname##_state.orphan_donelist, \ .barrier_mutex = __MUTEX_INITIALIZER(sname##_state.barrier_mutex), \ + .expedited_mutex = __MUTEX_INITIALIZER(sname##_state.expedited_mutex), \ .name = RCU_STATE_NAME(sname), \ .abbr = sabbr, \ } @@ -3357,8 +3358,6 @@ static int synchronize_sched_expedited_cpu_stop(void *data) */ void synchronize_sched_expedited(void) { - cpumask_var_t cm; - bool cma = false; int cpu; long firstsnap, s, snap; int trycount = 0; @@ -3394,28 +3393,11 @@ void synchronize_sched_expedited(void) } WARN_ON_ONCE(cpu_is_offline(raw_smp_processor_id())); - /* Offline CPUs, idle CPUs, and any CPU we run on are quiescent. */ - cma = zalloc_cpumask_var(&cm, GFP_KERNEL); - if (cma) { - cpumask_copy(cm, cpu_online_mask); - cpumask_clear_cpu(raw_smp_processor_id(), cm); - for_each_cpu(cpu, cm) { - struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu); - - if (!(atomic_add_return(0, &rdtp->dynticks) & 0x1)) - cpumask_clear_cpu(cpu, cm); - } - if (cpumask_weight(cm) == 0) - goto all_cpus_idle; - } - /* * Each pass through the following loop attempts to force a * context switch on each CPU. */ - while (try_stop_cpus(cma ? cm : cpu_online_mask, - synchronize_sched_expedited_cpu_stop, - NULL) == -EAGAIN) { + while (!mutex_trylock(&rsp->expedited_mutex)) { put_online_cpus(); atomic_long_inc(&rsp->expedited_tryfail); @@ -3425,7 +3407,6 @@ void synchronize_sched_expedited(void) /* ensure test happens before caller kfree */ smp_mb__before_atomic(); /* ^^^ */ atomic_long_inc(&rsp->expedited_workdone1); - free_cpumask_var(cm); return; } @@ -3435,7 +3416,6 @@ void synchronize_sched_expedited(void) } else { wait_rcu_gp(call_rcu_sched); atomic_long_inc(&rsp->expedited_normal); - free_cpumask_var(cm); return; } @@ -3445,7 +3425,6 @@ void synchronize_sched_expedited(void) /* ensure test happens before caller kfree */ smp_mb__before_atomic(); /* ^^^ */ atomic_long_inc(&rsp->expedited_workdone2); - free_cpumask_var(cm); return; } @@ -3460,16 +3439,23 @@ void synchronize_sched_expedited(void) /* CPU hotplug operation in flight, use normal GP. */ wait_rcu_gp(call_rcu_sched); atomic_long_inc(&rsp->expedited_normal); - free_cpumask_var(cm); return; } snap = atomic_long_read(&rsp->expedited_start); smp_mb(); /* ensure read is before try_stop_cpus(). */ } - atomic_long_inc(&rsp->expedited_stoppedcpus); -all_cpus_idle: - free_cpumask_var(cm); + /* Stop each CPU that is online, non-idle, and not us. */ + for_each_online_cpu(cpu) { + struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu); + + /* Skip our CPU and any idle CPUs. */ + if (raw_smp_processor_id() == cpu || + !(atomic_add_return(0, &rdtp->dynticks) & 0x1)) + continue; + stop_one_cpu(cpu, synchronize_sched_expedited_cpu_stop, NULL); + } + atomic_long_inc(&rsp->expedited_stoppedcpus); /* * Everyone up to our most recent fetch is covered by our grace @@ -3488,6 +3474,7 @@ all_cpus_idle: } } while (atomic_long_cmpxchg(&rsp->expedited_done, s, snap) != s); atomic_long_inc(&rsp->expedited_done_exit); + mutex_unlock(&rsp->expedited_mutex); put_online_cpus(); } diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index de22d6d06bf9..b04ffa0dea58 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -478,6 +478,7 @@ struct rcu_state { /* _rcu_barrier(). */ /* End of fields guarded by barrier_mutex. */ + struct mutex expedited_mutex; /* Serializes expediting. */ atomic_long_t expedited_start; /* Starting ticket. */ atomic_long_t expedited_done; /* Done ticket. */ atomic_long_t expedited_wrap; /* # near-wrap incidents. */