From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753529AbbFXOHN (ORCPT ); Wed, 24 Jun 2015 10:07:13 -0400 Received: from e37.co.us.ibm.com ([32.97.110.158]:32986 "EHLO e37.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752803AbbFXOHG (ORCPT ); Wed, 24 Jun 2015 10:07:06 -0400 X-Helo: d03dlp02.boulder.ibm.com X-MailFrom: paulmck@linux.vnet.ibm.com X-RcptTo: linux-kernel@vger.kernel.org Date: Wed, 24 Jun 2015 07:03:11 -0700 From: "Paul E. McKenney" To: Ingo Molnar Cc: Peter Zijlstra , Oleg Nesterov , tj@kernel.org, mingo@redhat.com, linux-kernel@vger.kernel.org, der.herr@hofr.at, dave@stgolabs.net, riel@redhat.com, viro@ZenIV.linux.org.uk, torvalds@linux-foundation.org Subject: Re: [RFC][PATCH 12/13] stop_machine: Remove lglock Message-ID: <20150624140243.GC3892@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20150623105548.GE18673@twins.programming.kicks-ass.net> <20150623112041.GF18673@twins.programming.kicks-ass.net> <20150623130826.GG18673@twins.programming.kicks-ass.net> <20150623173038.GJ3892@linux.vnet.ibm.com> <20150623180411.GF3644@twins.programming.kicks-ass.net> <20150623182626.GO3892@linux.vnet.ibm.com> <20150624073503.GH3644@twins.programming.kicks-ass.net> <20150624084248.GA27873@gmail.com> <20150624133859.GA3892@linux.vnet.ibm.com> <20150624134337.GA10662@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150624134337.GA10662@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15062414-0025-0000-0000-00000F0171FA Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 24, 2015 at 03:43:37PM +0200, Ingo Molnar wrote: > > * Paul E. McKenney wrote: > > > On Wed, Jun 24, 2015 at 10:42:48AM +0200, Ingo Molnar wrote: > > > > > > * Peter Zijlstra wrote: > > > > > > > On Tue, Jun 23, 2015 at 11:26:26AM -0700, Paul E. McKenney wrote: > > > > > > > > > > > > I really think you're making that expedited nonsense far too accessible. > > > > > > > > > > This has nothing to do with accessibility and everything to do with > > > > > robustness. And with me not becoming the triage center for too many non-RCU > > > > > bugs. > > > > > > > > But by making it so you're rewarding abuse instead of flagging it :-( > > > > > > Btw., being a 'triage center' is the bane of APIs that are overly successful, > > > so we should take that burden with pride! :-) > > > > I will gladly accept that compliment. > > > > And the burden. But, lazy as I am, I intend to automate it. ;-) > > lol :) > > > > Lockdep (and the scheduler APIs as well) frequently got into such situations as > > > well, and we mostly solved it by being more informative with debug splats. > > > > > > I don't think a kernel API should (ever!) stay artificially silent, just for fear > > > of flagging too many problems in other code. > > > > I agree, as attested by RCU CPU stall warnings, lockdep-RCU, sparse-based > > RCU checks, and the object-debug-based checks for double call_rcu(). > > That said, in all of these cases, including your example of lockdep, > > the diagnostic is a debug splat rather than a mutex-contention meltdown. > > And it is the mutex-contention meltdown that I will continue making > > synchronize_sched_expedited() avoid. > > > > But given the change from bulk try_stop_cpus() to either stop_one_cpu() or > > IPIs, it would not be hard to splat if a given CPU didn't come back fast > > enough. The latency tracer would of course provide better information, > > but synchronize_sched_expedited() could do a coarse-grained job with > > less setup required. > > > > My first guess for the timeout would be something like 500 milliseconds. > > Thoughts? > > So I'd start with 5,000 milliseconds and observe the results first ... Sounds good, especially when I recall that the default RCU CPU stall warning timeout is 21,000 milliseconds... ;-) Thanx, Paul