From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758013Ab2C1Nio (ORCPT ); Wed, 28 Mar 2012 09:38:44 -0400 Received: from mail-wi0-f172.google.com ([209.85.212.172]:40762 "EHLO mail-wi0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757605Ab2C1Nin (ORCPT ); Wed, 28 Mar 2012 09:38:43 -0400 Date: Wed, 28 Mar 2012 15:38:37 +0200 From: Frederic Weisbecker To: Gilad Ben-Yossef Cc: Christoph Lameter , LKML , linaro-sched-sig@lists.linaro.org, Alessio Igor Bogani , Andrew Morton , Avi Kivity , Chris Metcalf , Daniel Lezcano , Geoff Levand , Ingo Molnar , Max Krasnyansky , "Paul E. McKenney" , Peter Zijlstra , Stephen Hemminger , Steven Rostedt , Sven-Thorsten Dietrich , Thomas Gleixner , Zen Lin Subject: Re: [PATCH 11/32] nohz/cpuset: Don't turn off the tick if rcu needs it Message-ID: <20120328133835.GF17189@somewhere.redhat.com> References: <1332338318-5958-1-git-send-email-fweisbec@gmail.com> <1332338318-5958-13-git-send-email-fweisbec@gmail.com> <20120328123912.GE17189@somewhere.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 28, 2012 at 02:57:44PM +0200, Gilad Ben-Yossef wrote: > On Wed, Mar 28, 2012 at 2:39 PM, Frederic Weisbecker wrote: > > On Tue, Mar 27, 2012 at 05:21:34PM +0200, Gilad Ben-Yossef wrote: > >> On Thu, Mar 22, 2012 at 6:18 PM, Christoph Lameter wrote: > >> > On Thu, 22 Mar 2012, Gilad Ben-Yossef wrote: > >> > > >> >> > Is there any way for userspace to know that the tick is not off yet due to > >> >> > this? It would make sense for us to have busy loop in user space that > >> >> > waits until the OS has completed all processing if that avoids future > >> >> > latencies for the application. > >> >> > > >> >> > >> >> I previously suggested having the user register to receive a signal > >> >> when the tick > >> >> is turned off. Since the tick is always turned off the user task is > >> >> the current task > >> >> by design, *I think* you can simply mark the signal pending when you > >> >> turn the tick off. > >> > > >> > Ok that sounds good. You would define a new signal for this? > >> > > >> > >> My gut instinct is to let the process register with a specific signal > >> (properly the RT range) > >> it wants to receive when the tick goes off and/or on. > > > > Note the signal itself could trigger an event that could restart the tick. > > Calling call_rcu() is sufficient for that. We can probably optimize that > > one day by assigning another CPU to handle the callbacks of a tickless > > CPU but for now... > > > > > > >> > >> > So we would startup the application. App will do all prep work (memory > >> > allocation, device setup etc etc) and then wait for the signal to be > >> > received. After that it would enter the low latency processing phase. > >> > > >> > Could we also get a signal if something disrupts the peace and switches > >> > the timer interrupt on again? > >> > > >> > >> I think you'll have to since once you have the tick turned off there > >> is no guarantee that > >> it wont get turned on by a timer scheduling an task or an IPI. > > > > The problem with this scheme is that if the task is running with the > > guarantee that nothing is going to disturb it (it assumes so when it > > is notified that the timer is stopped), can it seriously recover from > > the fact the timer has been restarted once it gets notified about it? > > Recovery in this context involves a programmer/system architect looking > into what made the tick start and making sure that wont happen the next > time around. > > I know it's not quite what you had in mind, but it works :-) So this is about fixing bugs. Tracing may fit better for that. > > > > > I have a hard time to imagine that. It's like an RT task running a > > critical part that suddenly receives a notification from the kernel that > > says "what's up dude? hey by the way you're not real time anymore" :) > > How are we recovering from that? > > The point is that it is the difference between a QA report that says: > > "Performance dropped below acceptable level for 10 ms some when > during the test run" > > and > > "We got an indication that the kernel resumed the tick on us, so the test > was stopped and here is the stack trace for all the tasks running, > plus the logs". That's about post run analysis, that's sounds to be a job for tracing. > > > > May be instead of focusing on these notifications, we should try hard to > > shut down the tick before we reach userspace: delegate RCU work > > to another CPU, avoid needless IPIs, avoid needless timer list timers, etc... > > Fix those things one by one such that we can configure things to the point we > > get closer to a guarantee of CPU isolation. > > > > Does that sound reasonable? > > It does to me :-) > > Gilad > > > -- > Gilad Ben-Yossef > Chief Coffee Drinker > gilad@benyossef.com > Israel Cell: +972-52-8260388 > US Cell: +1-973-8260388 > http://benyossef.com > > "If you take a class in large-scale robotics, can you end up in a > situation where the homework eats your dog?" >  -- Jean-Baptiste Queru