From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753232Ab1H3OHA (ORCPT ); Tue, 30 Aug 2011 10:07:00 -0400 Received: from mail-ww0-f44.google.com ([74.125.82.44]:35592 "EHLO mail-ww0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752144Ab1H3OG6 (ORCPT ); Tue, 30 Aug 2011 10:06:58 -0400 Date: Tue, 30 Aug 2011 16:06:53 +0200 From: Frederic Weisbecker To: Gilad Ben-Yossef Cc: LKML , Andrew Morton , Anton Blanchard , Avi Kivity , Ingo Molnar , Lai Jiangshan , "Paul E . McKenney" , Paul Menage , Peter Zijlstra , Stephen Hemminger , Thomas Gleixner , Tim Pepper Subject: Re: [RFC PATCH 00/32] Nohz cpusets (was: Nohz Tasks) Message-ID: <20110830140648.GL9748@somewhere.redhat.com> References: <1313423549-27093-1-git-send-email-fweisbec@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 24, 2011 at 05:41:05PM +0300, Gilad Ben-Yossef wrote: > Hi, > > On Mon, Aug 15, 2011 at 6:51 PM, Frederic Weisbecker wrote: > > > > For those who want to play: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git > >        nohz/cpuset-v1 > > > You caught me in playful mood, so I took it for a spin... :-) > > I know this is far from being production ready, but I hope you'll find > the feedback useful. > > First a short description of my testing setup is in order, I believe: > > I've set up a small x86 VM with 4 CPUs running your git tree and a > minimal buildroot system. I've created 2 cpusets: sys and nohz, and > then assigned every task I could to the sys cpuset and set > adaptive_nohz on the nohz set. > > To make double sure I have no task on my nohz cpuset CPU, I've booted > the system with the isolcpus command line isolating the same cpu I've > assigned to the nohz set. This shouldn't be needed of course, but just > in case. Ah I haven't tested with that isolcpus especially as it's headed toward removal. > > I then ran a silly program I've written that basically eats CPU cycles > (https://github.com/gby/cpueat) and assigned it to the nohz set and > monitored the number of interrupts using /proc/interrupts > > Now, for the things I've noticed - > > 1. Before I turn adaptive_nohz to 1, when no task is running on the > nohz cpuset cpu, the tick is indeed idle (regular nohz case) and very > few function call IPIs are seen. However, when I turn adaptive_nohz to > 1 (but still with no task running on the CPU), the tick remains idle, > but I get an IPI function call interrupt almost in the rate the tick > would have been. Yeah I believe this is due to RCU that tries to wake up our nohz CPU. I need to have a deeper look there. > 2. When I run my little cpueat program on the nohz CPU, the tick does > not actually goes off. Instead it ticks away as usual. I know it is > the only legible task to run, since as soon as I kill it the tick > turns off (regular nohz mode again). I've tinkered around and found > out that what stops the tick going away is the check for rcu_pending() > in cpuset_nohz_can_stop_tick(). It seems to always be true. When I > removed that check experimentally and repeat the test, the tick indeed > stops with my cpueat task running. Of course, I don't suggest this is > the sane thing to do - I just wondered if that what stopped the tick > going away and it seems that it is. Are you sure the tick never goes off? But yeah may be there is something that constantly requires RCU grace periods to complete in your system. I should drop the rcu_pending() check as long as we want to stop the tick from userspace because there we are off the RCU state machine. > 3. My little cpueat program tries to fork a child process after 100k > iteration of some CPU bound loop. It usually takes a few seconds to > happen. The idea is to make sure that the tick resumes when nr_running > > 1. In my case, I got a kernel panic. Since it happened with some > debug code I added and with aforementioned experimental removal of > rcu_pending check, I'm assuming for now it's all my fault but will > look into verifying it further and will send panic logs if it proves > useful. I got some panic too but haven't seen any for some time. I made a lot of changes since then though so I thought the condition to trigger it just went away. IIRC, it was a locking inversion against the rq lock and some other lock. Very nice condition for a cool lockup ;)