From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753926AbZGPIj5 (ORCPT ); Thu, 16 Jul 2009 04:39:57 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753709AbZGPIj5 (ORCPT ); Thu, 16 Jul 2009 04:39:57 -0400 Received: from bilbo.ozlabs.org ([203.10.76.25]:54739 "EHLO bilbo.ozlabs.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753707AbZGPIjz (ORCPT ); Thu, 16 Jul 2009 04:39:55 -0400 Date: Thu, 16 Jul 2009 18:39:48 +1000 From: Anton Blanchard To: Bharata B Rao Cc: KOSAKI Motohiro , Ingo Molnar , Balbir Singh , mingo@redhat.com, hpa@zytor.com, linux-kernel@vger.kernel.org, a.p.zijlstra@chello.nl, schwidefsky@de.ibm.com, balajirrao@gmail.com, dhaval@linux.vnet.ibm.com, tglx@linutronix.de, kamezawa.hiroyu@jp.fujitsu.com, linux-tip-commits@vger.kernel.org Subject: Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters Message-ID: <20090716083948.GA2950@kryten> References: <20090512102412.GG6351@balbir.in.ibm.com> <20090512102939.GB11714@elte.hu> <20090512193656.D647.A69D9226@jp.fujitsu.com> <20090716081010.GB3134@in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090716081010.GB3134@in.ibm.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, > On ppc64, calling jiffies_to_cputime() from sched_init() is too early because > jiffies_to_cputime() needs tb_ticks_per_sec which gets initialized only > later in time_init(). Because of this I see that cpuacct_batch will always > be zero effectively negating what this patch is trying to do. > > As explained by you earlier, we too are finding the default batch value to > be too low for ppc64 with VIRT_CPU_ACCOUNTING turned on. Hence I guess > if this patch is taken in (ofcourse with the above issue fixed), it will > benefit ppc64 also. I created this patch earlier today when I hit the problem. Thoughts? Anton -- When CONFIG_VIRT_CPU_ACCOUNTING is enabled we can call cpuacct_update_stats with values much larger than percpu_counter_batch. This means the call to percpu_counter_add will always add to the global count which is protected by a spinlock. Since reading of the CPU accounting cgroup counters is not performance critical, we can use a maximum size batch of INT_MAX and use percpu_counter_sum on the read side which will add all the percpu counters. With this patch an 8 core POWER6 with CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_CGROUP_CPUACCT shows an improvement in aggregate context switch rate of 397k/sec to 3.9M/sec, a 10x improvement. Signed-off-by: Anton Blanchard --- Index: linux.trees.git/kernel/sched.c =================================================================== --- linux.trees.git.orig/kernel/sched.c 2009-07-16 10:11:02.000000000 +1000 +++ linux.trees.git/kernel/sched.c 2009-07-16 10:16:41.000000000 +1000 @@ -10551,7 +10551,7 @@ int i; for (i = 0; i < CPUACCT_STAT_NSTATS; i++) { - s64 val = percpu_counter_read(&ca->cpustat[i]); + s64 val = percpu_counter_sum(&ca->cpustat[i]); val = cputime64_to_clock_t(val); cb->fill(cb, cpuacct_stat_desc[i], val); } @@ -10621,7 +10621,7 @@ ca = task_ca(tsk); do { - percpu_counter_add(&ca->cpustat[idx], val); + __percpu_counter_add(&ca->cpustat[idx], val, INT_MAX); ca = ca->parent; } while (ca); rcu_read_unlock();