All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anton Blanchard <anton@samba.org>
To: Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>,
	mingo@redhat.com, hpa@zytor.com, linux-kernel@vger.kernel.org,
	tglx@linutronix.de, mingo@elte.hu,
	linux-tip-commits@vger.kernel.org
Subject: [PATCH 2/2] sched: cpuacct: Use bigger percpu counter batch values for stats counters
Date: Thu, 28 Jan 2010 23:52:02 +1100	[thread overview]
Message-ID: <20100128125202.GG2996@kryten> (raw)
In-Reply-To: <20100128124854.GF2996@kryten>


When CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_CGROUP_CPUACCT are enabled we can
call cpuacct_update_stats with values much larger than percpu_counter_batch.
This means the call to percpu_counter_add will always add to the global count
which is protected by a spinlock and we end up with a global spinlock in
the scheduler.

Based on an idea by KOSAKI Motohiro, this patch scales the batch value by
cputime_one_jiffy such that we have the same batch limit as we would
if CONFIG_VIRT_CPU_ACCOUNTING was disabled. His patch did this once at boot
but that initialisation happened too early on PowerPC (before time_init)
and it was never updated at runtime as a result of a hotplug cpu add/remove.

This patch instead scales percpu_counter_batch by cputime_one_jiffy at
runtime, which keeps the batch correct even after cpu hotplug operations.
We cap it at INT_MAX in case of overflow.

For architectures that do not support CONFIG_VIRT_CPU_ACCOUNTING,
cputime_one_jiffy is the constant 1 and gcc is smart enough to
optimise min(s32 percpu_counter_batch, INT_MAX) to just percpu_counter_batch
at least on x86 and PowerPC. So there is no need to add an #ifdef.

On a 64 thread PowerPC box with CONFIG_VIRT_CPU_ACCOUNTING and 
CONFIG_CGROUP_CPUACCT enabled, a context switch microbenchmark is 234x faster
and almost matches a CONFIG_CGROUP_CPUACCT disabled kernel:

CONFIG_CGROUP_CPUACCT disabled:         16906698 ctx switches/sec
CONFIG_CGROUP_CPUACCT enabled:             61720 ctx switches/sec
CONFIG_CGROUP_CPUACCT + patch:          16663217 ctx switches/sec

Tested with:

wget http://ozlabs.org/~anton/junkcode/context_switch.c
make context_switch
for i in `seq 0 63`; do taskset -c $i ./context_switch & done
vmstat 1

Signed-off-by: Anton Blanchard <anton@samba.org>
---

v2: Added a comment and fixed the UP build.

Index: linux-cpumask/kernel/sched.c
===================================================================
--- linux-cpumask.orig/kernel/sched.c	2010-01-22 18:23:43.377212514 +1100
+++ linux-cpumask/kernel/sched.c	2010-01-28 23:24:02.677233753 +1100
@@ -10885,12 +10885,30 @@ static void cpuacct_charge(struct task_s
 }
 
 /*
+ * When CONFIG_VIRT_CPU_ACCOUNTING is enabled one jiffy can be very large
+ * in cputime_t units. As a result, cpuacct_update_stats calls
+ * percpu_counter_add with values large enough to always overflow the
+ * per cpu batch limit causing bad SMP scalability.
+ *
+ * To fix this we scale percpu_counter_batch by cputime_one_jiffy so we
+ * batch the same amount of time with CONFIG_VIRT_CPU_ACCOUNTING disabled
+ * and enabled. We cap it at INT_MAX which is the largest allowed batch value.
+ */
+#ifdef CONFIG_SMP
+#define CPUACCT_BATCH	\
+	min_t(long, percpu_counter_batch * cputime_one_jiffy, INT_MAX)
+#else
+#define CPUACCT_BATCH	0
+#endif
+
+/*
  * Charge the system/user time to the task's accounting group.
  */
 static void cpuacct_update_stats(struct task_struct *tsk,
 		enum cpuacct_stat_index idx, cputime_t val)
 {
 	struct cpuacct *ca;
+	int batch = CPUACCT_BATCH;
 
 	if (unlikely(!cpuacct_subsys.active))
 		return;
@@ -10899,7 +10917,7 @@ static void cpuacct_update_stats(struct 
 	ca = task_ca(tsk);
 
 	do {
-		percpu_counter_add(&ca->cpustat[idx], val);
+		__percpu_counter_add(&ca->cpustat[idx], val, batch);
 		ca = ca->parent;
 	} while (ca);
 	rcu_read_unlock();

  reply	other threads:[~2010-01-28 12:56 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-07  9:59 [PATCH v3] cpuacct: VIRT_CPU_ACCOUNTING don't prevent percpu cputime count KOSAKI Motohiro
2009-05-07 10:50 ` Ingo Molnar
2009-05-09  9:26   ` KOSAKI Motohiro
2009-05-09  9:38     ` Peter Zijlstra
2009-05-09  9:52       ` KOSAKI Motohiro
2009-05-09 10:14         ` KOSAKI Motohiro
2009-05-11 12:33           ` [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters tip-bot for KOSAKI Motohiro
2009-05-11 12:36           ` tip-bot for KOSAKI Motohiro
2009-05-11 15:07             ` Ingo Molnar
2009-05-12 10:01               ` KOSAKI Motohiro
2009-05-12 10:02                 ` Peter Zijlstra
2009-05-12 10:06                   ` KOSAKI Motohiro
2009-05-12 10:22                     ` KOSAKI Motohiro
2009-05-12 10:27                       ` Ingo Molnar
2009-05-12 10:42                         ` KOSAKI Motohiro
2009-05-12 10:10                 ` Balbir Singh
2009-05-12 10:13                   ` KOSAKI Motohiro
2009-05-12 10:24                     ` Balbir Singh
2009-05-12 10:29                       ` Ingo Molnar
2009-05-12 10:44                         ` KOSAKI Motohiro
2009-05-12 13:28                           ` Ingo Molnar
2009-05-12 13:40                             ` KOSAKI Motohiro
2009-05-12 13:42                               ` Ingo Molnar
2009-07-16  8:10                           ` Bharata B Rao
2009-07-16  8:39                             ` Anton Blanchard
2009-08-20  5:10                               ` Anton Blanchard
2009-08-20  5:16                                 ` KAMEZAWA Hiroyuki
2009-08-20  5:26                                 ` Balbir Singh
2009-08-20  5:52                                   ` Peter Zijlstra
2009-08-20  6:05                                     ` Anton Blanchard
2009-08-20  6:10                                       ` KAMEZAWA Hiroyuki
2009-08-20  6:24                                         ` Anton Blanchard
2009-08-20  8:41                                           ` [PATCH] better align percpu counter (Was " KAMEZAWA Hiroyuki
2009-08-20 10:04                                             ` Ingo Molnar
2009-08-21  2:20                                               ` KAMEZAWA Hiroyuki
2009-08-21 11:29                                                 ` Ingo Molnar
2009-08-21  3:56                                             ` KAMEZAWA Hiroyuki
2009-08-20 11:46                                           ` Balbir Singh
2009-08-21  6:21                                           ` KAMEZAWA Hiroyuki
2010-01-18  4:41                                 ` [PATCH] " Anton Blanchard
2010-01-18  4:55                                   ` KOSAKI Motohiro
2010-01-18  7:51                                   ` Peter Zijlstra
2010-01-18  8:21                                     ` Anton Blanchard
2010-01-18  8:34                                       ` Peter Zijlstra
2010-01-18  8:35                                   ` Balbir Singh
2010-01-18  9:42                                   ` Martin Schwidefsky
2010-01-18  9:55                                     ` Anton Blanchard
2010-01-18 11:00                                       ` Balbir Singh
2010-01-21 15:25                                       ` Balbir Singh
2010-01-21 15:36                                         ` Peter Zijlstra
2010-01-25 23:14                                   ` Andrew Morton
2010-01-25 23:44                                     ` KOSAKI Motohiro
2010-01-26  6:17                                     ` Balbir Singh
2010-01-26  6:35                                       ` Andrew Morton
2010-01-26  8:07                                         ` Anton Blanchard
2010-01-27 13:15                                   ` [tip:sched/urgent] " tip-bot for Anton Blanchard
2010-01-27 13:34                                     ` Balbir Singh
2010-01-27 21:22                                       ` Andrew Morton
2010-01-27 21:43                                         ` Peter Zijlstra
2010-01-28 12:48                                           ` [PATCH 1/2] percpu_counter: Make __percpu_counter_add an inline function on UP Anton Blanchard
2010-01-28 12:52                                             ` Anton Blanchard [this message]
2010-01-28  5:21                                         ` [tip:sched/urgent] sched: cpuacct: Use bigger percpu counter batch values for stats counters Balbir Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100128125202.GG2996@kryten \
    --to=anton@samba.org \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.