* [PATCH v3] cpuacct: VIRT_CPU_ACCOUNTING don't prevent percpu cputime count
@ 2009-05-07 9:59 KOSAKI Motohiro
2009-05-07 10:50 ` Ingo Molnar
0 siblings, 1 reply; 62+ messages in thread
From: KOSAKI Motohiro @ 2009-05-07 9:59 UTC (permalink / raw)
To: LKML, Bharata B Rao, Balaji Rao, Dhaval Giani, KAMEZAWA Hiroyuki,
Peter Zijlstra, Balbir Singh, Ingo Molnar, Martin Schwidefsky
Cc: kosaki.motohiro
Changelog:
since v2
- revert using percpu_counter_sum()
since v1
- use percpu_counter_sum() instead percpu_counter_read()
-------------------------------------
Subject: [PATCH v3] cpuacct: VIRT_CPU_ACCOUNTING don't prevent percpu cputime count
cpuacct_update_stats() is called at every tick updating. and it use percpu_counter
for avoiding performance degression.
For archs which define VIRT_CPU_ACCOUNTING, every tick would result
in >1000 units of cputime updates and since this is much much greater
than percpu_batch_counter, we end up taking spinlock on every tick.
This patch change batch rule. now, any cpu can store "percpu_counter_bach * jiffies"
cputime in per-cpu cache.
it mean this patch don't have behavior change if VIRT_CPU_ACCOUNTING=n.
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: Balaji Rao <balajirrao@gmail.com>
Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
kernel/sched.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
Index: b/kernel/sched.c
===================================================================
--- a/kernel/sched.c 2009-04-30 11:37:47.000000000 +0900
+++ b/kernel/sched.c 2009-05-07 16:34:00.000000000 +0900
@@ -10221,6 +10221,7 @@ struct cpuacct {
};
struct cgroup_subsys cpuacct_subsys;
+static s32 cpuacct_batch;
/* return cpu accounting group corresponding to this container */
static inline struct cpuacct *cgroup_ca(struct cgroup *cgrp)
@@ -10250,6 +10251,9 @@ static struct cgroup_subsys_state *cpuac
if (!ca->cpuusage)
goto out_free_ca;
+ if (!cpuacct_batch)
+ cpuacct_batch = jiffies_to_cputime(percpu_counter_batch);
+
for (i = 0; i < CPUACCT_STAT_NSTATS; i++)
if (percpu_counter_init(&ca->cpustat[i], 0))
goto out_free_counters;
@@ -10446,7 +10450,7 @@ static void cpuacct_update_stats(struct
ca = task_ca(tsk);
do {
- percpu_counter_add(&ca->cpustat[idx], val);
+ __percpu_counter_add(&ca->cpustat[idx], val, cpuacct_batch);
ca = ca->parent;
} while (ca);
rcu_read_unlock();
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3] cpuacct: VIRT_CPU_ACCOUNTING don't prevent percpu cputime count
2009-05-07 9:59 [PATCH v3] cpuacct: VIRT_CPU_ACCOUNTING don't prevent percpu cputime count KOSAKI Motohiro
@ 2009-05-07 10:50 ` Ingo Molnar
2009-05-09 9:26 ` KOSAKI Motohiro
0 siblings, 1 reply; 62+ messages in thread
From: Ingo Molnar @ 2009-05-07 10:50 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: LKML, Bharata B Rao, Balaji Rao, Dhaval Giani, KAMEZAWA Hiroyuki,
Peter Zijlstra, Balbir Singh, Martin Schwidefsky
* KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
>
> Changelog:
> since v2
> - revert using percpu_counter_sum()
>
> since v1
> - use percpu_counter_sum() instead percpu_counter_read()
>
>
> -------------------------------------
> Subject: [PATCH v3] cpuacct: VIRT_CPU_ACCOUNTING don't prevent percpu cputime count
>
> cpuacct_update_stats() is called at every tick updating. and it use percpu_counter
> for avoiding performance degression.
>
> For archs which define VIRT_CPU_ACCOUNTING, every tick would result
> in >1000 units of cputime updates and since this is much much greater
> than percpu_batch_counter, we end up taking spinlock on every tick.
>
> This patch change batch rule. now, any cpu can store "percpu_counter_bach * jiffies"
> cputime in per-cpu cache.
> it mean this patch don't have behavior change if VIRT_CPU_ACCOUNTING=n.
>
> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
> Cc: Balaji Rao <balajirrao@gmail.com>
> Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
> kernel/sched.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> Index: b/kernel/sched.c
> ===================================================================
> --- a/kernel/sched.c 2009-04-30 11:37:47.000000000 +0900
> +++ b/kernel/sched.c 2009-05-07 16:34:00.000000000 +0900
> @@ -10221,6 +10221,7 @@ struct cpuacct {
> };
>
> struct cgroup_subsys cpuacct_subsys;
> +static s32 cpuacct_batch;
should be __read_mostly i guess.
>
> /* return cpu accounting group corresponding to this container */
> static inline struct cpuacct *cgroup_ca(struct cgroup *cgrp)
> @@ -10250,6 +10251,9 @@ static struct cgroup_subsys_state *cpuac
> if (!ca->cpuusage)
> goto out_free_ca;
>
> + if (!cpuacct_batch)
> + cpuacct_batch = jiffies_to_cputime(percpu_counter_batch);
So this is a once per boot condition? Why not initialize it in
sched_init() or so? (instead of incuring this ugly check all the
time)
Ingo
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3] cpuacct: VIRT_CPU_ACCOUNTING don't prevent percpu cputime count
2009-05-07 10:50 ` Ingo Molnar
@ 2009-05-09 9:26 ` KOSAKI Motohiro
2009-05-09 9:38 ` Peter Zijlstra
0 siblings, 1 reply; 62+ messages in thread
From: KOSAKI Motohiro @ 2009-05-09 9:26 UTC (permalink / raw)
To: Ingo Molnar
Cc: kosaki.motohiro, LKML, Bharata B Rao, Balaji Rao, Dhaval Giani,
KAMEZAWA Hiroyuki, Peter Zijlstra, Balbir Singh,
Martin Schwidefsky
>
> * KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
>
> >
> > Changelog:
> > since v2
> > - revert using percpu_counter_sum()
> >
> > since v1
> > - use percpu_counter_sum() instead percpu_counter_read()
> >
> >
> > -------------------------------------
> > Subject: [PATCH v3] cpuacct: VIRT_CPU_ACCOUNTING don't prevent percpu cputime count
> >
> > cpuacct_update_stats() is called at every tick updating. and it use percpu_counter
> > for avoiding performance degression.
> >
> > For archs which define VIRT_CPU_ACCOUNTING, every tick would result
> > in >1000 units of cputime updates and since this is much much greater
> > than percpu_batch_counter, we end up taking spinlock on every tick.
> >
> > This patch change batch rule. now, any cpu can store "percpu_counter_bach * jiffies"
> > cputime in per-cpu cache.
> > it mean this patch don't have behavior change if VIRT_CPU_ACCOUNTING=n.
> >
> > Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > Cc: Balaji Rao <balajirrao@gmail.com>
> > Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
> > Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
> > Cc: Ingo Molnar <mingo@elte.hu>
> > Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > ---
> > kernel/sched.c | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > Index: b/kernel/sched.c
> > ===================================================================
> > --- a/kernel/sched.c 2009-04-30 11:37:47.000000000 +0900
> > +++ b/kernel/sched.c 2009-05-07 16:34:00.000000000 +0900
> > @@ -10221,6 +10221,7 @@ struct cpuacct {
> > };
> >
> > struct cgroup_subsys cpuacct_subsys;
> > +static s32 cpuacct_batch;
>
> should be __read_mostly i guess.
>
> >
> > /* return cpu accounting group corresponding to this container */
> > static inline struct cpuacct *cgroup_ca(struct cgroup *cgrp)
> > @@ -10250,6 +10251,9 @@ static struct cgroup_subsys_state *cpuac
> > if (!ca->cpuusage)
> > goto out_free_ca;
> >
> > + if (!cpuacct_batch)
> > + cpuacct_batch = jiffies_to_cputime(percpu_counter_batch);
>
> So this is a once per boot condition? Why not initialize it in
> sched_init() or so? (instead of incuring this ugly check all the
> time)
Thanks! fixed.
Changelog:
since V3
- rewirte patch description (thanks Bharata!)
- append read_mostly to cpuacct_batch
- cpuacct_batch is initialized by sched_init()
since v2
- revert using percpu_counter_sum()
since v1
- use percpu_counter_sum() instead percpu_counter_read()
---------------------------------------------------------
Subject: [PATCH v4] cpuacct: Use bigger percpu counter batch values for stats counters on archs that have VIRT_CPU_ACCOUNTING=y
percpu counters used to accumulate statistics in cpuacct controller use
the default batch value [max(2*nr_cpus, 32)] which can be too small for
archs that define VIRT_CPU_ACCOUNTING. In such archs, a tick could result in
cputime updates in the range of thousands. As a result, cpuacct_update_stats()
would end up acquiring the percpu counter spinlock on every tick which
is not good for performance.
Let those architectures to have a bigger batch threshold so that percpu counter
spinlock isn't taken on every tick. This change doesn't affect the archs which
don't define VIRT_CPU_ACCOUNTING and they continue to have the default
percpu counter batch value.
Cc: Balaji Rao <balajirrao@gmail.com>
Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
kernel/sched.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
Index: b/kernel/sched.c
===================================================================
--- a/kernel/sched.c 2009-05-09 16:48:08.000000000 +0900
+++ b/kernel/sched.c 2009-05-09 18:16:00.000000000 +0900
@@ -824,8 +824,12 @@ static struct file_operations sched_feat
.release = single_release,
};
+static __read_mostly s32 cpuacct_batch;
+
static __init int sched_init_debug(void)
{
+ cpuacct_batch = jiffies_to_cputime(percpu_counter_batch);
+
debugfs_create_file("sched_features", 0644, NULL, NULL,
&sched_feat_fops);
@@ -10457,7 +10461,8 @@ static void cpuacct_update_stats(struct
ca = task_ca(tsk);
do {
- percpu_counter_add(&ca->cpustat[idx], val);
+ __percpu_counter_add(&ca->cpustat[idx], val, cpuacct_batch);
+
ca = ca->parent;
} while (ca);
rcu_read_unlock();
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3] cpuacct: VIRT_CPU_ACCOUNTING don't prevent percpu cputime count
2009-05-09 9:26 ` KOSAKI Motohiro
@ 2009-05-09 9:38 ` Peter Zijlstra
2009-05-09 9:52 ` KOSAKI Motohiro
0 siblings, 1 reply; 62+ messages in thread
From: Peter Zijlstra @ 2009-05-09 9:38 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Ingo Molnar, LKML, Bharata B Rao, Balaji Rao, Dhaval Giani,
KAMEZAWA Hiroyuki, Balbir Singh, Martin Schwidefsky
On Sat, 2009-05-09 at 18:26 +0900, KOSAKI Motohiro wrote:
>
> Index: b/kernel/sched.c
> ===================================================================
> --- a/kernel/sched.c 2009-05-09 16:48:08.000000000 +0900
> +++ b/kernel/sched.c 2009-05-09 18:16:00.000000000 +0900
> @@ -824,8 +824,12 @@ static struct file_operations sched_feat
> .release = single_release,
> };
>
> +static __read_mostly s32 cpuacct_batch;
> +
> static __init int sched_init_debug(void)
> {
> + cpuacct_batch = jiffies_to_cputime(percpu_counter_batch);
> +
> debugfs_create_file("sched_features", 0644, NULL, NULL,
> &sched_feat_fops);
I'm pretty user you can select cpu accounting without SCHED_DEBUG...
> @@ -10457,7 +10461,8 @@ static void cpuacct_update_stats(struct
> ca = task_ca(tsk);
>
> do {
> - percpu_counter_add(&ca->cpustat[idx], val);
> + __percpu_counter_add(&ca->cpustat[idx], val, cpuacct_batch);
> +
> ca = ca->parent;
> } while (ca);
> rcu_read_unlock();
>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3] cpuacct: VIRT_CPU_ACCOUNTING don't prevent percpu cputime count
2009-05-09 9:38 ` Peter Zijlstra
@ 2009-05-09 9:52 ` KOSAKI Motohiro
2009-05-09 10:14 ` KOSAKI Motohiro
0 siblings, 1 reply; 62+ messages in thread
From: KOSAKI Motohiro @ 2009-05-09 9:52 UTC (permalink / raw)
To: Peter Zijlstra
Cc: kosaki.motohiro, Ingo Molnar, LKML, Bharata B Rao, Balaji Rao,
Dhaval Giani, KAMEZAWA Hiroyuki, Balbir Singh,
Martin Schwidefsky
> On Sat, 2009-05-09 at 18:26 +0900, KOSAKI Motohiro wrote:
> >
> > Index: b/kernel/sched.c
> > ===================================================================
> > --- a/kernel/sched.c 2009-05-09 16:48:08.000000000 +0900
> > +++ b/kernel/sched.c 2009-05-09 18:16:00.000000000 +0900
> > @@ -824,8 +824,12 @@ static struct file_operations sched_feat
> > .release = single_release,
> > };
> >
> > +static __read_mostly s32 cpuacct_batch;
> > +
> > static __init int sched_init_debug(void)
> > {
> > + cpuacct_batch = jiffies_to_cputime(percpu_counter_batch);
> > +
> > debugfs_create_file("sched_features", 0644, NULL, NULL,
> > &sched_feat_fops);
>
> I'm pretty user you can select cpu accounting without SCHED_DEBUG...
Grr, yes, it's idiot ;)
I'll fix it soon.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3] cpuacct: VIRT_CPU_ACCOUNTING don't prevent percpu cputime count
2009-05-09 9:52 ` KOSAKI Motohiro
@ 2009-05-09 10:14 ` KOSAKI Motohiro
2009-05-11 12:33 ` [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters tip-bot for KOSAKI Motohiro
2009-05-11 12:36 ` tip-bot for KOSAKI Motohiro
0 siblings, 2 replies; 62+ messages in thread
From: KOSAKI Motohiro @ 2009-05-09 10:14 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: kosaki.motohiro, Peter Zijlstra, Ingo Molnar, LKML,
Bharata B Rao, Balaji Rao, Dhaval Giani, KAMEZAWA Hiroyuki,
Balbir Singh, Martin Schwidefsky
> > On Sat, 2009-05-09 at 18:26 +0900, KOSAKI Motohiro wrote:
> > >
> > > Index: b/kernel/sched.c
> > > ===================================================================
> > > --- a/kernel/sched.c 2009-05-09 16:48:08.000000000 +0900
> > > +++ b/kernel/sched.c 2009-05-09 18:16:00.000000000 +0900
> > > @@ -824,8 +824,12 @@ static struct file_operations sched_feat
> > > .release = single_release,
> > > };
> > >
> > > +static __read_mostly s32 cpuacct_batch;
> > > +
> > > static __init int sched_init_debug(void)
> > > {
> > > + cpuacct_batch = jiffies_to_cputime(percpu_counter_batch);
> > > +
> > > debugfs_create_file("sched_features", 0644, NULL, NULL,
> > > &sched_feat_fops);
> >
> > I'm pretty user you can select cpu accounting without SCHED_DEBUG...
>
> Grr, yes, it's idiot ;)
> I'll fix it soon.
Done.
Changelog:
since V4
- move cpuacct_batch initialization into sched_init()
since V3
- rewirte patch description (thanks Bharata!)
- append read_mostly to cpuacct_batch
- cpuacct_batch is initialized by sched_init_debug()
since v2
- revert using percpu_counter_sum()
since v1
- use percpu_counter_sum() instead percpu_counter_read()
---------------------------------------------------------
Subject: [PATCH v5] cpuacct: Use bigger percpu counter batch values for stats counters on archs that have VIRT_CPU_ACCOUNTING=y
percpu counters used to accumulate statistics in cpuacct controller use
the default batch value [max(2*nr_cpus, 32)] which can be too small for
archs that define VIRT_CPU_ACCOUNTING. In such archs, a tick could result in
cputime updates in the range of thousands. As a result, cpuacct_update_stats()
would end up acquiring the percpu counter spinlock on every tick which
is not good for performance.
Let those architectures to have a bigger batch threshold so that percpu counter
spinlock isn't taken on every tick. This change doesn't affect the archs which
don't define VIRT_CPU_ACCOUNTING and they continue to have the default
percpu counter batch value.
Cc: Balaji Rao <balajirrao@gmail.com>
Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
kernel/sched.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
Index: b/kernel/sched.c
===================================================================
--- a/kernel/sched.c 2009-05-09 16:48:08.000000000 +0900
+++ b/kernel/sched.c 2009-05-09 18:57:35.000000000 +0900
@@ -870,6 +870,8 @@ static __read_mostly int scheduler_runni
*/
int sysctl_sched_rt_runtime = 950000;
+static __read_mostly s32 cpuacct_batch;
+
static inline u64 global_rt_period(void)
{
return (u64)sysctl_sched_rt_period * NSEC_PER_USEC;
@@ -9284,6 +9286,8 @@ void __init sched_init(void)
perf_counter_init();
+ cpuacct_batch = jiffies_to_cputime(percpu_counter_batch);
+
scheduler_running = 1;
}
@@ -10457,7 +10461,8 @@ static void cpuacct_update_stats(struct
ca = task_ca(tsk);
do {
- percpu_counter_add(&ca->cpustat[idx], val);
+ __percpu_counter_add(&ca->cpustat[idx], val, cpuacct_batch);
+
ca = ca->parent;
} while (ca);
rcu_read_unlock();
^ permalink raw reply [flat|nested] 62+ messages in thread
* [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-05-09 10:14 ` KOSAKI Motohiro
@ 2009-05-11 12:33 ` tip-bot for KOSAKI Motohiro
2009-05-11 12:36 ` tip-bot for KOSAKI Motohiro
1 sibling, 0 replies; 62+ messages in thread
From: tip-bot for KOSAKI Motohiro @ 2009-05-11 12:33 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, hpa, mingo, a.p.zijlstra, schwidefsky, dhaval,
balajirrao, balbir, bharata, tglx, kamezawa.hiroyu,
kosaki.motohiro, mingo
Commit-ID: 0719318fea31d54d13ed8ead7f4a277038bd75a2
Gitweb: http://git.kernel.org/tip/0719318fea31d54d13ed8ead7f4a277038bd75a2
Author: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
AuthorDate: Sat, 9 May 2009 19:14:58 +0900
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Mon, 11 May 2009 14:21:32 +0200
sched: cpuacct: Use bigger percpu counter batch values for stats counters
percpu counters used to accumulate statistics in cpuacct controller use
the default batch value [max(2*nr_cpus, 32)] which can be too small for
archs that define VIRT_CPU_ACCOUNTING. In such archs, a tick could result in
cputime updates in the range of thousands. As a result, cpuacct_update_stats()
would end up acquiring the percpu counter spinlock on every tick which
is not good for performance.
Let those architectures to have a bigger batch threshold so that percpu counter
spinlock isn't taken on every tick. This change doesn't affect the archs which
don't define VIRT_CPU_ACCOUNTING and they continue to have the default
percpu counter batch value.
v5:
- move cpuacct_batch initialization into sched_init()
v4:
- rewrite patch description (thanks Bharata!)
- append read_mostly to cpuacct_batch
- cpuacct_batch is initialized by sched_init_debug()
v3:
- revert using percpu_counter_sum()
v2:
- use percpu_counter_sum() instead percpu_counter_read()
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Balaji Rao <balajirrao@gmail.com>
Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
LKML-Reference: <20090509191430.3AD5.A69D9226@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/sched.c | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)
diff --git a/kernel/sched.c b/kernel/sched.c
index 8908d19..beadb82 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -872,6 +872,8 @@ static __read_mostly int scheduler_running;
*/
int sysctl_sched_rt_runtime = 950000;
+static __read_mostly s32 cpuacct_batch;
+
static inline u64 global_rt_period(void)
{
return (u64)sysctl_sched_rt_period * NSEC_PER_USEC;
@@ -9181,6 +9183,8 @@ void __init sched_init(void)
alloc_bootmem_cpumask_var(&cpu_isolated_map);
#endif /* SMP */
+ cpuacct_batch = jiffies_to_cputime(percpu_counter_batch);
+
scheduler_running = 1;
}
@@ -10354,7 +10358,8 @@ static void cpuacct_update_stats(struct task_struct *tsk,
ca = task_ca(tsk);
do {
- percpu_counter_add(&ca->cpustat[idx], val);
+ __percpu_counter_add(&ca->cpustat[idx], val, cpuacct_batch);
+
ca = ca->parent;
} while (ca);
rcu_read_unlock();
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-05-09 10:14 ` KOSAKI Motohiro
2009-05-11 12:33 ` [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters tip-bot for KOSAKI Motohiro
@ 2009-05-11 12:36 ` tip-bot for KOSAKI Motohiro
2009-05-11 15:07 ` Ingo Molnar
1 sibling, 1 reply; 62+ messages in thread
From: tip-bot for KOSAKI Motohiro @ 2009-05-11 12:36 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, hpa, mingo, a.p.zijlstra, schwidefsky, dhaval,
balajirrao, balbir, bharata, tglx, kamezawa.hiroyu,
kosaki.motohiro, mingo
Commit-ID: 50fbed3bb72cbe38a665512771a91a96d028de46
Gitweb: http://git.kernel.org/tip/50fbed3bb72cbe38a665512771a91a96d028de46
Author: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
AuthorDate: Sat, 9 May 2009 19:14:58 +0900
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Mon, 11 May 2009 14:32:57 +0200
sched: cpuacct: Use bigger percpu counter batch values for stats counters
percpu counters used to accumulate statistics in cpuacct controller use
the default batch value [max(2*nr_cpus, 32)] which can be too small for
archs that define VIRT_CPU_ACCOUNTING. In such archs, a tick could result in
cputime updates in the range of thousands. As a result, cpuacct_update_stats()
would end up acquiring the percpu counter spinlock on every tick which
is not good for performance.
Let those architectures to have a bigger batch threshold so that percpu counter
spinlock isn't taken on every tick. This change doesn't affect the archs which
don't define VIRT_CPU_ACCOUNTING and they continue to have the default
percpu counter batch value.
v5:
- move cpuacct_batch initialization into sched_init()
v4:
- rewrite patch description (thanks Bharata!)
- append read_mostly to cpuacct_batch
- cpuacct_batch is initialized by sched_init_debug()
v3:
- revert using percpu_counter_sum()
v2:
- use percpu_counter_sum() instead percpu_counter_read()
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Balaji Rao <balajirrao@gmail.com>
Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
LKML-Reference: <20090509191430.3AD5.A69D9226@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/sched.c | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)
diff --git a/kernel/sched.c b/kernel/sched.c
index 8908d19..873bc3b 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -872,6 +872,8 @@ static __read_mostly int scheduler_running;
*/
int sysctl_sched_rt_runtime = 950000;
+static __read_mostly s32 cpuacct_batch;
+
static inline u64 global_rt_period(void)
{
return (u64)sysctl_sched_rt_period * NSEC_PER_USEC;
@@ -9171,6 +9173,8 @@ void __init sched_init(void)
*/
current->sched_class = &fair_sched_class;
+ cpuacct_batch = jiffies_to_cputime(percpu_counter_batch);
+
/* Allocate the nohz_cpu_mask if CONFIG_CPUMASK_OFFSTACK */
alloc_bootmem_cpumask_var(&nohz_cpu_mask);
#ifdef CONFIG_SMP
@@ -10354,7 +10358,8 @@ static void cpuacct_update_stats(struct task_struct *tsk,
ca = task_ca(tsk);
do {
- percpu_counter_add(&ca->cpustat[idx], val);
+ __percpu_counter_add(&ca->cpustat[idx], val, cpuacct_batch);
+
ca = ca->parent;
} while (ca);
rcu_read_unlock();
^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-05-11 12:36 ` tip-bot for KOSAKI Motohiro
@ 2009-05-11 15:07 ` Ingo Molnar
2009-05-12 10:01 ` KOSAKI Motohiro
0 siblings, 1 reply; 62+ messages in thread
From: Ingo Molnar @ 2009-05-11 15:07 UTC (permalink / raw)
To: mingo, hpa, linux-kernel, a.p.zijlstra, schwidefsky, balajirrao,
dhaval, balbir, bharata, tglx, kosaki.motohiro, kamezawa.hiroyu
Cc: linux-tip-commits
* tip-bot for KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> Commit-ID: 50fbed3bb72cbe38a665512771a91a96d028de46
> Gitweb: http://git.kernel.org/tip/50fbed3bb72cbe38a665512771a91a96d028de46
> Author: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> AuthorDate: Sat, 9 May 2009 19:14:58 +0900
> Committer: Ingo Molnar <mingo@elte.hu>
> CommitDate: Mon, 11 May 2009 14:32:57 +0200
>
> sched: cpuacct: Use bigger percpu counter batch values for stats counters
-tip testing found a build failure with this patch:
kernel/sched.c:9278: error: ‘percpu_counter_batch’ undeclared (first use in this function)
kernel/sched.c:9278: error: (Each undeclared identifier is reported only once
kernel/sched.c:9278: error: for each function it appears in.)
Ingo
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-05-11 15:07 ` Ingo Molnar
@ 2009-05-12 10:01 ` KOSAKI Motohiro
2009-05-12 10:02 ` Peter Zijlstra
2009-05-12 10:10 ` Balbir Singh
0 siblings, 2 replies; 62+ messages in thread
From: KOSAKI Motohiro @ 2009-05-12 10:01 UTC (permalink / raw)
To: Ingo Molnar
Cc: kosaki.motohiro, mingo, hpa, linux-kernel, a.p.zijlstra,
schwidefsky, balajirrao, dhaval, balbir, bharata, tglx,
kamezawa.hiroyu, linux-tip-commits
>
> * tip-bot for KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
>
> > Commit-ID: 50fbed3bb72cbe38a665512771a91a96d028de46
> > Gitweb: http://git.kernel.org/tip/50fbed3bb72cbe38a665512771a91a96d028de46
> > Author: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > AuthorDate: Sat, 9 May 2009 19:14:58 +0900
> > Committer: Ingo Molnar <mingo@elte.hu>
> > CommitDate: Mon, 11 May 2009 14:32:57 +0200
> >
> > sched: cpuacct: Use bigger percpu counter batch values for stats counters
>
> -tip testing found a build failure with this patch:
>
> kernel/sched.c:9278: error: ‘percpu_counter_batch’ undeclared (first use in this function)
> kernel/sched.c:9278: error: (Each undeclared identifier is reported only once
> kernel/sched.c:9278: error: for each function it appears in.)
Oh, my fault again. Good catch!
Changelog:
since V5
- fix build error when UP
since V4
- move cpuacct_batch initialization into sched_init()
since V3
- rewirte patch description (thanks Bharata!)
- append read_mostly to cpuacct_batch
- cpuacct_batch is initialized by sched_init()
since v2
- revert using percpu_counter_sum()
since v1
- use percpu_counter_sum() instead percpu_counter_read()
---------------------------------------------------------
Subject: [PATCH v6] cpuacct: Use bigger percpu counter batch values for stats counters on archs that have VIRT_CPU_ACCOUNTING=y
percpu counters used to accumulate statistics in cpuacct controller use
the default batch value [max(2*nr_cpus, 32)] which can be too small for
archs that define VIRT_CPU_ACCOUNTING. In such archs, a tick could result in
cputime updates in the range of thousands. As a result, cpuacct_update_stats()
would end up acquiring the percpu counter spinlock on every tick which
is not good for performance.
Let those architectures to have a bigger batch threshold so that percpu counter
spinlock isn't taken on every tick. This change doesn't affect the archs which
don't define VIRT_CPU_ACCOUNTING and they continue to have the default
percpu counter batch value.
Cc: Balaji Rao <balajirrao@gmail.com>
Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
kernel/sched.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
Index: b/kernel/sched.c
===================================================================
--- a/kernel/sched.c 2009-05-12 13:12:59.000000000 +0900
+++ b/kernel/sched.c 2009-05-12 13:28:53.000000000 +0900
@@ -870,6 +870,8 @@ static __read_mostly int scheduler_runni
*/
int sysctl_sched_rt_runtime = 950000;
+static __read_mostly s32 cpuacct_batch;
+
static inline u64 global_rt_period(void)
{
return (u64)sysctl_sched_rt_period * NSEC_PER_USEC;
@@ -9284,6 +9286,10 @@ void __init sched_init(void)
perf_counter_init();
+#ifdef CONFIGCONFIG_SMP
+ cpuacct_batch = jiffies_to_cputime(percpu_counter_batch);
+#endif
+
scheduler_running = 1;
}
@@ -10457,7 +10463,8 @@ static void cpuacct_update_stats(struct
ca = task_ca(tsk);
do {
- percpu_counter_add(&ca->cpustat[idx], val);
+ __percpu_counter_add(&ca->cpustat[idx], val, cpuacct_batch);
+
ca = ca->parent;
} while (ca);
rcu_read_unlock();
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-05-12 10:01 ` KOSAKI Motohiro
@ 2009-05-12 10:02 ` Peter Zijlstra
2009-05-12 10:06 ` KOSAKI Motohiro
2009-05-12 10:10 ` Balbir Singh
1 sibling, 1 reply; 62+ messages in thread
From: Peter Zijlstra @ 2009-05-12 10:02 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Ingo Molnar, mingo, hpa, linux-kernel, schwidefsky, balajirrao,
dhaval, balbir, bharata, tglx, kamezawa.hiroyu,
linux-tip-commits
On Tue, 2009-05-12 at 19:01 +0900, KOSAKI Motohiro wrote:
> +#ifdef CONFIGCONFIG_SMP
typo ?
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-05-12 10:02 ` Peter Zijlstra
@ 2009-05-12 10:06 ` KOSAKI Motohiro
2009-05-12 10:22 ` KOSAKI Motohiro
0 siblings, 1 reply; 62+ messages in thread
From: KOSAKI Motohiro @ 2009-05-12 10:06 UTC (permalink / raw)
To: Peter Zijlstra
Cc: kosaki.motohiro, Ingo Molnar, mingo, hpa, linux-kernel,
schwidefsky, balajirrao, dhaval, balbir, bharata, tglx,
kamezawa.hiroyu, linux-tip-commits
> On Tue, 2009-05-12 at 19:01 +0900, KOSAKI Motohiro wrote:
> > +#ifdef CONFIGCONFIG_SMP
>
> typo ?
yes, this week is my memorial stupid one ;)
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-05-12 10:01 ` KOSAKI Motohiro
2009-05-12 10:02 ` Peter Zijlstra
@ 2009-05-12 10:10 ` Balbir Singh
2009-05-12 10:13 ` KOSAKI Motohiro
1 sibling, 1 reply; 62+ messages in thread
From: Balbir Singh @ 2009-05-12 10:10 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Ingo Molnar, mingo, hpa, linux-kernel, a.p.zijlstra, schwidefsky,
balajirrao, dhaval, bharata, tglx, kamezawa.hiroyu,
linux-tip-commits
* KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> [2009-05-12 19:01:25]:
> >
> > * tip-bot for KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> >
> > > Commit-ID: 50fbed3bb72cbe38a665512771a91a96d028de46
> > > Gitweb: http://git.kernel.org/tip/50fbed3bb72cbe38a665512771a91a96d028de46
> > > Author: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > AuthorDate: Sat, 9 May 2009 19:14:58 +0900
> > > Committer: Ingo Molnar <mingo@elte.hu>
> > > CommitDate: Mon, 11 May 2009 14:32:57 +0200
> > >
> > > sched: cpuacct: Use bigger percpu counter batch values for stats counters
> >
> > -tip testing found a build failure with this patch:
> >
> > kernel/sched.c:9278: error: ?$B!Fpercpu_counter_batch?$B!G undeclared (first use in this function)
> > kernel/sched.c:9278: error: (Each undeclared identifier is reported only once
> > kernel/sched.c:9278: error: for each function it appears in.)
>
> Oh, my fault again. Good catch!
>
>
>
> Changelog:
> since V5
> - fix build error when UP
>
> since V4
> - move cpuacct_batch initialization into sched_init()
>
> since V3
> - rewirte patch description (thanks Bharata!)
> - append read_mostly to cpuacct_batch
> - cpuacct_batch is initialized by sched_init()
>
> since v2
> - revert using percpu_counter_sum()
>
> since v1
> - use percpu_counter_sum() instead percpu_counter_read()
>
>
> ---------------------------------------------------------
> Subject: [PATCH v6] cpuacct: Use bigger percpu counter batch values for stats counters on archs that have VIRT_CPU_ACCOUNTING=y
>
> percpu counters used to accumulate statistics in cpuacct controller use
> the default batch value [max(2*nr_cpus, 32)] which can be too small for
> archs that define VIRT_CPU_ACCOUNTING. In such archs, a tick could result in
> cputime updates in the range of thousands. As a result, cpuacct_update_stats()
> would end up acquiring the percpu counter spinlock on every tick which
> is not good for performance.
>
> Let those architectures to have a bigger batch threshold so that percpu counter
> spinlock isn't taken on every tick. This change doesn't affect the archs which
> don't define VIRT_CPU_ACCOUNTING and they continue to have the default
> percpu counter batch value.
>
> Cc: Balaji Rao <balajirrao@gmail.com>
> Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
> kernel/sched.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> Index: b/kernel/sched.c
> ===================================================================
> --- a/kernel/sched.c 2009-05-12 13:12:59.000000000 +0900
> +++ b/kernel/sched.c 2009-05-12 13:28:53.000000000 +0900
> @@ -870,6 +870,8 @@ static __read_mostly int scheduler_runni
> */
> int sysctl_sched_rt_runtime = 950000;
>
> +static __read_mostly s32 cpuacct_batch;
> +
> static inline u64 global_rt_period(void)
> {
> return (u64)sysctl_sched_rt_period * NSEC_PER_USEC;
> @@ -9284,6 +9286,10 @@ void __init sched_init(void)
>
> perf_counter_init();
>
> +#ifdef CONFIGCONFIG_SMP
> + cpuacct_batch = jiffies_to_cputime(percpu_counter_batch);
> +#endif
Slow down and compile patches before sending them out.. please. That
is a basic expectation if you expect it to be merged.
--
Balbir
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-05-12 10:10 ` Balbir Singh
@ 2009-05-12 10:13 ` KOSAKI Motohiro
2009-05-12 10:24 ` Balbir Singh
0 siblings, 1 reply; 62+ messages in thread
From: KOSAKI Motohiro @ 2009-05-12 10:13 UTC (permalink / raw)
To: balbir
Cc: kosaki.motohiro, Ingo Molnar, mingo, hpa, linux-kernel,
a.p.zijlstra, schwidefsky, balajirrao, dhaval, bharata, tglx,
kamezawa.hiroyu, linux-tip-commits
> > +#ifdef CONFIGCONFIG_SMP
> > + cpuacct_batch = jiffies_to_cputime(percpu_counter_batch);
> > +#endif
>
> Slow down and compile patches before sending them out.. please. That
> is a basic expectation if you expect it to be merged.
Unfortunately, this mistake pass test successfully ;)
it because cpuacct_batch=0 works even SMP.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-05-12 10:06 ` KOSAKI Motohiro
@ 2009-05-12 10:22 ` KOSAKI Motohiro
2009-05-12 10:27 ` Ingo Molnar
0 siblings, 1 reply; 62+ messages in thread
From: KOSAKI Motohiro @ 2009-05-12 10:22 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: kosaki.motohiro, Peter Zijlstra, Ingo Molnar, mingo, hpa,
linux-kernel, schwidefsky, balajirrao, dhaval, balbir, bharata,
tglx, kamezawa.hiroyu, linux-tip-commits
> > On Tue, 2009-05-12 at 19:01 +0900, KOSAKI Motohiro wrote:
> > > +#ifdef CONFIGCONFIG_SMP
> >
> > typo ?
>
> yes, this week is my memorial stupid one ;)
ok, assemble list indicate current patch have no typo ;)
ffffffff814ae920: 48 c7 40 30 10 eb 2a movq $0xffffffff812aeb10,0x30(%rax)
ffffffff814ae927: 81
ffffffff814ae928: c7 05 3e 37 fe ff 01 movl $0x1,-0x1c8c2(%rip) # ffffffff81492070 <scheduler_running>
ffffffff814ae92f: 00 00 00
ffffffff814ae932: 8b 05 28 52 fe ff mov -0x1add8(%rip),%eax # ffffffff81493b60 <percpu_counter_batch>
ffffffff814ae938: 89 05 36 37 fe ff mov %eax,-0x1c8ca(%rip) # ffffffff81492074 <cpuacct_batch>
ffffffff814ae93e: 41 5c pop %r12
---------------------------------------------------------
Subject: [PATCH] cpuacct: Use bigger percpu counter batch values for stats counters on archs that have VIRT_CPU_ACCOUNTING=y
percpu counters used to accumulate statistics in cpuacct controller use
the default batch value [max(2*nr_cpus, 32)] which can be too small for
archs that define VIRT_CPU_ACCOUNTING. In such archs, a tick could result in
cputime updates in the range of thousands. As a result, cpuacct_update_stats()
would end up acquiring the percpu counter spinlock on every tick which
is not good for performance.
Let those architectures to have a bigger batch threshold so that percpu counter
spinlock isn't taken on every tick. This change doesn't affect the archs which
don't define VIRT_CPU_ACCOUNTING and they continue to have the default
percpu counter batch value.
Cc: Balaji Rao <balajirrao@gmail.com>
Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
kernel/sched.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
Index: b/kernel/sched.c
===================================================================
--- a/kernel/sched.c 2009-05-12 13:12:59.000000000 +0900
+++ b/kernel/sched.c 2009-05-12 19:04:49.000000000 +0900
@@ -870,6 +870,8 @@ static __read_mostly int scheduler_runni
*/
int sysctl_sched_rt_runtime = 950000;
+static __read_mostly s32 cpuacct_batch;
+
static inline u64 global_rt_period(void)
{
return (u64)sysctl_sched_rt_period * NSEC_PER_USEC;
@@ -9284,6 +9286,10 @@ void __init sched_init(void)
perf_counter_init();
+#ifdef CONFIG_SMP
+ cpuacct_batch = jiffies_to_cputime(percpu_counter_batch);
+#endif
+
scheduler_running = 1;
}
@@ -10457,7 +10463,8 @@ static void cpuacct_update_stats(struct
ca = task_ca(tsk);
do {
- percpu_counter_add(&ca->cpustat[idx], val);
+ __percpu_counter_add(&ca->cpustat[idx], val, cpuacct_batch);
+
ca = ca->parent;
} while (ca);
rcu_read_unlock();
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-05-12 10:13 ` KOSAKI Motohiro
@ 2009-05-12 10:24 ` Balbir Singh
2009-05-12 10:29 ` Ingo Molnar
0 siblings, 1 reply; 62+ messages in thread
From: Balbir Singh @ 2009-05-12 10:24 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Ingo Molnar, mingo, hpa, linux-kernel, a.p.zijlstra, schwidefsky,
balajirrao, dhaval, bharata, tglx, kamezawa.hiroyu,
linux-tip-commits
* KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> [2009-05-12 19:13:42]:
> > > +#ifdef CONFIGCONFIG_SMP
> > > + cpuacct_batch = jiffies_to_cputime(percpu_counter_batch);
> > > +#endif
> >
> > Slow down and compile patches before sending them out.. please. That
> > is a basic expectation if you expect it to be merged.
>
> Unfortunately, this mistake pass test successfully ;)
> it because cpuacct_batch=0 works even SMP.
>
OK, BTW, using an #ifdef right in the middle of a function makes the
code harder to read, can't we use an inline function to abstract out
SMP?
--
Balbir
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-05-12 10:22 ` KOSAKI Motohiro
@ 2009-05-12 10:27 ` Ingo Molnar
2009-05-12 10:42 ` KOSAKI Motohiro
0 siblings, 1 reply; 62+ messages in thread
From: Ingo Molnar @ 2009-05-12 10:27 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Peter Zijlstra, mingo, hpa, linux-kernel, schwidefsky,
balajirrao, dhaval, balbir, bharata, tglx, kamezawa.hiroyu,
linux-tip-commits
* KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> > > On Tue, 2009-05-12 at 19:01 +0900, KOSAKI Motohiro wrote:
> > > > +#ifdef CONFIGCONFIG_SMP
> > >
> > > typo ?
> >
> > yes, this week is my memorial stupid one ;)
>
> ok, assemble list indicate current patch have no typo ;)
>
> ffffffff814ae920: 48 c7 40 30 10 eb 2a movq $0xffffffff812aeb10,0x30(%rax)
> ffffffff814ae927: 81
> ffffffff814ae928: c7 05 3e 37 fe ff 01 movl $0x1,-0x1c8c2(%rip) # ffffffff81492070 <scheduler_running>
> ffffffff814ae92f: 00 00 00
> ffffffff814ae932: 8b 05 28 52 fe ff mov -0x1add8(%rip),%eax # ffffffff81493b60 <percpu_counter_batch>
> ffffffff814ae938: 89 05 36 37 fe ff mov %eax,-0x1c8ca(%rip) # ffffffff81492074 <cpuacct_batch>
> ffffffff814ae93e: 41 5c pop %r12
>
>
>
> ---------------------------------------------------------
> Subject: [PATCH] cpuacct: Use bigger percpu counter batch values for stats counters on archs that have VIRT_CPU_ACCOUNTING=y
>
> percpu counters used to accumulate statistics in cpuacct controller use
> the default batch value [max(2*nr_cpus, 32)] which can be too small for
> archs that define VIRT_CPU_ACCOUNTING. In such archs, a tick could result in
> cputime updates in the range of thousands. As a result, cpuacct_update_stats()
> would end up acquiring the percpu counter spinlock on every tick which
> is not good for performance.
>
> Let those architectures to have a bigger batch threshold so that percpu counter
> spinlock isn't taken on every tick. This change doesn't affect the archs which
> don't define VIRT_CPU_ACCOUNTING and they continue to have the default
> percpu counter batch value.
>
> Cc: Balaji Rao <balajirrao@gmail.com>
> Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
> kernel/sched.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
Please reuse the commit log formatting fixes i did when i applied
your patch. (you should still have that commit notification email)
Ingo
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-05-12 10:24 ` Balbir Singh
@ 2009-05-12 10:29 ` Ingo Molnar
2009-05-12 10:44 ` KOSAKI Motohiro
0 siblings, 1 reply; 62+ messages in thread
From: Ingo Molnar @ 2009-05-12 10:29 UTC (permalink / raw)
To: Balbir Singh
Cc: KOSAKI Motohiro, mingo, hpa, linux-kernel, a.p.zijlstra,
schwidefsky, balajirrao, dhaval, bharata, tglx, kamezawa.hiroyu,
linux-tip-commits
* Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> * KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> [2009-05-12 19:13:42]:
>
> > > > +#ifdef CONFIGCONFIG_SMP
> > > > + cpuacct_batch = jiffies_to_cputime(percpu_counter_batch);
> > > > +#endif
> > >
> > > Slow down and compile patches before sending them out.. please. That
> > > is a basic expectation if you expect it to be merged.
> >
> > Unfortunately, this mistake pass test successfully ;)
> > it because cpuacct_batch=0 works even SMP.
> >
>
> OK, BTW, using an #ifdef right in the middle of a function makes
> the code harder to read, can't we use an inline function to
> abstract out SMP?
or rather, to make cpuacct_batch have a sane value on UP too. (1?
0?)
Ingo
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-05-12 10:27 ` Ingo Molnar
@ 2009-05-12 10:42 ` KOSAKI Motohiro
0 siblings, 0 replies; 62+ messages in thread
From: KOSAKI Motohiro @ 2009-05-12 10:42 UTC (permalink / raw)
To: Ingo Molnar
Cc: kosaki.motohiro, Peter Zijlstra, mingo, hpa, linux-kernel,
schwidefsky, balajirrao, dhaval, balbir, bharata, tglx,
kamezawa.hiroyu, linux-tip-commits
> > Cc: Balaji Rao <balajirrao@gmail.com>
> > Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
> > Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
> > Cc: Ingo Molnar <mingo@elte.hu>
> > Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > ---
> > kernel/sched.c | 9 ++++++++-
> > 1 file changed, 8 insertions(+), 1 deletion(-)
>
> Please reuse the commit log formatting fixes i did when i applied
> your patch. (you should still have that commit notification email)
Oh, I didn't notice this change. I'll resend it. thanks
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-05-12 10:29 ` Ingo Molnar
@ 2009-05-12 10:44 ` KOSAKI Motohiro
2009-05-12 13:28 ` Ingo Molnar
2009-07-16 8:10 ` Bharata B Rao
0 siblings, 2 replies; 62+ messages in thread
From: KOSAKI Motohiro @ 2009-05-12 10:44 UTC (permalink / raw)
To: Ingo Molnar
Cc: kosaki.motohiro, Balbir Singh, mingo, hpa, linux-kernel,
a.p.zijlstra, schwidefsky, balajirrao, dhaval, bharata, tglx,
kamezawa.hiroyu, linux-tip-commits
>
> * Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>
> > * KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> [2009-05-12 19:13:42]:
> >
> > > > > +#ifdef CONFIGCONFIG_SMP
> > > > > + cpuacct_batch = jiffies_to_cputime(percpu_counter_batch);
> > > > > +#endif
> > > >
> > > > Slow down and compile patches before sending them out.. please. That
> > > > is a basic expectation if you expect it to be merged.
> > >
> > > Unfortunately, this mistake pass test successfully ;)
> > > it because cpuacct_batch=0 works even SMP.
> > >
> >
> > OK, BTW, using an #ifdef right in the middle of a function makes
> > the code harder to read, can't we use an inline function to
> > abstract out SMP?
>
> or rather, to make cpuacct_batch have a sane value on UP too. (1?
> 0?)
umm..
I've reviewed my patch again.
but sched_init() already has multiple #ifdef SMP. Thus I don't think
cosmetic changing improve readability largely.
------------------------------------
Subject: [PATCH] sched: cpuacct: Use bigger percpu counter batch values for stats counters
percpu counters used to accumulate statistics in cpuacct controller use
the default batch value [max(2*nr_cpus, 32)] which can be too small for
archs that define VIRT_CPU_ACCOUNTING. In such archs, a tick could result in
cputime updates in the range of thousands. As a result, cpuacct_update_stats()
would end up acquiring the percpu counter spinlock on every tick which
is not good for performance.
Let those architectures to have a bigger batch threshold so that percpu counter
spinlock isn't taken on every tick. This change doesn't affect the archs which
don't define VIRT_CPU_ACCOUNTING and they continue to have the default
percpu counter batch value.
v7:
- fix typo and changelog
v6:
- fix build error when UP
v5:
- move cpuacct_batch initialization into sched_init()
v4:
- rewrite patch description (thanks Bharata!)
- append read_mostly to cpuacct_batch
- cpuacct_batch is initialized by sched_init_debug()
v3:
- revert using percpu_counter_sum()
v2:
- use percpu_counter_sum() instead percpu_counter_read()
Cc: Balaji Rao <balajirrao@gmail.com>
Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
kernel/sched.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
Index: b/kernel/sched.c
===================================================================
--- a/kernel/sched.c 2009-05-12 13:12:59.000000000 +0900
+++ b/kernel/sched.c 2009-05-12 19:04:49.000000000 +0900
@@ -870,6 +870,8 @@ static __read_mostly int scheduler_runni
*/
int sysctl_sched_rt_runtime = 950000;
+static __read_mostly s32 cpuacct_batch;
+
static inline u64 global_rt_period(void)
{
return (u64)sysctl_sched_rt_period * NSEC_PER_USEC;
@@ -9284,6 +9286,10 @@ void __init sched_init(void)
perf_counter_init();
+#ifdef CONFIG_SMP
+ cpuacct_batch = jiffies_to_cputime(percpu_counter_batch);
+#endif
+
scheduler_running = 1;
}
@@ -10457,7 +10463,8 @@ static void cpuacct_update_stats(struct
ca = task_ca(tsk);
do {
- percpu_counter_add(&ca->cpustat[idx], val);
+ __percpu_counter_add(&ca->cpustat[idx], val, cpuacct_batch);
+
ca = ca->parent;
} while (ca);
rcu_read_unlock();
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-05-12 10:44 ` KOSAKI Motohiro
@ 2009-05-12 13:28 ` Ingo Molnar
2009-05-12 13:40 ` KOSAKI Motohiro
2009-07-16 8:10 ` Bharata B Rao
1 sibling, 1 reply; 62+ messages in thread
From: Ingo Molnar @ 2009-05-12 13:28 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Balbir Singh, mingo, hpa, linux-kernel, a.p.zijlstra,
schwidefsky, balajirrao, dhaval, bharata, tglx, kamezawa.hiroyu,
linux-tip-commits
* KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> > or rather, to make cpuacct_batch have a sane value on UP too.
> > (1? 0?)
>
> umm..
>
> I've reviewed my patch again.
> but sched_init() already has multiple #ifdef SMP. [...]
Patches are welcome to remove more of them.
> [...] Thus I don't think cosmetic changing improve readability
> largely.
an avoidable #ifdef should aways be avoided.
Ingo
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-05-12 13:28 ` Ingo Molnar
@ 2009-05-12 13:40 ` KOSAKI Motohiro
2009-05-12 13:42 ` Ingo Molnar
0 siblings, 1 reply; 62+ messages in thread
From: KOSAKI Motohiro @ 2009-05-12 13:40 UTC (permalink / raw)
To: Ingo Molnar
Cc: Balbir Singh, mingo, hpa, linux-kernel, a.p.zijlstra,
schwidefsky, balajirrao, dhaval, bharata, tglx, kamezawa.hiroyu,
linux-tip-commits
2009/5/12 Ingo Molnar <mingo@elte.hu>:
>
> * KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
>
>> > or rather, to make cpuacct_batch have a sane value on UP too.
>> > (1? 0?)
>>
>> umm..
>>
>> I've reviewed my patch again.
>> but sched_init() already has multiple #ifdef SMP. [...]
>
> Patches are welcome to remove more of them.
>
>> [...] Thus I don't think cosmetic changing improve readability
>> largely.
>
> an avoidable #ifdef should aways be avoided.
ok. I'll fix it tommorow.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-05-12 13:40 ` KOSAKI Motohiro
@ 2009-05-12 13:42 ` Ingo Molnar
0 siblings, 0 replies; 62+ messages in thread
From: Ingo Molnar @ 2009-05-12 13:42 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Balbir Singh, mingo, hpa, linux-kernel, a.p.zijlstra,
schwidefsky, balajirrao, dhaval, bharata, tglx, kamezawa.hiroyu,
linux-tip-commits
* KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> 2009/5/12 Ingo Molnar <mingo@elte.hu>:
> >
> > * KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> >
> >> > or rather, to make cpuacct_batch have a sane value on UP too.
> >> > (1? 0?)
> >>
> >> umm..
> >>
> >> I've reviewed my patch again.
> >> but sched_init() already has multiple #ifdef SMP. [...]
> >
> > Patches are welcome to remove more of them.
> >
> >> [...] Thus I don't think cosmetic changing improve readability
> >> largely.
> >
> > an avoidable #ifdef should aways be avoided.
>
> ok. I'll fix it tommorow.
With the caveat that you should do it only if i'm right and if it
results in better code in general.
Ingo
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-05-12 10:44 ` KOSAKI Motohiro
2009-05-12 13:28 ` Ingo Molnar
@ 2009-07-16 8:10 ` Bharata B Rao
2009-07-16 8:39 ` Anton Blanchard
1 sibling, 1 reply; 62+ messages in thread
From: Bharata B Rao @ 2009-07-16 8:10 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Ingo Molnar, Balbir Singh, mingo, hpa, linux-kernel,
a.p.zijlstra, schwidefsky, balajirrao, dhaval, tglx,
kamezawa.hiroyu, linux-tip-commits, Anton Blanchard
On Tue, May 12, 2009 at 07:44:37PM +0900, KOSAKI Motohiro wrote:
> ------------------------------------
> Subject: [PATCH] sched: cpuacct: Use bigger percpu counter batch values for stats counters
>
> percpu counters used to accumulate statistics in cpuacct controller use
> the default batch value [max(2*nr_cpus, 32)] which can be too small for
> archs that define VIRT_CPU_ACCOUNTING. In such archs, a tick could result in
> cputime updates in the range of thousands. As a result, cpuacct_update_stats()
> would end up acquiring the percpu counter spinlock on every tick which
> is not good for performance.
>
> Let those architectures to have a bigger batch threshold so that percpu counter
> spinlock isn't taken on every tick. This change doesn't affect the archs which
> don't define VIRT_CPU_ACCOUNTING and they continue to have the default
> percpu counter batch value.
>
> v7:
> - fix typo and changelog
>
> v6:
> - fix build error when UP
>
> v5:
> - move cpuacct_batch initialization into sched_init()
>
> v4:
> - rewrite patch description (thanks Bharata!)
> - append read_mostly to cpuacct_batch
> - cpuacct_batch is initialized by sched_init_debug()
>
> v3:
> - revert using percpu_counter_sum()
>
> v2:
> - use percpu_counter_sum() instead percpu_counter_read()
>
> Cc: Balaji Rao <balajirrao@gmail.com>
> Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
> kernel/sched.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> Index: b/kernel/sched.c
> ===================================================================
> --- a/kernel/sched.c 2009-05-12 13:12:59.000000000 +0900
> +++ b/kernel/sched.c 2009-05-12 19:04:49.000000000 +0900
> @@ -870,6 +870,8 @@ static __read_mostly int scheduler_runni
> */
> int sysctl_sched_rt_runtime = 950000;
>
> +static __read_mostly s32 cpuacct_batch;
> +
> static inline u64 global_rt_period(void)
> {
> return (u64)sysctl_sched_rt_period * NSEC_PER_USEC;
> @@ -9284,6 +9286,10 @@ void __init sched_init(void)
>
> perf_counter_init();
>
> +#ifdef CONFIG_SMP
> + cpuacct_batch = jiffies_to_cputime(percpu_counter_batch);
> +#endif
On ppc64, calling jiffies_to_cputime() from sched_init() is too early because
jiffies_to_cputime() needs tb_ticks_per_sec which gets initialized only
later in time_init(). Because of this I see that cpuacct_batch will always
be zero effectively negating what this patch is trying to do.
As explained by you earlier, we too are finding the default batch value to
be too low for ppc64 with VIRT_CPU_ACCOUNTING turned on. Hence I guess
if this patch is taken in (ofcourse with the above issue fixed), it will
benefit ppc64 also.
Regards,
Bharata.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-07-16 8:10 ` Bharata B Rao
@ 2009-07-16 8:39 ` Anton Blanchard
2009-08-20 5:10 ` Anton Blanchard
0 siblings, 1 reply; 62+ messages in thread
From: Anton Blanchard @ 2009-07-16 8:39 UTC (permalink / raw)
To: Bharata B Rao
Cc: KOSAKI Motohiro, Ingo Molnar, Balbir Singh, mingo, hpa,
linux-kernel, a.p.zijlstra, schwidefsky, balajirrao, dhaval,
tglx, kamezawa.hiroyu, linux-tip-commits
Hi,
> On ppc64, calling jiffies_to_cputime() from sched_init() is too early because
> jiffies_to_cputime() needs tb_ticks_per_sec which gets initialized only
> later in time_init(). Because of this I see that cpuacct_batch will always
> be zero effectively negating what this patch is trying to do.
>
> As explained by you earlier, we too are finding the default batch value to
> be too low for ppc64 with VIRT_CPU_ACCOUNTING turned on. Hence I guess
> if this patch is taken in (ofcourse with the above issue fixed), it will
> benefit ppc64 also.
I created this patch earlier today when I hit the problem. Thoughts?
Anton
--
When CONFIG_VIRT_CPU_ACCOUNTING is enabled we can call cpuacct_update_stats
with values much larger than percpu_counter_batch. This means the
call to percpu_counter_add will always add to the global count which is
protected by a spinlock.
Since reading of the CPU accounting cgroup counters is not performance
critical, we can use a maximum size batch of INT_MAX and use
percpu_counter_sum on the read side which will add all the percpu
counters.
With this patch an 8 core POWER6 with CONFIG_VIRT_CPU_ACCOUNTING and
CONFIG_CGROUP_CPUACCT shows an improvement in aggregate context switch rate of
397k/sec to 3.9M/sec, a 10x improvement.
Signed-off-by: Anton Blanchard <anton@samba.org>
---
Index: linux.trees.git/kernel/sched.c
===================================================================
--- linux.trees.git.orig/kernel/sched.c 2009-07-16 10:11:02.000000000 +1000
+++ linux.trees.git/kernel/sched.c 2009-07-16 10:16:41.000000000 +1000
@@ -10551,7 +10551,7 @@
int i;
for (i = 0; i < CPUACCT_STAT_NSTATS; i++) {
- s64 val = percpu_counter_read(&ca->cpustat[i]);
+ s64 val = percpu_counter_sum(&ca->cpustat[i]);
val = cputime64_to_clock_t(val);
cb->fill(cb, cpuacct_stat_desc[i], val);
}
@@ -10621,7 +10621,7 @@
ca = task_ca(tsk);
do {
- percpu_counter_add(&ca->cpustat[idx], val);
+ __percpu_counter_add(&ca->cpustat[idx], val, INT_MAX);
ca = ca->parent;
} while (ca);
rcu_read_unlock();
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-07-16 8:39 ` Anton Blanchard
@ 2009-08-20 5:10 ` Anton Blanchard
2009-08-20 5:16 ` KAMEZAWA Hiroyuki
` (2 more replies)
0 siblings, 3 replies; 62+ messages in thread
From: Anton Blanchard @ 2009-08-20 5:10 UTC (permalink / raw)
To: Bharata B Rao
Cc: KOSAKI Motohiro, Ingo Molnar, Balbir Singh, mingo, hpa,
linux-kernel, a.p.zijlstra, schwidefsky, balajirrao, dhaval,
tglx, kamezawa.hiroyu, akpm
Hi,
Looks like this issue is still present. I tested on a 32 core box and
the patch improved maximum context switch rate from from 76k/sec to 9.5M/sec.
Thats over 100x faster, or 50x per line of code. That's got to be some sort of
record :)
Any chance we can get a fix in for 2.6.31? Don't make me find an even bigger
box so I can break the 200x mark :)
Anton
> --
>
> When CONFIG_VIRT_CPU_ACCOUNTING is enabled we can call cpuacct_update_stats
> with values much larger than percpu_counter_batch. This means the
> call to percpu_counter_add will always add to the global count which is
> protected by a spinlock.
>
> Since reading of the CPU accounting cgroup counters is not performance
> critical, we can use a maximum size batch of INT_MAX and use
> percpu_counter_sum on the read side which will add all the percpu
> counters.
>
> With this patch an 8 core POWER6 with CONFIG_VIRT_CPU_ACCOUNTING and
> CONFIG_CGROUP_CPUACCT shows an improvement in aggregate context switch rate of
> 397k/sec to 3.9M/sec, a 10x improvement.
>
> Signed-off-by: Anton Blanchard <anton@samba.org>
> ---
>
> Index: linux.trees.git/kernel/sched.c
> ===================================================================
> --- linux.trees.git.orig/kernel/sched.c 2009-07-16 10:11:02.000000000 +1000
> +++ linux.trees.git/kernel/sched.c 2009-07-16 10:16:41.000000000 +1000
> @@ -10551,7 +10551,7 @@
> int i;
>
> for (i = 0; i < CPUACCT_STAT_NSTATS; i++) {
> - s64 val = percpu_counter_read(&ca->cpustat[i]);
> + s64 val = percpu_counter_sum(&ca->cpustat[i]);
> val = cputime64_to_clock_t(val);
> cb->fill(cb, cpuacct_stat_desc[i], val);
> }
> @@ -10621,7 +10621,7 @@
> ca = task_ca(tsk);
>
> do {
> - percpu_counter_add(&ca->cpustat[idx], val);
> + __percpu_counter_add(&ca->cpustat[idx], val, INT_MAX);
> ca = ca->parent;
> } while (ca);
> rcu_read_unlock();
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-08-20 5:10 ` Anton Blanchard
@ 2009-08-20 5:16 ` KAMEZAWA Hiroyuki
2009-08-20 5:26 ` Balbir Singh
2010-01-18 4:41 ` [PATCH] " Anton Blanchard
2 siblings, 0 replies; 62+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-08-20 5:16 UTC (permalink / raw)
To: Anton Blanchard
Cc: Bharata B Rao, KOSAKI Motohiro, Ingo Molnar, Balbir Singh, mingo,
hpa, linux-kernel, a.p.zijlstra, schwidefsky, balajirrao, dhaval,
tglx, akpm
On Thu, 20 Aug 2009 15:10:38 +1000
Anton Blanchard <anton@samba.org> wrote:
>
> Hi,
>
> Looks like this issue is still present. I tested on a 32 core box and
> the patch improved maximum context switch rate from from 76k/sec to 9.5M/sec.
> Thats over 100x faster, or 50x per line of code. That's got to be some sort of
> record :)
>
> Any chance we can get a fix in for 2.6.31? Don't make me find an even bigger
> box so I can break the 200x mark :)
>
I like the patch itself, I think it does a correct fix.
Hmm, but, IIRC, there was a discssion "what happens at reading
this counter once per 1ms ?" or some.
What overhead at read on big system ? Can you show it ?
Thanks,
-Kame
> Anton
>
> > --
> >
> > When CONFIG_VIRT_CPU_ACCOUNTING is enabled we can call cpuacct_update_stats
> > with values much larger than percpu_counter_batch. This means the
> > call to percpu_counter_add will always add to the global count which is
> > protected by a spinlock.
> >
> > Since reading of the CPU accounting cgroup counters is not performance
> > critical, we can use a maximum size batch of INT_MAX and use
> > percpu_counter_sum on the read side which will add all the percpu
> > counters.
> >
> > With this patch an 8 core POWER6 with CONFIG_VIRT_CPU_ACCOUNTING and
> > CONFIG_CGROUP_CPUACCT shows an improvement in aggregate context switch rate of
> > 397k/sec to 3.9M/sec, a 10x improvement.
> >
> > Signed-off-by: Anton Blanchard <anton@samba.org>
> > ---
> >
> > Index: linux.trees.git/kernel/sched.c
> > ===================================================================
> > --- linux.trees.git.orig/kernel/sched.c 2009-07-16 10:11:02.000000000 +1000
> > +++ linux.trees.git/kernel/sched.c 2009-07-16 10:16:41.000000000 +1000
> > @@ -10551,7 +10551,7 @@
> > int i;
> >
> > for (i = 0; i < CPUACCT_STAT_NSTATS; i++) {
> > - s64 val = percpu_counter_read(&ca->cpustat[i]);
> > + s64 val = percpu_counter_sum(&ca->cpustat[i]);
> > val = cputime64_to_clock_t(val);
> > cb->fill(cb, cpuacct_stat_desc[i], val);
> > }
> > @@ -10621,7 +10621,7 @@
> > ca = task_ca(tsk);
> >
> > do {
> > - percpu_counter_add(&ca->cpustat[idx], val);
> > + __percpu_counter_add(&ca->cpustat[idx], val, INT_MAX);
> > ca = ca->parent;
> > } while (ca);
> > rcu_read_unlock();
>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-08-20 5:10 ` Anton Blanchard
2009-08-20 5:16 ` KAMEZAWA Hiroyuki
@ 2009-08-20 5:26 ` Balbir Singh
2009-08-20 5:52 ` Peter Zijlstra
2010-01-18 4:41 ` [PATCH] " Anton Blanchard
2 siblings, 1 reply; 62+ messages in thread
From: Balbir Singh @ 2009-08-20 5:26 UTC (permalink / raw)
To: Anton Blanchard
Cc: Bharata B Rao, KOSAKI Motohiro, Ingo Molnar, mingo, hpa,
linux-kernel, a.p.zijlstra, schwidefsky, balajirrao, dhaval,
tglx, kamezawa.hiroyu, akpm
* Anton Blanchard <anton@samba.org> [2009-08-20 15:10:38]:
>
> Hi,
>
> Looks like this issue is still present. I tested on a 32 core box and
> the patch improved maximum context switch rate from from 76k/sec to 9.5M/sec.
> Thats over 100x faster, or 50x per line of code. That's got to be some sort of
> record :)
>
> Any chance we can get a fix in for 2.6.31? Don't make me find an even bigger
> box so I can break the 200x mark :)
>
> Anton
>
> > --
> >
> > When CONFIG_VIRT_CPU_ACCOUNTING is enabled we can call cpuacct_update_stats
> > with values much larger than percpu_counter_batch. This means the
> > call to percpu_counter_add will always add to the global count which is
> > protected by a spinlock.
> >
> > Since reading of the CPU accounting cgroup counters is not performance
> > critical, we can use a maximum size batch of INT_MAX and use
> > percpu_counter_sum on the read side which will add all the percpu
> > counters.
> >
> > With this patch an 8 core POWER6 with CONFIG_VIRT_CPU_ACCOUNTING and
> > CONFIG_CGROUP_CPUACCT shows an improvement in aggregate context switch rate of
> > 397k/sec to 3.9M/sec, a 10x improvement.
> >
Looks good overall, why not keep the batch size conditional on
CONFIG_VIRT_CPU_ACCOUNTING? I'd still like to stick with
percpu_counter_read() on the read side because My concern is that a
bad user space application can read cpuacct.stat and bring the kernel
to its knees.
> > Signed-off-by: Anton Blanchard <anton@samba.org>
> > ---
> >
> > Index: linux.trees.git/kernel/sched.c
> > ===================================================================
> > --- linux.trees.git.orig/kernel/sched.c 2009-07-16 10:11:02.000000000 +1000
> > +++ linux.trees.git/kernel/sched.c 2009-07-16 10:16:41.000000000 +1000
> > @@ -10551,7 +10551,7 @@
> > int i;
> >
> > for (i = 0; i < CPUACCT_STAT_NSTATS; i++) {
> > - s64 val = percpu_counter_read(&ca->cpustat[i]);
> > + s64 val = percpu_counter_sum(&ca->cpustat[i]);
> > val = cputime64_to_clock_t(val);
> > cb->fill(cb, cpuacct_stat_desc[i], val);
> > }
> > @@ -10621,7 +10621,7 @@
> > ca = task_ca(tsk);
> >
> > do {
> > - percpu_counter_add(&ca->cpustat[idx], val);
> > + __percpu_counter_add(&ca->cpustat[idx], val, INT_MAX);
> > ca = ca->parent;
> > } while (ca);
> > rcu_read_unlock();
--
Balbir
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-08-20 5:26 ` Balbir Singh
@ 2009-08-20 5:52 ` Peter Zijlstra
2009-08-20 6:05 ` Anton Blanchard
0 siblings, 1 reply; 62+ messages in thread
From: Peter Zijlstra @ 2009-08-20 5:52 UTC (permalink / raw)
To: balbir
Cc: Anton Blanchard, Bharata B Rao, KOSAKI Motohiro, Ingo Molnar,
mingo, hpa, linux-kernel, schwidefsky, balajirrao, dhaval, tglx,
kamezawa.hiroyu, akpm
On Thu, 2009-08-20 at 10:56 +0530, Balbir Singh wrote:
> Looks good overall, why not keep the batch size conditional on
> CONFIG_VIRT_CPU_ACCOUNTING? I'd still like to stick with
> percpu_counter_read() on the read side because My concern is that a
> bad user space application can read cpuacct.stat and bring the kernel
> to its knees.
Agreed, the first hunk is dangerous, on the second hunk, I'm not sure
about the INT_MAX thing, that's a 4s gap per cpu, that might be a tad
much.
> > > Signed-off-by: Anton Blanchard <anton@samba.org>
> > > ---
> > >
> > > Index: linux.trees.git/kernel/sched.c
> > > ===================================================================
> > > --- linux.trees.git.orig/kernel/sched.c 2009-07-16 10:11:02.000000000 +1000
> > > +++ linux.trees.git/kernel/sched.c 2009-07-16 10:16:41.000000000 +1000
> > > @@ -10551,7 +10551,7 @@
> > > int i;
> > >
> > > for (i = 0; i < CPUACCT_STAT_NSTATS; i++) {
> > > - s64 val = percpu_counter_read(&ca->cpustat[i]);
> > > + s64 val = percpu_counter_sum(&ca->cpustat[i]);
> > > val = cputime64_to_clock_t(val);
> > > cb->fill(cb, cpuacct_stat_desc[i], val);
> > > }
> > > @@ -10621,7 +10621,7 @@
> > > ca = task_ca(tsk);
> > >
> > > do {
> > > - percpu_counter_add(&ca->cpustat[idx], val);
> > > + __percpu_counter_add(&ca->cpustat[idx], val, INT_MAX);
> > > ca = ca->parent;
> > > } while (ca);
> > > rcu_read_unlock();
>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-08-20 5:52 ` Peter Zijlstra
@ 2009-08-20 6:05 ` Anton Blanchard
2009-08-20 6:10 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 62+ messages in thread
From: Anton Blanchard @ 2009-08-20 6:05 UTC (permalink / raw)
To: Peter Zijlstra
Cc: balbir, Bharata B Rao, KOSAKI Motohiro, Ingo Molnar, mingo, hpa,
linux-kernel, schwidefsky, balajirrao, dhaval, tglx,
kamezawa.hiroyu, akpm
Hi,
> Agreed, the first hunk is dangerous, on the second hunk, I'm not sure
> about the INT_MAX thing, that's a 4s gap per cpu, that might be a tad
> much.
Yeah I somehow missed the fact we read it every timer tick. I was hoping
for an easy fix but I guess we just need to set a reasonable batch size
at runtime.
Anton
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-08-20 6:05 ` Anton Blanchard
@ 2009-08-20 6:10 ` KAMEZAWA Hiroyuki
2009-08-20 6:24 ` Anton Blanchard
0 siblings, 1 reply; 62+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-08-20 6:10 UTC (permalink / raw)
To: Anton Blanchard
Cc: Peter Zijlstra, balbir, Bharata B Rao, KOSAKI Motohiro,
Ingo Molnar, mingo, hpa, linux-kernel, schwidefsky, balajirrao,
dhaval, tglx, akpm
On Thu, 20 Aug 2009 16:05:28 +1000
Anton Blanchard <anton@samba.org> wrote:
>
> Hi,
>
> > Agreed, the first hunk is dangerous, on the second hunk, I'm not sure
> > about the INT_MAX thing, that's a 4s gap per cpu, that might be a tad
> > much.
>
> Yeah I somehow missed the fact we read it every timer tick. I was hoping
> for an easy fix but I guess we just need to set a reasonable batch size
> at runtime.
>
Could you share contex-switch-test program ?
I'd like to play with it to find out what I can do against percpu counter.
Thanks,
-Kame
> Anton
>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-08-20 6:10 ` KAMEZAWA Hiroyuki
@ 2009-08-20 6:24 ` Anton Blanchard
2009-08-20 8:41 ` [PATCH] better align percpu counter (Was " KAMEZAWA Hiroyuki
` (2 more replies)
0 siblings, 3 replies; 62+ messages in thread
From: Anton Blanchard @ 2009-08-20 6:24 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Peter Zijlstra, balbir, Bharata B Rao, KOSAKI Motohiro,
Ingo Molnar, mingo, hpa, linux-kernel, schwidefsky, balajirrao,
dhaval, tglx, akpm
Hi,
> Could you share contex-switch-test program ?
> I'd like to play with it to find out what I can do against percpu counter.
Sure:
http://ozlabs.org/~anton/junkcode/context_switch.c
Very simple, just run it once per core:
for i in `seq 0 31`
do
taskset -c $i ./context_switch &
done
Then look at the context switch rates in vmstat.
Anton
^ permalink raw reply [flat|nested] 62+ messages in thread
* [PATCH] better align percpu counter (Was Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-08-20 6:24 ` Anton Blanchard
@ 2009-08-20 8:41 ` KAMEZAWA Hiroyuki
2009-08-20 10:04 ` Ingo Molnar
2009-08-21 3:56 ` KAMEZAWA Hiroyuki
2009-08-20 11:46 ` Balbir Singh
2009-08-21 6:21 ` KAMEZAWA Hiroyuki
2 siblings, 2 replies; 62+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-08-20 8:41 UTC (permalink / raw)
To: Anton Blanchard
Cc: Peter Zijlstra, balbir, Bharata B Rao, KOSAKI Motohiro,
Ingo Molnar, mingo, hpa, linux-kernel, schwidefsky, balajirrao,
dhaval, tglx, akpm
On Thu, 20 Aug 2009 16:24:51 +1000
Anton Blanchard <anton@samba.org> wrote:
>
> Hi,
>
> > Could you share contex-switch-test program ?
> > I'd like to play with it to find out what I can do against percpu counter.
>
> Sure:
>
> http://ozlabs.org/~anton/junkcode/context_switch.c
>
> Very simple, just run it once per core:
>
> for i in `seq 0 31`
> do
> taskset -c $i ./context_switch &
> done
>
> Then look at the context switch rates in vmstat.
>
Thank you for test program.
Before adjusting batch counter (I think you should modify it),
Could you try this ?
I only have 8cpu(2socket) host but works well.
(But...my host is x86-64 and has not virt-cpu-accouting.)
with your program
before patch.
cpuacct off : 414000-416000 ctsw per sec.
cpuacct on : 401000-404000 ctsw per sec.
after patch
cpuacct on : 412000-413000 ctsw per sec.
Maybe I should check cache-miss late ;)
==
It's bad to place pointer for array of per-cpu-data on the
same cache line of spinlock. This patch moves percpu_counter's
cacheline to reduce false sharing.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
include/linux/percpu_counter.h | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
Index: linux-2.6.31-rc6/include/linux/percpu_counter.h
===================================================================
--- linux-2.6.31-rc6.orig/include/linux/percpu_counter.h 2009-08-20 12:09:27.000000000 +0900
+++ linux-2.6.31-rc6/include/linux/percpu_counter.h 2009-08-20 17:31:13.000000000 +0900
@@ -14,14 +14,24 @@
#include <linux/types.h>
#ifdef CONFIG_SMP
+struct __percpu_counter_padding {
+ char x[0];
+} ____cacheline_internodealigned_in_smp;
+#define CACHELINE_PADDING(name) struct __percpu_counter_padding name
struct percpu_counter {
+ /*
+ * This pointer is persistent and accessed firstly.
+ * Then, should not be purged by locking in other cpus.
+ */
+ s32 *counters;
+ CACHELINE_PADDING(pad);
spinlock_t lock;
s64 count;
#ifdef CONFIG_HOTPLUG_CPU
+ /* rarely accessed field */
struct list_head list; /* All percpu_counters are on a list */
#endif
- s32 *counters;
};
extern int percpu_counter_batch;
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] better align percpu counter (Was Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-08-20 8:41 ` [PATCH] better align percpu counter (Was " KAMEZAWA Hiroyuki
@ 2009-08-20 10:04 ` Ingo Molnar
2009-08-21 2:20 ` KAMEZAWA Hiroyuki
2009-08-21 3:56 ` KAMEZAWA Hiroyuki
1 sibling, 1 reply; 62+ messages in thread
From: Ingo Molnar @ 2009-08-20 10:04 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Anton Blanchard, Peter Zijlstra, balbir, Bharata B Rao,
KOSAKI Motohiro, mingo, hpa, linux-kernel, schwidefsky,
balajirrao, dhaval, tglx, akpm
* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Thu, 20 Aug 2009 16:24:51 +1000
> Anton Blanchard <anton@samba.org> wrote:
>
> >
> > Hi,
> >
> > > Could you share contex-switch-test program ?
> > > I'd like to play with it to find out what I can do against percpu counter.
> >
> > Sure:
> >
> > http://ozlabs.org/~anton/junkcode/context_switch.c
> >
> > Very simple, just run it once per core:
> >
> > for i in `seq 0 31`
> > do
> > taskset -c $i ./context_switch &
> > done
> >
> > Then look at the context switch rates in vmstat.
> >
> Thank you for test program.
>
> Before adjusting batch counter (I think you should modify it),
> Could you try this ?
>
> I only have 8cpu(2socket) host but works well.
> (But...my host is x86-64 and has not virt-cpu-accouting.)
>
> with your program
> before patch.
> cpuacct off : 414000-416000 ctsw per sec.
> cpuacct on : 401000-404000 ctsw per sec.
>
> after patch
> cpuacct on : 412000-413000 ctsw per sec.
>
> Maybe I should check cache-miss late ;)
Btw., in latest upstream you can do that via:
cd tools/perf/
make -j install
perf stat --repeat 5 -- taskset -c 1 ./context_switch
there will be cache-miss and other stats, error bars so
you can compare the before/after better, etc.
Ingo
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-08-20 6:24 ` Anton Blanchard
2009-08-20 8:41 ` [PATCH] better align percpu counter (Was " KAMEZAWA Hiroyuki
@ 2009-08-20 11:46 ` Balbir Singh
2009-08-21 6:21 ` KAMEZAWA Hiroyuki
2 siblings, 0 replies; 62+ messages in thread
From: Balbir Singh @ 2009-08-20 11:46 UTC (permalink / raw)
To: Anton Blanchard
Cc: KAMEZAWA Hiroyuki, Peter Zijlstra, Bharata B Rao,
KOSAKI Motohiro, Ingo Molnar, mingo, hpa, linux-kernel,
schwidefsky, balajirrao, dhaval, tglx, akpm
Hi, Anton,
Do you have any data on what level of batching is good for
VIRT_CPU_ACCOUNTING? If it is going to be as high as INT_MAX, then
your patch seems to be the best way to go forward. We'll need to herd
away bad users who read cpuacct.stat frequently enough to create a
problem.
Balbir Singh.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] better align percpu counter (Was Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-08-20 10:04 ` Ingo Molnar
@ 2009-08-21 2:20 ` KAMEZAWA Hiroyuki
2009-08-21 11:29 ` Ingo Molnar
0 siblings, 1 reply; 62+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-08-21 2:20 UTC (permalink / raw)
To: Ingo Molnar
Cc: Anton Blanchard, Peter Zijlstra, balbir, Bharata B Rao,
KOSAKI Motohiro, mingo, hpa, linux-kernel, schwidefsky,
balajirrao, dhaval, tglx, akpm
On Thu, 20 Aug 2009 12:04:03 +0200
Ingo Molnar <mingo@elte.hu> wrote:
> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > with your program
> > before patch.
> > cpuacct off : 414000-416000 ctsw per sec.
> > cpuacct on : 401000-404000 ctsw per sec.
> >
> > after patch
> > cpuacct on : 412000-413000 ctsw per sec.
> >
> > Maybe I should check cache-miss late ;)
>
> Btw., in latest upstream you can do that via:
>
> cd tools/perf/
> make -j install
>
> perf stat --repeat 5 -- taskset -c 1 ./context_switch
>
tried. (on 8cpu/2socket host). It seems cache-miss decreases.
But IPC ..?
==
/root/bin/perf stat --repeat 5 -a -e cycles,instructions,cache-misses,L1-dcache-load-misses,L1-dcache-store-misses -- ./ctxt_sw.sh
[Before] patch
Performance counter stats for './ctxt_sw.sh' (5 runs):
1511260148530 cycles ( +- 0.025% ) (scaled from 63.49% )
470690642181 instructions # 0.311 IPC ( +- 0.098% )(scaled from 79.49%)
1210051728 cache-misses ( +- 0.629% ) (scaled from 79.00% )
3202978828 L1-dcache-load-misses ( +- 1.118% ) (scaled from 78.00% )
1803963907 L1-dcache-store-misses ( +- 0.728% ) (scaled from 42.99% )
60.161941918 seconds time elapsed ( +- 0.029% )
[After] patch
Performance counter stats for './ctxt_sw.sh' (5 runs):
1511961867506 cycles ( +- 0.018% ) (scaled from 71.50%)
448724406149 instructions # 0.297 IPC ( +- 0.133% ) (scaled from 75.49%)
1184548041 cache-misses ( +- 0.136% ) (scaled from 75.50%)
3086357048 L1-dcache-load-misses ( +- 0.822% ) (scaled from 77.50%)
1708375493 L1-dcache-store-misses ( +- 0.328% ) (scaled from 47.00%)
60.179814774 seconds time elapsed ( +- 0.052% )
Thanks,
-Kame
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] better align percpu counter (Was Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-08-20 8:41 ` [PATCH] better align percpu counter (Was " KAMEZAWA Hiroyuki
2009-08-20 10:04 ` Ingo Molnar
@ 2009-08-21 3:56 ` KAMEZAWA Hiroyuki
1 sibling, 0 replies; 62+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-08-21 3:56 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Anton Blanchard, Peter Zijlstra, balbir, Bharata B Rao,
KOSAKI Motohiro, Ingo Molnar, mingo, hpa, linux-kernel,
schwidefsky, balajirrao, dhaval, tglx, akpm
On Thu, 20 Aug 2009 17:41:23 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
> include/linux/percpu_counter.h | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> Index: linux-2.6.31-rc6/include/linux/percpu_counter.h
> ===================================================================
> --- linux-2.6.31-rc6.orig/include/linux/percpu_counter.h 2009-08-20 12:09:27.000000000 +0900
> +++ linux-2.6.31-rc6/include/linux/percpu_counter.h 2009-08-20 17:31:13.000000000 +0900
> @@ -14,14 +14,24 @@
> #include <linux/types.h>
>
> #ifdef CONFIG_SMP
> +struct __percpu_counter_padding {
> + char x[0];
> +} ____cacheline_internodealigned_in_smp;
above alignement is too big...plz ignore this patch :(
-Kame
> +#define CACHELINE_PADDING(name) struct __percpu_counter_padding name
>
> struct percpu_counter {
> + /*
> + * This pointer is persistent and accessed firstly.
> + * Then, should not be purged by locking in other cpus.
> + */
> + s32 *counters;
> + CACHELINE_PADDING(pad);
> spinlock_t lock;
> s64 count;
> #ifdef CONFIG_HOTPLUG_CPU
> + /* rarely accessed field */
> struct list_head list; /* All percpu_counters are on a list */
> #endif
> - s32 *counters;
> };
>
> extern int percpu_counter_batch;
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-08-20 6:24 ` Anton Blanchard
2009-08-20 8:41 ` [PATCH] better align percpu counter (Was " KAMEZAWA Hiroyuki
2009-08-20 11:46 ` Balbir Singh
@ 2009-08-21 6:21 ` KAMEZAWA Hiroyuki
2 siblings, 0 replies; 62+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-08-21 6:21 UTC (permalink / raw)
To: Anton Blanchard
Cc: Peter Zijlstra, balbir, Bharata B Rao, KOSAKI Motohiro,
Ingo Molnar, mingo, hpa, linux-kernel, schwidefsky, balajirrao,
dhaval, tglx, akpm
On Thu, 20 Aug 2009 16:24:51 +1000
Anton Blanchard <anton@samba.org> wrote:
>
> Hi,
>
> > Could you share contex-switch-test program ?
> > I'd like to play with it to find out what I can do against percpu counter.
>
> Sure:
>
> http://ozlabs.org/~anton/junkcode/context_switch.c
>
> Very simple, just run it once per core:
>
> for i in `seq 0 31`
> do
> taskset -c $i ./context_switch &
> done
>
> Then look at the context switch rates in vmstat.
>
Ok, I reproduced it on ia64 box (ia64 supports VIRT_CPU_ACCOUNTING)
On 8cpu
-- config=on--
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
11 0 0 7185152 66048 284416 0 0 0 0 5918 12354 0 100 0 0 0
--config=off--
11 0 0 7194496 66304 283392 0 0 0 20 8023 1424022 32 68 0 0 0
Wow, yes. 100x regression.
I'll dig this more.
IIUC, If CONFIG_VIRT_CPU_ACCOUNTING=on, process's stime/utime records "nanosecond".
Then, batch=32 is too small. How about once per msec ?
Thanks,
-Kame
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] better align percpu counter (Was Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-08-21 2:20 ` KAMEZAWA Hiroyuki
@ 2009-08-21 11:29 ` Ingo Molnar
0 siblings, 0 replies; 62+ messages in thread
From: Ingo Molnar @ 2009-08-21 11:29 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Anton Blanchard, Peter Zijlstra, balbir, Bharata B Rao,
KOSAKI Motohiro, mingo, hpa, linux-kernel, schwidefsky,
balajirrao, dhaval, tglx, akpm
* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Thu, 20 Aug 2009 12:04:03 +0200
> Ingo Molnar <mingo@elte.hu> wrote:
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > > with your program
> > > before patch.
> > > cpuacct off : 414000-416000 ctsw per sec.
> > > cpuacct on : 401000-404000 ctsw per sec.
> > >
> > > after patch
> > > cpuacct on : 412000-413000 ctsw per sec.
> > >
> > > Maybe I should check cache-miss late ;)
> >
> > Btw., in latest upstream you can do that via:
> >
> > cd tools/perf/
> > make -j install
> >
> > perf stat --repeat 5 -- taskset -c 1 ./context_switch
> >
>
> tried. (on 8cpu/2socket host). It seems cache-miss decreases. But
> IPC ..?
All the numbers have gone down - about the same amount of cycles but
fewer instructions executed, and fewer cache-misses. That's good.
The Instructions Per Cycle metric got worse because cycles stayed
constant. One thing is that you have triggered counter-over-commit
(the 'scaled from' messages) - this means that more counters are
used than the hardware has space for - so we round-robin schedule
them.
If you want to get to the bottom of that, to get the most precise
result try something like:
perf stat --repeat 5 -a -e \
cycles,instructions,L1-dcache-load-misses,L1-dcache-store-misses \
-- ./ctxt_sw.sh
( this is almost the same as the command line you used, but without
the 'cache-misses' counter. Your CPU should be able to
simultaneously activate all these counters and they should count
100% of the events. )
Ingo
^ permalink raw reply [flat|nested] 62+ messages in thread
* [PATCH] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2009-08-20 5:10 ` Anton Blanchard
2009-08-20 5:16 ` KAMEZAWA Hiroyuki
2009-08-20 5:26 ` Balbir Singh
@ 2010-01-18 4:41 ` Anton Blanchard
2010-01-18 4:55 ` KOSAKI Motohiro
` (5 more replies)
2 siblings, 6 replies; 62+ messages in thread
From: Anton Blanchard @ 2010-01-18 4:41 UTC (permalink / raw)
To: Bharata B Rao
Cc: KOSAKI Motohiro, Ingo Molnar, Balbir Singh, mingo, hpa,
linux-kernel, a.p.zijlstra, schwidefsky, balajirrao, dhaval,
tglx, kamezawa.hiroyu, akpm, Tony Luck, Fenghua Yu,
Heiko Carstens, linux390
Hi,
Another try at this percpu_counter batch issue with CONFIG_VIRT_CPU_ACCOUNTING
and CONFIG_CGROUP_CPUACCT enabled. Thoughts?
--
When CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_CGROUP_CPUACCT are enabled we can
call cpuacct_update_stats with values much larger than percpu_counter_batch.
This means the call to percpu_counter_add will always add to the global count
which is protected by a spinlock and we end up with a global spinlock in
the scheduler.
Based on an idea by KOSAKI Motohiro, this patch scales the batch value by
cputime_one_jiffy such that we have the same batch limit as we would
if CONFIG_VIRT_CPU_ACCOUNTING was disabled. His patch did this once at boot
but that initialisation happened too early on PowerPC (before time_init)
and it was never updated at runtime as a result of a hotplug cpu add/remove.
This patch instead scales percpu_counter_batch by cputime_one_jiffy at
runtime, which keeps the batch correct even after cpu hotplug operations.
We cap it at INT_MAX in case of overflow.
For architectures that do not support CONFIG_VIRT_CPU_ACCOUNTING,
cputime_one_jiffy is the constant 1 and gcc is smart enough to
optimise min(s32 percpu_counter_batch, INT_MAX) to just percpu_counter_batch
at least on x86 and PowerPC. So there is no need to add an #ifdef.
On a 64 thread PowerPC box with CONFIG_VIRT_CPU_ACCOUNTING and
CONFIG_CGROUP_CPUACCT enabled, a context switch microbenchmark is 234x faster
and almost matches a CONFIG_CGROUP_CPUACCT disabled kernel:
CONFIG_CGROUP_CPUACCT disabled: 16906698 ctx switches/sec
CONFIG_CGROUP_CPUACCT enabled: 61720 ctx switches/sec
CONFIG_CGROUP_CPUACCT + patch: 16663217 ctx switches/sec
Tested with:
wget http://ozlabs.org/~anton/junkcode/context_switch.c
make context_switch
for i in `seq 0 63`; do taskset -c $i ./context_switch & done
vmstat 1
Signed-off-by: Anton Blanchard <anton@samba.org>
---
Note: ccing ia64 and s390 who have not yet added code to statically
initialise cputime_one_jiffy at boot.
See a42548a18866e87092db93b771e6c5b060d78401 (cputime: Optimize
jiffies_to_cputime(1) for details). Adding this would help optimise not only
this patch but many other areas of the scheduler when
CONFIG_VIRT_CPU_ACCOUNTING is enabled.
Index: linux.trees.git/kernel/sched.c
===================================================================
--- linux.trees.git.orig/kernel/sched.c 2010-01-18 14:27:12.000000000 +1100
+++ linux.trees.git/kernel/sched.c 2010-01-18 15:21:59.000000000 +1100
@@ -10894,6 +10894,7 @@ static void cpuacct_update_stats(struct
enum cpuacct_stat_index idx, cputime_t val)
{
struct cpuacct *ca;
+ int batch;
if (unlikely(!cpuacct_subsys.active))
return;
@@ -10901,8 +10902,9 @@ static void cpuacct_update_stats(struct
rcu_read_lock();
ca = task_ca(tsk);
+ batch = min_t(long, percpu_counter_batch * cputime_one_jiffy, INT_MAX);
do {
- percpu_counter_add(&ca->cpustat[idx], val);
+ __percpu_counter_add(&ca->cpustat[idx], val, batch);
ca = ca->parent;
} while (ca);
rcu_read_unlock();
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2010-01-18 4:41 ` [PATCH] " Anton Blanchard
@ 2010-01-18 4:55 ` KOSAKI Motohiro
2010-01-18 7:51 ` Peter Zijlstra
` (4 subsequent siblings)
5 siblings, 0 replies; 62+ messages in thread
From: KOSAKI Motohiro @ 2010-01-18 4:55 UTC (permalink / raw)
To: Anton Blanchard
Cc: kosaki.motohiro, Bharata B Rao, Ingo Molnar, Balbir Singh, mingo,
hpa, linux-kernel, a.p.zijlstra, schwidefsky, balajirrao, dhaval,
tglx, kamezawa.hiroyu, akpm, Tony Luck, Fenghua Yu,
Heiko Carstens, linux390
>
> Hi,
>
> Another try at this percpu_counter batch issue with CONFIG_VIRT_CPU_ACCOUNTING
> and CONFIG_CGROUP_CPUACCT enabled. Thoughts?
>
> --
>
> When CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_CGROUP_CPUACCT are enabled we can
> call cpuacct_update_stats with values much larger than percpu_counter_batch.
> This means the call to percpu_counter_add will always add to the global count
> which is protected by a spinlock and we end up with a global spinlock in
> the scheduler.
>
> Based on an idea by KOSAKI Motohiro, this patch scales the batch value by
> cputime_one_jiffy such that we have the same batch limit as we would
> if CONFIG_VIRT_CPU_ACCOUNTING was disabled. His patch did this once at boot
> but that initialisation happened too early on PowerPC (before time_init)
> and it was never updated at runtime as a result of a hotplug cpu add/remove.
>
> This patch instead scales percpu_counter_batch by cputime_one_jiffy at
> runtime, which keeps the batch correct even after cpu hotplug operations.
> We cap it at INT_MAX in case of overflow.
>
> For architectures that do not support CONFIG_VIRT_CPU_ACCOUNTING,
> cputime_one_jiffy is the constant 1 and gcc is smart enough to
> optimise min(s32 percpu_counter_batch, INT_MAX) to just percpu_counter_batch
> at least on x86 and PowerPC. So there is no need to add an #ifdef.
>
> On a 64 thread PowerPC box with CONFIG_VIRT_CPU_ACCOUNTING and
> CONFIG_CGROUP_CPUACCT enabled, a context switch microbenchmark is 234x faster
> and almost matches a CONFIG_CGROUP_CPUACCT disabled kernel:
>
> CONFIG_CGROUP_CPUACCT disabled: 16906698 ctx switches/sec
> CONFIG_CGROUP_CPUACCT enabled: 61720 ctx switches/sec
> CONFIG_CGROUP_CPUACCT + patch: 16663217 ctx switches/sec
>
> Tested with:
>
> wget http://ozlabs.org/~anton/junkcode/context_switch.c
> make context_switch
> for i in `seq 0 63`; do taskset -c $i ./context_switch & done
> vmstat 1
>
> Signed-off-by: Anton Blanchard <anton@samba.org>
Looks fine to me. Thanks Anton!
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
>
> Note: ccing ia64 and s390 who have not yet added code to statically
> initialise cputime_one_jiffy at boot.
> See a42548a18866e87092db93b771e6c5b060d78401 (cputime: Optimize
> jiffies_to_cputime(1) for details). Adding this would help optimise not only
> this patch but many other areas of the scheduler when
> CONFIG_VIRT_CPU_ACCOUNTING is enabled.
>
> Index: linux.trees.git/kernel/sched.c
> ===================================================================
> --- linux.trees.git.orig/kernel/sched.c 2010-01-18 14:27:12.000000000 +1100
> +++ linux.trees.git/kernel/sched.c 2010-01-18 15:21:59.000000000 +1100
> @@ -10894,6 +10894,7 @@ static void cpuacct_update_stats(struct
> enum cpuacct_stat_index idx, cputime_t val)
> {
> struct cpuacct *ca;
> + int batch;
>
> if (unlikely(!cpuacct_subsys.active))
> return;
> @@ -10901,8 +10902,9 @@ static void cpuacct_update_stats(struct
> rcu_read_lock();
> ca = task_ca(tsk);
>
> + batch = min_t(long, percpu_counter_batch * cputime_one_jiffy, INT_MAX);
> do {
> - percpu_counter_add(&ca->cpustat[idx], val);
> + __percpu_counter_add(&ca->cpustat[idx], val, batch);
> ca = ca->parent;
> } while (ca);
> rcu_read_unlock();
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2010-01-18 4:41 ` [PATCH] " Anton Blanchard
2010-01-18 4:55 ` KOSAKI Motohiro
@ 2010-01-18 7:51 ` Peter Zijlstra
2010-01-18 8:21 ` Anton Blanchard
2010-01-18 8:35 ` Balbir Singh
` (3 subsequent siblings)
5 siblings, 1 reply; 62+ messages in thread
From: Peter Zijlstra @ 2010-01-18 7:51 UTC (permalink / raw)
To: Anton Blanchard
Cc: Bharata B Rao, KOSAKI Motohiro, Ingo Molnar, Balbir Singh, mingo,
hpa, linux-kernel, schwidefsky, balajirrao, dhaval, tglx,
kamezawa.hiroyu, akpm, Tony Luck, Fenghua Yu, Heiko Carstens,
linux390
On Mon, 2010-01-18 at 15:41 +1100, Anton Blanchard wrote:
> Hi,
>
> Another try at this percpu_counter batch issue with CONFIG_VIRT_CPU_ACCOUNTING
> and CONFIG_CGROUP_CPUACCT enabled. Thoughts?
Seems like a good idea, but isn't that batch number rather static? If
so, computing it in some init path would save that multiply on the
actual accounting path.
>
> Index: linux.trees.git/kernel/sched.c
> ===================================================================
> --- linux.trees.git.orig/kernel/sched.c 2010-01-18 14:27:12.000000000 +1100
> +++ linux.trees.git/kernel/sched.c 2010-01-18 15:21:59.000000000 +1100
> @@ -10894,6 +10894,7 @@ static void cpuacct_update_stats(struct
> enum cpuacct_stat_index idx, cputime_t val)
> {
> struct cpuacct *ca;
> + int batch;
>
> if (unlikely(!cpuacct_subsys.active))
> return;
> @@ -10901,8 +10902,9 @@ static void cpuacct_update_stats(struct
> rcu_read_lock();
> ca = task_ca(tsk);
>
> + batch = min_t(long, percpu_counter_batch * cputime_one_jiffy, INT_MAX);
> do {
> - percpu_counter_add(&ca->cpustat[idx], val);
> + __percpu_counter_add(&ca->cpustat[idx], val, batch);
> ca = ca->parent;
> } while (ca);
> rcu_read_unlock();
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2010-01-18 7:51 ` Peter Zijlstra
@ 2010-01-18 8:21 ` Anton Blanchard
2010-01-18 8:34 ` Peter Zijlstra
0 siblings, 1 reply; 62+ messages in thread
From: Anton Blanchard @ 2010-01-18 8:21 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Bharata B Rao, KOSAKI Motohiro, Ingo Molnar, Balbir Singh, mingo,
hpa, linux-kernel, schwidefsky, balajirrao, dhaval, tglx,
kamezawa.hiroyu, akpm, Tony Luck, Fenghua Yu, Heiko Carstens,
linux390
Hi Peter,
> Seems like a good idea, but isn't that batch number rather static? If
> so, computing it in some init path would save that multiply on the
> actual accounting path.
It is mostly static, but percpu_counter_batch does change with hotplug
operations. Adding a hotplug notifier felt like a lot of work but I was
worried we have issues if we didn't scale with hotplug add operations.
Anton
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2010-01-18 8:21 ` Anton Blanchard
@ 2010-01-18 8:34 ` Peter Zijlstra
0 siblings, 0 replies; 62+ messages in thread
From: Peter Zijlstra @ 2010-01-18 8:34 UTC (permalink / raw)
To: Anton Blanchard
Cc: Bharata B Rao, KOSAKI Motohiro, Ingo Molnar, Balbir Singh, mingo,
hpa, linux-kernel, schwidefsky, balajirrao, dhaval, tglx,
kamezawa.hiroyu, akpm, Tony Luck, Fenghua Yu, Heiko Carstens,
linux390
On Mon, 2010-01-18 at 19:21 +1100, Anton Blanchard wrote:
> Hi Peter,
>
> > Seems like a good idea, but isn't that batch number rather static? If
> > so, computing it in some init path would save that multiply on the
> > actual accounting path.
>
> It is mostly static, but percpu_counter_batch does change with hotplug
> operations. Adding a hotplug notifier felt like a lot of work but I was
> worried we have issues if we didn't scale with hotplug add operations.
OK, trading a notifier for a mult might be worth it, we'll see if
someone complains ;-)
Lets got with this for now.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2010-01-18 4:41 ` [PATCH] " Anton Blanchard
2010-01-18 4:55 ` KOSAKI Motohiro
2010-01-18 7:51 ` Peter Zijlstra
@ 2010-01-18 8:35 ` Balbir Singh
2010-01-18 9:42 ` Martin Schwidefsky
` (2 subsequent siblings)
5 siblings, 0 replies; 62+ messages in thread
From: Balbir Singh @ 2010-01-18 8:35 UTC (permalink / raw)
To: Anton Blanchard
Cc: Bharata B Rao, KOSAKI Motohiro, Ingo Molnar, mingo, hpa,
linux-kernel, a.p.zijlstra, schwidefsky, balajirrao, dhaval,
tglx, kamezawa.hiroyu, akpm, Tony Luck, Fenghua Yu,
Heiko Carstens, linux390
On Monday 18 January 2010 10:11 AM, Anton Blanchard wrote:
>
> Hi,
>
> Another try at this percpu_counter batch issue with CONFIG_VIRT_CPU_ACCOUNTING
> and CONFIG_CGROUP_CPUACCT enabled. Thoughts?
>
> --
>
> When CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_CGROUP_CPUACCT are enabled we can
> call cpuacct_update_stats with values much larger than percpu_counter_batch.
> This means the call to percpu_counter_add will always add to the global count
> which is protected by a spinlock and we end up with a global spinlock in
> the scheduler.
>
> Based on an idea by KOSAKI Motohiro, this patch scales the batch value by
> cputime_one_jiffy such that we have the same batch limit as we would
> if CONFIG_VIRT_CPU_ACCOUNTING was disabled. His patch did this once at boot
> but that initialisation happened too early on PowerPC (before time_init)
> and it was never updated at runtime as a result of a hotplug cpu add/remove.
>
> This patch instead scales percpu_counter_batch by cputime_one_jiffy at
> runtime, which keeps the batch correct even after cpu hotplug operations.
> We cap it at INT_MAX in case of overflow.
>
> For architectures that do not support CONFIG_VIRT_CPU_ACCOUNTING,
> cputime_one_jiffy is the constant 1 and gcc is smart enough to
> optimise min(s32 percpu_counter_batch, INT_MAX) to just percpu_counter_batch
> at least on x86 and PowerPC. So there is no need to add an #ifdef.
>
> On a 64 thread PowerPC box with CONFIG_VIRT_CPU_ACCOUNTING and
> CONFIG_CGROUP_CPUACCT enabled, a context switch microbenchmark is 234x faster
> and almost matches a CONFIG_CGROUP_CPUACCT disabled kernel:
>
> CONFIG_CGROUP_CPUACCT disabled: 16906698 ctx switches/sec
> CONFIG_CGROUP_CPUACCT enabled: 61720 ctx switches/sec
> CONFIG_CGROUP_CPUACCT + patch: 16663217 ctx switches/sec
>
> Tested with:
>
> wget http://ozlabs.org/~anton/junkcode/context_switch.c
> make context_switch
> for i in `seq 0 63`; do taskset -c $i ./context_switch & done
> vmstat 1
>
> Signed-off-by: Anton Blanchard <anton@samba.org>
> ---
>
> Note: ccing ia64 and s390 who have not yet added code to statically
> initialise cputime_one_jiffy at boot.
> See a42548a18866e87092db93b771e6c5b060d78401 (cputime: Optimize
> jiffies_to_cputime(1) for details). Adding this would help optimise not only
> this patch but many other areas of the scheduler when
> CONFIG_VIRT_CPU_ACCOUNTING is enabled.
>
> Index: linux.trees.git/kernel/sched.c
> ===================================================================
> --- linux.trees.git.orig/kernel/sched.c 2010-01-18 14:27:12.000000000 +1100
> +++ linux.trees.git/kernel/sched.c 2010-01-18 15:21:59.000000000 +1100
> @@ -10894,6 +10894,7 @@ static void cpuacct_update_stats(struct
> enum cpuacct_stat_index idx, cputime_t val)
> {
> struct cpuacct *ca;
> + int batch;
>
> if (unlikely(!cpuacct_subsys.active))
> return;
> @@ -10901,8 +10902,9 @@ static void cpuacct_update_stats(struct
> rcu_read_lock();
> ca = task_ca(tsk);
>
> + batch = min_t(long, percpu_counter_batch * cputime_one_jiffy, INT_MAX);
> do {
> - percpu_counter_add(&ca->cpustat[idx], val);
> + __percpu_counter_add(&ca->cpustat[idx], val, batch);
> ca = ca->parent;
> } while (ca);
> rcu_read_unlock();
Looks good to me, but I'll test it as well and revert back. I think we
might need to look at the call side where we do the percpu_counter_read().
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Balbir
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2010-01-18 4:41 ` [PATCH] " Anton Blanchard
` (2 preceding siblings ...)
2010-01-18 8:35 ` Balbir Singh
@ 2010-01-18 9:42 ` Martin Schwidefsky
2010-01-18 9:55 ` Anton Blanchard
2010-01-25 23:14 ` Andrew Morton
2010-01-27 13:15 ` [tip:sched/urgent] " tip-bot for Anton Blanchard
5 siblings, 1 reply; 62+ messages in thread
From: Martin Schwidefsky @ 2010-01-18 9:42 UTC (permalink / raw)
To: Anton Blanchard
Cc: Bharata B Rao, KOSAKI Motohiro, Ingo Molnar, Balbir Singh, mingo,
hpa, linux-kernel, a.p.zijlstra, balajirrao, dhaval, tglx,
kamezawa.hiroyu, akpm, Tony Luck, Fenghua Yu, Heiko Carstens,
linux390
Hi Anton,
On Mon, 18 Jan 2010 15:41:42 +1100
Anton Blanchard <anton@samba.org> wrote:
> Note: ccing ia64 and s390 who have not yet added code to statically
> initialise cputime_one_jiffy at boot.
> See a42548a18866e87092db93b771e6c5b060d78401 (cputime: Optimize
> jiffies_to_cputime(1) for details). Adding this would help optimise not only
> this patch but many other areas of the scheduler when
> CONFIG_VIRT_CPU_ACCOUNTING is enabled.
For s390 the jiffies_to_cputime is a compile time constant. No need to
initialize it at runtime, no?
> Index: linux.trees.git/kernel/sched.c
> ===================================================================
> --- linux.trees.git.orig/kernel/sched.c 2010-01-18 14:27:12.000000000 +1100
> +++ linux.trees.git/kernel/sched.c 2010-01-18 15:21:59.000000000 +1100
> @@ -10894,6 +10894,7 @@ static void cpuacct_update_stats(struct
> enum cpuacct_stat_index idx, cputime_t val)
> {
> struct cpuacct *ca;
> + int batch;
>
> if (unlikely(!cpuacct_subsys.active))
> return;
> @@ -10901,8 +10902,9 @@ static void cpuacct_update_stats(struct
> rcu_read_lock();
> ca = task_ca(tsk);
>
> + batch = min_t(long, percpu_counter_batch * cputime_one_jiffy, INT_MAX);
> do {
> - percpu_counter_add(&ca->cpustat[idx], val);
> + __percpu_counter_add(&ca->cpustat[idx], val, batch);
> ca = ca->parent;
> } while (ca);
> rcu_read_unlock();
The patch itself trades some accuracy (larger cpu accounting value that
are stored per-cpu) against runtime overhead (spinlock to transfer the
value to the global variable in __percpu_counter_add). Did you
calculate how big the loss in accuracy is?
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2010-01-18 9:42 ` Martin Schwidefsky
@ 2010-01-18 9:55 ` Anton Blanchard
2010-01-18 11:00 ` Balbir Singh
2010-01-21 15:25 ` Balbir Singh
0 siblings, 2 replies; 62+ messages in thread
From: Anton Blanchard @ 2010-01-18 9:55 UTC (permalink / raw)
To: Martin Schwidefsky
Cc: Bharata B Rao, KOSAKI Motohiro, Ingo Molnar, Balbir Singh, mingo,
hpa, linux-kernel, a.p.zijlstra, balajirrao, dhaval, tglx,
kamezawa.hiroyu, akpm, Tony Luck, Fenghua Yu, Heiko Carstens,
linux390
Hi Martin,
> For s390 the jiffies_to_cputime is a compile time constant. No need to
> initialize it at runtime, no?
Indeed it is, I didn't look closely enough. Same with ia64 so no work to
do on either arch :)
> The patch itself trades some accuracy (larger cpu accounting value that
> are stored per-cpu) against runtime overhead (spinlock to transfer the
> value to the global variable in __percpu_counter_add). Did you
> calculate how big the loss in accuracy is?
The idea is we are already batching percpu_counter_batch jiffies, so
with CONFIG_VIRT_CPU_ACCOUNTING we batch the equivalent amount in
cputime.
Anton
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2010-01-18 9:55 ` Anton Blanchard
@ 2010-01-18 11:00 ` Balbir Singh
2010-01-21 15:25 ` Balbir Singh
1 sibling, 0 replies; 62+ messages in thread
From: Balbir Singh @ 2010-01-18 11:00 UTC (permalink / raw)
To: Anton Blanchard
Cc: Martin Schwidefsky, Bharata B Rao, KOSAKI Motohiro, Ingo Molnar,
mingo, hpa, linux-kernel, a.p.zijlstra, balajirrao, dhaval, tglx,
kamezawa.hiroyu, akpm, Tony Luck, Fenghua Yu, Heiko Carstens,
linux390
On Monday 18 January 2010 03:25 PM, Anton Blanchard wrote:
>
> Hi Martin,
>
>> For s390 the jiffies_to_cputime is a compile time constant. No need to
>> initialize it at runtime, no?
>
> Indeed it is, I didn't look closely enough. Same with ia64 so no work to
> do on either arch :)
>
>> The patch itself trades some accuracy (larger cpu accounting value that
>> are stored per-cpu) against runtime overhead (spinlock to transfer the
>> value to the global variable in __percpu_counter_add). Did you
>> calculate how big the loss in accuracy is?
>
> The idea is we are already batching percpu_counter_batch jiffies, so
> with CONFIG_VIRT_CPU_ACCOUNTING we batch the equivalent amount in
> cputime.
>
This patch worked well for me.
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Tested-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Balbir Singh.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2010-01-18 9:55 ` Anton Blanchard
2010-01-18 11:00 ` Balbir Singh
@ 2010-01-21 15:25 ` Balbir Singh
2010-01-21 15:36 ` Peter Zijlstra
1 sibling, 1 reply; 62+ messages in thread
From: Balbir Singh @ 2010-01-21 15:25 UTC (permalink / raw)
To: Anton Blanchard
Cc: Martin Schwidefsky, Bharata B Rao, KOSAKI Motohiro, Ingo Molnar,
mingo, hpa, linux-kernel, a.p.zijlstra, balajirrao, dhaval, tglx,
kamezawa.hiroyu, akpm, Tony Luck, Fenghua Yu, Heiko Carstens,
linux390
On Monday 18 January 2010 03:25 PM, Anton Blanchard wrote:
>
> Hi Martin,
>
>> For s390 the jiffies_to_cputime is a compile time constant. No need to
>> initialize it at runtime, no?
>
> Indeed it is, I didn't look closely enough. Same with ia64 so no work to
> do on either arch :)
>
>> The patch itself trades some accuracy (larger cpu accounting value that
>> are stored per-cpu) against runtime overhead (spinlock to transfer the
>> value to the global variable in __percpu_counter_add). Did you
>> calculate how big the loss in accuracy is?
>
> The idea is we are already batching percpu_counter_batch jiffies, so
> with CONFIG_VIRT_CPU_ACCOUNTING we batch the equivalent amount in
> cputime.
>
Hi, Peter, Ingo
Could we please pick up the patch for -tip?
--
Three Cheers,
Balbir Singh
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2010-01-21 15:25 ` Balbir Singh
@ 2010-01-21 15:36 ` Peter Zijlstra
0 siblings, 0 replies; 62+ messages in thread
From: Peter Zijlstra @ 2010-01-21 15:36 UTC (permalink / raw)
To: balbir
Cc: Anton Blanchard, Martin Schwidefsky, Bharata B Rao,
KOSAKI Motohiro, Ingo Molnar, mingo, hpa, linux-kernel,
balajirrao, dhaval, tglx, kamezawa.hiroyu, akpm, Tony Luck,
Fenghua Yu, Heiko Carstens, linux390
On Thu, 2010-01-21 at 20:55 +0530, Balbir Singh wrote:
> Could we please pick up the patch for -tip?
OK, will queue it up if Ingo doesn't beat me to it.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2010-01-18 4:41 ` [PATCH] " Anton Blanchard
` (3 preceding siblings ...)
2010-01-18 9:42 ` Martin Schwidefsky
@ 2010-01-25 23:14 ` Andrew Morton
2010-01-25 23:44 ` KOSAKI Motohiro
2010-01-26 6:17 ` Balbir Singh
2010-01-27 13:15 ` [tip:sched/urgent] " tip-bot for Anton Blanchard
5 siblings, 2 replies; 62+ messages in thread
From: Andrew Morton @ 2010-01-25 23:14 UTC (permalink / raw)
To: Anton Blanchard
Cc: Bharata B Rao, KOSAKI Motohiro, Ingo Molnar, Balbir Singh, mingo,
hpa, linux-kernel, a.p.zijlstra, schwidefsky, balajirrao, dhaval,
tglx, kamezawa.hiroyu, Tony Luck, Fenghua Yu, Heiko Carstens,
linux390
On Mon, 18 Jan 2010 15:41:42 +1100
Anton Blanchard <anton@samba.org> wrote:
> When CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_CGROUP_CPUACCT are enabled we can
> call cpuacct_update_stats with values much larger than percpu_counter_batch.
> This means the call to percpu_counter_add will always add to the global count
> which is protected by a spinlock and we end up with a global spinlock in
> the scheduler.
When one looks at the end result:
: static void cpuacct_update_stats(struct task_struct *tsk,
: enum cpuacct_stat_index idx, cputime_t val)
: {
: struct cpuacct *ca;
: int batch;
:
: if (unlikely(!cpuacct_subsys.active))
: return;
:
: rcu_read_lock();
: ca = task_ca(tsk);
:
: batch = min_t(long, percpu_counter_batch * cputime_one_jiffy, INT_MAX);
: do {
: __percpu_counter_add(&ca->cpustat[idx], val, batch);
: ca = ca->parent;
: } while (ca);
: rcu_read_unlock();
: }
the code (which used to be quite obvious) becomes pretty unobvious. In
fact it looks quite wrong.
Shouldn't there be a comment there explaining wtf is going on?
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2010-01-25 23:14 ` Andrew Morton
@ 2010-01-25 23:44 ` KOSAKI Motohiro
2010-01-26 6:17 ` Balbir Singh
1 sibling, 0 replies; 62+ messages in thread
From: KOSAKI Motohiro @ 2010-01-25 23:44 UTC (permalink / raw)
To: Andrew Morton
Cc: kosaki.motohiro, Anton Blanchard, Bharata B Rao, Ingo Molnar,
Balbir Singh, mingo, hpa, linux-kernel, a.p.zijlstra,
schwidefsky, balajirrao, dhaval, tglx, kamezawa.hiroyu,
Tony Luck, Fenghua Yu, Heiko Carstens, linux390
> On Mon, 18 Jan 2010 15:41:42 +1100
> Anton Blanchard <anton@samba.org> wrote:
>
> > When CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_CGROUP_CPUACCT are enabled we can
> > call cpuacct_update_stats with values much larger than percpu_counter_batch.
> > This means the call to percpu_counter_add will always add to the global count
> > which is protected by a spinlock and we end up with a global spinlock in
> > the scheduler.
>
> When one looks at the end result:
We have about 32 jiffies batch both with or without CONFIG_VIRT_CPU_ACCOUNTING.
Then, The enduser can looks some jiffies after.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2010-01-25 23:14 ` Andrew Morton
2010-01-25 23:44 ` KOSAKI Motohiro
@ 2010-01-26 6:17 ` Balbir Singh
2010-01-26 6:35 ` Andrew Morton
1 sibling, 1 reply; 62+ messages in thread
From: Balbir Singh @ 2010-01-26 6:17 UTC (permalink / raw)
To: Andrew Morton
Cc: Anton Blanchard, Bharata B Rao, KOSAKI Motohiro, Ingo Molnar,
mingo, hpa, linux-kernel, a.p.zijlstra, schwidefsky, balajirrao,
dhaval, tglx, kamezawa.hiroyu, Tony Luck, Fenghua Yu,
Heiko Carstens, linux390
On Tuesday 26 January 2010 04:44 AM, Andrew Morton wrote:
> On Mon, 18 Jan 2010 15:41:42 +1100
> Anton Blanchard <anton@samba.org> wrote:
>
>> When CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_CGROUP_CPUACCT are enabled we can
>> call cpuacct_update_stats with values much larger than percpu_counter_batch.
>> This means the call to percpu_counter_add will always add to the global count
>> which is protected by a spinlock and we end up with a global spinlock in
>> the scheduler.
>
> When one looks at the end result:
>
> : static void cpuacct_update_stats(struct task_struct *tsk,
> : enum cpuacct_stat_index idx, cputime_t val)
> : {
> : struct cpuacct *ca;
> : int batch;
> :
> : if (unlikely(!cpuacct_subsys.active))
> : return;
> :
> : rcu_read_lock();
> : ca = task_ca(tsk);
> :
> : batch = min_t(long, percpu_counter_batch * cputime_one_jiffy, INT_MAX);
> : do {
> : __percpu_counter_add(&ca->cpustat[idx], val, batch);
> : ca = ca->parent;
> : } while (ca);
> : rcu_read_unlock();
> : }
>
> the code (which used to be quite obvious) becomes pretty unobvious. In
> fact it looks quite wrong.
>
> Shouldn't there be a comment there explaining wtf is going on?
Andrew,
I guess a lot of the changelog and comments are in the email history,
but your point on the comment is valid. Why does it look quite wrong to you?
cputime_one_jiffy tells us how many cputime_t's we've gotten in one
jiffy. If virtual accounting is enabled, this number is quite large, and
1 if virtual accounting is not enabled. Overall the value is set to 32
for non-virtual accounting enabled systems. On systems that support
virtual accounting, the value is set to 32*cputime_per_jifffy, so the
per cpu counter syncs up roughly once in 32 jiffies assuming
cpuacct_update_stats is called once per jiffy for non-virtual machines.
If the above comment, pleases you I'll polish it and send it across.
Anton, could you please confirm what I've said above is indeed correct.
--
Three Cheers,
Balbir Singh
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2010-01-26 6:17 ` Balbir Singh
@ 2010-01-26 6:35 ` Andrew Morton
2010-01-26 8:07 ` Anton Blanchard
0 siblings, 1 reply; 62+ messages in thread
From: Andrew Morton @ 2010-01-26 6:35 UTC (permalink / raw)
To: balbir
Cc: Anton Blanchard, Bharata B Rao, KOSAKI Motohiro, Ingo Molnar,
mingo, hpa, linux-kernel, a.p.zijlstra, schwidefsky, balajirrao,
dhaval, tglx, kamezawa.hiroyu, Tony Luck, Fenghua Yu,
Heiko Carstens, linux390
On Tue, 26 Jan 2010 11:47:15 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> On Tuesday 26 January 2010 04:44 AM, Andrew Morton wrote:
> > On Mon, 18 Jan 2010 15:41:42 +1100
> > Anton Blanchard <anton@samba.org> wrote:
> >
> >> When CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_CGROUP_CPUACCT are enabled we can
> >> call cpuacct_update_stats with values much larger than percpu_counter_batch.
> >> This means the call to percpu_counter_add will always add to the global count
> >> which is protected by a spinlock and we end up with a global spinlock in
> >> the scheduler.
> >
> > When one looks at the end result:
> >
> > : static void cpuacct_update_stats(struct task_struct *tsk,
> > : enum cpuacct_stat_index idx, cputime_t val)
> > : {
> > : struct cpuacct *ca;
> > : int batch;
> > :
> > : if (unlikely(!cpuacct_subsys.active))
> > : return;
> > :
> > : rcu_read_lock();
> > : ca = task_ca(tsk);
> > :
> > : batch = min_t(long, percpu_counter_batch * cputime_one_jiffy, INT_MAX);
> > : do {
> > : __percpu_counter_add(&ca->cpustat[idx], val, batch);
> > : ca = ca->parent;
> > : } while (ca);
> > : rcu_read_unlock();
> > : }
> >
> > the code (which used to be quite obvious) becomes pretty unobvious. In
> > fact it looks quite wrong.
> >
> > Shouldn't there be a comment there explaining wtf is going on?
>
> Andrew,
>
> I guess a lot of the changelog and comments are in the email history,
Not a very useful location for it!
> Why does it look quite wrong to you?
Because it computes the correct value and then if it's larger than
INT_MAX, it inexplicably assigns INT_MAX to it, giving a wrong result!
Does that code actually work, btw? percpu_counter_batch has type `int'
and cputime_one_jiffy has type `int' so their product has type `int'.
So by the time min_t performs its comparison, the upper 32 bits of the
product are already lost.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2010-01-26 6:35 ` Andrew Morton
@ 2010-01-26 8:07 ` Anton Blanchard
0 siblings, 0 replies; 62+ messages in thread
From: Anton Blanchard @ 2010-01-26 8:07 UTC (permalink / raw)
To: Andrew Morton
Cc: balbir, Bharata B Rao, KOSAKI Motohiro, Ingo Molnar, mingo, hpa,
linux-kernel, a.p.zijlstra, schwidefsky, balajirrao, dhaval,
tglx, kamezawa.hiroyu, Tony Luck, Fenghua Yu, Heiko Carstens,
linux390
Hi Andrew,
> > I guess a lot of the changelog and comments are in the email history,
>
> Not a very useful location for it!
Good point, I'll work on a useful comment.
> > Why does it look quite wrong to you?
>
> Because it computes the correct value and then if it's larger than
> INT_MAX, it inexplicably assigns INT_MAX to it, giving a wrong result!
>
> Does that code actually work, btw? percpu_counter_batch has type `int'
> and cputime_one_jiffy has type `int' so their product has type `int'.
> So by the time min_t performs its comparison, the upper 32 bits of the
> product are already lost.
On ppc64, s390 and ia64 cputime_one_jiffy is 64bit and I want to prevent us
creating too large a batch value:
void __percpu_counter_add(struct percpu_counter *fbc, s64 amount, s32 batch)
If we pass in something bigger than INT_MAX we could end up with 0 after
truncation which will turn the percpu counter into a single spinlock global
counter.
Anton
^ permalink raw reply [flat|nested] 62+ messages in thread
* [tip:sched/urgent] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2010-01-18 4:41 ` [PATCH] " Anton Blanchard
` (4 preceding siblings ...)
2010-01-25 23:14 ` Andrew Morton
@ 2010-01-27 13:15 ` tip-bot for Anton Blanchard
2010-01-27 13:34 ` Balbir Singh
5 siblings, 1 reply; 62+ messages in thread
From: tip-bot for Anton Blanchard @ 2010-01-27 13:15 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, anton, hpa, mingo, a.p.zijlstra, tglx, mingo
Commit-ID: 43f85eab1411905afe5db510fbf9841b516e7e6a
Gitweb: http://git.kernel.org/tip/43f85eab1411905afe5db510fbf9841b516e7e6a
Author: Anton Blanchard <anton@samba.org>
AuthorDate: Mon, 18 Jan 2010 15:41:42 +1100
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 27 Jan 2010 08:34:38 +0100
sched: cpuacct: Use bigger percpu counter batch values for stats counters
When CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_CGROUP_CPUACCT are enabled we
can call cpuacct_update_stats with values much larger than
percpu_counter_batch. This means the call to percpu_counter_add will
always add to the global count which is protected by a spinlock and we
end up with a global spinlock in the scheduler.
Based on an idea by KOSAKI Motohiro, this patch scales the batch value by
cputime_one_jiffy such that we have the same batch limit as we would if
CONFIG_VIRT_CPU_ACCOUNTING was disabled. His patch did this once at boot
but that initialisation happened too early on PowerPC (before time_init)
and it was never updated at runtime as a result of a hotplug cpu
add/remove.
This patch instead scales percpu_counter_batch by cputime_one_jiffy at
runtime, which keeps the batch correct even after cpu hotplug operations.
We cap it at INT_MAX in case of overflow.
For architectures that do not support CONFIG_VIRT_CPU_ACCOUNTING,
cputime_one_jiffy is the constant 1 and gcc is smart enough to optimise
min(s32 percpu_counter_batch, INT_MAX) to just percpu_counter_batch at
least on x86 and PowerPC. So there is no need to add an #ifdef.
On a 64 thread PowerPC box with CONFIG_VIRT_CPU_ACCOUNTING and
CONFIG_CGROUP_CPUACCT enabled, a context switch microbenchmark is 234x
faster and almost matches a CONFIG_CGROUP_CPUACCT disabled kernel:
CONFIG_CGROUP_CPUACCT disabled: 16906698 ctx switches/sec
CONFIG_CGROUP_CPUACCT enabled: 61720 ctx switches/sec
CONFIG_CGROUP_CPUACCT + patch: 16663217 ctx switches/sec
Tested with:
wget http://ozlabs.org/~anton/junkcode/context_switch.c
make context_switch
for i in `seq 0 63`; do taskset -c $i ./context_switch & done
vmstat 1
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100118044142.GS12666@kryten>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/sched.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/kernel/sched.c b/kernel/sched.c
index 3a8fb30..8f94138 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -10906,6 +10906,7 @@ static void cpuacct_update_stats(struct task_struct *tsk,
enum cpuacct_stat_index idx, cputime_t val)
{
struct cpuacct *ca;
+ int batch;
if (unlikely(!cpuacct_subsys.active))
return;
@@ -10913,8 +10914,9 @@ static void cpuacct_update_stats(struct task_struct *tsk,
rcu_read_lock();
ca = task_ca(tsk);
+ batch = min_t(long, percpu_counter_batch * cputime_one_jiffy, INT_MAX);
do {
- percpu_counter_add(&ca->cpustat[idx], val);
+ __percpu_counter_add(&ca->cpustat[idx], val, batch);
ca = ca->parent;
} while (ca);
rcu_read_unlock();
^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [tip:sched/urgent] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2010-01-27 13:15 ` [tip:sched/urgent] " tip-bot for Anton Blanchard
@ 2010-01-27 13:34 ` Balbir Singh
2010-01-27 21:22 ` Andrew Morton
0 siblings, 1 reply; 62+ messages in thread
From: Balbir Singh @ 2010-01-27 13:34 UTC (permalink / raw)
To: mingo, hpa, anton, linux-kernel, a.p.zijlstra, tglx, mingo, akpm
Cc: linux-tip-commits
On Wed, Jan 27, 2010 at 6:45 PM, tip-bot for Anton Blanchard
<anton@samba.org> wrote:
> Commit-ID: 43f85eb1411905afe5db510fbf9841b516e7e6a
> Gitweb: http://git.kernel.org/tip/43f85eab1411905afe5db510fbf9841b516e7e6a
> Author: Anton Blanchard <anton@samba.org>
> AuthorDate: Mon, 18 Jan 2010 15:41:42 +1100
> Committer: Ingo Molnar <mingo@elte.hu>
> CommitDate: Wed, 27 Jan 2010 08:34:38 +0100
>
> sched: cpuacct: Use bigger percpu counter batch values for stats counters
>
> When CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_CGROUP_CPUACCT are enabled we
> can call cpuacct_update_stats with values much larger than
> percpu_counter_batch. This means the call to percpu_counter_add will
> always add to the global count which is protected by a spinlock and we
> end up with a global spinlock in the scheduler.
>
> Based on an idea by KOSAKI Motohiro, this patch scales the batch value by
> cputime_one_jiffy such that we have the same batch limit as we would if
> CONFIG_VIRT_CPU_ACCOUNTING was disabled. His patch did this once at boot
> but that initialisation happened too early on PowerPC (before time_init)
> and it was never updated at runtime as a result of a hotplug cpu
> add/remove.
>
> This patch instead scales percpu_counter_batch by cputime_one_jiffy at
> runtime, which keeps the batch correct even after cpu hotplug operations.
> We cap it at INT_MAX in case of overflow.
>
> For architectures that do not support CONFIG_VIRT_CPU_ACCOUNTING,
> cputime_one_jiffy is the constant 1 and gcc is smart enough to optimise
> min(s32 percpu_counter_batch, INT_MAX) to just percpu_counter_batch at
> least on x86 and PowerPC. So there is no need to add an #ifdef.
>
> On a 64 thread PowerPC box with CONFIG_VIRT_CPU_ACCOUNTING and
> CONFIG_CGROUP_CPUACCT enabled, a context switch microbenchmark is 234x
> faster and almost matches a CONFIG_CGROUP_CPUACCT disabled kernel:
>
> CONFIG_CGROUP_CPUACCT disabled: 16906698 ctx switches/sec
> CONFIG_CGROUP_CPUACCT enabled: 61720 ctx switches/sec
> CONFIG_CGROUP_CPUACCT + patch: 16663217 ctx switches/sec
>
> Tested with:
>
> wget http://ozlabs.org/~anton/junkcode/context_switch.c
> make context_switch
> for i in `seq 0 63`; do taskset -c $i ./context_switch & done
> vmstat 1
>
> Signed-off-by: Anton Blanchard <anton@samba.org>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> LKML-Reference: <20100118044142.GS12666@kryten>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
> kernel/sched.c | 4 +++-
> 1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 3a8fb30..8f94138 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -10906,6 +10906,7 @@ static void cpuacct_update_stats(struct task_struct *tsk,
> enum cpuacct_stat_index idx, cputime_t val)
> {
> struct cpuacct *ca;
> + int batch;
>
> if (unlikely(!cpuacct_subsys.active))
> return;
> @@ -10913,8 +10914,9 @@ static void cpuacct_update_stats(struct task_struct *tsk,
> rcu_read_lock();
> ca = task_ca(tsk);
>
> + batch = min_t(long, percpu_counter_batch * cputime_one_jiffy, INT_MAX);
> do {
> - percpu_counter_add(&ca->cpustat[idx], val);
> + __percpu_counter_add(&ca->cpustat[idx], val, batch);
> ca = ca->parent;
> } while (ca);
> rcu_read_unlock();
IIRC, Andrew picked up this patch as well and applied some checkpatch
fixes too..
Balbir
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [tip:sched/urgent] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2010-01-27 13:34 ` Balbir Singh
@ 2010-01-27 21:22 ` Andrew Morton
2010-01-27 21:43 ` Peter Zijlstra
2010-01-28 5:21 ` [tip:sched/urgent] " Balbir Singh
0 siblings, 2 replies; 62+ messages in thread
From: Andrew Morton @ 2010-01-27 21:22 UTC (permalink / raw)
To: Balbir Singh
Cc: mingo, hpa, anton, linux-kernel, a.p.zijlstra, tglx, mingo,
linux-tip-commits
On Wed, 27 Jan 2010 19:04:55 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> On Wed, Jan 27, 2010 at 6:45 PM, tip-bot for Anton Blanchard
> <anton@samba.org> wrote:
> > Commit-ID: __43f85eb1411905afe5db510fbf9841b516e7e6a
> > Gitweb: __ __ http://git.kernel.org/tip/43f85eab1411905afe5db510fbf9841b516e7e6a
> > Author: __ __ Anton Blanchard <anton@samba.org>
> > AuthorDate: Mon, 18 Jan 2010 15:41:42 +1100
> > Committer: __Ingo Molnar <mingo@elte.hu>
> > CommitDate: Wed, 27 Jan 2010 08:34:38 +0100
> >
> > sched: cpuacct: Use bigger percpu counter batch values for stats counters
> >
> > When CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_CGROUP_CPUACCT are enabled we
> > can call cpuacct_update_stats with values much larger than
> > percpu_counter_batch. This means the call to percpu_counter_add will
> > always add to the global count which is protected by a spinlock and we
> > end up with a global spinlock in the scheduler.
> >
> > Based on an idea by KOSAKI Motohiro, this patch scales the batch value by
> > cputime_one_jiffy such that we have the same batch limit as we would if
> > CONFIG_VIRT_CPU_ACCOUNTING was disabled. His patch did this once at boot
> > but that initialisation happened too early on PowerPC (before time_init)
> > and it was never updated at runtime as a result of a hotplug cpu
> > add/remove.
> >
> > This patch instead scales percpu_counter_batch by cputime_one_jiffy at
> > runtime, which keeps the batch correct even after cpu hotplug operations.
> > We cap it at INT_MAX in case of overflow.
> >
> > For architectures that do not support CONFIG_VIRT_CPU_ACCOUNTING,
> > cputime_one_jiffy is the constant 1 and gcc is smart enough to optimise
> > min(s32 percpu_counter_batch, INT_MAX) to just percpu_counter_batch at
> > least on x86 and PowerPC. So there is no need to add an #ifdef.
> >
> > On a 64 thread PowerPC box with CONFIG_VIRT_CPU_ACCOUNTING and
> > CONFIG_CGROUP_CPUACCT enabled, a context switch microbenchmark is 234x
> > faster and almost matches a CONFIG_CGROUP_CPUACCT disabled kernel:
> >
> > CONFIG_CGROUP_CPUACCT disabled: __ __ __ __ 16906698 ctx switches/sec
> > CONFIG_CGROUP_CPUACCT enabled: __ __ __ __ __ __ 61720 ctx switches/sec
> > CONFIG_CGROUP_CPUACCT + patch: __ __ __ __ __16663217 ctx switches/sec
> >
> > Tested with:
> >
> > __wget http://ozlabs.org/~anton/junkcode/context_switch.c
> > __make context_switch
> > __for i in `seq 0 63`; do taskset -c $i ./context_switch & done
> > __vmstat 1
> >
> > Signed-off-by: Anton Blanchard <anton@samba.org>
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > LKML-Reference: <20100118044142.GS12666@kryten>
> > Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > ---
> > __kernel/sched.c | __ __4 +++-
> > __1 files changed, 3 insertions(+), 1 deletions(-)
> >
> > diff --git a/kernel/sched.c b/kernel/sched.c
> > index 3a8fb30..8f94138 100644
> > --- a/kernel/sched.c
> > +++ b/kernel/sched.c
> > @@ -10906,6 +10906,7 @@ static void cpuacct_update_stats(struct task_struct *tsk,
> > __ __ __ __ __ __ __ __enum cpuacct_stat_index idx, cputime_t val)
> > __{
> > __ __ __ __struct cpuacct *ca;
> > + __ __ __ int batch;
> >
> > __ __ __ __if (unlikely(!cpuacct_subsys.active))
> > __ __ __ __ __ __ __ __return;
> > @@ -10913,8 +10914,9 @@ static void cpuacct_update_stats(struct task_struct *tsk,
> > __ __ __ __rcu_read_lock();
> > __ __ __ __ca = task_ca(tsk);
> >
> > + __ __ __ batch = min_t(long, percpu_counter_batch * cputime_one_jiffy, INT_MAX);
> > __ __ __ __do {
> > - __ __ __ __ __ __ __ percpu_counter_add(&ca->cpustat[idx], val);
> > + __ __ __ __ __ __ __ __percpu_counter_add(&ca->cpustat[idx], val, batch);
> > __ __ __ __ __ __ __ __ca = ca->parent;
> > __ __ __ __} while (ca);
> > __ __ __ __rcu_read_unlock();
^^ your email client inexplicably fills emails with 0xa0
> IIRC, Andrew picked up this patch as well and applied some checkpatch
> fixes too..
No, I have no changes.
Last I heard, Anton was "working on a useful comment" and will be
redoing the patch.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [tip:sched/urgent] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2010-01-27 21:22 ` Andrew Morton
@ 2010-01-27 21:43 ` Peter Zijlstra
2010-01-28 12:48 ` [PATCH 1/2] percpu_counter: Make __percpu_counter_add an inline function on UP Anton Blanchard
2010-01-28 5:21 ` [tip:sched/urgent] " Balbir Singh
1 sibling, 1 reply; 62+ messages in thread
From: Peter Zijlstra @ 2010-01-27 21:43 UTC (permalink / raw)
To: Andrew Morton
Cc: Balbir Singh, mingo, hpa, anton, linux-kernel, tglx, mingo,
linux-tip-commits
On Wed, 2010-01-27 at 13:22 -0800, Andrew Morton wrote:
> Last I heard, Anton was "working on a useful comment" and will be
> redoing the patch.
If Anton is working on a new version, would Anton then also care to post
one that compiles on UP? :-)
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [tip:sched/urgent] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2010-01-27 21:22 ` Andrew Morton
2010-01-27 21:43 ` Peter Zijlstra
@ 2010-01-28 5:21 ` Balbir Singh
1 sibling, 0 replies; 62+ messages in thread
From: Balbir Singh @ 2010-01-28 5:21 UTC (permalink / raw)
To: Andrew Morton
Cc: mingo, hpa, anton, linux-kernel, a.p.zijlstra, tglx, mingo,
linux-tip-commits
>> > __ __ __ __ __ __ __ __ca = ca->parent;
>> > __ __ __ __} while (ca);
>> > __ __ __ __rcu_read_unlock();
>
> ^^ your email client inexplicably fills emails with 0xa0
>
I am going to be changing my email client soon, back to the good old
text client.
>> IIRC, Andrew picked up this patch as well and applied some checkpatch
>> fixes too..
>
> No, I have no changes.
>
Sorry, I was confused, the changes you had were for getdelays.c
> Last I heard, Anton was "working on a useful comment" and will be
> redoing the patch.
Yep, lets wait on him.
Three Cheers,
Balbir
^ permalink raw reply [flat|nested] 62+ messages in thread
* [PATCH 1/2] percpu_counter: Make __percpu_counter_add an inline function on UP
2010-01-27 21:43 ` Peter Zijlstra
@ 2010-01-28 12:48 ` Anton Blanchard
2010-01-28 12:52 ` [PATCH 2/2] sched: cpuacct: Use bigger percpu counter batch values for stats counters Anton Blanchard
0 siblings, 1 reply; 62+ messages in thread
From: Anton Blanchard @ 2010-01-28 12:48 UTC (permalink / raw)
To: Peter Zijlstra, Andrew Morton
Cc: Balbir Singh, mingo, hpa, linux-kernel, tglx, mingo, linux-tip-commits
Even though batch isn't used on UP, we may want to pass one in to keep the
SMP and UP code paths similar. Convert __percpu_counter_add to an inline
function so we wont get variable unused warnings if we do.
Signed-off-by: Anton Blanchard <anton@samba.org>
---
diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h
index a7684a5..794662b 100644
--- a/include/linux/percpu_counter.h
+++ b/include/linux/percpu_counter.h
@@ -98,9 +98,6 @@ static inline void percpu_counter_set(struct percpu_counter *fbc, s64 amount)
fbc->count = amount;
}
-#define __percpu_counter_add(fbc, amount, batch) \
- percpu_counter_add(fbc, amount)
-
static inline void
percpu_counter_add(struct percpu_counter *fbc, s64 amount)
{
@@ -109,6 +106,12 @@ percpu_counter_add(struct percpu_counter *fbc, s64 amount)
preempt_enable();
}
+static inline void
+__percpu_counter_add(struct percpu_counter *fbc, s64 amount, s32 batch)
+{
+ percpu_counter_add(fbc, amount);
+}
+
static inline s64 percpu_counter_read(struct percpu_counter *fbc)
{
return fbc->count;
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH 2/2] sched: cpuacct: Use bigger percpu counter batch values for stats counters
2010-01-28 12:48 ` [PATCH 1/2] percpu_counter: Make __percpu_counter_add an inline function on UP Anton Blanchard
@ 2010-01-28 12:52 ` Anton Blanchard
0 siblings, 0 replies; 62+ messages in thread
From: Anton Blanchard @ 2010-01-28 12:52 UTC (permalink / raw)
To: Peter Zijlstra, Andrew Morton
Cc: Balbir Singh, mingo, hpa, linux-kernel, tglx, mingo, linux-tip-commits
When CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_CGROUP_CPUACCT are enabled we can
call cpuacct_update_stats with values much larger than percpu_counter_batch.
This means the call to percpu_counter_add will always add to the global count
which is protected by a spinlock and we end up with a global spinlock in
the scheduler.
Based on an idea by KOSAKI Motohiro, this patch scales the batch value by
cputime_one_jiffy such that we have the same batch limit as we would
if CONFIG_VIRT_CPU_ACCOUNTING was disabled. His patch did this once at boot
but that initialisation happened too early on PowerPC (before time_init)
and it was never updated at runtime as a result of a hotplug cpu add/remove.
This patch instead scales percpu_counter_batch by cputime_one_jiffy at
runtime, which keeps the batch correct even after cpu hotplug operations.
We cap it at INT_MAX in case of overflow.
For architectures that do not support CONFIG_VIRT_CPU_ACCOUNTING,
cputime_one_jiffy is the constant 1 and gcc is smart enough to
optimise min(s32 percpu_counter_batch, INT_MAX) to just percpu_counter_batch
at least on x86 and PowerPC. So there is no need to add an #ifdef.
On a 64 thread PowerPC box with CONFIG_VIRT_CPU_ACCOUNTING and
CONFIG_CGROUP_CPUACCT enabled, a context switch microbenchmark is 234x faster
and almost matches a CONFIG_CGROUP_CPUACCT disabled kernel:
CONFIG_CGROUP_CPUACCT disabled: 16906698 ctx switches/sec
CONFIG_CGROUP_CPUACCT enabled: 61720 ctx switches/sec
CONFIG_CGROUP_CPUACCT + patch: 16663217 ctx switches/sec
Tested with:
wget http://ozlabs.org/~anton/junkcode/context_switch.c
make context_switch
for i in `seq 0 63`; do taskset -c $i ./context_switch & done
vmstat 1
Signed-off-by: Anton Blanchard <anton@samba.org>
---
v2: Added a comment and fixed the UP build.
Index: linux-cpumask/kernel/sched.c
===================================================================
--- linux-cpumask.orig/kernel/sched.c 2010-01-22 18:23:43.377212514 +1100
+++ linux-cpumask/kernel/sched.c 2010-01-28 23:24:02.677233753 +1100
@@ -10885,12 +10885,30 @@ static void cpuacct_charge(struct task_s
}
/*
+ * When CONFIG_VIRT_CPU_ACCOUNTING is enabled one jiffy can be very large
+ * in cputime_t units. As a result, cpuacct_update_stats calls
+ * percpu_counter_add with values large enough to always overflow the
+ * per cpu batch limit causing bad SMP scalability.
+ *
+ * To fix this we scale percpu_counter_batch by cputime_one_jiffy so we
+ * batch the same amount of time with CONFIG_VIRT_CPU_ACCOUNTING disabled
+ * and enabled. We cap it at INT_MAX which is the largest allowed batch value.
+ */
+#ifdef CONFIG_SMP
+#define CPUACCT_BATCH \
+ min_t(long, percpu_counter_batch * cputime_one_jiffy, INT_MAX)
+#else
+#define CPUACCT_BATCH 0
+#endif
+
+/*
* Charge the system/user time to the task's accounting group.
*/
static void cpuacct_update_stats(struct task_struct *tsk,
enum cpuacct_stat_index idx, cputime_t val)
{
struct cpuacct *ca;
+ int batch = CPUACCT_BATCH;
if (unlikely(!cpuacct_subsys.active))
return;
@@ -10899,7 +10917,7 @@ static void cpuacct_update_stats(struct
ca = task_ca(tsk);
do {
- percpu_counter_add(&ca->cpustat[idx], val);
+ __percpu_counter_add(&ca->cpustat[idx], val, batch);
ca = ca->parent;
} while (ca);
rcu_read_unlock();
^ permalink raw reply [flat|nested] 62+ messages in thread
end of thread, other threads:[~2010-01-28 12:56 UTC | newest]
Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-07 9:59 [PATCH v3] cpuacct: VIRT_CPU_ACCOUNTING don't prevent percpu cputime count KOSAKI Motohiro
2009-05-07 10:50 ` Ingo Molnar
2009-05-09 9:26 ` KOSAKI Motohiro
2009-05-09 9:38 ` Peter Zijlstra
2009-05-09 9:52 ` KOSAKI Motohiro
2009-05-09 10:14 ` KOSAKI Motohiro
2009-05-11 12:33 ` [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters tip-bot for KOSAKI Motohiro
2009-05-11 12:36 ` tip-bot for KOSAKI Motohiro
2009-05-11 15:07 ` Ingo Molnar
2009-05-12 10:01 ` KOSAKI Motohiro
2009-05-12 10:02 ` Peter Zijlstra
2009-05-12 10:06 ` KOSAKI Motohiro
2009-05-12 10:22 ` KOSAKI Motohiro
2009-05-12 10:27 ` Ingo Molnar
2009-05-12 10:42 ` KOSAKI Motohiro
2009-05-12 10:10 ` Balbir Singh
2009-05-12 10:13 ` KOSAKI Motohiro
2009-05-12 10:24 ` Balbir Singh
2009-05-12 10:29 ` Ingo Molnar
2009-05-12 10:44 ` KOSAKI Motohiro
2009-05-12 13:28 ` Ingo Molnar
2009-05-12 13:40 ` KOSAKI Motohiro
2009-05-12 13:42 ` Ingo Molnar
2009-07-16 8:10 ` Bharata B Rao
2009-07-16 8:39 ` Anton Blanchard
2009-08-20 5:10 ` Anton Blanchard
2009-08-20 5:16 ` KAMEZAWA Hiroyuki
2009-08-20 5:26 ` Balbir Singh
2009-08-20 5:52 ` Peter Zijlstra
2009-08-20 6:05 ` Anton Blanchard
2009-08-20 6:10 ` KAMEZAWA Hiroyuki
2009-08-20 6:24 ` Anton Blanchard
2009-08-20 8:41 ` [PATCH] better align percpu counter (Was " KAMEZAWA Hiroyuki
2009-08-20 10:04 ` Ingo Molnar
2009-08-21 2:20 ` KAMEZAWA Hiroyuki
2009-08-21 11:29 ` Ingo Molnar
2009-08-21 3:56 ` KAMEZAWA Hiroyuki
2009-08-20 11:46 ` Balbir Singh
2009-08-21 6:21 ` KAMEZAWA Hiroyuki
2010-01-18 4:41 ` [PATCH] " Anton Blanchard
2010-01-18 4:55 ` KOSAKI Motohiro
2010-01-18 7:51 ` Peter Zijlstra
2010-01-18 8:21 ` Anton Blanchard
2010-01-18 8:34 ` Peter Zijlstra
2010-01-18 8:35 ` Balbir Singh
2010-01-18 9:42 ` Martin Schwidefsky
2010-01-18 9:55 ` Anton Blanchard
2010-01-18 11:00 ` Balbir Singh
2010-01-21 15:25 ` Balbir Singh
2010-01-21 15:36 ` Peter Zijlstra
2010-01-25 23:14 ` Andrew Morton
2010-01-25 23:44 ` KOSAKI Motohiro
2010-01-26 6:17 ` Balbir Singh
2010-01-26 6:35 ` Andrew Morton
2010-01-26 8:07 ` Anton Blanchard
2010-01-27 13:15 ` [tip:sched/urgent] " tip-bot for Anton Blanchard
2010-01-27 13:34 ` Balbir Singh
2010-01-27 21:22 ` Andrew Morton
2010-01-27 21:43 ` Peter Zijlstra
2010-01-28 12:48 ` [PATCH 1/2] percpu_counter: Make __percpu_counter_add an inline function on UP Anton Blanchard
2010-01-28 12:52 ` [PATCH 2/2] sched: cpuacct: Use bigger percpu counter batch values for stats counters Anton Blanchard
2010-01-28 5:21 ` [tip:sched/urgent] " Balbir Singh
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.