All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] [RFC] Trivial scheduler related Android patches
@ 2010-11-20  2:08 John Stultz
  2010-11-20  2:08 ` [PATCH 1/5] sched: Enable might_sleep before initializing drivers John Stultz
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: John Stultz @ 2010-11-20  2:08 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Arve Hj�nnev�g, Ingo Molnar,
	Peter Zijlstra, Dima Zavin, Erik Gilling, Mike Chan

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 2763 bytes --]

So after all the heat that was generated in the various Android
discussions, I took a look a look at the android git tree, and 
while there are a fair number of large and controversial 
infrastructure changes, there are also a number of small fixes 
that apply easily against Linus' git tree.

So after cherry picking these 50-some small patches out of the
android tree, I organized them into topic branches, and over
the next few weeks, I hope to send them out to lkml and topic
maintainers for comments.

Now, I'm not proposing that these changes be merged as-is. It
may very well be that, unknown to me, android developers have
already tried to submit these patches and they have been rejected
for good reason. Or some patches may very well be necessary hacks
to get thing shipping while deeper fixes are being worked on. If
that is the case, let me know and forgive me for the noise.

But as, it seemed many of these small changes have been obscured
by the debate over the larger infrastructure changes, I wanted 
to bring them forward so that possibly good fixes were not missed
in the controversy.

Maintainers: If you do find any of these patches distasteful,
that's fine, I'll be happy to drop them from my tree for now.
I really don't want to stir up another huge mail thread over these 
small patches, but I'd appreciate if you'd consider them as a
bug report illustrating an issue or a desired feature, and suggest 
what you see as a reasonable way to accomplish the desired
functionality presented in the patch.

The following patches are just the scheduler related trivial patches
from the Android tree. You can find this as well as my other trivial
Android topic branches here:
http://git.linaro.org/gitweb?p=people/jstultz/linux.git;a=summary

thanks
-john

Cc: Arve Hjønnevåg <arve@android.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Dima Zavin <dima@android.com>
CC: Erik Gilling <konkers@android.com>
CC: Mike Chan <mike@android.com>


Arve Hjønnevåg (1):
  sched: Enable might_sleep before initializing drivers.

Dima Zavin (1):
  sched: use the old min_vruntime when normalizing on dequeue

Erik Gilling (1):
  sched: make task dump print all 15 chars of proc comm

Mike Chan (2):
  scheduler: cpuacct: Enable platform hooks to track cpuusage for CPU
    frequencies
  scheduler: cpuacct: Enable platform callbacks for cpuacct power
    tracking

 Documentation/cgroups/cpuacct.txt |    7 +++
 include/linux/cpuacct.h           |   43 +++++++++++++++++++
 kernel/sched.c                    |   84 ++++++++++++++++++++++++++++++++++++-
 kernel/sched_fair.c               |    6 ++-
 4 files changed, 137 insertions(+), 3 deletions(-)
 create mode 100644 include/linux/cpuacct.h

-- 
1.7.3.2.146.gca209


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/5] sched: Enable might_sleep before initializing drivers.
  2010-11-20  2:08 [PATCH 0/5] [RFC] Trivial scheduler related Android patches John Stultz
@ 2010-11-20  2:08 ` John Stultz
  2010-11-20 10:42   ` Peter Zijlstra
  2010-11-20  2:08 ` [PATCH 2/5] sched: make task dump print all 15 chars of proc comm John Stultz
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: John Stultz @ 2010-11-20  2:08 UTC (permalink / raw)
  To: lkml; +Cc: Arve Hjønnevåg, Ingo Molnar, Peter Zijlstra, John Stultz

From: Arve Hjønnevåg <arve@android.com>

This allows detection of init bugs in built-in drivers.

CC: Ingo Molnar <mingo@elte.hu>
CC: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arve Hjønnevåg <arve@android.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/sched.c |   13 ++++++++++++-
 1 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index aa14a56..0b58415 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -8104,13 +8104,24 @@ static inline int preempt_count_equals(int preempt_offset)
 	return (nested == PREEMPT_INATOMIC_BASE + preempt_offset);
 }
 
+static int __might_sleep_init_called;
+int __init __might_sleep_init(void)
+{
+	__might_sleep_init_called = 1;
+	return 0;
+}
+early_initcall(__might_sleep_init);
+
 void __might_sleep(const char *file, int line, int preempt_offset)
 {
 #ifdef in_atomic
 	static unsigned long prev_jiffy;	/* ratelimiting */
 
 	if ((preempt_count_equals(preempt_offset) && !irqs_disabled()) ||
-	    system_state != SYSTEM_RUNNING || oops_in_progress)
+	    oops_in_progress)
+		return;
+	if (system_state != SYSTEM_RUNNING &&
+	    (!__might_sleep_init_called || system_state != SYSTEM_BOOTING))
 		return;
 	if (time_before(jiffies, prev_jiffy + HZ) && prev_jiffy)
 		return;
-- 
1.7.3.2.146.gca209


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/5] sched: make task dump print all 15 chars of proc comm
  2010-11-20  2:08 [PATCH 0/5] [RFC] Trivial scheduler related Android patches John Stultz
  2010-11-20  2:08 ` [PATCH 1/5] sched: Enable might_sleep before initializing drivers John Stultz
@ 2010-11-20  2:08 ` John Stultz
  2010-11-23 10:21   ` [tip:sched/core] sched: Make " tip-bot for Erik Gilling
  2010-11-20  2:08 ` [PATCH 3/5] scheduler: cpuacct: Enable platform hooks to track cpuusage for CPU frequencies John Stultz
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: John Stultz @ 2010-11-20  2:08 UTC (permalink / raw)
  To: lkml; +Cc: Erik Gilling, Ingo Molnar, Peter Zijlstra, John Stultz

From: Erik Gilling <konkers@android.com>

CC: Ingo Molnar <mingo@elte.hu>
CC: Peter Zijlstra <peterz@infradead.org>
Change-Id: I1a5c9676baa06c9f9b4424bbcab01b9b2fbfcd99
Signed-off-by: Erik Gilling <konkers@android.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/sched.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 0b58415..c99bbb2 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -5390,7 +5390,7 @@ void sched_show_task(struct task_struct *p)
 	unsigned state;
 
 	state = p->state ? __ffs(p->state) + 1 : 0;
-	printk(KERN_INFO "%-13.13s %c", p->comm,
+	printk(KERN_INFO "%-15.15s %c", p->comm,
 		state < sizeof(stat_nam) - 1 ? stat_nam[state] : '?');
 #if BITS_PER_LONG == 32
 	if (state == TASK_RUNNING)
-- 
1.7.3.2.146.gca209


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/5] scheduler: cpuacct: Enable platform hooks to track cpuusage for CPU frequencies
  2010-11-20  2:08 [PATCH 0/5] [RFC] Trivial scheduler related Android patches John Stultz
  2010-11-20  2:08 ` [PATCH 1/5] sched: Enable might_sleep before initializing drivers John Stultz
  2010-11-20  2:08 ` [PATCH 2/5] sched: make task dump print all 15 chars of proc comm John Stultz
@ 2010-11-20  2:08 ` John Stultz
  2010-11-20 10:48   ` Peter Zijlstra
  2010-11-20  2:08 ` [PATCH 4/5] scheduler: cpuacct: Enable platform callbacks for cpuacct power tracking John Stultz
  2010-11-20  2:08 ` [PATCH 5/5] sched: use the old min_vruntime when normalizing on dequeue John Stultz
  4 siblings, 1 reply; 16+ messages in thread
From: John Stultz @ 2010-11-20  2:08 UTC (permalink / raw)
  To: lkml; +Cc: Mike Chan, Ingo Molnar, Peter Zijlstra, John Stultz

From: Mike Chan <mike@android.com>

Introduce new platform callback hooks for cpuacct for tracking CPU frequencies

Not all platforms / architectures have a set CPU_FREQ_TABLE defined
for CPU transition speeds. In order to track time spent in at various
CPU frequencies, we enable platform callbacks from cpuacct for this accounting.

Architectures that support overclock boosting, or don't have pre-defined
frequency tables can implement their own bucketing system that makes sense
given their cpufreq scaling abilities.

New file:
cpuacct.cpufreq reports the CPU time (in nanoseconds) spent at each CPU
frequency.

CC: Ingo Molnar <mingo@elte.hu>
CC: Peter Zijlstra <peterz@infradead.org>
Change-Id: I10a80b3162e6fff3a8a2f74dd6bb37e88b12ba96
Signed-off-by: Mike Chan <mike@android.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 Documentation/cgroups/cpuacct.txt |    4 +++
 include/linux/cpuacct.h           |   41 +++++++++++++++++++++++++++++++
 kernel/sched.c                    |   49 +++++++++++++++++++++++++++++++++++++
 3 files changed, 94 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/cpuacct.h

diff --git a/Documentation/cgroups/cpuacct.txt b/Documentation/cgroups/cpuacct.txt
index 8b93094..600d2d0 100644
--- a/Documentation/cgroups/cpuacct.txt
+++ b/Documentation/cgroups/cpuacct.txt
@@ -40,6 +40,10 @@ system: Time spent by tasks of the cgroup in kernel mode.
 
 user and system are in USER_HZ unit.
 
+cpuacct.cpufreq file gives CPU time (in nanoseconds) spent at each CPU
+frequency. Platform hooks must be implemented inorder to properly track
+time at each CPU frequency.
+
 cpuacct controller uses percpu_counter interface to collect user and
 system times. This has two side effects:
 
diff --git a/include/linux/cpuacct.h b/include/linux/cpuacct.h
new file mode 100644
index 0000000..560df02
--- /dev/null
+++ b/include/linux/cpuacct.h
@@ -0,0 +1,41 @@
+/* include/linux/cpuacct.h
+ *
+ * Copyright (C) 2010 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _CPUACCT_H_
+#define _CPUACCT_H_
+
+#include <linux/cgroup.h>
+
+#ifdef CONFIG_CGROUP_CPUACCT
+
+/*
+ * Platform specific CPU frequency hooks for cpuacct. These functions are
+ * called from the scheduler.
+ */
+struct cpuacct_charge_calls {
+	/*
+	 * Platforms can take advantage of this data and use
+	 * per-cpu allocations if necessary.
+	 */
+	void (*init) (void **cpuacct_data);
+	void (*charge) (void *cpuacct_data,  u64 cputime, unsigned int cpu);
+	void (*show) (void *cpuacct_data, struct cgroup_map_cb *cb);
+};
+
+int cpuacct_charge_register(struct cpuacct_charge_calls *fn);
+
+#endif /* CONFIG_CGROUP_CPUACCT */
+
+#endif // _CPUACCT_H_
diff --git a/kernel/sched.c b/kernel/sched.c
index c99bbb2..35055fc 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -72,6 +72,7 @@
 #include <linux/ctype.h>
 #include <linux/ftrace.h>
 #include <linux/slab.h>
+#include <linux/cpuacct.h>
 
 #include <asm/tlb.h>
 #include <asm/irq_regs.h>
@@ -9082,8 +9083,30 @@ struct cpuacct {
 	u64 __percpu *cpuusage;
 	struct percpu_counter cpustat[CPUACCT_STAT_NSTATS];
 	struct cpuacct *parent;
+	struct cpuacct_charge_calls *cpufreq_fn;
+	void *cpuacct_data;
 };
 
+static struct cpuacct *cpuacct_root;
+
+/* Default calls for cpufreq accounting */
+static struct cpuacct_charge_calls *cpuacct_cpufreq;
+int cpuacct_register_cpufreq(struct cpuacct_charge_calls *fn)
+{
+	cpuacct_cpufreq = fn;
+
+	/*
+	 * Root node is created before platform can register callbacks,
+	 * initalize here.
+	 */
+	if (cpuacct_root && fn) {
+		cpuacct_root->cpufreq_fn = fn;
+		if (fn->init)
+			fn->init(&cpuacct_root->cpuacct_data);
+	}
+	return 0;
+}
+
 struct cgroup_subsys cpuacct_subsys;
 
 /* return cpu accounting group corresponding to this container */
@@ -9118,8 +9141,16 @@ static struct cgroup_subsys_state *cpuacct_create(
 		if (percpu_counter_init(&ca->cpustat[i], 0))
 			goto out_free_counters;
 
+	ca->cpufreq_fn = cpuacct_cpufreq;
+
+	/* If available, have platform code initalize cpu frequency table */
+	if (ca->cpufreq_fn && ca->cpufreq_fn->init)
+		ca->cpufreq_fn->init(&ca->cpuacct_data);
+
 	if (cgrp->parent)
 		ca->parent = cgroup_ca(cgrp->parent);
+	else
+		cpuacct_root = ca;
 
 	return &ca->css;
 
@@ -9247,6 +9278,16 @@ static int cpuacct_stats_show(struct cgroup *cgrp, struct cftype *cft,
 	return 0;
 }
 
+static int cpuacct_cpufreq_show(struct cgroup *cgrp, struct cftype *cft,
+		struct cgroup_map_cb *cb)
+{
+	struct cpuacct *ca = cgroup_ca(cgrp);
+	if (ca->cpufreq_fn && ca->cpufreq_fn->show)
+		ca->cpufreq_fn->show(ca->cpuacct_data, cb);
+
+	return 0;
+}
+
 static struct cftype files[] = {
 	{
 		.name = "usage",
@@ -9261,6 +9302,10 @@ static struct cftype files[] = {
 		.name = "stat",
 		.read_map = cpuacct_stats_show,
 	},
+	{
+		.name =  "cpufreq",
+		.read_map = cpuacct_cpufreq_show,
+	},
 };
 
 static int cpuacct_populate(struct cgroup_subsys *ss, struct cgroup *cgrp)
@@ -9290,6 +9335,10 @@ static void cpuacct_charge(struct task_struct *tsk, u64 cputime)
 	for (; ca; ca = ca->parent) {
 		u64 *cpuusage = per_cpu_ptr(ca->cpuusage, cpu);
 		*cpuusage += cputime;
+
+		/* Call back into platform code to account for CPU speeds */
+		if (ca->cpufreq_fn && ca->cpufreq_fn->charge)
+			ca->cpufreq_fn->charge(ca->cpuacct_data, cputime, cpu);
 	}
 
 	rcu_read_unlock();
-- 
1.7.3.2.146.gca209


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/5] scheduler: cpuacct: Enable platform callbacks for cpuacct power tracking
  2010-11-20  2:08 [PATCH 0/5] [RFC] Trivial scheduler related Android patches John Stultz
                   ` (2 preceding siblings ...)
  2010-11-20  2:08 ` [PATCH 3/5] scheduler: cpuacct: Enable platform hooks to track cpuusage for CPU frequencies John Stultz
@ 2010-11-20  2:08 ` John Stultz
  2010-11-20  2:08 ` [PATCH 5/5] sched: use the old min_vruntime when normalizing on dequeue John Stultz
  4 siblings, 0 replies; 16+ messages in thread
From: John Stultz @ 2010-11-20  2:08 UTC (permalink / raw)
  To: lkml; +Cc: Mike Chan, Ingo Molnar, Peter Zijlstra, John Stultz

From: Mike Chan <mike@android.com>

Platform must register cpu power function that return power in
milliWatt seconds.

CC: Ingo Molnar <mingo@elte.hu>
CC: Peter Zijlstra <peterz@infradead.org>
Change-Id: I1caa0335e316c352eee3b1ddf326fcd4942bcbe8
Signed-off-by: Mike Chan <mike@android.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 Documentation/cgroups/cpuacct.txt |    3 +++
 include/linux/cpuacct.h           |    4 +++-
 kernel/sched.c                    |   24 ++++++++++++++++++++++--
 3 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/Documentation/cgroups/cpuacct.txt b/Documentation/cgroups/cpuacct.txt
index 600d2d0..84e471b 100644
--- a/Documentation/cgroups/cpuacct.txt
+++ b/Documentation/cgroups/cpuacct.txt
@@ -44,6 +44,9 @@ cpuacct.cpufreq file gives CPU time (in nanoseconds) spent at each CPU
 frequency. Platform hooks must be implemented inorder to properly track
 time at each CPU frequency.
 
+cpuacct.power file gives CPU power consumed (in milliWatt seconds). Platform
+must provide and implement power callback functions.
+
 cpuacct controller uses percpu_counter interface to collect user and
 system times. This has two side effects:
 
diff --git a/include/linux/cpuacct.h b/include/linux/cpuacct.h
index 560df02..8f68e73 100644
--- a/include/linux/cpuacct.h
+++ b/include/linux/cpuacct.h
@@ -31,7 +31,9 @@ struct cpuacct_charge_calls {
 	 */
 	void (*init) (void **cpuacct_data);
 	void (*charge) (void *cpuacct_data,  u64 cputime, unsigned int cpu);
-	void (*show) (void *cpuacct_data, struct cgroup_map_cb *cb);
+	void (*cpufreq_show) (void *cpuacct_data, struct cgroup_map_cb *cb);
+	/* Returns power consumed in milliWatt seconds */
+	u64 (*power_usage) (void *cpuacct_data);
 };
 
 int cpuacct_charge_register(struct cpuacct_charge_calls *fn);
diff --git a/kernel/sched.c b/kernel/sched.c
index 35055fc..270d34a 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -9282,12 +9282,28 @@ static int cpuacct_cpufreq_show(struct cgroup *cgrp, struct cftype *cft,
 		struct cgroup_map_cb *cb)
 {
 	struct cpuacct *ca = cgroup_ca(cgrp);
-	if (ca->cpufreq_fn && ca->cpufreq_fn->show)
-		ca->cpufreq_fn->show(ca->cpuacct_data, cb);
+	if (ca->cpufreq_fn && ca->cpufreq_fn->cpufreq_show)
+		ca->cpufreq_fn->cpufreq_show(ca->cpuacct_data, cb);
 
 	return 0;
 }
 
+/* return total cpu power usage (milliWatt second) of a group */
+static u64 cpuacct_powerusage_read(struct cgroup *cgrp, struct cftype *cft)
+{
+	int i;
+	struct cpuacct *ca = cgroup_ca(cgrp);
+	u64 totalpower = 0;
+
+	if (ca->cpufreq_fn && ca->cpufreq_fn->power_usage)
+		for_each_present_cpu(i) {
+			totalpower += ca->cpufreq_fn->power_usage(
+					ca->cpuacct_data);
+		}
+
+	return totalpower;
+}
+
 static struct cftype files[] = {
 	{
 		.name = "usage",
@@ -9306,6 +9322,10 @@ static struct cftype files[] = {
 		.name =  "cpufreq",
 		.read_map = cpuacct_cpufreq_show,
 	},
+	{
+		.name = "power",
+		.read_u64 = cpuacct_powerusage_read
+	},
 };
 
 static int cpuacct_populate(struct cgroup_subsys *ss, struct cgroup *cgrp)
-- 
1.7.3.2.146.gca209


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/5] sched: use the old min_vruntime when normalizing on dequeue
  2010-11-20  2:08 [PATCH 0/5] [RFC] Trivial scheduler related Android patches John Stultz
                   ` (3 preceding siblings ...)
  2010-11-20  2:08 ` [PATCH 4/5] scheduler: cpuacct: Enable platform callbacks for cpuacct power tracking John Stultz
@ 2010-11-20  2:08 ` John Stultz
  2010-11-20 10:55   ` Peter Zijlstra
  4 siblings, 1 reply; 16+ messages in thread
From: John Stultz @ 2010-11-20  2:08 UTC (permalink / raw)
  To: lkml
  Cc: Dima Zavin, Ingo Molnar, Peter Zijlstra,
	Arve Hjønnevåg, John Stultz

From: Dima Zavin <dima@android.com>

After pulling the thread off the run-queue during a cgroup change,
the cfs_rq.min_vruntime gets recalculated. The dequeued thread's vruntime
then gets normalized to this new value. This can then lead to the thread
getting an unfair boost in the new group if the vruntime of the next
task in the old run-queue was way further ahead.

CC: Ingo Molnar <mingo@elte.hu>
CC: Peter Zijlstra <peterz@infradead.org>
Cc: Arve Hjønnevåg <arve@android.com>
Signed-off-by: Dima Zavin <dima@android.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/sched_fair.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index f4f6a83..72f19ad 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -802,6 +802,8 @@ static void clear_buddies(struct cfs_rq *cfs_rq, struct sched_entity *se)
 static void
 dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 {
+	u64 min_vruntime;
+
 	/*
 	 * Update run-time statistics of the 'current'.
 	 */
@@ -826,6 +828,8 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	if (se != cfs_rq->curr)
 		__dequeue_entity(cfs_rq, se);
 	account_entity_dequeue(cfs_rq, se);
+
+	min_vruntime = cfs_rq->min_vruntime;
 	update_min_vruntime(cfs_rq);
 
 	/*
@@ -834,7 +838,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	 * movement in our normalized position.
 	 */
 	if (!(flags & DEQUEUE_SLEEP))
-		se->vruntime -= cfs_rq->min_vruntime;
+		se->vruntime -= min_vruntime;
 }
 
 /*
-- 
1.7.3.2.146.gca209


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] sched: Enable might_sleep before initializing drivers.
  2010-11-20  2:08 ` [PATCH 1/5] sched: Enable might_sleep before initializing drivers John Stultz
@ 2010-11-20 10:42   ` Peter Zijlstra
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Zijlstra @ 2010-11-20 10:42 UTC (permalink / raw)
  To: John Stultz; +Cc: lkml, Arve Hjønnevåg, Ingo Molnar

On Fri, 2010-11-19 at 18:08 -0800, John Stultz wrote:
> From: Arve Hjønnevåg <arve@android.com>
> 
> This allows detection of init bugs in built-in drivers.
> 
> CC: Ingo Molnar <mingo@elte.hu>
> CC: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Arve Hjønnevåg <arve@android.com>
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
>  kernel/sched.c |   13 ++++++++++++-
>  1 files changed, 12 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/sched.c b/kernel/sched.c
> index aa14a56..0b58415 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -8104,13 +8104,24 @@ static inline int preempt_count_equals(int preempt_offset)
>  	return (nested == PREEMPT_INATOMIC_BASE + preempt_offset);
>  }
>  
> +static int __might_sleep_init_called;
> +int __init __might_sleep_init(void)
> +{
> +	__might_sleep_init_called = 1;
> +	return 0;
> +}
> +early_initcall(__might_sleep_init);
> +
>  void __might_sleep(const char *file, int line, int preempt_offset)
>  {
>  #ifdef in_atomic
>  	static unsigned long prev_jiffy;	/* ratelimiting */
>  
>  	if ((preempt_count_equals(preempt_offset) && !irqs_disabled()) ||
> -	    system_state != SYSTEM_RUNNING || oops_in_progress)
> +	    oops_in_progress)
> +		return;
> +	if (system_state != SYSTEM_RUNNING &&
> +	    (!__might_sleep_init_called || system_state != SYSTEM_BOOTING))
>  		return;
>  	if (time_before(jiffies, prev_jiffy + HZ) && prev_jiffy)
>  		return;

Remind me, why isn't scheduler_running good enough?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/5] scheduler: cpuacct: Enable platform hooks to track cpuusage for CPU frequencies
  2010-11-20  2:08 ` [PATCH 3/5] scheduler: cpuacct: Enable platform hooks to track cpuusage for CPU frequencies John Stultz
@ 2010-11-20 10:48   ` Peter Zijlstra
  2010-11-22  5:51     ` Florian Mickler
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2010-11-20 10:48 UTC (permalink / raw)
  To: John Stultz; +Cc: lkml, Mike Chan, Ingo Molnar

On Fri, 2010-11-19 at 18:08 -0800, John Stultz wrote:
> From: Mike Chan <mike@android.com>
> 
> Introduce new platform callback hooks for cpuacct for tracking CPU frequencies
> 
> Not all platforms / architectures have a set CPU_FREQ_TABLE defined
> for CPU transition speeds. In order to track time spent in at various
> CPU frequencies, we enable platform callbacks from cpuacct for this accounting.
> 
> Architectures that support overclock boosting, or don't have pre-defined
> frequency tables can implement their own bucketing system that makes sense
> given their cpufreq scaling abilities.
> 
> New file:
> cpuacct.cpufreq reports the CPU time (in nanoseconds) spent at each CPU
> frequency.

I utterly detest all such accounting crap.. it adds ABI constraints it
add runtime overhead. etc.. 

Can't you get the same information by using the various perf bits? If
you trace the cpufreq changes you can compute the time spend in each
power state, if you additionally trace the sched_switch you can compute
it for each task.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] sched: use the old min_vruntime when normalizing on dequeue
  2010-11-20  2:08 ` [PATCH 5/5] sched: use the old min_vruntime when normalizing on dequeue John Stultz
@ 2010-11-20 10:55   ` Peter Zijlstra
  2010-11-20 12:33     ` Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2010-11-20 10:55 UTC (permalink / raw)
  To: John Stultz; +Cc: lkml, Dima Zavin, Ingo Molnar, Arve Hjønnevåg

On Fri, 2010-11-19 at 18:08 -0800, John Stultz wrote:
> From: Dima Zavin <dima@android.com>
> 
> After pulling the thread off the run-queue during a cgroup change,
> the cfs_rq.min_vruntime gets recalculated. The dequeued thread's vruntime
> then gets normalized to this new value. This can then lead to the thread
> getting an unfair boost in the new group if the vruntime of the next
> task in the old run-queue was way further ahead.
> 
> CC: Ingo Molnar <mingo@elte.hu>
> CC: Peter Zijlstra <peterz@infradead.org>
> Cc: Arve Hjønnevåg <arve@android.com>
> Signed-off-by: Dima Zavin <dima@android.com>
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
>  kernel/sched_fair.c |    6 +++++-
>  1 files changed, 5 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index f4f6a83..72f19ad 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -802,6 +802,8 @@ static void clear_buddies(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  static void
>  dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>  {
> +	u64 min_vruntime;
> +
>  	/*
>  	 * Update run-time statistics of the 'current'.
>  	 */
> @@ -826,6 +828,8 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>  	if (se != cfs_rq->curr)
>  		__dequeue_entity(cfs_rq, se);
>  	account_entity_dequeue(cfs_rq, se);
> +
> +	min_vruntime = cfs_rq->min_vruntime;
>  	update_min_vruntime(cfs_rq);
>  
>  	/*
> @@ -834,7 +838,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>  	 * movement in our normalized position.
>  	 */
>  	if (!(flags & DEQUEUE_SLEEP))
> -		se->vruntime -= cfs_rq->min_vruntime;
> +		se->vruntime -= min_vruntime;
>  }

Right, so assuming the reasoning is right (my brain still needs to wake
up) the patch is weird, by not simply move the code bock up and avoid
the whole extra variable like so?

---
 kernel/sched_fair.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index d35f464..dfa28ef 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1003,8 +1003,6 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	se->on_rq = 0;
 	update_cfs_load(cfs_rq, 0);
 	account_entity_dequeue(cfs_rq, se);
-	update_min_vruntime(cfs_rq);
-	update_cfs_shares(cfs_rq, 0);
 
 	/*
 	 * Normalize the entity after updating the min_vruntime because the
@@ -1013,6 +1011,9 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	 */
 	if (!(flags & DEQUEUE_SLEEP))
 		se->vruntime -= cfs_rq->min_vruntime;
+
+	update_min_vruntime(cfs_rq);
+	update_cfs_shares(cfs_rq, 0);
 }
 
 /*


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] sched: use the old min_vruntime when normalizing on dequeue
  2010-11-20 10:55   ` Peter Zijlstra
@ 2010-11-20 12:33     ` Peter Zijlstra
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Zijlstra @ 2010-11-20 12:33 UTC (permalink / raw)
  To: John Stultz; +Cc: lkml, Dima Zavin, Ingo Molnar, Arve Hjønnevåg

On Sat, 2010-11-20 at 11:55 +0100, Peter Zijlstra wrote:
> Right, so assuming the reasoning is right (my brain still needs to wake
> up) the patch is weird, by not simply move the code bock up and avoid
> the whole extra variable like so?

Also, clearly that comments needs addressing..

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/5] scheduler: cpuacct: Enable platform hooks to track cpuusage for CPU frequencies
  2010-11-20 10:48   ` Peter Zijlstra
@ 2010-11-22  5:51     ` Florian Mickler
  2010-11-22 10:43       ` Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Florian Mickler @ 2010-11-22  5:51 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: John Stultz, lkml, Mike Chan, Ingo Molnar

On Sat, 20 Nov 2010 11:48:24 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, 2010-11-19 at 18:08 -0800, John Stultz wrote:
> > From: Mike Chan <mike@android.com>
> > 
> > Introduce new platform callback hooks for cpuacct for tracking CPU frequencies
> > 
> > Not all platforms / architectures have a set CPU_FREQ_TABLE defined
> > for CPU transition speeds. In order to track time spent in at various
> > CPU frequencies, we enable platform callbacks from cpuacct for this accounting.
> > 
> > Architectures that support overclock boosting, or don't have pre-defined
> > frequency tables can implement their own bucketing system that makes sense
> > given their cpufreq scaling abilities.
> > 
> > New file:
> > cpuacct.cpufreq reports the CPU time (in nanoseconds) spent at each CPU
> > frequency.
> 
> I utterly detest all such accounting crap.. it adds ABI constraints it
> add runtime overhead. etc.. 
> 
> Can't you get the same information by using the various perf bits? If
> you trace the cpufreq changes you can compute the time spend in each
> power state, if you additionally trace the sched_switch you can compute
> it for each task.
> 
> 
This is probably used for "on-site" debugging of production systems.

I.e. when someone sends them a problem report using an
bugreport-tool, they gather all useful information they can get on the
system because they only have one-way communication with their bug
reporters.

Do the perf bits work for such a usecase? If I guess correctly, the
perf bits need a userspace part that computes what would be in the
cpuacct.cpufreq file? 

Regards,
Flo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/5] scheduler: cpuacct: Enable platform hooks to track cpuusage for CPU frequencies
  2010-11-22  5:51     ` Florian Mickler
@ 2010-11-22 10:43       ` Peter Zijlstra
  2010-11-22 12:23         ` Florian Mickler
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2010-11-22 10:43 UTC (permalink / raw)
  To: Florian Mickler; +Cc: John Stultz, lkml, Mike Chan, Ingo Molnar

On Mon, 2010-11-22 at 06:51 +0100, Florian Mickler wrote:
> On Sat, 20 Nov 2010 11:48:24 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Fri, 2010-11-19 at 18:08 -0800, John Stultz wrote:
> > > From: Mike Chan <mike@android.com>
> > > 
> > > Introduce new platform callback hooks for cpuacct for tracking CPU frequencies
> > > 
> > > Not all platforms / architectures have a set CPU_FREQ_TABLE defined
> > > for CPU transition speeds. In order to track time spent in at various
> > > CPU frequencies, we enable platform callbacks from cpuacct for this accounting.
> > > 
> > > Architectures that support overclock boosting, or don't have pre-defined
> > > frequency tables can implement their own bucketing system that makes sense
> > > given their cpufreq scaling abilities.
> > > 
> > > New file:
> > > cpuacct.cpufreq reports the CPU time (in nanoseconds) spent at each CPU
> > > frequency.
> > 
> > I utterly detest all such accounting crap.. it adds ABI constraints it
> > add runtime overhead. etc.. 
> > 
> > Can't you get the same information by using the various perf bits? If
> > you trace the cpufreq changes you can compute the time spend in each
> > power state, if you additionally trace the sched_switch you can compute
> > it for each task.
> > 
> > 
> This is probably used for "on-site" debugging of production systems.

Dude, its from the _android_ tree... its cpufreq crud.. it must be some
crack induced power management scheme.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/5] scheduler: cpuacct: Enable platform hooks to track cpuusage for CPU frequencies
  2010-11-22 10:43       ` Peter Zijlstra
@ 2010-11-22 12:23         ` Florian Mickler
  2010-11-23  2:05           ` Mike Chan
  0 siblings, 1 reply; 16+ messages in thread
From: Florian Mickler @ 2010-11-22 12:23 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: John Stultz, lkml, Mike Chan, Ingo Molnar

On Mon, 22 Nov 2010 11:43:59 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, 2010-11-22 at 06:51 +0100, Florian Mickler wrote:
> > On Sat, 20 Nov 2010 11:48:24 +0100
> > Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > On Fri, 2010-11-19 at 18:08 -0800, John Stultz wrote:
> > > > From: Mike Chan <mike@android.com>
> > > > 
> > > > Introduce new platform callback hooks for cpuacct for tracking CPU frequencies
> > > > 
> > > > Not all platforms / architectures have a set CPU_FREQ_TABLE defined
> > > > for CPU transition speeds. In order to track time spent in at various
> > > > CPU frequencies, we enable platform callbacks from cpuacct for this accounting.
> > > > 
> > > > Architectures that support overclock boosting, or don't have pre-defined
> > > > frequency tables can implement their own bucketing system that makes sense
> > > > given their cpufreq scaling abilities.
> > > > 
> > > > New file:
> > > > cpuacct.cpufreq reports the CPU time (in nanoseconds) spent at each CPU
> > > > frequency.
> > > 
> > > I utterly detest all such accounting crap.. it adds ABI constraints it
> > > add runtime overhead. etc.. 
> > > 
> > > Can't you get the same information by using the various perf bits? If
> > > you trace the cpufreq changes you can compute the time spend in each
> > > power state, if you additionally trace the sched_switch you can compute
> > > it for each task.
> > > 
> > > 
> > This is probably used for "on-site" debugging of production systems.
> 
> Dude, its from the _android_ tree... its cpufreq crud.. it must be some
> crack induced power management scheme.
> 
> 

:) 

what I wanted to get at, was that they probably need these stats
aggregated somewhere neat and tidy and can not compute them on the fly
recording massive amounts of data...

I wonder why they didn't put this in the
idle-driver.  I don't know. 

Regards,
Flo


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/5] scheduler: cpuacct: Enable platform hooks to track cpuusage for CPU frequencies
  2010-11-22 12:23         ` Florian Mickler
@ 2010-11-23  2:05           ` Mike Chan
  2010-11-23 11:35             ` Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Mike Chan @ 2010-11-23  2:05 UTC (permalink / raw)
  To: Florian Mickler; +Cc: Peter Zijlstra, John Stultz, lkml, Ingo Molnar

On Mon, Nov 22, 2010 at 4:23 AM, Florian Mickler <florian@mickler.org> wrote:
> On Mon, 22 Nov 2010 11:43:59 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
>
>> On Mon, 2010-11-22 at 06:51 +0100, Florian Mickler wrote:
>> > On Sat, 20 Nov 2010 11:48:24 +0100
>> > Peter Zijlstra <peterz@infradead.org> wrote:
>> >
>> > > On Fri, 2010-11-19 at 18:08 -0800, John Stultz wrote:
>> > > > From: Mike Chan <mike@android.com>
>> > > >
>> > > > Introduce new platform callback hooks for cpuacct for tracking CPU frequencies
>> > > >
>> > > > Not all platforms / architectures have a set CPU_FREQ_TABLE defined
>> > > > for CPU transition speeds. In order to track time spent in at various
>> > > > CPU frequencies, we enable platform callbacks from cpuacct for this accounting.
>> > > >
>> > > > Architectures that support overclock boosting, or don't have pre-defined
>> > > > frequency tables can implement their own bucketing system that makes sense
>> > > > given their cpufreq scaling abilities.
>> > > >
>> > > > New file:
>> > > > cpuacct.cpufreq reports the CPU time (in nanoseconds) spent at each CPU
>> > > > frequency.
>> > >
>> > > I utterly detest all such accounting crap.. it adds ABI constraints it
>> > > add runtime overhead. etc..
>> > >
>> > > Can't you get the same information by using the various perf bits? If
>> > > you trace the cpufreq changes you can compute the time spend in each
>> > > power state, if you additionally trace the sched_switch you can compute
>> > > it for each task.
>> > >
>> > >
>> > This is probably used for "on-site" debugging of production systems.
>>
>> Dude, its from the _android_ tree... its cpufreq crud.. it must be some
>> crack induced power management scheme.
>>
>>
>
> :)
>
> what I wanted to get at, was that they probably need these stats
> aggregated somewhere neat and tidy and can not compute them on the fly
> recording massive amounts of data...
>
> I wonder why they didn't put this in the
> idle-driver.  I don't know.
>

This is useful for tracking cpu power per c-group. We split each
android application into its own c-group and track what cpu speeds and
how long the cpu spent for each one. Peter we've actually discussed
this before:
http://lkml.org/lkml/2010/5/6/301

These patches were discussed with Paul Menage and Balbir Singh back in
April, as well as on lmkl and the cpufreq mailing lists. These may or
may not be useful for mainline, I assume anyone wanting to track power
specific for c-groups would be interested. I'm open for different
implementations that can help achieve cpu power tracking per-cgroup if
this particular implementation is controversial, or if you just want
to help make Android's kernel better.

-- Mike

> Regards,
> Flo
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tip:sched/core] sched: Make task dump print all 15 chars of proc comm
  2010-11-20  2:08 ` [PATCH 2/5] sched: make task dump print all 15 chars of proc comm John Stultz
@ 2010-11-23 10:21   ` tip-bot for Erik Gilling
  0 siblings, 0 replies; 16+ messages in thread
From: tip-bot for Erik Gilling @ 2010-11-23 10:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, john.stultz, hpa, mingo, a.p.zijlstra, konkers,
	tglx, mingo

Commit-ID:  28d0686cf7b14e30243096bd874d3f80591ed392
Gitweb:     http://git.kernel.org/tip/28d0686cf7b14e30243096bd874d3f80591ed392
Author:     Erik Gilling <konkers@android.com>
AuthorDate: Fri, 19 Nov 2010 18:08:51 -0800
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Tue, 23 Nov 2010 10:29:07 +0100

sched: Make task dump print all 15 chars of proc comm

Signed-off-by: Erik Gilling <konkers@android.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1290218934-8544-3-git-send-email-john.stultz@linaro.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 550cf3a..324afce 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -5249,7 +5249,7 @@ void sched_show_task(struct task_struct *p)
 	unsigned state;
 
 	state = p->state ? __ffs(p->state) + 1 : 0;
-	printk(KERN_INFO "%-13.13s %c", p->comm,
+	printk(KERN_INFO "%-15.15s %c", p->comm,
 		state < sizeof(stat_nam) - 1 ? stat_nam[state] : '?');
 #if BITS_PER_LONG == 32
 	if (state == TASK_RUNNING)

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/5] scheduler: cpuacct: Enable platform hooks to track cpuusage for CPU frequencies
  2010-11-23  2:05           ` Mike Chan
@ 2010-11-23 11:35             ` Peter Zijlstra
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Zijlstra @ 2010-11-23 11:35 UTC (permalink / raw)
  To: Mike Chan
  Cc: Florian Mickler, John Stultz, lkml, Ingo Molnar, Stephane Eranian

On Mon, 2010-11-22 at 18:05 -0800, Mike Chan wrote:

> This is useful for tracking cpu power per c-group. We split each
> android application into its own c-group and track what cpu speeds and
> how long the cpu spent for each one. Peter we've actually discussed
> this before:
> http://lkml.org/lkml/2010/5/6/301
> 
> These patches were discussed with Paul Menage and Balbir Singh back in
> April, as well as on lmkl and the cpufreq mailing lists. These may or
> may not be useful for mainline, I assume anyone wanting to track power
> specific for c-groups would be interested. I'm open for different
> implementations that can help achieve cpu power tracking per-cgroup if
> this particular implementation is controversial, or if you just want
> to help make Android's kernel better.

Right, so Stephane is working on perf-cgroup bits (I saw he recently
posted another version, which I guess I ought to look at soonish).

With that it would be rather simple to use perf to track per-cgroup
power state.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2010-11-23 11:35 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-20  2:08 [PATCH 0/5] [RFC] Trivial scheduler related Android patches John Stultz
2010-11-20  2:08 ` [PATCH 1/5] sched: Enable might_sleep before initializing drivers John Stultz
2010-11-20 10:42   ` Peter Zijlstra
2010-11-20  2:08 ` [PATCH 2/5] sched: make task dump print all 15 chars of proc comm John Stultz
2010-11-23 10:21   ` [tip:sched/core] sched: Make " tip-bot for Erik Gilling
2010-11-20  2:08 ` [PATCH 3/5] scheduler: cpuacct: Enable platform hooks to track cpuusage for CPU frequencies John Stultz
2010-11-20 10:48   ` Peter Zijlstra
2010-11-22  5:51     ` Florian Mickler
2010-11-22 10:43       ` Peter Zijlstra
2010-11-22 12:23         ` Florian Mickler
2010-11-23  2:05           ` Mike Chan
2010-11-23 11:35             ` Peter Zijlstra
2010-11-20  2:08 ` [PATCH 4/5] scheduler: cpuacct: Enable platform callbacks for cpuacct power tracking John Stultz
2010-11-20  2:08 ` [PATCH 5/5] sched: use the old min_vruntime when normalizing on dequeue John Stultz
2010-11-20 10:55   ` Peter Zijlstra
2010-11-20 12:33     ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.