linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/10] sched: Task placement for heterogeneous MP systems
@ 2012-09-21 18:32 morten.rasmussen
  2012-09-21 18:32 ` [RFC PATCH 01/10] sched: entity load-tracking load_avg_ratio morten.rasmussen
                   ` (9 more replies)
  0 siblings, 10 replies; 27+ messages in thread
From: morten.rasmussen @ 2012-09-21 18:32 UTC (permalink / raw)
  To: paulmck, pjt, peterz, suresh.b.siddha
  Cc: morten.rasmussen, linaro-sched-sig, linaro-dev, linux-kernel

From: Morten Rasmussen <morten.rasmussen@arm.com>

Hi Paul, Paul, Peter, Suresh, linaro-sched-sig, and LKML,

As a follow-up on my Linux Plumbers Conference talk about my experiments with
scheduling on heterogeneous systems I'm posting a proof-of-concept patch set
with my modifications. The intention behind the modifications is to tweak
scheduling behaviour to only use fast (and power hungry) cores when it is
necessary and also improve performance consistency. Without the modifications
it is more or less random where tasks are scheduled and so is the execution
time.

I'm seeing good improvements on performance consistency for web browsing on
Android using Bbench <http://www.gem5.org/Bbench> on the ARM big.LITTLE TC2
chip, which has two fast cores (Cortex-A15) and three power-efficient cores
(Cortex-A7). The total execution time numbers below are for Androids
SurfaceFlinger process is key for page rendering performance. The average
execution time is lower with the patches enabled and the standard deviation is
much smaller. Similar improvements can be seen for the Android.Browser and
WebViewCoreThread processes.

Total execution time statistics based on 50 runs.

SurfaceFlinger  SMP kernel [s]  HMP modifications [s]
------------------------------------------------------
Average         14.617          11.012
St. Dev.         4.577           0.902
10% Pctl.        9.343          10.783
90% Pctl.       18.743          11.695

Unfortunately, I cannot share power-efficiency numbers at this stage.

This patch set introduces proof-of-concept scheduler modifications which
attempt to improve scheduling decisions on heterogeneous multi-processor
systems (HMP) such as ARM big.LITTLE systems. The patch set relies on the
entity load-tracking re-work patch set by Paul Turner:

<https://lkml.org/lkml/2012/8/23/267>

The modifications attempt to migrate tasks between cores with different
compute capacity depending on the tracked load and priority. The aim is
to only use fast cores for tasks which really need the extra performance
and thereby improve power consumption by running everything else on the
slow cores.

The patch introduces hmp_domains to represent the different types of cores
that are available on the given platform. Multiple (>2) hmp_domains is
supported but not tested. hmp_domains must be set up by platform code and
the patch set includes patches for ARM platforms using device-tree.

The patches intentionally try to avoid modifying the existing code paths
as much as possible. The aim is to experiment with HMP scheduling and get
the overall policy right before integrating it properly with the existing
load-balancer.

Morten

Morten Rasmussen (10):
  sched: entity load-tracking load_avg_ratio
  sched: Task placement for heterogeneous systems based on task
    load-tracking
  sched: Forced task migration on heterogeneous systems
  sched: Introduce priority-based task migration filter
  ARM: Add HMP scheduling support for ARM architecture
  ARM: sched: Use device-tree to provide fast/slow CPU list for HMP
  ARM: sched: Setup SCHED_HMP domains
  sched: Add ftrace events for entity load-tracking
  sched: Add HMP task migration ftrace event
  sched: SCHED_HMP multi-domain task migration control

 arch/arm/Kconfig                |   46 +++++
 arch/arm/include/asm/topology.h |   32 +++
 arch/arm/kernel/topology.c      |   91 ++++++++
 include/linux/sched.h           |   11 +
 include/trace/events/sched.h    |  153 ++++++++++++++
 kernel/sched/core.c             |    4 +
 kernel/sched/fair.c             |  434 ++++++++++++++++++++++++++++++++++++++-
 kernel/sched/sched.h            |    9 +
 8 files changed, 779 insertions(+), 1 deletion(-)

-- 
1.7.9.5



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC PATCH 01/10] sched: entity load-tracking load_avg_ratio
  2012-09-21 18:32 [RFC PATCH 00/10] sched: Task placement for heterogeneous MP systems morten.rasmussen
@ 2012-09-21 18:32 ` morten.rasmussen
  2012-09-21 18:32 ` [RFC PATCH 02/10] sched: Task placement for heterogeneous systems based on task load-tracking morten.rasmussen
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 27+ messages in thread
From: morten.rasmussen @ 2012-09-21 18:32 UTC (permalink / raw)
  To: paulmck, pjt, peterz, suresh.b.siddha
  Cc: morten.rasmussen, linaro-sched-sig, linaro-dev, linux-kernel

From: Morten Rasmussen <morten.rasmussen@arm.com>

This patch adds load_avg_ratio to each task. The load_avg_ratio is a
variant of load_avg_contrib which is not scaled by the task priority. It
is calculated like this:

runnable_avg_sum * NICE_0_LOAD / (runnable_avg_period + 1).

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
---
 include/linux/sched.h |    1 +
 kernel/sched/fair.c   |    3 +++
 2 files changed, 4 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4dc4990..81e4e82 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1151,6 +1151,7 @@ struct sched_avg {
 	u64 last_runnable_update;
 	s64 decay_count;
 	unsigned long load_avg_contrib;
+	unsigned long load_avg_ratio;
 	u32 usage_avg_sum;
 };
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 095d86c..3e17dd5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1192,6 +1192,9 @@ static inline void __update_task_entity_contrib(struct sched_entity *se)
 	contrib = se->avg.runnable_avg_sum * scale_load_down(se->load.weight);
 	contrib /= (se->avg.runnable_avg_period + 1);
 	se->avg.load_avg_contrib = scale_load(contrib);
+	contrib = se->avg.runnable_avg_sum * scale_load_down(NICE_0_LOAD);
+	contrib /= (se->avg.runnable_avg_period + 1);
+	se->avg.load_avg_ratio = scale_load(contrib);
 }
 
 /* Compute the current contribution to load_avg by se, return any delta */
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH 02/10] sched: Task placement for heterogeneous systems based on task load-tracking
  2012-09-21 18:32 [RFC PATCH 00/10] sched: Task placement for heterogeneous MP systems morten.rasmussen
  2012-09-21 18:32 ` [RFC PATCH 01/10] sched: entity load-tracking load_avg_ratio morten.rasmussen
@ 2012-09-21 18:32 ` morten.rasmussen
  2012-10-04  6:02   ` Viresh Kumar
  2012-09-21 18:32 ` [RFC PATCH 03/10] sched: Forced task migration on heterogeneous systems morten.rasmussen
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: morten.rasmussen @ 2012-09-21 18:32 UTC (permalink / raw)
  To: paulmck, pjt, peterz, suresh.b.siddha
  Cc: morten.rasmussen, linaro-sched-sig, linaro-dev, linux-kernel

From: Morten Rasmussen <morten.rasmussen@arm.com>

This patch introduces the basic SCHED_HMP infrastructure. Each class of
cpus is represented by a hmp_domain and tasks will only be moved between
these domains when their load profiles suggest it is beneficial.

SCHED_HMP relies heavily on the task load-tracking introduced in Paul
Turners fair group scheduling patch set:

<https://lkml.org/lkml/2012/8/23/267>

SCHED_HMP requires that the platform implements arch_get_hmp_domains()
which should set up the platform specific list of hmp_domains. It is
also assumed that the platform disables SD_LOAD_BALANCE for the
appropriate sched_domains.
Tasks placement takes place every time a task is to be inserted into
a runqueue based on its load history. The task placement decision is
based on load thresholds.

There are no restrictions on the number of hmp_domains, however,
multiple (>2) has not been tested and the up/down migration policy is
rather simple.

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
---
 arch/arm/Kconfig      |   17 +++++
 include/linux/sched.h |    6 ++
 kernel/sched/fair.c   |  168 +++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/sched.h  |    6 ++
 4 files changed, 197 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index f4a5d58..5b09684 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1554,6 +1554,23 @@ config SCHED_SMT
 	  MultiThreading at a cost of slightly increased overhead in some
 	  places. If unsure say N here.
 
+config DISABLE_CPU_SCHED_DOMAIN_BALANCE
+	bool "(EXPERIMENTAL) Disable CPU level scheduler load-balancing"
+	help
+	  Disables scheduler load-balancing at CPU sched domain level.
+
+config SCHED_HMP
+	bool "(EXPERIMENTAL) Heterogenous multiprocessor scheduling"
+	depends on DISABLE_CPU_SCHED_DOMAIN_BALANCE && SCHED_MC && FAIR_GROUP_SCHED && !SCHED_AUTOGROUP
+	help
+	  Experimental scheduler optimizations for heterogeneous platforms.
+	  Attempts to introspectively select task affinity to optimize power
+	  and performance. Basic support for multiple (>2) cpu types is in place,
+	  but it has only been tested with two types of cpus.
+	  There is currently no support for migration of task groups, hence
+	  !SCHED_AUTOGROUP. Furthermore, normal load-balancing must be disabled
+	  between cpus of different type (DISABLE_CPU_SCHED_DOMAIN_BALANCE).
+
 config HAVE_ARM_SCU
 	bool
 	help
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 81e4e82..df971a3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1039,6 +1039,12 @@ unsigned long default_scale_smt_power(struct sched_domain *sd, int cpu);
 
 bool cpus_share_cache(int this_cpu, int that_cpu);
 
+#ifdef CONFIG_SCHED_HMP
+struct hmp_domain {
+	struct cpumask cpus;
+	struct list_head hmp_domains;
+};
+#endif /* CONFIG_SCHED_HMP */
 #else /* CONFIG_SMP */
 
 struct sched_domain_attr;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3e17dd5..d80de46 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3077,6 +3077,125 @@ static int select_idle_sibling(struct task_struct *p, int target)
 	return target;
 }
 
+#ifdef CONFIG_SCHED_HMP
+/*
+ * Heterogenous multiprocessor (HMP) optimizations
+ *
+ * The cpu types are distinguished using a list of hmp_domains
+ * which each represent one cpu type using a cpumask.
+ * The list is assumed ordered by compute capacity with the
+ * fastest domain first.
+ */
+DEFINE_PER_CPU(struct hmp_domain *, hmp_cpu_domain);
+
+extern void __init arch_get_hmp_domains(struct list_head *hmp_domains_list);
+
+/* Setup hmp_domains */
+static int __init hmp_cpu_mask_setup(void)
+{
+	char buf[64];
+	struct hmp_domain *domain;
+	struct list_head *pos;
+	int dc, cpu;
+
+	pr_debug("Initializing HMP scheduler:\n");
+
+	/* Initialize hmp_domains using platform code */
+	arch_get_hmp_domains(&hmp_domains);
+	if (list_empty(&hmp_domains)) {
+		pr_debug("HMP domain list is empty!\n");
+		return 0;
+	}
+
+	/* Print hmp_domains */
+	dc = 0;
+	list_for_each(pos, &hmp_domains) {
+		domain = list_entry(pos, struct hmp_domain, hmp_domains);
+		cpulist_scnprintf(buf, 64, &domain->cpus);
+		pr_debug("  HMP domain %d: %s\n", dc, buf);
+
+		for_each_cpu_mask(cpu, domain->cpus) {
+			per_cpu(hmp_cpu_domain, cpu) = domain;
+		}
+		dc++;
+	}
+
+	return 1;
+}
+
+/*
+ * Migration thresholds should be in the range [0..1023]
+ * hmp_up_threshold: min. load required for migrating tasks to a faster cpu
+ * hmp_down_threshold: max. load allowed for tasks migrating to a slower cpu
+ * The default values (512, 256) offer good responsiveness, but may need
+ * tweaking suit particular needs.
+ */
+unsigned int hmp_up_threshold = 512;
+unsigned int hmp_down_threshold = 256;
+
+static unsigned int hmp_up_migration(int cpu, struct sched_entity *se);
+static unsigned int hmp_down_migration(int cpu, struct sched_entity *se);
+
+/* Check if cpu is in fastest hmp_domain */
+static inline unsigned int hmp_cpu_is_fastest(int cpu)
+{
+	struct list_head *pos;
+
+	pos = &hmp_cpu_domain(cpu)->hmp_domains;
+	return pos == hmp_domains.next;
+}
+
+/* Check if cpu is in slowest hmp_domain */
+static inline unsigned int hmp_cpu_is_slowest(int cpu)
+{
+	struct list_head *pos;
+
+	pos = &hmp_cpu_domain(cpu)->hmp_domains;
+	return list_is_last(pos, &hmp_domains);
+}
+
+/* Next (slower) hmp_domain relative to cpu */
+static inline struct hmp_domain *hmp_slower_domain(int cpu)
+{
+	struct list_head *pos;
+
+	pos = &hmp_cpu_domain(cpu)->hmp_domains;
+	return list_entry(pos->next, struct hmp_domain, hmp_domains);
+}
+
+/* Previous (faster) hmp_domain relative to cpu */
+static inline struct hmp_domain *hmp_faster_domain(int cpu)
+{
+	struct list_head *pos;
+
+	pos = &hmp_cpu_domain(cpu)->hmp_domains;
+	return list_entry(pos->prev, struct hmp_domain, hmp_domains);
+}
+
+/*
+ * Selects a cpu in previous (faster) hmp_domain
+ * Note that cpumask_any_and() returns the first cpu in the cpumask
+ */
+static inline unsigned int hmp_select_faster_cpu(struct task_struct *tsk,
+							int cpu)
+{
+	return cpumask_any_and(&hmp_faster_domain(cpu)->cpus,
+				tsk_cpus_allowed(tsk));
+}
+
+/*
+ * Selects a cpu in next (slower) hmp_domain
+ * Note that cpumask_any_and() returns the first cpu in the cpumask
+ */
+static inline unsigned int hmp_select_slower_cpu(struct task_struct *tsk,
+							int cpu)
+{
+	return cpumask_any_and(&hmp_slower_domain(cpu)->cpus,
+				tsk_cpus_allowed(tsk));
+}
+
+#endif /* CONFIG_SCHED_HMP */
+
 /*
  * sched_balance_self: balance the current task (running on cpu) in domains
  * that have the 'flag' flag set. In practice, this is SD_BALANCE_FORK and
@@ -3203,6 +3322,16 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
 unlock:
 	rcu_read_unlock();
 
+#ifdef CONFIG_SCHED_HMP
+	if (hmp_up_migration(prev_cpu, &p->se))
+		return hmp_select_faster_cpu(p, prev_cpu);
+	if (hmp_down_migration(prev_cpu, &p->se))
+		return hmp_select_slower_cpu(p, prev_cpu);
+	/* Make sure that the task stays in its previous hmp domain */
+	if (!cpumask_test_cpu(new_cpu, &hmp_cpu_domain(prev_cpu)->cpus))
+		return prev_cpu;
+#endif
+
 	return new_cpu;
 }
 
@@ -5354,6 +5483,41 @@ need_kick:
 static void nohz_idle_balance(int this_cpu, enum cpu_idle_type idle) { }
 #endif
 
+#ifdef CONFIG_SCHED_HMP
+/* Check if task should migrate to a faster cpu */
+static unsigned int hmp_up_migration(int cpu, struct sched_entity *se)
+{
+	struct task_struct *p = task_of(se);
+
+	if (hmp_cpu_is_fastest(cpu))
+		return 0;
+
+	if (cpumask_intersects(&hmp_faster_domain(cpu)->cpus,
+					tsk_cpus_allowed(p))
+		&& se->avg.load_avg_ratio > hmp_up_threshold) {
+		return 1;
+	}
+	return 0;
+}
+
+/* Check if task should migrate to a slower cpu */
+static unsigned int hmp_down_migration(int cpu, struct sched_entity *se)
+{
+	struct task_struct *p = task_of(se);
+
+	if (hmp_cpu_is_slowest(cpu))
+		return 0;
+
+	if (cpumask_intersects(&hmp_slower_domain(cpu)->cpus,
+					tsk_cpus_allowed(p))
+		&& se->avg.load_avg_ratio < hmp_down_threshold) {
+		return 1;
+	}
+	return 0;
+}
+
+#endif /* CONFIG_SCHED_HMP */
+
 /*
  * run_rebalance_domains is triggered when needed from the scheduler tick.
  * Also triggered for nohz idle balancing (with nohz_balancing_kick set).
@@ -5861,6 +6025,10 @@ __init void init_sched_fair_class(void)
 	zalloc_cpumask_var(&nohz.idle_cpus_mask, GFP_NOWAIT);
 	cpu_notifier(sched_ilb_notifier, 0);
 #endif
+
+#ifdef CONFIG_SCHED_HMP
+	hmp_cpu_mask_setup();
+#endif
 #endif /* SMP */
 
 }
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 81135f9..4990d9e 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -547,6 +547,12 @@ DECLARE_PER_CPU(int, sd_llc_id);
 
 extern int group_balance_cpu(struct sched_group *sg);
 
+#ifdef CONFIG_SCHED_HMP
+static LIST_HEAD(hmp_domains);
+DECLARE_PER_CPU(struct hmp_domain *, hmp_cpu_domain);
+#define hmp_cpu_domain(cpu)	(per_cpu(hmp_cpu_domain, (cpu)))
+#endif /* CONFIG_SCHED_HMP */
+
 #endif /* CONFIG_SMP */
 
 #include "stats.h"
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH 03/10] sched: Forced task migration on heterogeneous systems
  2012-09-21 18:32 [RFC PATCH 00/10] sched: Task placement for heterogeneous MP systems morten.rasmussen
  2012-09-21 18:32 ` [RFC PATCH 01/10] sched: entity load-tracking load_avg_ratio morten.rasmussen
  2012-09-21 18:32 ` [RFC PATCH 02/10] sched: Task placement for heterogeneous systems based on task load-tracking morten.rasmussen
@ 2012-09-21 18:32 ` morten.rasmussen
  2012-10-04  6:18   ` Viresh Kumar
  2012-09-21 18:32 ` [RFC PATCH 04/10] sched: Introduce priority-based task migration filter morten.rasmussen
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: morten.rasmussen @ 2012-09-21 18:32 UTC (permalink / raw)
  To: paulmck, pjt, peterz, suresh.b.siddha
  Cc: morten.rasmussen, linaro-sched-sig, linaro-dev, linux-kernel

From: Morten Rasmussen <morten.rasmussen@arm.com>

This patch introduces forced task migration for moving suitable
currently running tasks between hmp_domains. Task behaviour is likely
to change over time. Tasks running in a less capable hmp_domain may
change to become more demanding and should therefore be migrated up.
They are unlikely go through the select_task_rq_fair() path anytime
soon and therefore need special attention.

This patch introduces a period check (SCHED_TICK) of the currently
running task on all runqueues and sets up a forced migration using
stop_machine_no_wait() if the task needs to be migrated.

Ideally, this should not be implemented by polling all runqueues.

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
---
 kernel/sched/fair.c  |  196 +++++++++++++++++++++++++++++++++++++++++++++++++-
 kernel/sched/sched.h |    3 +
 2 files changed, 198 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d80de46..490f1f0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3744,7 +3744,6 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
 	 * 1) task is cache cold, or
 	 * 2) too many balance attempts have failed.
 	 */
-
 	tsk_cache_hot = task_hot(p, env->src_rq->clock_task, env->sd);
 	if (!tsk_cache_hot ||
 		env->sd->nr_balance_failed > env->sd->cache_nice_tries) {
@@ -5516,6 +5515,199 @@ static unsigned int hmp_down_migration(int cpu, struct sched_entity *se)
 	return 0;
 }
 
+/*
+ * hmp_can_migrate_task - may task p from runqueue rq be migrated to this_cpu?
+ * Ideally this function should be merged with can_migrate_task() to avoid
+ * redundant code.
+ */
+static int hmp_can_migrate_task(struct task_struct *p, struct lb_env *env)
+{
+	int tsk_cache_hot = 0;
+
+	/*
+	 * We do not migrate tasks that are:
+	 * 1) running (obviously), or
+	 * 2) cannot be migrated to this CPU due to cpus_allowed
+	 */
+	if (!cpumask_test_cpu(env->dst_cpu, tsk_cpus_allowed(p))) {
+		schedstat_inc(p, se.statistics.nr_failed_migrations_affine);
+		return 0;
+	}
+	env->flags &= ~LBF_ALL_PINNED;
+
+	if (task_running(env->src_rq, p)) {
+		schedstat_inc(p, se.statistics.nr_failed_migrations_running);
+		return 0;
+	}
+
+	/*
+	 * Aggressive migration if:
+	 * 1) task is cache cold, or
+	 * 2) too many balance attempts have failed.
+	 */
+
+	tsk_cache_hot = task_hot(p, env->src_rq->clock_task, env->sd);
+	if (!tsk_cache_hot ||
+		env->sd->nr_balance_failed > env->sd->cache_nice_tries) {
+#ifdef CONFIG_SCHEDSTATS
+		if (tsk_cache_hot) {
+			schedstat_inc(env->sd, lb_hot_gained[env->idle]);
+			schedstat_inc(p, se.statistics.nr_forced_migrations);
+		}
+#endif
+		return 1;
+	}
+
+	return 1;
+}
+
+/*
+ * move_specific_task tries to move a specific task.
+ * Returns 1 if successful and 0 otherwise.
+ * Called with both runqueues locked.
+ */
+static int move_specific_task(struct lb_env *env, struct task_struct *pm)
+{
+	struct task_struct *p, *n;
+
+	list_for_each_entry_safe(p, n, &env->src_rq->cfs_tasks, se.group_node) {
+	if (throttled_lb_pair(task_group(p), env->src_rq->cpu,
+				env->dst_cpu))
+		continue;
+
+		if (!hmp_can_migrate_task(p, env))
+			continue;
+		/* Check if we found the right task */
+		if (p != pm)
+			continue;
+
+		move_task(p, env);
+		/*
+		 * Right now, this is only the third place move_task()
+		 * is called, so we can safely collect move_task()
+		 * stats here rather than inside move_task().
+		 */
+		schedstat_inc(env->sd, lb_gained[env->idle]);
+		return 1;
+	}
+	return 0;
+}
+
+/*
+ * hmp_active_task_migration_cpu_stop is run by cpu stopper and used to
+ * migrate a specific task from one runqueue to another.
+ * hmp_force_up_migration uses this to push a currently running task
+ * off a runqueue.
+ * Based on active_load_balance_stop_cpu and can potentially be merged.
+ */
+static int hmp_active_task_migration_cpu_stop(void *data)
+{
+	struct rq *busiest_rq = data;
+	struct task_struct *p = busiest_rq->migrate_task;
+	int busiest_cpu = cpu_of(busiest_rq);
+	int target_cpu = busiest_rq->push_cpu;
+	struct rq *target_rq = cpu_rq(target_cpu);
+	struct sched_domain *sd;
+
+	raw_spin_lock_irq(&busiest_rq->lock);
+	/* make sure the requested cpu hasn't gone down in the meantime */
+	if (unlikely(busiest_cpu != smp_processor_id() ||
+		!busiest_rq->active_balance)) {
+		goto out_unlock;
+	}
+	/* Is there any task to move? */
+	if (busiest_rq->nr_running <= 1)
+		goto out_unlock;
+	/* Task has migrated meanwhile, abort forced migration */
+	if (task_rq(p) != busiest_rq)
+		goto out_unlock;
+	/*
+	 * This condition is "impossible", if it occurs
+	 * we need to fix it. Originally reported by
+	 * Bjorn Helgaas on a 128-cpu setup.
+	 */
+	BUG_ON(busiest_rq == target_rq);
+
+	/* move a task from busiest_rq to target_rq */
+	double_lock_balance(busiest_rq, target_rq);
+
+	/* Search for an sd spanning us and the target CPU. */
+	rcu_read_lock();
+	for_each_domain(target_cpu, sd) {
+		if (cpumask_test_cpu(busiest_cpu, sched_domain_span(sd)))
+			break;
+	}
+
+	if (likely(sd)) {
+		struct lb_env env = {
+			.sd		= sd,
+			.dst_cpu	= target_cpu,
+			.dst_rq		= target_rq,
+			.src_cpu	= busiest_rq->cpu,
+			.src_rq		= busiest_rq,
+			.idle		= CPU_IDLE,
+		};
+
+		schedstat_inc(sd, alb_count);
+
+		if (move_specific_task(&env, p))
+			schedstat_inc(sd, alb_pushed);
+		else
+			schedstat_inc(sd, alb_failed);
+	}
+	rcu_read_unlock();
+	double_unlock_balance(busiest_rq, target_rq);
+out_unlock:
+	busiest_rq->active_balance = 0;
+	raw_spin_unlock_irq(&busiest_rq->lock);
+	return 0;
+}
+
+static DEFINE_SPINLOCK(hmp_force_migration);
+
+/*
+ * hmp_force_up_migration checks runqueues for tasks that need to
+ * be actively migrated to a faster cpu.
+ */
+static void hmp_force_up_migration(int this_cpu)
+{
+	int cpu;
+	struct sched_entity *curr;
+	struct rq *target;
+	unsigned long flags;
+	unsigned int force;
+	struct task_struct *p;
+
+	if (!spin_trylock(&hmp_force_migration))
+		return;
+	for_each_online_cpu(cpu) {
+		force = 0;
+		target = cpu_rq(cpu);
+		raw_spin_lock_irqsave(&target->lock, flags);
+		curr = target->cfs.curr;
+		if (!curr || !entity_is_task(curr)) {
+			raw_spin_unlock_irqrestore(&target->lock, flags);
+			continue;
+		}
+		p = task_of(curr);
+		if (hmp_up_migration(cpu, curr)) {
+			if (!target->active_balance) {
+				target->active_balance = 1;
+				target->push_cpu = hmp_select_faster_cpu(p, cpu);
+				target->migrate_task = p;
+				force = 1;
+			}
+		}
+		raw_spin_unlock_irqrestore(&target->lock, flags);
+		if (force)
+			stop_one_cpu_nowait(cpu_of(target),
+				hmp_active_task_migration_cpu_stop,
+				target, &target->active_balance_work);
+	}
+	spin_unlock(&hmp_force_migration);
+}
+#else
+static void hmp_force_up_migration(int this_cpu) { }
 #endif /* CONFIG_SCHED_HMP */
 
 /*
@@ -5529,6 +5721,8 @@ static void run_rebalance_domains(struct softirq_action *h)
 	enum cpu_idle_type idle = this_rq->idle_balance ?
 						CPU_IDLE : CPU_NOT_IDLE;
 
+	hmp_force_up_migration(this_cpu);
+
 	rebalance_domains(this_cpu, idle);
 
 	/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 4990d9e..92858e9 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -425,6 +425,9 @@ struct rq {
 	int active_balance;
 	int push_cpu;
 	struct cpu_stop_work active_balance_work;
+#ifdef CONFIG_SCHED_HMP
+	struct task_struct *migrate_task;
+#endif
 	/* cpu of this runqueue: */
 	int cpu;
 	int online;
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH 04/10] sched: Introduce priority-based task migration filter
  2012-09-21 18:32 [RFC PATCH 00/10] sched: Task placement for heterogeneous MP systems morten.rasmussen
                   ` (2 preceding siblings ...)
  2012-09-21 18:32 ` [RFC PATCH 03/10] sched: Forced task migration on heterogeneous systems morten.rasmussen
@ 2012-09-21 18:32 ` morten.rasmussen
  2012-10-04  4:37   ` Viresh Kumar
  2012-10-04  6:27   ` Viresh Kumar
  2012-09-21 18:32 ` [RFC PATCH 05/10] ARM: Add HMP scheduling support for ARM architecture morten.rasmussen
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 27+ messages in thread
From: morten.rasmussen @ 2012-09-21 18:32 UTC (permalink / raw)
  To: paulmck, pjt, peterz, suresh.b.siddha
  Cc: morten.rasmussen, linaro-sched-sig, linaro-dev, linux-kernel

From: Morten Rasmussen <morten.rasmussen@arm.com>

Introduces a priority threshold which prevents low priority task
from migrating to faster hmp_domains (cpus). This is useful for
user-space software which assigns lower task priority to background
task.

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
---
 arch/arm/Kconfig    |   13 +++++++++++++
 kernel/sched/fair.c |   15 +++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 5b09684..05de193 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1571,6 +1571,19 @@ config SCHED_HMP
 	  !SCHED_AUTOGROUP. Furthermore, normal load-balancing must be disabled
 	  between cpus of different type (DISABLE_CPU_SCHED_DOMAIN_BALANCE).
 
+config SCHED_HMP_PRIO_FILTER
+	bool "(EXPERIMENTAL) Filter HMP migrations by task priority"
+	depends on SCHED_HMP
+	help
+	  Enables task priority based HMP migration filter. Any task with
+	  a NICE value above the threshold will always be on low-power cpus
+	  with less compute capacity.
+
+config SCHED_HMP_PRIO_FILTER_VAL
+	int "NICE priority threshold"
+	default 5
+	depends on SCHED_HMP_PRIO_FILTER
+
 config HAVE_ARM_SCU
 	bool
 	help
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 490f1f0..8f0f3b9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3129,9 +3129,12 @@ static int __init hmp_cpu_mask_setup(void)
  * hmp_down_threshold: max. load allowed for tasks migrating to a slower cpu
  * The default values (512, 256) offer good responsiveness, but may need
  * tweaking suit particular needs.
+ *
+ * hmp_up_prio: Only up migrate task with high priority (<hmp_up_prio)
  */
 unsigned int hmp_up_threshold = 512;
 unsigned int hmp_down_threshold = 256;
+unsigned int hmp_up_prio = NICE_TO_PRIO(CONFIG_SCHED_HMP_PRIO_FILTER_VAL);
 
 static unsigned int hmp_up_migration(int cpu, struct sched_entity *se);
 static unsigned int hmp_down_migration(int cpu, struct sched_entity *se);
@@ -5491,6 +5494,12 @@ static unsigned int hmp_up_migration(int cpu, struct sched_entity *se)
 	if (hmp_cpu_is_fastest(cpu))
 		return 0;
 
+#ifdef CONFIG_SCHED_HMP_PRIO_FILTER
+	/* Filter by task priority */
+	if (p->prio >= hmp_up_prio)
+		return 0;
+#endif
+
 	if (cpumask_intersects(&hmp_faster_domain(cpu)->cpus,
 					tsk_cpus_allowed(p))
 		&& se->avg.load_avg_ratio > hmp_up_threshold) {
@@ -5507,6 +5516,12 @@ static unsigned int hmp_down_migration(int cpu, struct sched_entity *se)
 	if (hmp_cpu_is_slowest(cpu))
 		return 0;
 
+#ifdef CONFIG_SCHED_HMP_PRIO_FILTER
+	/* Filter by task priority */
+	if (p->prio >= hmp_up_prio)
+		return 1;
+#endif
+
 	if (cpumask_intersects(&hmp_slower_domain(cpu)->cpus,
 					tsk_cpus_allowed(p))
 		&& se->avg.load_avg_ratio < hmp_down_threshold) {
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH 05/10] ARM: Add HMP scheduling support for ARM architecture
  2012-09-21 18:32 [RFC PATCH 00/10] sched: Task placement for heterogeneous MP systems morten.rasmussen
                   ` (3 preceding siblings ...)
  2012-09-21 18:32 ` [RFC PATCH 04/10] sched: Introduce priority-based task migration filter morten.rasmussen
@ 2012-09-21 18:32 ` morten.rasmussen
  2012-09-21 18:32 ` [RFC PATCH 06/10] ARM: sched: Use device-tree to provide fast/slow CPU list for HMP morten.rasmussen
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 27+ messages in thread
From: morten.rasmussen @ 2012-09-21 18:32 UTC (permalink / raw)
  To: paulmck, pjt, peterz, suresh.b.siddha
  Cc: morten.rasmussen, linaro-sched-sig, linaro-dev, linux-kernel

From: Morten Rasmussen <morten.rasmussen@arm.com>

Adds Kconfig entries to enable HMP scheduling on ARM platforms.
Currently, it disables CPU level sched_domain load-balacing in order
to simplify things. This needs fixing in a later revision. HMP
scheduling will do the load-balancing at this level instead.

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
---
 arch/arm/Kconfig                |   14 ++++++++++++++
 arch/arm/include/asm/topology.h |   32 ++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 05de193..cb80846 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1584,6 +1584,20 @@ config SCHED_HMP_PRIO_FILTER_VAL
 	default 5
 	depends on SCHED_HMP_PRIO_FILTER
 
+config HMP_FAST_CPU_MASK
+	string "HMP scheduler fast CPU mask"
+	depends on SCHED_HMP
+	help
+          Specify the cpuids of the fast CPUs in the system as a list string,
+	  e.g. cpuid 0+1 should be specified as 0-1.
+
+config HMP_SLOW_CPU_MASK
+	string "HMP scheduler slow CPU mask"
+	depends on SCHED_HMP
+	help
+	  Specify the cpuids of the slow CPUs in the system as a list string,
+	  e.g. cpuid 0+1 should be specified as 0-1.
+
 config HAVE_ARM_SCU
 	bool
 	help
diff --git a/arch/arm/include/asm/topology.h b/arch/arm/include/asm/topology.h
index 58b8b84..13a03de 100644
--- a/arch/arm/include/asm/topology.h
+++ b/arch/arm/include/asm/topology.h
@@ -27,6 +27,38 @@ void init_cpu_topology(void);
 void store_cpu_topology(unsigned int cpuid);
 const struct cpumask *cpu_coregroup_mask(int cpu);
 
+#ifdef CONFIG_DISABLE_CPU_SCHED_DOMAIN_BALANCE
+/* Common values for CPUs */
+#ifndef SD_CPU_INIT
+#define SD_CPU_INIT (struct sched_domain) {				\
+	.min_interval		= 1,					\
+	.max_interval		= 4,					\
+	.busy_factor		= 64,					\
+	.imbalance_pct		= 125,					\
+	.cache_nice_tries	= 1,					\
+	.busy_idx		= 2,					\
+	.idle_idx		= 1,					\
+	.newidle_idx		= 0,					\
+	.wake_idx		= 0,					\
+	.forkexec_idx		= 0,					\
+									\
+	.flags			= 0*SD_LOAD_BALANCE			\
+				| 1*SD_BALANCE_NEWIDLE			\
+				| 1*SD_BALANCE_EXEC			\
+				| 1*SD_BALANCE_FORK			\
+				| 0*SD_BALANCE_WAKE			\
+				| 1*SD_WAKE_AFFINE			\
+				| 0*SD_PREFER_LOCAL			\
+				| 0*SD_SHARE_CPUPOWER			\
+				| 0*SD_SHARE_PKG_RESOURCES		\
+				| 0*SD_SERIALIZE			\
+				,					\
+	.last_balance		 = jiffies,				\
+	.balance_interval	= 1,					\
+}
+#endif
+#endif /* CONFIG_DISABLE_CPU_SCHED_DOMAIN_BALANCE */
+
 #else
 
 static inline void init_cpu_topology(void) { }
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH 06/10] ARM: sched: Use device-tree to provide fast/slow CPU list for HMP
  2012-09-21 18:32 [RFC PATCH 00/10] sched: Task placement for heterogeneous MP systems morten.rasmussen
                   ` (4 preceding siblings ...)
  2012-09-21 18:32 ` [RFC PATCH 05/10] ARM: Add HMP scheduling support for ARM architecture morten.rasmussen
@ 2012-09-21 18:32 ` morten.rasmussen
  2012-10-04  6:49   ` Viresh Kumar
  2012-10-10 11:04   ` Morten Rasmussen
  2012-09-21 18:32 ` [RFC PATCH 07/10] ARM: sched: Setup SCHED_HMP domains morten.rasmussen
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 27+ messages in thread
From: morten.rasmussen @ 2012-09-21 18:32 UTC (permalink / raw)
  To: paulmck, pjt, peterz, suresh.b.siddha
  Cc: morten.rasmussen, linaro-sched-sig, linaro-dev, linux-kernel

From: Morten Rasmussen <morten.rasmussen@arm.com>

We can't rely on Kconfig options to set the fast and slow CPU lists for
HMP scheduling if we want a single kernel binary to support multiple
devices with different CPU topology. E.g. TC2 (ARM's Test-Chip-2
big.LITTLE system), Fast Models, or even non big.LITTLE devices.

This patch adds the function arch_get_fast_and_slow_cpus() to generate
the lists at run-time by parsing the CPU nodes in device-tree; it
assumes slow cores are A7s and everything else is fast. The function
still supports the old Kconfig options as this is useful for testing the
HMP scheduler on devices without big.LITTLE.

This patch is reuse of a patch by Jon Medhurst <tixy@linaro.org> with a
few bits left out.

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
---
 arch/arm/Kconfig           |    4 ++-
 arch/arm/kernel/topology.c |   69 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 72 insertions(+), 1 deletion(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index cb80846..f1271bc 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1588,13 +1588,15 @@ config HMP_FAST_CPU_MASK
 	string "HMP scheduler fast CPU mask"
 	depends on SCHED_HMP
 	help
-          Specify the cpuids of the fast CPUs in the system as a list string,
+          Leave empty to use device tree information.
+	  Specify the cpuids of the fast CPUs in the system as a list string,
 	  e.g. cpuid 0+1 should be specified as 0-1.
 
 config HMP_SLOW_CPU_MASK
 	string "HMP scheduler slow CPU mask"
 	depends on SCHED_HMP
 	help
+	  Leave empty to use device tree information.
 	  Specify the cpuids of the slow CPUs in the system as a list string,
 	  e.g. cpuid 0+1 should be specified as 0-1.
 
diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index 26c12c6..7682e12 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -317,6 +317,75 @@ void store_cpu_topology(unsigned int cpuid)
 		cpu_topology[cpuid].socket_id, mpidr);
 }
 
+
+#ifdef CONFIG_SCHED_HMP
+
+static const char * const little_cores[] = {
+	"arm,cortex-a7",
+	NULL,
+};
+
+static bool is_little_cpu(struct device_node *cn)
+{
+	const char * const *lc;
+	for (lc = little_cores; *lc; lc++)
+		if (of_device_is_compatible(cn, *lc))
+			return true;
+	return false;
+}
+
+void __init arch_get_fast_and_slow_cpus(struct cpumask *fast,
+					struct cpumask *slow)
+{
+	struct device_node *cn = NULL;
+	int cpu = 0;
+
+	cpumask_clear(fast);
+	cpumask_clear(slow);
+
+	/*
+	 * Use the config options if they are given. This helps testing
+	 * HMP scheduling on systems without a big.LITTLE architecture.
+	 */
+	if (strlen(CONFIG_HMP_FAST_CPU_MASK) && strlen(CONFIG_HMP_SLOW_CPU_MASK)) {
+		if (cpulist_parse(CONFIG_HMP_FAST_CPU_MASK, fast))
+			WARN(1, "Failed to parse HMP fast cpu mask!\n");
+		if (cpulist_parse(CONFIG_HMP_SLOW_CPU_MASK, slow))
+			WARN(1, "Failed to parse HMP slow cpu mask!\n");
+		return;
+	}
+
+	/*
+	 * Else, parse device tree for little cores.
+	 */
+	while ((cn = of_find_node_by_type(cn, "cpu"))) {
+
+		if (cpu >= num_possible_cpus())
+			break;
+
+		if (is_little_cpu(cn))
+			cpumask_set_cpu(cpu, slow);
+		else
+			cpumask_set_cpu(cpu, fast);
+
+		cpu++;
+	}
+
+	if (!cpumask_empty(fast) && !cpumask_empty(slow))
+		return;
+
+	/*
+	 * We didn't find both big and little cores so let's call all cores
+	 * fast as this will keep the system running, with all cores being
+	 * treated equal.
+	 */
+	cpumask_setall(fast);
+	cpumask_clear(slow);
+}
+
+#endif /* CONFIG_SCHED_HMP */
+
+
 /*
  * init_cpu_topology is called at boot when only one cpu is running
  * which prevent simultaneous write access to cpu_topology array
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH 07/10] ARM: sched: Setup SCHED_HMP domains
  2012-09-21 18:32 [RFC PATCH 00/10] sched: Task placement for heterogeneous MP systems morten.rasmussen
                   ` (5 preceding siblings ...)
  2012-09-21 18:32 ` [RFC PATCH 06/10] ARM: sched: Use device-tree to provide fast/slow CPU list for HMP morten.rasmussen
@ 2012-09-21 18:32 ` morten.rasmussen
  2012-10-04  6:58   ` Viresh Kumar
  2012-09-21 18:32 ` [RFC PATCH 08/10] sched: Add ftrace events for entity load-tracking morten.rasmussen
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: morten.rasmussen @ 2012-09-21 18:32 UTC (permalink / raw)
  To: paulmck, pjt, peterz, suresh.b.siddha
  Cc: morten.rasmussen, linaro-sched-sig, linaro-dev, linux-kernel

From: Morten Rasmussen <morten.rasmussen@arm.com>

SCHED_HMP requires the different cpu types to be represented by an
ordered list of hmp_domains. Each hmp_domain represents all cpus of
a particular type using a cpumask.

The list is platform specific and therefore must be generated by
platform code by implementing arch_get_hmp_domains().

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
---
 arch/arm/kernel/topology.c |   22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index 7682e12..ec8ad5c 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -383,6 +383,28 @@ void __init arch_get_fast_and_slow_cpus(struct cpumask *fast,
 	cpumask_clear(slow);
 }
 
+void __init arch_get_hmp_domains(struct list_head *hmp_domains_list)
+{
+	struct cpumask hmp_fast_cpu_mask;
+	struct cpumask hmp_slow_cpu_mask;
+	struct hmp_domain *domain;
+
+	arch_get_fast_and_slow_cpus(&hmp_fast_cpu_mask, &hmp_slow_cpu_mask);
+
+	/*
+	 * Initialize hmp_domains
+	 * Must be ordered with respect to compute capacity.
+	 * Fastest domain at head of list.
+	 */
+	domain = (struct hmp_domain *)
+		kmalloc(sizeof(struct hmp_domain), GFP_KERNEL);
+	cpumask_copy(&domain->cpus, &hmp_slow_cpu_mask);
+	list_add(&domain->hmp_domains, hmp_domains_list);
+	domain = (struct hmp_domain *)
+		kmalloc(sizeof(struct hmp_domain), GFP_KERNEL);
+	cpumask_copy(&domain->cpus, &hmp_fast_cpu_mask);
+	list_add(&domain->hmp_domains, hmp_domains_list);
+}
 #endif /* CONFIG_SCHED_HMP */
 
 
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH 08/10] sched: Add ftrace events for entity load-tracking
  2012-09-21 18:32 [RFC PATCH 00/10] sched: Task placement for heterogeneous MP systems morten.rasmussen
                   ` (6 preceding siblings ...)
  2012-09-21 18:32 ` [RFC PATCH 07/10] ARM: sched: Setup SCHED_HMP domains morten.rasmussen
@ 2012-09-21 18:32 ` morten.rasmussen
  2012-09-21 18:32 ` [RFC PATCH 09/10] sched: Add HMP task migration ftrace event morten.rasmussen
  2012-09-21 18:32 ` [RFC PATCH 10/10] sched: SCHED_HMP multi-domain task migration control morten.rasmussen
  9 siblings, 0 replies; 27+ messages in thread
From: morten.rasmussen @ 2012-09-21 18:32 UTC (permalink / raw)
  To: paulmck, pjt, peterz, suresh.b.siddha
  Cc: morten.rasmussen, linaro-sched-sig, linaro-dev, linux-kernel

From: Morten Rasmussen <morten.rasmussen@arm.com>

Adds ftrace events for key variables related to the entity
load-tracking to help debugging scheduler behaviour. Allows tracing
of load contribution and runqueue residency ratio for both entities
and runqueues as well as entity CPU usage ratio.

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
---
 include/trace/events/sched.h |  125 ++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/fair.c          |    7 +++
 2 files changed, 132 insertions(+)

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 5a8671e..847eb76 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -430,6 +430,131 @@ TRACE_EVENT(sched_pi_setprio,
 			__entry->oldprio, __entry->newprio)
 );
 
+/*
+ * Tracepoint for showing tracked load contribution.
+ */
+TRACE_EVENT(sched_task_load_contrib,
+
+	TP_PROTO(struct task_struct *tsk, unsigned long load_contrib),
+
+	TP_ARGS(tsk, load_contrib),
+
+	TP_STRUCT__entry(
+		__array(char, comm, TASK_COMM_LEN)
+		__field(pid_t, pid)
+		__field(unsigned long, load_contrib)
+	),
+
+	TP_fast_assign(
+		memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
+		__entry->pid            = tsk->pid;
+		__entry->load_contrib   = load_contrib;
+	),
+
+	TP_printk("comm=%s pid=%d load_contrib=%lu",
+			__entry->comm, __entry->pid,
+			__entry->load_contrib)
+);
+
+/*
+ * Tracepoint for showing tracked task runnable ratio [0..1023].
+ */
+TRACE_EVENT(sched_task_runnable_ratio,
+
+	TP_PROTO(struct task_struct *tsk, unsigned long ratio),
+
+	TP_ARGS(tsk, ratio),
+
+	TP_STRUCT__entry(
+		__array(char, comm, TASK_COMM_LEN)
+		__field(pid_t, pid)
+		__field(unsigned long, ratio)
+	),
+
+	TP_fast_assign(
+	memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
+		__entry->pid   = tsk->pid;
+		__entry->ratio = ratio;
+	),
+
+	TP_printk("comm=%s pid=%d ratio=%lu",
+			__entry->comm, __entry->pid,
+			__entry->ratio)
+);
+
+/*
+ * Tracepoint for showing tracked rq runnable ratio [0..1023].
+ */
+TRACE_EVENT(sched_rq_runnable_ratio,
+
+	TP_PROTO(int cpu, unsigned long ratio),
+
+	TP_ARGS(cpu, ratio),
+
+	TP_STRUCT__entry(
+		__field(int, cpu)
+		__field(unsigned long, ratio)
+	),
+
+	TP_fast_assign(
+		__entry->cpu   = cpu;
+		__entry->ratio = ratio;
+	),
+
+	TP_printk("cpu=%d ratio=%lu",
+			__entry->cpu,
+			__entry->ratio)
+);
+
+/*
+ * Tracepoint for showing tracked rq runnable load.
+ */
+TRACE_EVENT(sched_rq_runnable_load,
+
+	TP_PROTO(int cpu, u64 load),
+
+	TP_ARGS(cpu, load),
+
+	TP_STRUCT__entry(
+		__field(int, cpu)
+		__field(u64, load)
+	),
+
+	TP_fast_assign(
+		__entry->cpu  = cpu;
+		__entry->load = load;
+	),
+
+	TP_printk("cpu=%d load=%llu",
+			__entry->cpu,
+			__entry->load)
+);
+
+/*
+ * Tracepoint for showing tracked task cpu usage ratio [0..1023].
+ */
+TRACE_EVENT(sched_task_usage_ratio,
+
+	TP_PROTO(struct task_struct *tsk, unsigned long ratio),
+
+	TP_ARGS(tsk, ratio),
+
+	TP_STRUCT__entry(
+		__array(char, comm, TASK_COMM_LEN)
+		__field(pid_t, pid)
+		__field(unsigned long, ratio)
+	),
+
+	TP_fast_assign(
+	memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
+		__entry->pid   = tsk->pid;
+		__entry->ratio = ratio;
+	),
+
+	TP_printk("comm=%s pid=%d ratio=%lu",
+			__entry->comm, __entry->pid,
+			__entry->ratio)
+);
 #endif /* _TRACE_SCHED_H */
 
 /* This part must be outside protection */
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8f0f3b9..0be53be 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1192,9 +1192,11 @@ static inline void __update_task_entity_contrib(struct sched_entity *se)
 	contrib = se->avg.runnable_avg_sum * scale_load_down(se->load.weight);
 	contrib /= (se->avg.runnable_avg_period + 1);
 	se->avg.load_avg_contrib = scale_load(contrib);
+	trace_sched_task_load_contrib(task_of(se), se->avg.load_avg_contrib);
 	contrib = se->avg.runnable_avg_sum * scale_load_down(NICE_0_LOAD);
 	contrib /= (se->avg.runnable_avg_period + 1);
 	se->avg.load_avg_ratio = scale_load(contrib);
+	trace_sched_task_runnable_ratio(task_of(se), se->avg.load_avg_ratio);
 }
 
 /* Compute the current contribution to load_avg by se, return any delta */
@@ -1286,9 +1288,14 @@ static void update_cfs_rq_blocked_load(struct cfs_rq *cfs_rq, int force_update)
 
 static inline void update_rq_runnable_avg(struct rq *rq, int runnable)
 {
+	u32 contrib;
 	__update_entity_runnable_avg(rq->clock_task, &rq->avg, runnable,
 				     runnable);
 	__update_tg_runnable_avg(&rq->avg, &rq->cfs);
+	contrib = rq->avg.runnable_avg_sum * scale_load_down(1024);
+	contrib /= (rq->avg.runnable_avg_period + 1);
+	trace_sched_rq_runnable_ratio(cpu_of(rq), scale_load(contrib));
+	trace_sched_rq_runnable_load(cpu_of(rq), rq->cfs.runnable_load_avg);
 }
 
 /* Add the load generated by se into cfs_rq's child load-average */
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH 09/10] sched: Add HMP task migration ftrace event
  2012-09-21 18:32 [RFC PATCH 00/10] sched: Task placement for heterogeneous MP systems morten.rasmussen
                   ` (7 preceding siblings ...)
  2012-09-21 18:32 ` [RFC PATCH 08/10] sched: Add ftrace events for entity load-tracking morten.rasmussen
@ 2012-09-21 18:32 ` morten.rasmussen
  2012-09-21 18:32 ` [RFC PATCH 10/10] sched: SCHED_HMP multi-domain task migration control morten.rasmussen
  9 siblings, 0 replies; 27+ messages in thread
From: morten.rasmussen @ 2012-09-21 18:32 UTC (permalink / raw)
  To: paulmck, pjt, peterz, suresh.b.siddha
  Cc: morten.rasmussen, linaro-sched-sig, linaro-dev, linux-kernel

From: Morten Rasmussen <morten.rasmussen@arm.com>

Adds ftrace event for tracing task migrations using HMP
optimized scheduling.

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
---
 include/trace/events/sched.h |   28 ++++++++++++++++++++++++++++
 kernel/sched/fair.c          |   15 +++++++++++----
 2 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 847eb76..501aa32 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -555,6 +555,34 @@ TRACE_EVENT(sched_task_usage_ratio,
 			__entry->comm, __entry->pid,
 			__entry->ratio)
 );
+
+/*
+ * Tracepoint for HMP (CONFIG_SCHED_HMP) task migrations.
+ */
+TRACE_EVENT(sched_hmp_migrate,
+
+	TP_PROTO(struct task_struct *tsk, int dest, int force),
+
+	TP_ARGS(tsk, dest, force),
+
+	TP_STRUCT__entry(
+		__array(char, comm, TASK_COMM_LEN)
+		__field(pid_t, pid)
+		__field(int,  dest)
+		__field(int,  force)
+	),
+
+	TP_fast_assign(
+	memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
+		__entry->pid   = tsk->pid;
+		__entry->dest  = dest;
+		__entry->force = force;
+	),
+
+	TP_printk("comm=%s pid=%d dest=%d force=%d",
+			__entry->comm, __entry->pid,
+			__entry->dest, __entry->force)
+);
 #endif /* _TRACE_SCHED_H */
 
 /* This part must be outside protection */
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0be53be..811b2b9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3333,10 +3333,16 @@ unlock:
 	rcu_read_unlock();
 
 #ifdef CONFIG_SCHED_HMP
-	if (hmp_up_migration(prev_cpu, &p->se))
-		return hmp_select_faster_cpu(p, prev_cpu);
-	if (hmp_down_migration(prev_cpu, &p->se))
-		return hmp_select_slower_cpu(p, prev_cpu);
+	if (hmp_up_migration(prev_cpu, &p->se)) {
+		new_cpu = hmp_select_faster_cpu(p, prev_cpu);
+		trace_sched_hmp_migrate(p, new_cpu, 0);
+		return new_cpu;
+	}
+	if (hmp_down_migration(prev_cpu, &p->se)) {
+		new_cpu = hmp_select_slower_cpu(p, prev_cpu);
+		trace_sched_hmp_migrate(p, new_cpu, 0);
+		return new_cpu;
+	}
 	/* Make sure that the task stays in its previous hmp domain */
 	if (!cpumask_test_cpu(new_cpu, &hmp_cpu_domain(prev_cpu)->cpus))
 		return prev_cpu;
@@ -5718,6 +5724,7 @@ static void hmp_force_up_migration(int this_cpu)
 				target->push_cpu = hmp_select_faster_cpu(p, cpu);
 				target->migrate_task = p;
 				force = 1;
+				trace_sched_hmp_migrate(p, target->push_cpu, 1);
 			}
 		}
 		raw_spin_unlock_irqrestore(&target->lock, flags);
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH 10/10] sched: SCHED_HMP multi-domain task migration control
  2012-09-21 18:32 [RFC PATCH 00/10] sched: Task placement for heterogeneous MP systems morten.rasmussen
                   ` (8 preceding siblings ...)
  2012-09-21 18:32 ` [RFC PATCH 09/10] sched: Add HMP task migration ftrace event morten.rasmussen
@ 2012-09-21 18:32 ` morten.rasmussen
  9 siblings, 0 replies; 27+ messages in thread
From: morten.rasmussen @ 2012-09-21 18:32 UTC (permalink / raw)
  To: paulmck, pjt, peterz, suresh.b.siddha
  Cc: morten.rasmussen, linaro-sched-sig, linaro-dev, linux-kernel

From: Morten Rasmussen <morten.rasmussen@arm.com>

We need a way to prevent tasks that are migrating up and down the
hmp_domains from migrating straight on through before the load has
adapted to the new compute capacity of the CPU on the new hmp_domain.
This patch adds a next up/down migration delay that prevents the task
from doing another migration in the same direction until the delay
has expired.

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
---
 include/linux/sched.h |    4 ++++
 kernel/sched/core.c   |    4 ++++
 kernel/sched/fair.c   |   38 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 46 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index df971a3..ca3890a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1158,6 +1158,10 @@ struct sched_avg {
 	s64 decay_count;
 	unsigned long load_avg_contrib;
 	unsigned long load_avg_ratio;
+#ifdef CONFIG_SCHED_HMP
+	u64 hmp_last_up_migration;
+	u64 hmp_last_down_migration;
+#endif
 	u32 usage_avg_sum;
 };
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 652b86b..a3b1ff6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1723,6 +1723,10 @@ static void __sched_fork(struct task_struct *p)
 #if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)
 	p->se.avg.runnable_avg_period = 0;
 	p->se.avg.runnable_avg_sum = 0;
+#ifdef CONFIG_SCHED_HMP
+	p->se.avg.hmp_last_up_migration = 0;
+	p->se.avg.hmp_last_down_migration = 0;
+#endif
 #endif
 #ifdef CONFIG_SCHEDSTATS
 	memset(&p->se.statistics, 0, sizeof(p->se.statistics));
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 811b2b9..56cbda1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3138,10 +3138,14 @@ static int __init hmp_cpu_mask_setup(void)
  * tweaking suit particular needs.
  *
  * hmp_up_prio: Only up migrate task with high priority (<hmp_up_prio)
+ * hmp_next_up_threshold: Delay before next up migration (1024 ~= 1 ms)
+ * hmp_next_down_threshold: Delay before next down migration (1024 ~= 1 ms)
  */
 unsigned int hmp_up_threshold = 512;
 unsigned int hmp_down_threshold = 256;
 unsigned int hmp_up_prio = NICE_TO_PRIO(CONFIG_SCHED_HMP_PRIO_FILTER_VAL);
+unsigned int hmp_next_up_threshold = 4096;
+unsigned int hmp_next_down_threshold = 4096;
 
 static unsigned int hmp_up_migration(int cpu, struct sched_entity *se);
 static unsigned int hmp_down_migration(int cpu, struct sched_entity *se);
@@ -3204,6 +3208,21 @@ static inline unsigned int hmp_select_slower_cpu(struct task_struct *tsk,
 				tsk_cpus_allowed(tsk));
 }
 
+static inline void hmp_next_up_delay(struct sched_entity *se, int cpu)
+{
+	struct cfs_rq *cfs_rq = &cpu_rq(cpu)->cfs;
+
+	se->avg.hmp_last_up_migration = cfs_rq_clock_task(cfs_rq);
+	se->avg.hmp_last_down_migration = 0;
+}
+
+static inline void hmp_next_down_delay(struct sched_entity *se, int cpu)
+{
+	struct cfs_rq *cfs_rq = &cpu_rq(cpu)->cfs;
+
+	se->avg.hmp_last_down_migration = cfs_rq_clock_task(cfs_rq);
+	se->avg.hmp_last_up_migration = 0;
+}
 #endif /* CONFIG_SCHED_HMP */
 
 /*
@@ -3335,11 +3354,13 @@ unlock:
 #ifdef CONFIG_SCHED_HMP
 	if (hmp_up_migration(prev_cpu, &p->se)) {
 		new_cpu = hmp_select_faster_cpu(p, prev_cpu);
+		hmp_next_up_delay(&p->se, new_cpu);
 		trace_sched_hmp_migrate(p, new_cpu, 0);
 		return new_cpu;
 	}
 	if (hmp_down_migration(prev_cpu, &p->se)) {
 		new_cpu = hmp_select_slower_cpu(p, prev_cpu);
+		hmp_next_down_delay(&p->se, new_cpu);
 		trace_sched_hmp_migrate(p, new_cpu, 0);
 		return new_cpu;
 	}
@@ -5503,6 +5524,8 @@ static void nohz_idle_balance(int this_cpu, enum cpu_idle_type idle) { }
 static unsigned int hmp_up_migration(int cpu, struct sched_entity *se)
 {
 	struct task_struct *p = task_of(se);
+	struct cfs_rq *cfs_rq = &cpu_rq(cpu)->cfs;
+	u64 now;
 
 	if (hmp_cpu_is_fastest(cpu))
 		return 0;
@@ -5513,6 +5536,12 @@ static unsigned int hmp_up_migration(int cpu, struct sched_entity *se)
 		return 0;
 #endif
 
+	/* Let the task load settle before doing another up migration */
+	now = cfs_rq_clock_task(cfs_rq);
+	if (((now - se->avg.hmp_last_up_migration) >> 10)
+					< hmp_next_up_threshold)
+		return 0;
+
 	if (cpumask_intersects(&hmp_faster_domain(cpu)->cpus,
 					tsk_cpus_allowed(p))
 		&& se->avg.load_avg_ratio > hmp_up_threshold) {
@@ -5525,6 +5554,8 @@ static unsigned int hmp_up_migration(int cpu, struct sched_entity *se)
 static unsigned int hmp_down_migration(int cpu, struct sched_entity *se)
 {
 	struct task_struct *p = task_of(se);
+	struct cfs_rq *cfs_rq = &cpu_rq(cpu)->cfs;
+	u64 now;
 
 	if (hmp_cpu_is_slowest(cpu))
 		return 0;
@@ -5535,6 +5566,12 @@ static unsigned int hmp_down_migration(int cpu, struct sched_entity *se)
 		return 1;
 #endif
 
+	/* Let the task load settle before doing another down migration */
+	now = cfs_rq_clock_task(cfs_rq);
+	if (((now - se->avg.hmp_last_down_migration) >> 10)
+					< hmp_next_down_threshold)
+		return 0;
+
 	if (cpumask_intersects(&hmp_slower_domain(cpu)->cpus,
 					tsk_cpus_allowed(p))
 		&& se->avg.load_avg_ratio < hmp_down_threshold) {
@@ -5725,6 +5762,7 @@ static void hmp_force_up_migration(int this_cpu)
 				target->migrate_task = p;
 				force = 1;
 				trace_sched_hmp_migrate(p, target->push_cpu, 1);
+				hmp_next_up_delay(&p->se, target->push_cpu);
 			}
 		}
 		raw_spin_unlock_irqrestore(&target->lock, flags);
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 04/10] sched: Introduce priority-based task migration filter
  2012-09-21 18:32 ` [RFC PATCH 04/10] sched: Introduce priority-based task migration filter morten.rasmussen
@ 2012-10-04  4:37   ` Viresh Kumar
  2012-10-04  6:27   ` Viresh Kumar
  1 sibling, 0 replies; 27+ messages in thread
From: Viresh Kumar @ 2012-10-04  4:37 UTC (permalink / raw)
  To: morten.rasmussen
  Cc: paulmck, pjt, peterz, suresh.b.siddha, linaro-sched-sig,
	linaro-dev, linux-kernel, Robin Randhawa, Amit Kucheria

On 22 September 2012 00:02,  <morten.rasmussen@arm.com> wrote:

Hi Morten,

I would try to review your patches in coming days. For now, Just
reporting a problem
which i encountered during routine build.

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 490f1f0..8f0f3b9 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3129,9 +3129,12 @@ static int __init hmp_cpu_mask_setup(void)
>   * hmp_down_threshold: max. load allowed for tasks migrating to a slower cpu
>   * The default values (512, 256) offer good responsiveness, but may need
>   * tweaking suit particular needs.
> + *
> + * hmp_up_prio: Only up migrate task with high priority (<hmp_up_prio)
>   */
>  unsigned int hmp_up_threshold = 512;
>  unsigned int hmp_down_threshold = 256;

#ifdef CONFIG_SCHED_HMP_PRIO_FILTER

> +unsigned int hmp_up_prio = NICE_TO_PRIO(CONFIG_SCHED_HMP_PRIO_FILTER_VAL);

#endif

is required here for successful build without CONFIG_SCHED_HMP_PRIO_FILTER_VAL.

--
viresh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 02/10] sched: Task placement for heterogeneous systems based on task load-tracking
  2012-09-21 18:32 ` [RFC PATCH 02/10] sched: Task placement for heterogeneous systems based on task load-tracking morten.rasmussen
@ 2012-10-04  6:02   ` Viresh Kumar
  2012-10-04  6:54     ` Amit Kucheria
  2012-10-09 15:56     ` Morten Rasmussen
  0 siblings, 2 replies; 27+ messages in thread
From: Viresh Kumar @ 2012-10-04  6:02 UTC (permalink / raw)
  To: morten.rasmussen
  Cc: paulmck, pjt, peterz, suresh.b.siddha, linaro-sched-sig,
	linaro-dev, linux-kernel, Amit Kucheria, Robin Randhawa,
	Arvind.Chauhan

Hi Morten,

On 22 September 2012 00:02,  <morten.rasmussen@arm.com> wrote:
> From: Morten Rasmussen <morten.rasmussen@arm.com>
>
> This patch introduces the basic SCHED_HMP infrastructure. Each class of
> cpus is represented by a hmp_domain and tasks will only be moved between
> these domains when their load profiles suggest it is beneficial.
>
> SCHED_HMP relies heavily on the task load-tracking introduced in Paul
> Turners fair group scheduling patch set:
>
> <https://lkml.org/lkml/2012/8/23/267>
>
> SCHED_HMP requires that the platform implements arch_get_hmp_domains()
> which should set up the platform specific list of hmp_domains. It is
> also assumed that the platform disables SD_LOAD_BALANCE for the
> appropriate sched_domains.

An explanation of this requirement would be helpful here.

> Tasks placement takes place every time a task is to be inserted into
> a runqueue based on its load history. The task placement decision is
> based on load thresholds.
>
> There are no restrictions on the number of hmp_domains, however,
> multiple (>2) has not been tested and the up/down migration policy is
> rather simple.
>
> Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
> ---
>  arch/arm/Kconfig      |   17 +++++
>  include/linux/sched.h |    6 ++
>  kernel/sched/fair.c   |  168 +++++++++++++++++++++++++++++++++++++++++++++++++
>  kernel/sched/sched.h  |    6 ++
>  4 files changed, 197 insertions(+)
>
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index f4a5d58..5b09684 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -1554,6 +1554,23 @@ config SCHED_SMT
>           MultiThreading at a cost of slightly increased overhead in some
>           places. If unsure say N here.
>
> +config DISABLE_CPU_SCHED_DOMAIN_BALANCE
> +       bool "(EXPERIMENTAL) Disable CPU level scheduler load-balancing"
> +       help
> +         Disables scheduler load-balancing at CPU sched domain level.

Shouldn't this depend on EXPERIMENTAL?

> +config SCHED_HMP
> +       bool "(EXPERIMENTAL) Heterogenous multiprocessor scheduling"

ditto.

> +       depends on DISABLE_CPU_SCHED_DOMAIN_BALANCE && SCHED_MC && FAIR_GROUP_SCHED && !SCHED_AUTOGROUP
> +       help
> +         Experimental scheduler optimizations for heterogeneous platforms.
> +         Attempts to introspectively select task affinity to optimize power
> +         and performance. Basic support for multiple (>2) cpu types is in place,
> +         but it has only been tested with two types of cpus.
> +         There is currently no support for migration of task groups, hence
> +         !SCHED_AUTOGROUP. Furthermore, normal load-balancing must be disabled
> +         between cpus of different type (DISABLE_CPU_SCHED_DOMAIN_BALANCE).
> +
>  config HAVE_ARM_SCU
>         bool
>         help
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 81e4e82..df971a3 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1039,6 +1039,12 @@ unsigned long default_scale_smt_power(struct sched_domain *sd, int cpu);
>
>  bool cpus_share_cache(int this_cpu, int that_cpu);
>
> +#ifdef CONFIG_SCHED_HMP
> +struct hmp_domain {
> +       struct cpumask cpus;
> +       struct list_head hmp_domains;

Probably need a better name here. domain_list?

> +};
> +#endif /* CONFIG_SCHED_HMP */
>  #else /* CONFIG_SMP */
>
>  struct sched_domain_attr;
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 3e17dd5..d80de46 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3077,6 +3077,125 @@ static int select_idle_sibling(struct task_struct *p, int target)
>         return target;
>  }
>
> +#ifdef CONFIG_SCHED_HMP
> +/*
> + * Heterogenous multiprocessor (HMP) optimizations
> + *
> + * The cpu types are distinguished using a list of hmp_domains
> + * which each represent one cpu type using a cpumask.
> + * The list is assumed ordered by compute capacity with the
> + * fastest domain first.
> + */
> +DEFINE_PER_CPU(struct hmp_domain *, hmp_cpu_domain);
> +
> +extern void __init arch_get_hmp_domains(struct list_head *hmp_domains_list);
> +
> +/* Setup hmp_domains */
> +static int __init hmp_cpu_mask_setup(void)

How should we interpret its return value? Can you mention what does 0 & 1 mean
here?

> +{
> +       char buf[64];
> +       struct hmp_domain *domain;
> +       struct list_head *pos;
> +       int dc, cpu;
> +
> +       pr_debug("Initializing HMP scheduler:\n");
> +
> +       /* Initialize hmp_domains using platform code */
> +       arch_get_hmp_domains(&hmp_domains);
> +       if (list_empty(&hmp_domains)) {
> +               pr_debug("HMP domain list is empty!\n");
> +               return 0;
> +       }
> +
> +       /* Print hmp_domains */
> +       dc = 0;

Should be done during definition of dc.

> +       list_for_each(pos, &hmp_domains) {
> +               domain = list_entry(pos, struct hmp_domain, hmp_domains);
> +               cpulist_scnprintf(buf, 64, &domain->cpus);
> +               pr_debug("  HMP domain %d: %s\n", dc, buf);

Spaces before HMP are intentional?

> +
> +               for_each_cpu_mask(cpu, domain->cpus) {
> +                       per_cpu(hmp_cpu_domain, cpu) = domain;
> +               }

Should use hmp_cpu_domain(cpu) here. Also no need of {} for single
line loop.

> +               dc++;

You aren't using it... Only for testing? Should we remove it from mainline
patchset and keep it locally?

> +       }
> +
> +       return 1;
> +}
> +
> +/*
> + * Migration thresholds should be in the range [0..1023]
> + * hmp_up_threshold: min. load required for migrating tasks to a faster cpu
> + * hmp_down_threshold: max. load allowed for tasks migrating to a slower cpu
> + * The default values (512, 256) offer good responsiveness, but may need
> + * tweaking suit particular needs.
> + */
> +unsigned int hmp_up_threshold = 512;
> +unsigned int hmp_down_threshold = 256;

For default values, it is fine. But still we should get user preferred
values via DT
or CONFIG_*.

> +static unsigned int hmp_up_migration(int cpu, struct sched_entity *se);
> +static unsigned int hmp_down_migration(int cpu, struct sched_entity *se);
> +
> +/* Check if cpu is in fastest hmp_domain */
> +static inline unsigned int hmp_cpu_is_fastest(int cpu)
> +{
> +       struct list_head *pos;
> +
> +       pos = &hmp_cpu_domain(cpu)->hmp_domains;
> +       return pos == hmp_domains.next;

better create list_is_first() for this.

> +}
> +
> +/* Check if cpu is in slowest hmp_domain */
> +static inline unsigned int hmp_cpu_is_slowest(int cpu)
> +{
> +       struct list_head *pos;
> +
> +       pos = &hmp_cpu_domain(cpu)->hmp_domains;
> +       return list_is_last(pos, &hmp_domains);
> +}
> +
> +/* Next (slower) hmp_domain relative to cpu */
> +static inline struct hmp_domain *hmp_slower_domain(int cpu)
> +{
> +       struct list_head *pos;
> +
> +       pos = &hmp_cpu_domain(cpu)->hmp_domains;
> +       return list_entry(pos->next, struct hmp_domain, hmp_domains);
> +}
> +
> +/* Previous (faster) hmp_domain relative to cpu */
> +static inline struct hmp_domain *hmp_faster_domain(int cpu)
> +{
> +       struct list_head *pos;
> +
> +       pos = &hmp_cpu_domain(cpu)->hmp_domains;
> +       return list_entry(pos->prev, struct hmp_domain, hmp_domains);
> +}

For all four routines, first two lines of body can be merged. If u wish :)

> +
> +/*
> + * Selects a cpu in previous (faster) hmp_domain
> + * Note that cpumask_any_and() returns the first cpu in the cpumask
> + */
> +static inline unsigned int hmp_select_faster_cpu(struct task_struct *tsk,
> +                                                       int cpu)
> +{
> +       return cpumask_any_and(&hmp_faster_domain(cpu)->cpus,
> +                               tsk_cpus_allowed(tsk));
> +}
> +
> +/*
> + * Selects a cpu in next (slower) hmp_domain
> + * Note that cpumask_any_and() returns the first cpu in the cpumask
> + */
> +static inline unsigned int hmp_select_slower_cpu(struct task_struct *tsk,
> +                                                       int cpu)
> +{
> +       return cpumask_any_and(&hmp_slower_domain(cpu)->cpus,
> +                               tsk_cpus_allowed(tsk));
> +}
> +
> +#endif /* CONFIG_SCHED_HMP */
> +
>  /*
>   * sched_balance_self: balance the current task (running on cpu) in domains
>   * that have the 'flag' flag set. In practice, this is SD_BALANCE_FORK and
> @@ -3203,6 +3322,16 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
>  unlock:
>         rcu_read_unlock();
>
> +#ifdef CONFIG_SCHED_HMP
> +       if (hmp_up_migration(prev_cpu, &p->se))
> +               return hmp_select_faster_cpu(p, prev_cpu);
> +       if (hmp_down_migration(prev_cpu, &p->se))
> +               return hmp_select_slower_cpu(p, prev_cpu);
> +       /* Make sure that the task stays in its previous hmp domain */
> +       if (!cpumask_test_cpu(new_cpu, &hmp_cpu_domain(prev_cpu)->cpus))

Why is this tested?

> +               return prev_cpu;
> +#endif
> +
>         return new_cpu;
>  }
>
> @@ -5354,6 +5483,41 @@ need_kick:
>  static void nohz_idle_balance(int this_cpu, enum cpu_idle_type idle) { }
>  #endif
>
> +#ifdef CONFIG_SCHED_HMP
> +/* Check if task should migrate to a faster cpu */
> +static unsigned int hmp_up_migration(int cpu, struct sched_entity *se)
> +{
> +       struct task_struct *p = task_of(se);
> +
> +       if (hmp_cpu_is_fastest(cpu))
> +               return 0;
> +
> +       if (cpumask_intersects(&hmp_faster_domain(cpu)->cpus,
> +                                       tsk_cpus_allowed(p))
> +               && se->avg.load_avg_ratio > hmp_up_threshold) {
> +               return 1;
> +       }

I know all these comparisons are not very costly, still i would prefer

se->avg.load_avg_ratio > hmp_up_threshold

as the first comparison in this routine.

We should check first, if the task needs migration or not. Rather than
checking if it can migrate to other cpus or not.

> +       return 0;
> +}
> +
> +/* Check if task should migrate to a slower cpu */
> +static unsigned int hmp_down_migration(int cpu, struct sched_entity *se)
> +{
> +       struct task_struct *p = task_of(se);
> +
> +       if (hmp_cpu_is_slowest(cpu))
> +               return 0;
> +
> +       if (cpumask_intersects(&hmp_slower_domain(cpu)->cpus,
> +                                       tsk_cpus_allowed(p))
> +               && se->avg.load_avg_ratio < hmp_down_threshold) {
> +               return 1;
> +       }

same here.

> +       return 0;
> +}
> +
> +#endif /* CONFIG_SCHED_HMP */
> +
>  /*
>   * run_rebalance_domains is triggered when needed from the scheduler tick.
>   * Also triggered for nohz idle balancing (with nohz_balancing_kick set).
> @@ -5861,6 +6025,10 @@ __init void init_sched_fair_class(void)
>         zalloc_cpumask_var(&nohz.idle_cpus_mask, GFP_NOWAIT);
>         cpu_notifier(sched_ilb_notifier, 0);
>  #endif
> +
> +#ifdef CONFIG_SCHED_HMP
> +       hmp_cpu_mask_setup();

Should we check the return value? If not required then should we
make fn() declaration return void?

> +#endif
>  #endif /* SMP */
>
>  }
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 81135f9..4990d9e 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -547,6 +547,12 @@ DECLARE_PER_CPU(int, sd_llc_id);
>
>  extern int group_balance_cpu(struct sched_group *sg);
>
> +#ifdef CONFIG_SCHED_HMP
> +static LIST_HEAD(hmp_domains);
> +DECLARE_PER_CPU(struct hmp_domain *, hmp_cpu_domain);
> +#define hmp_cpu_domain(cpu)    (per_cpu(hmp_cpu_domain, (cpu)))

can drop "()" around per_cpu().

Both, per_cpu variable and macro to get it, have the same name. Can
we try giving them better names. Or atleast add an "_" before per_cpu
pointers name?

--
viresh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 03/10] sched: Forced task migration on heterogeneous systems
  2012-09-21 18:32 ` [RFC PATCH 03/10] sched: Forced task migration on heterogeneous systems morten.rasmussen
@ 2012-10-04  6:18   ` Viresh Kumar
  0 siblings, 0 replies; 27+ messages in thread
From: Viresh Kumar @ 2012-10-04  6:18 UTC (permalink / raw)
  To: morten.rasmussen
  Cc: paulmck, pjt, peterz, suresh.b.siddha, linaro-sched-sig,
	linaro-dev, linux-kernel, Amit Kucheria, Robin Randhawa,
	Arvind.Chauhan

Minor comments here :)

On 22 September 2012 00:02,  <morten.rasmussen@arm.com> wrote:

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index d80de46..490f1f0 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3744,7 +3744,6 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
>          * 1) task is cache cold, or
>          * 2) too many balance attempts have failed.
>          */
> -

:(

>         tsk_cache_hot = task_hot(p, env->src_rq->clock_task, env->sd);
>         if (!tsk_cache_hot ||
>                 env->sd->nr_balance_failed > env->sd->cache_nice_tries) {
> @@ -5516,6 +5515,199 @@ static unsigned int hmp_down_migration(int cpu, struct sched_entity *se)
>         return 0;
>  }
>

> +static int hmp_can_migrate_task(struct task_struct *p, struct lb_env *env)
> +{

<...>

> +static int move_specific_task(struct lb_env *env, struct task_struct *pm)
> +{
> +       struct task_struct *p, *n;
> +
> +       list_for_each_entry_safe(p, n, &env->src_rq->cfs_tasks, se.group_node) {
> +       if (throttled_lb_pair(task_group(p), env->src_rq->cpu,
> +                               env->dst_cpu))
> +               continue;

Please fix indentation of above if statement.

<...>

> +#else
> +static void hmp_force_up_migration(int this_cpu) { }

inline?

--
viresh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 04/10] sched: Introduce priority-based task migration filter
  2012-09-21 18:32 ` [RFC PATCH 04/10] sched: Introduce priority-based task migration filter morten.rasmussen
  2012-10-04  4:37   ` Viresh Kumar
@ 2012-10-04  6:27   ` Viresh Kumar
  2012-10-09 16:40     ` Morten Rasmussen
  1 sibling, 1 reply; 27+ messages in thread
From: Viresh Kumar @ 2012-10-04  6:27 UTC (permalink / raw)
  To: morten.rasmussen
  Cc: paulmck, pjt, peterz, suresh.b.siddha, linaro-sched-sig,
	linaro-dev, linux-kernel, Amit Kucheria, Arvind.Chauhan,
	Robin Randhawa

On 22 September 2012 00:02,  <morten.rasmussen@arm.com> wrote:

> +config SCHED_HMP_PRIO_FILTER
> +       bool "(EXPERIMENTAL) Filter HMP migrations by task priority"
> +       depends on SCHED_HMP

Should it depend on EXPERIMENTAL?

> +       help
> +         Enables task priority based HMP migration filter. Any task with
> +         a NICE value above the threshold will always be on low-power cpus
> +         with less compute capacity.
> +
> +config SCHED_HMP_PRIO_FILTER_VAL
> +       int "NICE priority threshold"
> +       default 5
> +       depends on SCHED_HMP_PRIO_FILTER
> +
>  config HAVE_ARM_SCU
>         bool
>         help
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 490f1f0..8f0f3b9 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3129,9 +3129,12 @@ static int __init hmp_cpu_mask_setup(void)
>   * hmp_down_threshold: max. load allowed for tasks migrating to a slower cpu
>   * The default values (512, 256) offer good responsiveness, but may need
>   * tweaking suit particular needs.
> + *
> + * hmp_up_prio: Only up migrate task with high priority (<hmp_up_prio)
>   */
>  unsigned int hmp_up_threshold = 512;
>  unsigned int hmp_down_threshold = 256;
> +unsigned int hmp_up_prio = NICE_TO_PRIO(CONFIG_SCHED_HMP_PRIO_FILTER_VAL);
>
>  static unsigned int hmp_up_migration(int cpu, struct sched_entity *se);
>  static unsigned int hmp_down_migration(int cpu, struct sched_entity *se);
> @@ -5491,6 +5494,12 @@ static unsigned int hmp_up_migration(int cpu, struct sched_entity *se)
>         if (hmp_cpu_is_fastest(cpu))
>                 return 0;
>
> +#ifdef CONFIG_SCHED_HMP_PRIO_FILTER
> +       /* Filter by task priority */
> +       if (p->prio >= hmp_up_prio)
> +               return 0;
> +#endif
> +
>         if (cpumask_intersects(&hmp_faster_domain(cpu)->cpus,
>                                         tsk_cpus_allowed(p))
>                 && se->avg.load_avg_ratio > hmp_up_threshold) {
> @@ -5507,6 +5516,12 @@ static unsigned int hmp_down_migration(int cpu, struct sched_entity *se)
>         if (hmp_cpu_is_slowest(cpu))
>                 return 0;
>
> +#ifdef CONFIG_SCHED_HMP_PRIO_FILTER
> +       /* Filter by task priority */
> +       if (p->prio >= hmp_up_prio)
> +               return 1;
> +#endif

Even if below cpumask_intersects() fails?

>         if (cpumask_intersects(&hmp_slower_domain(cpu)->cpus,
>                                         tsk_cpus_allowed(p))
>                 && se->avg.load_avg_ratio < hmp_down_threshold) {

--
viresh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 06/10] ARM: sched: Use device-tree to provide fast/slow CPU list for HMP
  2012-09-21 18:32 ` [RFC PATCH 06/10] ARM: sched: Use device-tree to provide fast/slow CPU list for HMP morten.rasmussen
@ 2012-10-04  6:49   ` Viresh Kumar
  2012-10-10 10:17     ` Morten Rasmussen
  2012-10-10 11:04   ` Morten Rasmussen
  1 sibling, 1 reply; 27+ messages in thread
From: Viresh Kumar @ 2012-10-04  6:49 UTC (permalink / raw)
  To: morten.rasmussen
  Cc: paulmck, pjt, peterz, suresh.b.siddha, linaro-sched-sig,
	linaro-dev, linux-kernel, Arvind.Chauhan, Amit Kucheria,
	Robin Randhawa

On 22 September 2012 00:02,  <morten.rasmussen@arm.com> wrote:
> From: Morten Rasmussen <morten.rasmussen@arm.com>
>
> We can't rely on Kconfig options to set the fast and slow CPU lists for
> HMP scheduling if we want a single kernel binary to support multiple
> devices with different CPU topology. E.g. TC2 (ARM's Test-Chip-2
> big.LITTLE system), Fast Models, or even non big.LITTLE devices.
>
> This patch adds the function arch_get_fast_and_slow_cpus() to generate
> the lists at run-time by parsing the CPU nodes in device-tree; it
> assumes slow cores are A7s and everything else is fast. The function
> still supports the old Kconfig options as this is useful for testing the
> HMP scheduler on devices without big.LITTLE.

But this code is handling this case too at the end, with following logic:

> +       cpumask_setall(fast);
> +       cpumask_clear(slow);

Am i missing something?

> This patch is reuse of a patch by Jon Medhurst <tixy@linaro.org> with a
> few bits left out.

Then probably he must be the author of this commit? Also a SOB is required
from him here.

> Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
> ---
>  arch/arm/Kconfig           |    4 ++-
>  arch/arm/kernel/topology.c |   69 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 72 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index cb80846..f1271bc 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -1588,13 +1588,15 @@ config HMP_FAST_CPU_MASK
>         string "HMP scheduler fast CPU mask"
>         depends on SCHED_HMP
>         help
> -          Specify the cpuids of the fast CPUs in the system as a list string,
> +          Leave empty to use device tree information.
> +         Specify the cpuids of the fast CPUs in the system as a list string,
>           e.g. cpuid 0+1 should be specified as 0-1.
>
>  config HMP_SLOW_CPU_MASK
>         string "HMP scheduler slow CPU mask"
>         depends on SCHED_HMP
>         help
> +         Leave empty to use device tree information.
>           Specify the cpuids of the slow CPUs in the system as a list string,
>           e.g. cpuid 0+1 should be specified as 0-1.
>
> diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
> index 26c12c6..7682e12 100644
> --- a/arch/arm/kernel/topology.c
> +++ b/arch/arm/kernel/topology.c
> @@ -317,6 +317,75 @@ void store_cpu_topology(unsigned int cpuid)
>                 cpu_topology[cpuid].socket_id, mpidr);
>  }
>
> +
> +#ifdef CONFIG_SCHED_HMP
> +
> +static const char * const little_cores[] = {
> +       "arm,cortex-a7",
> +       NULL,
> +};
> +
> +static bool is_little_cpu(struct device_node *cn)
> +{
> +       const char * const *lc;
> +       for (lc = little_cores; *lc; lc++)
> +               if (of_device_is_compatible(cn, *lc))
> +                       return true;
> +       return false;
> +}
> +
> +void __init arch_get_fast_and_slow_cpus(struct cpumask *fast,
> +                                       struct cpumask *slow)
> +{
> +       struct device_node *cn = NULL;
> +       int cpu = 0;
> +
> +       cpumask_clear(fast);
> +       cpumask_clear(slow);
> +
> +       /*
> +        * Use the config options if they are given. This helps testing
> +        * HMP scheduling on systems without a big.LITTLE architecture.
> +        */
> +       if (strlen(CONFIG_HMP_FAST_CPU_MASK) && strlen(CONFIG_HMP_SLOW_CPU_MASK)) {
> +               if (cpulist_parse(CONFIG_HMP_FAST_CPU_MASK, fast))
> +                       WARN(1, "Failed to parse HMP fast cpu mask!\n");
> +               if (cpulist_parse(CONFIG_HMP_SLOW_CPU_MASK, slow))
> +                       WARN(1, "Failed to parse HMP slow cpu mask!\n");
> +               return;
> +       }
> +
> +       /*
> +        * Else, parse device tree for little cores.
> +        */
> +       while ((cn = of_find_node_by_type(cn, "cpu"))) {
> +
> +               if (cpu >= num_possible_cpus())
> +                       break;
> +
> +               if (is_little_cpu(cn))
> +                       cpumask_set_cpu(cpu, slow);
> +               else
> +                       cpumask_set_cpu(cpu, fast);
> +
> +               cpu++;
> +       }
> +
> +       if (!cpumask_empty(fast) && !cpumask_empty(slow))
> +               return;
> +
> +       /*
> +        * We didn't find both big and little cores so let's call all cores
> +        * fast as this will keep the system running, with all cores being
> +        * treated equal.
> +        */
> +       cpumask_setall(fast);
> +       cpumask_clear(slow);
> +}
> +
> +#endif /* CONFIG_SCHED_HMP */

All above calls to of_*() routines have dependency on CONFIG_OF

--
viresh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 02/10] sched: Task placement for heterogeneous systems based on task load-tracking
  2012-10-04  6:02   ` Viresh Kumar
@ 2012-10-04  6:54     ` Amit Kucheria
  2012-10-09 15:56     ` Morten Rasmussen
  1 sibling, 0 replies; 27+ messages in thread
From: Amit Kucheria @ 2012-10-04  6:54 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: morten.rasmussen, paulmck, pjt, peterz, suresh.b.siddha,
	linaro-sched-sig, linaro-dev, linux-kernel, Robin Randhawa,
	Arvind.Chauhan

On Thu, Oct 4, 2012 at 11:32 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> Hi Morten,
>
> On 22 September 2012 00:02,  <morten.rasmussen@arm.com> wrote:
>> From: Morten Rasmussen <morten.rasmussen@arm.com>
>>
>> This patch introduces the basic SCHED_HMP infrastructure. Each class of
>> cpus is represented by a hmp_domain and tasks will only be moved between
>> these domains when their load profiles suggest it is beneficial.
>>
>> SCHED_HMP relies heavily on the task load-tracking introduced in Paul
>> Turners fair group scheduling patch set:
>>
>> <https://lkml.org/lkml/2012/8/23/267>
>>
>> SCHED_HMP requires that the platform implements arch_get_hmp_domains()
>> which should set up the platform specific list of hmp_domains. It is
>> also assumed that the platform disables SD_LOAD_BALANCE for the
>> appropriate sched_domains.
>
> An explanation of this requirement would be helpful here.
>
>> Tasks placement takes place every time a task is to be inserted into
>> a runqueue based on its load history. The task placement decision is
>> based on load thresholds.
>>
>> There are no restrictions on the number of hmp_domains, however,
>> multiple (>2) has not been tested and the up/down migration policy is
>> rather simple.
>>
>> Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
>> ---
>>  arch/arm/Kconfig      |   17 +++++
>>  include/linux/sched.h |    6 ++
>>  kernel/sched/fair.c   |  168 +++++++++++++++++++++++++++++++++++++++++++++++++
>>  kernel/sched/sched.h  |    6 ++
>>  4 files changed, 197 insertions(+)
>>
>> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
>> index f4a5d58..5b09684 100644
>> --- a/arch/arm/Kconfig
>> +++ b/arch/arm/Kconfig
>> @@ -1554,6 +1554,23 @@ config SCHED_SMT
>>           MultiThreading at a cost of slightly increased overhead in some
>>           places. If unsure say N here.
>>
>> +config DISABLE_CPU_SCHED_DOMAIN_BALANCE
>> +       bool "(EXPERIMENTAL) Disable CPU level scheduler load-balancing"
>> +       help
>> +         Disables scheduler load-balancing at CPU sched domain level.
>
> Shouldn't this depend on EXPERIMENTAL?

EXPERIMENTAL might be on its way out: https://lkml.org/lkml/2012/10/2/398

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 07/10] ARM: sched: Setup SCHED_HMP domains
  2012-09-21 18:32 ` [RFC PATCH 07/10] ARM: sched: Setup SCHED_HMP domains morten.rasmussen
@ 2012-10-04  6:58   ` Viresh Kumar
  2012-10-10 13:29     ` Morten Rasmussen
  0 siblings, 1 reply; 27+ messages in thread
From: Viresh Kumar @ 2012-10-04  6:58 UTC (permalink / raw)
  To: morten.rasmussen
  Cc: paulmck, pjt, peterz, suresh.b.siddha, linaro-sched-sig,
	linaro-dev, linux-kernel, Arvind.Chauhan, Robin Randhawa,
	Amit Kucheria

On 22 September 2012 00:02,  <morten.rasmussen@arm.com> wrote:
> diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c

> +void __init arch_get_hmp_domains(struct list_head *hmp_domains_list)
> +{
> +       struct cpumask hmp_fast_cpu_mask;
> +       struct cpumask hmp_slow_cpu_mask;

can be merged to single line.

> +       struct hmp_domain *domain;
> +
> +       arch_get_fast_and_slow_cpus(&hmp_fast_cpu_mask, &hmp_slow_cpu_mask);
> +
> +       /*
> +        * Initialize hmp_domains
> +        * Must be ordered with respect to compute capacity.
> +        * Fastest domain at head of list.
> +        */
> +       domain = (struct hmp_domain *)
> +               kmalloc(sizeof(struct hmp_domain), GFP_KERNEL);

should be:

domain = kmalloc(sizeof(*domain), GFP_KERNEL);

> +       cpumask_copy(&domain->cpus, &hmp_slow_cpu_mask);

what if kmalloc failed?

> +       list_add(&domain->hmp_domains, hmp_domains_list);
> +       domain = (struct hmp_domain *)
> +               kmalloc(sizeof(struct hmp_domain), GFP_KERNEL);

would be better to kmalloc only once with size 2* sizeof(*domain)

> +       cpumask_copy(&domain->cpus, &hmp_fast_cpu_mask);
> +       list_add(&domain->hmp_domains, hmp_domains_list);

Also would be better to create a macro for above two lines to remove
code redundancy.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 02/10] sched: Task placement for heterogeneous systems based on task load-tracking
  2012-10-04  6:02   ` Viresh Kumar
  2012-10-04  6:54     ` Amit Kucheria
@ 2012-10-09 15:56     ` Morten Rasmussen
  2012-10-09 16:58       ` Viresh Kumar
  1 sibling, 1 reply; 27+ messages in thread
From: Morten Rasmussen @ 2012-10-09 15:56 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: paulmck, pjt, peterz, suresh.b.siddha, linaro-sched-sig,
	linaro-dev, linux-kernel, Amit Kucheria, Robin Randhawa,
	Arvind Chauhan

Hi Viresh,

On Thu, Oct 04, 2012 at 07:02:03AM +0100, Viresh Kumar wrote:
> Hi Morten,
> 
> On 22 September 2012 00:02,  <morten.rasmussen@arm.com> wrote:
> > From: Morten Rasmussen <morten.rasmussen@arm.com>
> >
> > This patch introduces the basic SCHED_HMP infrastructure. Each class of
> > cpus is represented by a hmp_domain and tasks will only be moved between
> > these domains when their load profiles suggest it is beneficial.
> >
> > SCHED_HMP relies heavily on the task load-tracking introduced in Paul
> > Turners fair group scheduling patch set:
> >
> > <https://lkml.org/lkml/2012/8/23/267>
> >
> > SCHED_HMP requires that the platform implements arch_get_hmp_domains()
> > which should set up the platform specific list of hmp_domains. It is
> > also assumed that the platform disables SD_LOAD_BALANCE for the
> > appropriate sched_domains.
> 
> An explanation of this requirement would be helpful here.
> 

Yes. This is to prevent the load-balancer from moving tasks between
hmp_domains. This will be done exclusively by SCHED_HMP instead to
implement a strict task migration policy and avoid changing the
load-balancer behaviour. The load-balancer will take care of
load-balacing within each hmp_domain.

> > Tasks placement takes place every time a task is to be inserted into
> > a runqueue based on its load history. The task placement decision is
> > based on load thresholds.
> >
> > There are no restrictions on the number of hmp_domains, however,
> > multiple (>2) has not been tested and the up/down migration policy is
> > rather simple.
> >
> > Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
> > ---
> >  arch/arm/Kconfig      |   17 +++++
> >  include/linux/sched.h |    6 ++
> >  kernel/sched/fair.c   |  168 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  kernel/sched/sched.h  |    6 ++
> >  4 files changed, 197 insertions(+)
> >
> > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> > index f4a5d58..5b09684 100644
> > --- a/arch/arm/Kconfig
> > +++ b/arch/arm/Kconfig
> > @@ -1554,6 +1554,23 @@ config SCHED_SMT
> >           MultiThreading at a cost of slightly increased overhead in some
> >           places. If unsure say N here.
> >
> > +config DISABLE_CPU_SCHED_DOMAIN_BALANCE
> > +       bool "(EXPERIMENTAL) Disable CPU level scheduler load-balancing"
> > +       help
> > +         Disables scheduler load-balancing at CPU sched domain level.
> 
> Shouldn't this depend on EXPERIMENTAL?
> 

It should. The ongoing discussion about CONFIG_EXPERIMENTAL that Amit is
referring to hasn't come to a conclusion yet.

> > +config SCHED_HMP
> > +       bool "(EXPERIMENTAL) Heterogenous multiprocessor scheduling"
> 
> ditto.
> 
> > +       depends on DISABLE_CPU_SCHED_DOMAIN_BALANCE && SCHED_MC && FAIR_GROUP_SCHED && !SCHED_AUTOGROUP
> > +       help
> > +         Experimental scheduler optimizations for heterogeneous platforms.
> > +         Attempts to introspectively select task affinity to optimize power
> > +         and performance. Basic support for multiple (>2) cpu types is in place,
> > +         but it has only been tested with two types of cpus.
> > +         There is currently no support for migration of task groups, hence
> > +         !SCHED_AUTOGROUP. Furthermore, normal load-balancing must be disabled
> > +         between cpus of different type (DISABLE_CPU_SCHED_DOMAIN_BALANCE).
> > +
> >  config HAVE_ARM_SCU
> >         bool
> >         help
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 81e4e82..df971a3 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -1039,6 +1039,12 @@ unsigned long default_scale_smt_power(struct sched_domain *sd, int cpu);
> >
> >  bool cpus_share_cache(int this_cpu, int that_cpu);
> >
> > +#ifdef CONFIG_SCHED_HMP
> > +struct hmp_domain {
> > +       struct cpumask cpus;
> > +       struct list_head hmp_domains;
> 
> Probably need a better name here. domain_list?
> 

Yes. hmp_domain_list would be better and stick with the hmp_* naming
convention.

> > +};
> > +#endif /* CONFIG_SCHED_HMP */
> >  #else /* CONFIG_SMP */
> >
> >  struct sched_domain_attr;
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 3e17dd5..d80de46 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -3077,6 +3077,125 @@ static int select_idle_sibling(struct task_struct *p, int target)
> >         return target;
> >  }
> >
> > +#ifdef CONFIG_SCHED_HMP
> > +/*
> > + * Heterogenous multiprocessor (HMP) optimizations
> > + *
> > + * The cpu types are distinguished using a list of hmp_domains
> > + * which each represent one cpu type using a cpumask.
> > + * The list is assumed ordered by compute capacity with the
> > + * fastest domain first.
> > + */
> > +DEFINE_PER_CPU(struct hmp_domain *, hmp_cpu_domain);
> > +
> > +extern void __init arch_get_hmp_domains(struct list_head *hmp_domains_list);
> > +
> > +/* Setup hmp_domains */
> > +static int __init hmp_cpu_mask_setup(void)
> 
> How should we interpret its return value? Can you mention what does 0 & 1 mean
> here?
> 

Returns 0 if domain setup failed, i.e. the domain list is empty, and 1
otherwise.

> > +{
> > +       char buf[64];
> > +       struct hmp_domain *domain;
> > +       struct list_head *pos;
> > +       int dc, cpu;
> > +
> > +       pr_debug("Initializing HMP scheduler:\n");
> > +
> > +       /* Initialize hmp_domains using platform code */
> > +       arch_get_hmp_domains(&hmp_domains);
> > +       if (list_empty(&hmp_domains)) {
> > +               pr_debug("HMP domain list is empty!\n");
> > +               return 0;
> > +       }
> > +
> > +       /* Print hmp_domains */
> > +       dc = 0;
> 
> Should be done during definition of dc.
> 
> > +       list_for_each(pos, &hmp_domains) {
> > +               domain = list_entry(pos, struct hmp_domain, hmp_domains);
> > +               cpulist_scnprintf(buf, 64, &domain->cpus);
> > +               pr_debug("  HMP domain %d: %s\n", dc, buf);
> 
> Spaces before HMP are intentional?
> 

Yes. It makes the boot log easier to read when the hmp_domain listing is
indented.

> > +
> > +               for_each_cpu_mask(cpu, domain->cpus) {
> > +                       per_cpu(hmp_cpu_domain, cpu) = domain;
> > +               }
> 
> Should use hmp_cpu_domain(cpu) here. Also no need of {} for single
> line loop.
> 
> > +               dc++;
> 
> You aren't using it... Only for testing? Should we remove it from mainline
> patchset and keep it locally?
> 

I'm using it in the pr_debug line a little earlier. It is used for
enumerating the hmp_domains.

> > +       }
> > +
> > +       return 1;
> > +}
> > +
> > +/*
> > + * Migration thresholds should be in the range [0..1023]
> > + * hmp_up_threshold: min. load required for migrating tasks to a faster cpu
> > + * hmp_down_threshold: max. load allowed for tasks migrating to a slower cpu
> > + * The default values (512, 256) offer good responsiveness, but may need
> > + * tweaking suit particular needs.
> > + */
> > +unsigned int hmp_up_threshold = 512;
> > +unsigned int hmp_down_threshold = 256;
> 
> For default values, it is fine. But still we should get user preferred
> values via DT
> or CONFIG_*.
> 

Yes, but for now getting the scheduler to do the right thing has higher
priority than proper integration with DT.

> > +static unsigned int hmp_up_migration(int cpu, struct sched_entity *se);
> > +static unsigned int hmp_down_migration(int cpu, struct sched_entity *se);
> > +
> > +/* Check if cpu is in fastest hmp_domain */
> > +static inline unsigned int hmp_cpu_is_fastest(int cpu)
> > +{
> > +       struct list_head *pos;
> > +
> > +       pos = &hmp_cpu_domain(cpu)->hmp_domains;
> > +       return pos == hmp_domains.next;
> 
> better create list_is_first() for this.
> 

I had the same thought, but I see that as a separate patch that should
be submitted separately.

> > +}
> > +
> > +/* Check if cpu is in slowest hmp_domain */
> > +static inline unsigned int hmp_cpu_is_slowest(int cpu)
> > +{
> > +       struct list_head *pos;
> > +
> > +       pos = &hmp_cpu_domain(cpu)->hmp_domains;
> > +       return list_is_last(pos, &hmp_domains);
> > +}
> > +
> > +/* Next (slower) hmp_domain relative to cpu */
> > +static inline struct hmp_domain *hmp_slower_domain(int cpu)
> > +{
> > +       struct list_head *pos;
> > +
> > +       pos = &hmp_cpu_domain(cpu)->hmp_domains;
> > +       return list_entry(pos->next, struct hmp_domain, hmp_domains);
> > +}
> > +
> > +/* Previous (faster) hmp_domain relative to cpu */
> > +static inline struct hmp_domain *hmp_faster_domain(int cpu)
> > +{
> > +       struct list_head *pos;
> > +
> > +       pos = &hmp_cpu_domain(cpu)->hmp_domains;
> > +       return list_entry(pos->prev, struct hmp_domain, hmp_domains);
> > +}
> 
> For all four routines, first two lines of body can be merged. If u wish :)
> 

I have kept these helper functions fairly generic on purpose. It might
be necessary for multi-domain platforms (>2) to modify these functions
to implement better multi-domain task migration policies. I don't know
any such platform, so for know these functions are very simple.

> > +
> > +/*
> > + * Selects a cpu in previous (faster) hmp_domain
> > + * Note that cpumask_any_and() returns the first cpu in the cpumask
> > + */
> > +static inline unsigned int hmp_select_faster_cpu(struct task_struct *tsk,
> > +                                                       int cpu)
> > +{
> > +       return cpumask_any_and(&hmp_faster_domain(cpu)->cpus,
> > +                               tsk_cpus_allowed(tsk));
> > +}
> > +
> > +/*
> > + * Selects a cpu in next (slower) hmp_domain
> > + * Note that cpumask_any_and() returns the first cpu in the cpumask
> > + */
> > +static inline unsigned int hmp_select_slower_cpu(struct task_struct *tsk,
> > +                                                       int cpu)
> > +{
> > +       return cpumask_any_and(&hmp_slower_domain(cpu)->cpus,
> > +                               tsk_cpus_allowed(tsk));
> > +}
> > +
> > +#endif /* CONFIG_SCHED_HMP */
> > +
> >  /*
> >   * sched_balance_self: balance the current task (running on cpu) in domains
> >   * that have the 'flag' flag set. In practice, this is SD_BALANCE_FORK and
> > @@ -3203,6 +3322,16 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
> >  unlock:
> >         rcu_read_unlock();
> >
> > +#ifdef CONFIG_SCHED_HMP
> > +       if (hmp_up_migration(prev_cpu, &p->se))
> > +               return hmp_select_faster_cpu(p, prev_cpu);
> > +       if (hmp_down_migration(prev_cpu, &p->se))
> > +               return hmp_select_slower_cpu(p, prev_cpu);
> > +       /* Make sure that the task stays in its previous hmp domain */
> > +       if (!cpumask_test_cpu(new_cpu, &hmp_cpu_domain(prev_cpu)->cpus))
> 
> Why is this tested?
> 

I don't think it is needed. It is there as an extra guarantee that
select_task_rq_fair() doesn't pick a cpu outside the task's current
hmp_domain in cases where there is no up or down migration. Disabling
SD_LOAD_BALANCE for the appropriate domains should give that guarantee.
I just haven't completely convinced myself yet.

> > +               return prev_cpu;
> > +#endif
> > +
> >         return new_cpu;
> >  }
> >
> > @@ -5354,6 +5483,41 @@ need_kick:
> >  static void nohz_idle_balance(int this_cpu, enum cpu_idle_type idle) { }
> >  #endif
> >
> > +#ifdef CONFIG_SCHED_HMP
> > +/* Check if task should migrate to a faster cpu */
> > +static unsigned int hmp_up_migration(int cpu, struct sched_entity *se)
> > +{
> > +       struct task_struct *p = task_of(se);
> > +
> > +       if (hmp_cpu_is_fastest(cpu))
> > +               return 0;
> > +
> > +       if (cpumask_intersects(&hmp_faster_domain(cpu)->cpus,
> > +                                       tsk_cpus_allowed(p))
> > +               && se->avg.load_avg_ratio > hmp_up_threshold) {
> > +               return 1;
> > +       }
> 
> I know all these comparisons are not very costly, still i would prefer
> 
> se->avg.load_avg_ratio > hmp_up_threshold
> 
> as the first comparison in this routine.
> 
> We should check first, if the task needs migration or not. Rather than
> checking if it can migrate to other cpus or not.
> 

Agree.

> > +       return 0;
> > +}
> > +
> > +/* Check if task should migrate to a slower cpu */
> > +static unsigned int hmp_down_migration(int cpu, struct sched_entity *se)
> > +{
> > +       struct task_struct *p = task_of(se);
> > +
> > +       if (hmp_cpu_is_slowest(cpu))
> > +               return 0;
> > +
> > +       if (cpumask_intersects(&hmp_slower_domain(cpu)->cpus,
> > +                                       tsk_cpus_allowed(p))
> > +               && se->avg.load_avg_ratio < hmp_down_threshold) {
> > +               return 1;
> > +       }
> 
> same here.
> 

Agree.

> > +       return 0;
> > +}
> > +
> > +#endif /* CONFIG_SCHED_HMP */
> > +
> >  /*
> >   * run_rebalance_domains is triggered when needed from the scheduler tick.
> >   * Also triggered for nohz idle balancing (with nohz_balancing_kick set).
> > @@ -5861,6 +6025,10 @@ __init void init_sched_fair_class(void)
> >         zalloc_cpumask_var(&nohz.idle_cpus_mask, GFP_NOWAIT);
> >         cpu_notifier(sched_ilb_notifier, 0);
> >  #endif
> > +
> > +#ifdef CONFIG_SCHED_HMP
> > +       hmp_cpu_mask_setup();
> 
> Should we check the return value? If not required then should we
> make fn() declaration return void?
> 

It can be changed to void if we don't add any error handling anyway.

> > +#endif
> >  #endif /* SMP */
> >
> >  }
> > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> > index 81135f9..4990d9e 100644
> > --- a/kernel/sched/sched.h
> > +++ b/kernel/sched/sched.h
> > @@ -547,6 +547,12 @@ DECLARE_PER_CPU(int, sd_llc_id);
> >
> >  extern int group_balance_cpu(struct sched_group *sg);
> >
> > +#ifdef CONFIG_SCHED_HMP
> > +static LIST_HEAD(hmp_domains);
> > +DECLARE_PER_CPU(struct hmp_domain *, hmp_cpu_domain);
> > +#define hmp_cpu_domain(cpu)    (per_cpu(hmp_cpu_domain, (cpu)))
> 
> can drop "()" around per_cpu().
> 
> Both, per_cpu variable and macro to get it, have the same name. Can
> we try giving them better names. Or atleast add an "_" before per_cpu
> pointers name?
> 

Yes.

> --
> viresh
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 04/10] sched: Introduce priority-based task migration filter
  2012-10-04  6:27   ` Viresh Kumar
@ 2012-10-09 16:40     ` Morten Rasmussen
  2012-10-24  2:32       ` li guang
  0 siblings, 1 reply; 27+ messages in thread
From: Morten Rasmussen @ 2012-10-09 16:40 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: paulmck, pjt, peterz, suresh.b.siddha, linaro-sched-sig,
	linaro-dev, linux-kernel, Amit Kucheria, Arvind Chauhan,
	Robin Randhawa

On Thu, Oct 04, 2012 at 07:27:00AM +0100, Viresh Kumar wrote:
> On 22 September 2012 00:02,  <morten.rasmussen@arm.com> wrote:
> 
> > +config SCHED_HMP_PRIO_FILTER
> > +       bool "(EXPERIMENTAL) Filter HMP migrations by task priority"
> > +       depends on SCHED_HMP
> 
> Should it depend on EXPERIMENTAL?
> 
> > +       help
> > +         Enables task priority based HMP migration filter. Any task with
> > +         a NICE value above the threshold will always be on low-power cpus
> > +         with less compute capacity.
> > +
> > +config SCHED_HMP_PRIO_FILTER_VAL
> > +       int "NICE priority threshold"
> > +       default 5
> > +       depends on SCHED_HMP_PRIO_FILTER
> > +
> >  config HAVE_ARM_SCU
> >         bool
> >         help
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 490f1f0..8f0f3b9 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -3129,9 +3129,12 @@ static int __init hmp_cpu_mask_setup(void)
> >   * hmp_down_threshold: max. load allowed for tasks migrating to a slower cpu
> >   * The default values (512, 256) offer good responsiveness, but may need
> >   * tweaking suit particular needs.
> > + *
> > + * hmp_up_prio: Only up migrate task with high priority (<hmp_up_prio)
> >   */
> >  unsigned int hmp_up_threshold = 512;
> >  unsigned int hmp_down_threshold = 256;
> > +unsigned int hmp_up_prio = NICE_TO_PRIO(CONFIG_SCHED_HMP_PRIO_FILTER_VAL);
> >
> >  static unsigned int hmp_up_migration(int cpu, struct sched_entity *se);
> >  static unsigned int hmp_down_migration(int cpu, struct sched_entity *se);
> > @@ -5491,6 +5494,12 @@ static unsigned int hmp_up_migration(int cpu, struct sched_entity *se)
> >         if (hmp_cpu_is_fastest(cpu))
> >                 return 0;
> >
> > +#ifdef CONFIG_SCHED_HMP_PRIO_FILTER
> > +       /* Filter by task priority */
> > +       if (p->prio >= hmp_up_prio)
> > +               return 0;
> > +#endif
> > +
> >         if (cpumask_intersects(&hmp_faster_domain(cpu)->cpus,
> >                                         tsk_cpus_allowed(p))
> >                 && se->avg.load_avg_ratio > hmp_up_threshold) {
> > @@ -5507,6 +5516,12 @@ static unsigned int hmp_down_migration(int cpu, struct sched_entity *se)
> >         if (hmp_cpu_is_slowest(cpu))
> >                 return 0;
> >
> > +#ifdef CONFIG_SCHED_HMP_PRIO_FILTER
> > +       /* Filter by task priority */
> > +       if (p->prio >= hmp_up_prio)
> > +               return 1;
> > +#endif
> 
> Even if below cpumask_intersects() fails?
> 

No. Good catch :)

> >         if (cpumask_intersects(&hmp_slower_domain(cpu)->cpus,
> >                                         tsk_cpus_allowed(p))
> >                 && se->avg.load_avg_ratio < hmp_down_threshold) {
> 
> --
> viresh
> 

Thanks,
Morten


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 02/10] sched: Task placement for heterogeneous systems based on task load-tracking
  2012-10-09 15:56     ` Morten Rasmussen
@ 2012-10-09 16:58       ` Viresh Kumar
  0 siblings, 0 replies; 27+ messages in thread
From: Viresh Kumar @ 2012-10-09 16:58 UTC (permalink / raw)
  To: Morten Rasmussen
  Cc: paulmck, pjt, peterz, suresh.b.siddha, linaro-sched-sig,
	linaro-dev, linux-kernel, Amit Kucheria, Robin Randhawa,
	Arvind Chauhan

On 9 October 2012 21:26, Morten Rasmussen <Morten.Rasmussen@arm.com> wrote:
> On Thu, Oct 04, 2012 at 07:02:03AM +0100, Viresh Kumar wrote:
>> On 22 September 2012 00:02,  <morten.rasmussen@arm.com> wrote:

>> > SCHED_HMP requires that the platform implements arch_get_hmp_domains()
>> > which should set up the platform specific list of hmp_domains. It is
>> > also assumed that the platform disables SD_LOAD_BALANCE for the
>> > appropriate sched_domains.
>>
>> An explanation of this requirement would be helpful here.
>
> Yes. This is to prevent the load-balancer from moving tasks between
> hmp_domains. This will be done exclusively by SCHED_HMP instead to
> implement a strict task migration policy and avoid changing the
> load-balancer behaviour. The load-balancer will take care of
> load-balacing within each hmp_domain.

Honestly speaking i understood this point now and earlier it wasn't clear
to me :)

What would be ideal is to put this information in the comment just before
we re-define other SCHED_*** domains where we disable balancing.
And keep it in the commit log too.

>> > +struct hmp_domain {
>> > +       struct cpumask cpus;
>> > +       struct list_head hmp_domains;
>>
>> Probably need a better name here. domain_list?
>
> Yes. hmp_domain_list would be better and stick with the hmp_* naming
> convention.

IMHO hmp_ would be better for global names, but names of variables
enclosed within another hmp_*** data type don't actually need hmp_**,
as this is implicity.

i.e.
struct hmp_domain {
       struct cpumask cpus;
       struct list_head domain_list;
}

would be better than
       struct list_head hmp domain_list;

as the parent structure already have hmp_***. So whatever is inside the
struct is obviously hmp specific.

>> > +/* Setup hmp_domains */
>> > +static int __init hmp_cpu_mask_setup(void)
>>
>> How should we interpret its return value? Can you mention what does 0 & 1 mean
>> here?
>>
>
> Returns 0 if domain setup failed, i.e. the domain list is empty, and 1
> otherwise.

Helpful. Please mention this in function comment in your next revision.

>> > +{
>> > +       char buf[64];
>> > +       struct hmp_domain *domain;
>> > +       struct list_head *pos;
>> > +       int dc, cpu;

>> > +       /* Print hmp_domains */
>> > +       dc = 0;
>>
>> Should be done during definition of dc.

You missed this ??

>> > +               for_each_cpu_mask(cpu, domain->cpus) {
>> > +                       per_cpu(hmp_cpu_domain, cpu) = domain;
>> > +               }
>>
>> Should use hmp_cpu_domain(cpu) here. Also no need of {} for single
>> line loop.

??

>> > +               dc++;
>>
>> You aren't using it... Only for testing? Should we remove it from mainline
>> patchset and keep it locally?
>>
>
> I'm using it in the pr_debug line a little earlier. It is used for
> enumerating the hmp_domains.

My mistake :(

>> > +/* Check if cpu is in fastest hmp_domain */
>> > +static inline unsigned int hmp_cpu_is_fastest(int cpu)
>> > +{
>> > +       struct list_head *pos;
>> > +
>> > +       pos = &hmp_cpu_domain(cpu)->hmp_domains;
>> > +       return pos == hmp_domains.next;
>>
>> better create list_is_first() for this.
>
> I had the same thought, but I see that as a separate patch that should
> be submitted separately.

Correct. So better send it now, so that it is included before you send your
next version. :)

--
viresh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 06/10] ARM: sched: Use device-tree to provide fast/slow CPU list for HMP
  2012-10-04  6:49   ` Viresh Kumar
@ 2012-10-10 10:17     ` Morten Rasmussen
  2012-10-10 10:33       ` Viresh Kumar
  0 siblings, 1 reply; 27+ messages in thread
From: Morten Rasmussen @ 2012-10-10 10:17 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: paulmck, pjt, peterz, suresh.b.siddha, linaro-sched-sig,
	linaro-dev, linux-kernel, Arvind Chauhan, Amit Kucheria,
	Robin Randhawa

On Thu, Oct 04, 2012 at 07:49:32AM +0100, Viresh Kumar wrote:
> On 22 September 2012 00:02,  <morten.rasmussen@arm.com> wrote:
> > From: Morten Rasmussen <morten.rasmussen@arm.com>
> >
> > We can't rely on Kconfig options to set the fast and slow CPU lists for
> > HMP scheduling if we want a single kernel binary to support multiple
> > devices with different CPU topology. E.g. TC2 (ARM's Test-Chip-2
> > big.LITTLE system), Fast Models, or even non big.LITTLE devices.
> >
> > This patch adds the function arch_get_fast_and_slow_cpus() to generate
> > the lists at run-time by parsing the CPU nodes in device-tree; it
> > assumes slow cores are A7s and everything else is fast. The function
> > still supports the old Kconfig options as this is useful for testing the
> > HMP scheduler on devices without big.LITTLE.
> 
> But this code is handling this case too at the end, with following logic:
> 
> > +       cpumask_setall(fast);
> > +       cpumask_clear(slow);
> 
> Am i missing something?
> 

The HMP setup can be defined using Kconfig or DT. If both fails, it will
set all cpus to be fast cpus and effectively disable SCHED_HMP. The
Kconfig option is kept to allow testing of alternative HMP setups
without having to change the DT or use DT at all which might be handy
for non-ARM platforms. I hope that answers you question.

> > This patch is reuse of a patch by Jon Medhurst <tixy@linaro.org> with a
> > few bits left out.
> 
> Then probably he must be the author of this commit? Also a SOB is required
> from him here.
> 

I don't know what the correct procedure is for this sort of partial
patch reuse. Since I didn't know better, I adopted Tixy's own reference
style that he used in one of his patches which is an extension of a
previous patch by me. I will of course fix it to follow normal procedure
if there is one.

> > Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
> > ---
> >  arch/arm/Kconfig           |    4 ++-
> >  arch/arm/kernel/topology.c |   69 ++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 72 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> > index cb80846..f1271bc 100644
> > --- a/arch/arm/Kconfig
> > +++ b/arch/arm/Kconfig
> > @@ -1588,13 +1588,15 @@ config HMP_FAST_CPU_MASK
> >         string "HMP scheduler fast CPU mask"
> >         depends on SCHED_HMP
> >         help
> > -          Specify the cpuids of the fast CPUs in the system as a list string,
> > +          Leave empty to use device tree information.
> > +         Specify the cpuids of the fast CPUs in the system as a list string,
> >           e.g. cpuid 0+1 should be specified as 0-1.
> >
> >  config HMP_SLOW_CPU_MASK
> >         string "HMP scheduler slow CPU mask"
> >         depends on SCHED_HMP
> >         help
> > +         Leave empty to use device tree information.
> >           Specify the cpuids of the slow CPUs in the system as a list string,
> >           e.g. cpuid 0+1 should be specified as 0-1.
> >
> > diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
> > index 26c12c6..7682e12 100644
> > --- a/arch/arm/kernel/topology.c
> > +++ b/arch/arm/kernel/topology.c
> > @@ -317,6 +317,75 @@ void store_cpu_topology(unsigned int cpuid)
> >                 cpu_topology[cpuid].socket_id, mpidr);
> >  }
> >
> > +
> > +#ifdef CONFIG_SCHED_HMP
> > +
> > +static const char * const little_cores[] = {
> > +       "arm,cortex-a7",
> > +       NULL,
> > +};
> > +
> > +static bool is_little_cpu(struct device_node *cn)
> > +{
> > +       const char * const *lc;
> > +       for (lc = little_cores; *lc; lc++)
> > +               if (of_device_is_compatible(cn, *lc))
> > +                       return true;
> > +       return false;
> > +}
> > +
> > +void __init arch_get_fast_and_slow_cpus(struct cpumask *fast,
> > +                                       struct cpumask *slow)
> > +{
> > +       struct device_node *cn = NULL;
> > +       int cpu = 0;
> > +
> > +       cpumask_clear(fast);
> > +       cpumask_clear(slow);
> > +
> > +       /*
> > +        * Use the config options if they are given. This helps testing
> > +        * HMP scheduling on systems without a big.LITTLE architecture.
> > +        */
> > +       if (strlen(CONFIG_HMP_FAST_CPU_MASK) && strlen(CONFIG_HMP_SLOW_CPU_MASK)) {
> > +               if (cpulist_parse(CONFIG_HMP_FAST_CPU_MASK, fast))
> > +                       WARN(1, "Failed to parse HMP fast cpu mask!\n");
> > +               if (cpulist_parse(CONFIG_HMP_SLOW_CPU_MASK, slow))
> > +                       WARN(1, "Failed to parse HMP slow cpu mask!\n");
> > +               return;
> > +       }
> > +
> > +       /*
> > +        * Else, parse device tree for little cores.
> > +        */
> > +       while ((cn = of_find_node_by_type(cn, "cpu"))) {
> > +
> > +               if (cpu >= num_possible_cpus())
> > +                       break;
> > +
> > +               if (is_little_cpu(cn))
> > +                       cpumask_set_cpu(cpu, slow);
> > +               else
> > +                       cpumask_set_cpu(cpu, fast);
> > +
> > +               cpu++;
> > +       }
> > +
> > +       if (!cpumask_empty(fast) && !cpumask_empty(slow))
> > +               return;
> > +
> > +       /*
> > +        * We didn't find both big and little cores so let's call all cores
> > +        * fast as this will keep the system running, with all cores being
> > +        * treated equal.
> > +        */
> > +       cpumask_setall(fast);
> > +       cpumask_clear(slow);
> > +}
> > +
> > +#endif /* CONFIG_SCHED_HMP */
> 
> All above calls to of_*() routines have dependency on CONFIG_OF
> 

It would be very easy to blame someone else here... :) I will fix it.

Thanks,
Morten

> --
> viresh
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 06/10] ARM: sched: Use device-tree to provide fast/slow CPU list for HMP
  2012-10-10 10:17     ` Morten Rasmussen
@ 2012-10-10 10:33       ` Viresh Kumar
  0 siblings, 0 replies; 27+ messages in thread
From: Viresh Kumar @ 2012-10-10 10:33 UTC (permalink / raw)
  To: Morten Rasmussen
  Cc: paulmck, pjt, peterz, suresh.b.siddha, linaro-sched-sig,
	linaro-dev, linux-kernel, Arvind Chauhan, Amit Kucheria,
	Robin Randhawa

On 10 October 2012 15:47, Morten Rasmussen <Morten.Rasmussen@arm.com> wrote:
> On Thu, Oct 04, 2012 at 07:49:32AM +0100, Viresh Kumar wrote:

>> > This patch is reuse of a patch by Jon Medhurst <tixy@linaro.org> with a
>> > few bits left out.
>>
>> Then probably he must be the author of this commit? Also a SOB is required
>> from him here.
>>
>
> I don't know what the correct procedure is for this sort of partial
> patch reuse. Since I didn't know better, I adopted Tixy's own reference
> style that he used in one of his patches which is an extension of a
> previous patch by me. I will of course fix it to follow normal procedure
> if there is one.

AFAIK, if you have used only some part of the earlier patch, then you just
need to add an SOB of original author.
But if you have picked most of the stuff from original patch, which i feel is
the case here, you must have original author in author & SOB + your SOB.

> It would be very easy to blame someone else here... :) I will fix it.

:)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 06/10] ARM: sched: Use device-tree to provide fast/slow CPU list for HMP
  2012-09-21 18:32 ` [RFC PATCH 06/10] ARM: sched: Use device-tree to provide fast/slow CPU list for HMP morten.rasmussen
  2012-10-04  6:49   ` Viresh Kumar
@ 2012-10-10 11:04   ` Morten Rasmussen
  2012-10-10 11:29     ` Jon Medhurst (Tixy)
  1 sibling, 1 reply; 27+ messages in thread
From: Morten Rasmussen @ 2012-10-10 11:04 UTC (permalink / raw)
  To: tixy
  Cc: linaro-sched-sig, linaro-dev, linux-kernel, pjt, peterz,
	suresh.b.siddha, paulmck

Hi Tixy,

Could you have a look at my code stealing patch below? Since it is
basically a trimmed version of one of your patches I would prefer to
put you as author and have your SOB on it. What is your opinion?

Thanks,
Morten

On Fri, Sep 21, 2012 at 07:32:21PM +0100, Morten Rasmussen wrote:
> From: Morten Rasmussen <morten.rasmussen@arm.com>
> 
> We can't rely on Kconfig options to set the fast and slow CPU lists for
> HMP scheduling if we want a single kernel binary to support multiple
> devices with different CPU topology. E.g. TC2 (ARM's Test-Chip-2
> big.LITTLE system), Fast Models, or even non big.LITTLE devices.
> 
> This patch adds the function arch_get_fast_and_slow_cpus() to generate
> the lists at run-time by parsing the CPU nodes in device-tree; it
> assumes slow cores are A7s and everything else is fast. The function
> still supports the old Kconfig options as this is useful for testing the
> HMP scheduler on devices without big.LITTLE.
> 
> This patch is reuse of a patch by Jon Medhurst <tixy@linaro.org> with a
> few bits left out.
> 
> Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
> ---
>  arch/arm/Kconfig           |    4 ++-
>  arch/arm/kernel/topology.c |   69 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 72 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index cb80846..f1271bc 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -1588,13 +1588,15 @@ config HMP_FAST_CPU_MASK
>  	string "HMP scheduler fast CPU mask"
>  	depends on SCHED_HMP
>  	help
> -          Specify the cpuids of the fast CPUs in the system as a list string,
> +          Leave empty to use device tree information.
> +	  Specify the cpuids of the fast CPUs in the system as a list string,
>  	  e.g. cpuid 0+1 should be specified as 0-1.
>  
>  config HMP_SLOW_CPU_MASK
>  	string "HMP scheduler slow CPU mask"
>  	depends on SCHED_HMP
>  	help
> +	  Leave empty to use device tree information.
>  	  Specify the cpuids of the slow CPUs in the system as a list string,
>  	  e.g. cpuid 0+1 should be specified as 0-1.
>  
> diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
> index 26c12c6..7682e12 100644
> --- a/arch/arm/kernel/topology.c
> +++ b/arch/arm/kernel/topology.c
> @@ -317,6 +317,75 @@ void store_cpu_topology(unsigned int cpuid)
>  		cpu_topology[cpuid].socket_id, mpidr);
>  }
>  
> +
> +#ifdef CONFIG_SCHED_HMP
> +
> +static const char * const little_cores[] = {
> +	"arm,cortex-a7",
> +	NULL,
> +};
> +
> +static bool is_little_cpu(struct device_node *cn)
> +{
> +	const char * const *lc;
> +	for (lc = little_cores; *lc; lc++)
> +		if (of_device_is_compatible(cn, *lc))
> +			return true;
> +	return false;
> +}
> +
> +void __init arch_get_fast_and_slow_cpus(struct cpumask *fast,
> +					struct cpumask *slow)
> +{
> +	struct device_node *cn = NULL;
> +	int cpu = 0;
> +
> +	cpumask_clear(fast);
> +	cpumask_clear(slow);
> +
> +	/*
> +	 * Use the config options if they are given. This helps testing
> +	 * HMP scheduling on systems without a big.LITTLE architecture.
> +	 */
> +	if (strlen(CONFIG_HMP_FAST_CPU_MASK) && strlen(CONFIG_HMP_SLOW_CPU_MASK)) {
> +		if (cpulist_parse(CONFIG_HMP_FAST_CPU_MASK, fast))
> +			WARN(1, "Failed to parse HMP fast cpu mask!\n");
> +		if (cpulist_parse(CONFIG_HMP_SLOW_CPU_MASK, slow))
> +			WARN(1, "Failed to parse HMP slow cpu mask!\n");
> +		return;
> +	}
> +
> +	/*
> +	 * Else, parse device tree for little cores.
> +	 */
> +	while ((cn = of_find_node_by_type(cn, "cpu"))) {
> +
> +		if (cpu >= num_possible_cpus())
> +			break;
> +
> +		if (is_little_cpu(cn))
> +			cpumask_set_cpu(cpu, slow);
> +		else
> +			cpumask_set_cpu(cpu, fast);
> +
> +		cpu++;
> +	}
> +
> +	if (!cpumask_empty(fast) && !cpumask_empty(slow))
> +		return;
> +
> +	/*
> +	 * We didn't find both big and little cores so let's call all cores
> +	 * fast as this will keep the system running, with all cores being
> +	 * treated equal.
> +	 */
> +	cpumask_setall(fast);
> +	cpumask_clear(slow);
> +}
> +
> +#endif /* CONFIG_SCHED_HMP */
> +
> +
>  /*
>   * init_cpu_topology is called at boot when only one cpu is running
>   * which prevent simultaneous write access to cpu_topology array
> -- 
> 1.7.9.5
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 06/10] ARM: sched: Use device-tree to provide fast/slow CPU list for HMP
  2012-10-10 11:04   ` Morten Rasmussen
@ 2012-10-10 11:29     ` Jon Medhurst (Tixy)
  0 siblings, 0 replies; 27+ messages in thread
From: Jon Medhurst (Tixy) @ 2012-10-10 11:29 UTC (permalink / raw)
  To: Morten Rasmussen
  Cc: linaro-sched-sig, linaro-dev, linux-kernel, pjt, peterz,
	suresh.b.siddha, paulmck

On Wed, 2012-10-10 at 12:04 +0100, Morten Rasmussen wrote:
> Hi Tixy,
> 
> Could you have a look at my code stealing patch below? Since it is
> basically a trimmed version of one of your patches I would prefer to
> put you as author and have your SOB on it. What is your opinion?

Yes, I can agree with that opinion, (and my employer likes to count
their patch totals ;-) so please feel free to add: 

From: Jon Medhurst <tixy@linaro.org>
Signed-off-by: Jon Medhurst <tixy@linaro.org>

Thanks

-- 
Tixy


> Thanks,
> Morten
> 
> On Fri, Sep 21, 2012 at 07:32:21PM +0100, Morten Rasmussen wrote:
> > From: Morten Rasmussen <morten.rasmussen@arm.com>
> > 
> > We can't rely on Kconfig options to set the fast and slow CPU lists for
> > HMP scheduling if we want a single kernel binary to support multiple
> > devices with different CPU topology. E.g. TC2 (ARM's Test-Chip-2
> > big.LITTLE system), Fast Models, or even non big.LITTLE devices.
> > 
> > This patch adds the function arch_get_fast_and_slow_cpus() to generate
> > the lists at run-time by parsing the CPU nodes in device-tree; it
> > assumes slow cores are A7s and everything else is fast. The function
> > still supports the old Kconfig options as this is useful for testing the
> > HMP scheduler on devices without big.LITTLE.
> > 
> > This patch is reuse of a patch by Jon Medhurst <tixy@linaro.org> with a
> > few bits left out.
> > 
> > Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
> > ---
> >  arch/arm/Kconfig           |    4 ++-
> >  arch/arm/kernel/topology.c |   69 ++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 72 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> > index cb80846..f1271bc 100644
> > --- a/arch/arm/Kconfig
> > +++ b/arch/arm/Kconfig
> > @@ -1588,13 +1588,15 @@ config HMP_FAST_CPU_MASK
> >  	string "HMP scheduler fast CPU mask"
> >  	depends on SCHED_HMP
> >  	help
> > -          Specify the cpuids of the fast CPUs in the system as a list string,
> > +          Leave empty to use device tree information.
> > +	  Specify the cpuids of the fast CPUs in the system as a list string,
> >  	  e.g. cpuid 0+1 should be specified as 0-1.
> >  
> >  config HMP_SLOW_CPU_MASK
> >  	string "HMP scheduler slow CPU mask"
> >  	depends on SCHED_HMP
> >  	help
> > +	  Leave empty to use device tree information.
> >  	  Specify the cpuids of the slow CPUs in the system as a list string,
> >  	  e.g. cpuid 0+1 should be specified as 0-1.
> >  
> > diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
> > index 26c12c6..7682e12 100644
> > --- a/arch/arm/kernel/topology.c
> > +++ b/arch/arm/kernel/topology.c
> > @@ -317,6 +317,75 @@ void store_cpu_topology(unsigned int cpuid)
> >  		cpu_topology[cpuid].socket_id, mpidr);
> >  }
> >  
> > +
> > +#ifdef CONFIG_SCHED_HMP
> > +
> > +static const char * const little_cores[] = {
> > +	"arm,cortex-a7",
> > +	NULL,
> > +};
> > +
> > +static bool is_little_cpu(struct device_node *cn)
> > +{
> > +	const char * const *lc;
> > +	for (lc = little_cores; *lc; lc++)
> > +		if (of_device_is_compatible(cn, *lc))
> > +			return true;
> > +	return false;
> > +}
> > +
> > +void __init arch_get_fast_and_slow_cpus(struct cpumask *fast,
> > +					struct cpumask *slow)
> > +{
> > +	struct device_node *cn = NULL;
> > +	int cpu = 0;
> > +
> > +	cpumask_clear(fast);
> > +	cpumask_clear(slow);
> > +
> > +	/*
> > +	 * Use the config options if they are given. This helps testing
> > +	 * HMP scheduling on systems without a big.LITTLE architecture.
> > +	 */
> > +	if (strlen(CONFIG_HMP_FAST_CPU_MASK) && strlen(CONFIG_HMP_SLOW_CPU_MASK)) {
> > +		if (cpulist_parse(CONFIG_HMP_FAST_CPU_MASK, fast))
> > +			WARN(1, "Failed to parse HMP fast cpu mask!\n");
> > +		if (cpulist_parse(CONFIG_HMP_SLOW_CPU_MASK, slow))
> > +			WARN(1, "Failed to parse HMP slow cpu mask!\n");
> > +		return;
> > +	}
> > +
> > +	/*
> > +	 * Else, parse device tree for little cores.
> > +	 */
> > +	while ((cn = of_find_node_by_type(cn, "cpu"))) {
> > +
> > +		if (cpu >= num_possible_cpus())
> > +			break;
> > +
> > +		if (is_little_cpu(cn))
> > +			cpumask_set_cpu(cpu, slow);
> > +		else
> > +			cpumask_set_cpu(cpu, fast);
> > +
> > +		cpu++;
> > +	}
> > +
> > +	if (!cpumask_empty(fast) && !cpumask_empty(slow))
> > +		return;
> > +
> > +	/*
> > +	 * We didn't find both big and little cores so let's call all cores
> > +	 * fast as this will keep the system running, with all cores being
> > +	 * treated equal.
> > +	 */
> > +	cpumask_setall(fast);
> > +	cpumask_clear(slow);
> > +}
> > +
> > +#endif /* CONFIG_SCHED_HMP */
> > +
> > +
> >  /*
> >   * init_cpu_topology is called at boot when only one cpu is running
> >   * which prevent simultaneous write access to cpu_topology array
> > -- 
> > 1.7.9.5
> > 
> 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 07/10] ARM: sched: Setup SCHED_HMP domains
  2012-10-04  6:58   ` Viresh Kumar
@ 2012-10-10 13:29     ` Morten Rasmussen
  0 siblings, 0 replies; 27+ messages in thread
From: Morten Rasmussen @ 2012-10-10 13:29 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: paulmck, pjt, peterz, suresh.b.siddha, linaro-sched-sig,
	linaro-dev, linux-kernel, Arvind Chauhan, Robin Randhawa,
	Amit Kucheria

On Thu, Oct 04, 2012 at 07:58:45AM +0100, Viresh Kumar wrote:
> On 22 September 2012 00:02,  <morten.rasmussen@arm.com> wrote:
> > diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
> 
> > +void __init arch_get_hmp_domains(struct list_head *hmp_domains_list)
> > +{
> > +       struct cpumask hmp_fast_cpu_mask;
> > +       struct cpumask hmp_slow_cpu_mask;
> 
> can be merged to single line.
> 
> > +       struct hmp_domain *domain;
> > +
> > +       arch_get_fast_and_slow_cpus(&hmp_fast_cpu_mask, &hmp_slow_cpu_mask);
> > +
> > +       /*
> > +        * Initialize hmp_domains
> > +        * Must be ordered with respect to compute capacity.
> > +        * Fastest domain at head of list.
> > +        */
> > +       domain = (struct hmp_domain *)
> > +               kmalloc(sizeof(struct hmp_domain), GFP_KERNEL);
> 
> should be:
> 
> domain = kmalloc(sizeof(*domain), GFP_KERNEL);
> 
> > +       cpumask_copy(&domain->cpus, &hmp_slow_cpu_mask);
> 
> what if kmalloc failed?
> 
> > +       list_add(&domain->hmp_domains, hmp_domains_list);
> > +       domain = (struct hmp_domain *)
> > +               kmalloc(sizeof(struct hmp_domain), GFP_KERNEL);
> 
> would be better to kmalloc only once with size 2* sizeof(*domain)
> 
> > +       cpumask_copy(&domain->cpus, &hmp_fast_cpu_mask);
> > +       list_add(&domain->hmp_domains, hmp_domains_list);
> 
> Also would be better to create a macro for above two lines to remove
> code redundancy.
> 

Agree on all of the above.

Thanks,
Morten


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 04/10] sched: Introduce priority-based task migration filter
  2012-10-09 16:40     ` Morten Rasmussen
@ 2012-10-24  2:32       ` li guang
  0 siblings, 0 replies; 27+ messages in thread
From: li guang @ 2012-10-24  2:32 UTC (permalink / raw)
  To: Morten Rasmussen
  Cc: Viresh Kumar, paulmck, pjt, peterz, suresh.b.siddha,
	linaro-sched-sig, linaro-dev, linux-kernel, Amit Kucheria,
	Arvind Chauhan, Robin Randhawa

在 2012-10-09二的 17:40 +0100,Morten Rasmussen写道:
> On Thu, Oct 04, 2012 at 07:27:00AM +0100, Viresh Kumar wrote:
> > On 22 September 2012 00:02,  <morten.rasmussen@arm.com> wrote:
> > 
> > > +config SCHED_HMP_PRIO_FILTER
> > > +       bool "(EXPERIMENTAL) Filter HMP migrations by task priority"
> > > +       depends on SCHED_HMP
> > 
> > Should it depend on EXPERIMENTAL?
> > 
> > > +       help
> > > +         Enables task priority based HMP migration filter. Any task with
> > > +         a NICE value above the threshold will always be on low-power cpus
> > > +         with less compute capacity.
> > > +
> > > +config SCHED_HMP_PRIO_FILTER_VAL
> > > +       int "NICE priority threshold"
> > > +       default 5
> > > +       depends on SCHED_HMP_PRIO_FILTER
> > > +
> > >  config HAVE_ARM_SCU
> > >         bool
> > >         help
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 490f1f0..8f0f3b9 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -3129,9 +3129,12 @@ static int __init hmp_cpu_mask_setup(void)
> > >   * hmp_down_threshold: max. load allowed for tasks migrating to a slower cpu
> > >   * The default values (512, 256) offer good responsiveness, but may need
> > >   * tweaking suit particular needs.
> > > + *
> > > + * hmp_up_prio: Only up migrate task with high priority (<hmp_up_prio)
> > >   */
> > >  unsigned int hmp_up_threshold = 512;
> > >  unsigned int hmp_down_threshold = 256;

hmp_*_threshold maybe sysctl_hmp_*_threshold,
and appear at /proc/sys/kernel,
so, can be adjusted to be rational.

> > > +unsigned int hmp_up_prio = NICE_TO_PRIO(CONFIG_SCHED_HMP_PRIO_FILTER_VAL);
> > >
> > >  static unsigned int hmp_up_migration(int cpu, struct sched_entity *se);
> > >  static unsigned int hmp_down_migration(int cpu, struct sched_entity *se);
> > > @@ -5491,6 +5494,12 @@ static unsigned int hmp_up_migration(int cpu, struct sched_entity *se)
> > >         if (hmp_cpu_is_fastest(cpu))
> > >                 return 0;
> > >
> > > +#ifdef CONFIG_SCHED_HMP_PRIO_FILTER
> > > +       /* Filter by task priority */
> > > +       if (p->prio >= hmp_up_prio)
> > > +               return 0;
> > > +#endif
> > > +
> > >         if (cpumask_intersects(&hmp_faster_domain(cpu)->cpus,
> > >                                         tsk_cpus_allowed(p))
> > >                 && se->avg.load_avg_ratio > hmp_up_threshold) {
> > > @@ -5507,6 +5516,12 @@ static unsigned int hmp_down_migration(int cpu, struct sched_entity *se)
> > >         if (hmp_cpu_is_slowest(cpu))
> > >                 return 0;
> > >
> > > +#ifdef CONFIG_SCHED_HMP_PRIO_FILTER
> > > +       /* Filter by task priority */
> > > +       if (p->prio >= hmp_up_prio)
> > > +               return 1;
> > > +#endif
> > 
> > Even if below cpumask_intersects() fails?
> > 
> 
> No. Good catch :)
> 
> > >         if (cpumask_intersects(&hmp_slower_domain(cpu)->cpus,
> > >                                         tsk_cpus_allowed(p))
> > >                 && se->avg.load_avg_ratio < hmp_down_threshold) {
> > 
> > --
> > viresh
> > 
> 
> Thanks,
> Morten
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
liguang    lig.fnst@cn.fujitsu.com
FNST linux kernel team


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2012-10-24  2:35 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-21 18:32 [RFC PATCH 00/10] sched: Task placement for heterogeneous MP systems morten.rasmussen
2012-09-21 18:32 ` [RFC PATCH 01/10] sched: entity load-tracking load_avg_ratio morten.rasmussen
2012-09-21 18:32 ` [RFC PATCH 02/10] sched: Task placement for heterogeneous systems based on task load-tracking morten.rasmussen
2012-10-04  6:02   ` Viresh Kumar
2012-10-04  6:54     ` Amit Kucheria
2012-10-09 15:56     ` Morten Rasmussen
2012-10-09 16:58       ` Viresh Kumar
2012-09-21 18:32 ` [RFC PATCH 03/10] sched: Forced task migration on heterogeneous systems morten.rasmussen
2012-10-04  6:18   ` Viresh Kumar
2012-09-21 18:32 ` [RFC PATCH 04/10] sched: Introduce priority-based task migration filter morten.rasmussen
2012-10-04  4:37   ` Viresh Kumar
2012-10-04  6:27   ` Viresh Kumar
2012-10-09 16:40     ` Morten Rasmussen
2012-10-24  2:32       ` li guang
2012-09-21 18:32 ` [RFC PATCH 05/10] ARM: Add HMP scheduling support for ARM architecture morten.rasmussen
2012-09-21 18:32 ` [RFC PATCH 06/10] ARM: sched: Use device-tree to provide fast/slow CPU list for HMP morten.rasmussen
2012-10-04  6:49   ` Viresh Kumar
2012-10-10 10:17     ` Morten Rasmussen
2012-10-10 10:33       ` Viresh Kumar
2012-10-10 11:04   ` Morten Rasmussen
2012-10-10 11:29     ` Jon Medhurst (Tixy)
2012-09-21 18:32 ` [RFC PATCH 07/10] ARM: sched: Setup SCHED_HMP domains morten.rasmussen
2012-10-04  6:58   ` Viresh Kumar
2012-10-10 13:29     ` Morten Rasmussen
2012-09-21 18:32 ` [RFC PATCH 08/10] sched: Add ftrace events for entity load-tracking morten.rasmussen
2012-09-21 18:32 ` [RFC PATCH 09/10] sched: Add HMP task migration ftrace event morten.rasmussen
2012-09-21 18:32 ` [RFC PATCH 10/10] sched: SCHED_HMP multi-domain task migration control morten.rasmussen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).