All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] sched/fair: Migrate 'misfit' tasks on asymmetric capacity systems
@ 2018-02-15 16:20 Morten Rasmussen
  2018-02-15 16:20 ` [PATCH 1/7] sched: Add static_key for asymmetric cpu capacity optimizations Morten Rasmussen
                   ` (7 more replies)
  0 siblings, 8 replies; 31+ messages in thread
From: Morten Rasmussen @ 2018-02-15 16:20 UTC (permalink / raw)
  To: peterz, mingo
  Cc: valentin.schneider, dietmar.eggemann, vincent.guittot,
	linux-kernel, Morten Rasmussen

On asymmetric cpu capacity systems (e.g. Arm big.LITTLE) it is crucial
for performance that cpu intensive tasks are aggressively migrated to
high capacity cpus as soon as those become available. The capacity
awareness tweaks already in the wake-up path can't handle this as such
tasks might run or be runnable forever. If they happen to be placed on a
low capacity cpu from the beginning they are stuck there forever while
high capacity cpus may have become available in the meantime.

To address this issue this patch set introduces a new "misfit"
load-balancing scenario in periodic/nohz/newly idle balance which tweaks
the load-balance conditions to ignore load per capacity in certain
cases. Since misfit tasks are commonly running alone on a cpu, more
aggressive active load-balancing is needed too.

The fundamental idea of this patch set has been in Android kernels for a
long time and is absolutely essential for consistent performance on
asymmetric cpu capacity systems.

The patches have been tested on:
   1. Arm Juno (r0): 2+4 Cortex A57/A53
   2. Hikey960: 4+4 Cortex A73/A53

Test case:
Big cpus are always kept busy. Pin a shorter running sysbench tasks to
big cpus, while creating a longer running set of unpinned sysbench
tasks.

    REQUESTS=1000
    BIGS="1 2"
    LITTLES="0 3 4 5"
 
    # Don't care about the score for those, just keep the bigs busy
    for i in $BIGS; do
        taskset -c $i sysbench --max-requests=$((REQUESTS / 4)) \
        --test=cpu  run &>/dev/null &
    done
 
    for i in $LITTLES; do
        sysbench --max-requests=$REQUESTS --test=cpu run \
	| grep "total time:" &
    done
 
    wait

Results:
Single runs with completion time of each task
Juno (tip)
    total time:                          1.2608s
    total time:                          1.2995s
    total time:                          1.5954s
    total time:                          1.7463s

Juno (misfit)
    total time:                          1.2575s
    total time:                          1.3004s
    total time:                          1.5860s
    total time:                          1.5871s

Hikey960 (tip)
    total time:                          1.7431s
    total time:                          2.2914s
    total time:                          2.5976s
    total time:                          1.7280s

Hikey960 (misfit)
    total time:                          1.7866s
    total time:                          1.7513s
    total time:                          1.6918s
    total time:                          1.6965s

10 run summary (tracking longest running task for each run)
	Juno		Hikey960
	avg	max	avg	max
tip     1.7465  1.7469  2.5997  2.6131 
misfit  1.6016  1.6192  1.8506  1.9666

Morten Rasmussen (4):
  sched: Add static_key for asymmetric cpu capacity optimizations
  sched/fair: Add group_misfit_task load-balance type
  sched/fair: Consider misfit tasks when load-balancing
  sched/fair: Avoid unnecessary balancing of asymmetric capacity groups

Valentin Schneider (3):
  sched/fair: Kick nohz balance if rq->misfit_task
  sched: Rename root_domain->overload to should_idle_balance
  sched/fair: Set sd->should_idle_balance when misfit

 kernel/sched/fair.c     | 141 +++++++++++++++++++++++++++++++++++++++++-------
 kernel/sched/sched.h    |  15 ++++--
 kernel/sched/topology.c |  10 ++++
 3 files changed, 142 insertions(+), 24 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 1/7] sched: Add static_key for asymmetric cpu capacity optimizations
  2018-02-15 16:20 [PATCH 0/7] sched/fair: Migrate 'misfit' tasks on asymmetric capacity systems Morten Rasmussen
@ 2018-02-15 16:20 ` Morten Rasmussen
  2018-02-16 13:47   ` Peter Zijlstra
  2018-02-15 16:20 ` [PATCH 2/7] sched/fair: Add group_misfit_task load-balance type Morten Rasmussen
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 31+ messages in thread
From: Morten Rasmussen @ 2018-02-15 16:20 UTC (permalink / raw)
  To: peterz, mingo
  Cc: valentin.schneider, dietmar.eggemann, vincent.guittot,
	linux-kernel, Morten Rasmussen

The existing asymmetric cpu capacity code should cause minimal overhead
for others. Putting it behind a static_key, it has been done for SMT
optimizations, would make it easier to extend and improve without
causing harm to others moving forward.

cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
---
 kernel/sched/fair.c     |  3 +++
 kernel/sched/sched.h    |  1 +
 kernel/sched/topology.c | 10 ++++++++++
 3 files changed, 14 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1070803cb423..452ad2e6f1a0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6291,6 +6291,9 @@ static int wake_cap(struct task_struct *p, int cpu, int prev_cpu)
 {
 	long min_cap, max_cap;
 
+	if (!static_branch_unlikely(&sched_asym_cpucapacity))
+		return 0;
+
 	min_cap = min(capacity_orig_of(prev_cpu), capacity_orig_of(cpu));
 	max_cap = cpu_rq(cpu)->rd->max_cpu_capacity;
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 2e95505e23c6..a06184906640 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1095,6 +1095,7 @@ DECLARE_PER_CPU(int, sd_llc_id);
 DECLARE_PER_CPU(struct sched_domain_shared *, sd_llc_shared);
 DECLARE_PER_CPU(struct sched_domain *, sd_numa);
 DECLARE_PER_CPU(struct sched_domain *, sd_asym);
+extern struct static_key_false sched_asym_cpucapacity;
 
 struct sched_group_capacity {
 	atomic_t ref;
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 034cbed7f88b..517c57d312df 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -393,6 +393,7 @@ DEFINE_PER_CPU(int, sd_llc_id);
 DEFINE_PER_CPU(struct sched_domain_shared *, sd_llc_shared);
 DEFINE_PER_CPU(struct sched_domain *, sd_numa);
 DEFINE_PER_CPU(struct sched_domain *, sd_asym);
+DEFINE_STATIC_KEY_FALSE(sched_asym_cpucapacity);
 
 static void update_top_cache_domain(int cpu)
 {
@@ -420,6 +421,13 @@ static void update_top_cache_domain(int cpu)
 	rcu_assign_pointer(per_cpu(sd_asym, cpu), sd);
 }
 
+static void update_asym_cpucapacity(int cpu)
+{
+	if (!static_branch_unlikely(&sched_asym_cpucapacity) &&
+	    lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY))
+		static_branch_enable(&sched_asym_cpucapacity);
+}
+
 /*
  * Attach the domain 'sd' to 'cpu' as its base domain. Callers must
  * hold the hotplug lock.
@@ -1697,6 +1705,8 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
 
 		cpu_attach_domain(sd, d.rd, i);
 	}
+
+	update_asym_cpucapacity(cpumask_first(cpu_map));
 	rcu_read_unlock();
 
 	if (rq && sched_debug_enabled) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 2/7] sched/fair: Add group_misfit_task load-balance type
  2018-02-15 16:20 [PATCH 0/7] sched/fair: Migrate 'misfit' tasks on asymmetric capacity systems Morten Rasmussen
  2018-02-15 16:20 ` [PATCH 1/7] sched: Add static_key for asymmetric cpu capacity optimizations Morten Rasmussen
@ 2018-02-15 16:20 ` Morten Rasmussen
  2018-02-19 13:56   ` Peter Zijlstra
  2018-02-15 16:20 ` [PATCH 3/7] sched/fair: Consider misfit tasks when load-balancing Morten Rasmussen
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 31+ messages in thread
From: Morten Rasmussen @ 2018-02-15 16:20 UTC (permalink / raw)
  To: peterz, mingo
  Cc: valentin.schneider, dietmar.eggemann, vincent.guittot,
	linux-kernel, Morten Rasmussen

To maximize throughput in systems with asymmetric cpu capacities (e.g.
ARM big.LITTLE) load-balancing has to consider task and cpu utilization
as well as per-cpu compute capacity when load-balancing in addition to
the current average load based load-balancing policy. Tasks with high
utilization that are scheduled on a lower capacity cpu need to be
identified and migrated to a higher capacity cpu if possible to maximize
throughput.

To implement this additional policy an additional group_type
(load-balance scenario) is added: group_misfit_task. This represents
scenarios where a sched_group has one or more tasks that are not
suitable for its per-cpu capacity. group_misfit_task is only considered
if the system is not overloaded or imbalanced (group_imbalanced or
group_overloaded).

Identifying misfit tasks requires the rq lock to be held. To avoid
taking remote rq locks to examine source sched_groups for misfit tasks,
each cpu is responsible for tracking misfit tasks themselves and update
the rq->misfit_task flag. This means checking task utilization when
tasks are scheduled and on sched_tick.

cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
---
 kernel/sched/fair.c  | 57 +++++++++++++++++++++++++++++++++++++++++++---------
 kernel/sched/sched.h |  2 ++
 2 files changed, 50 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 452ad2e6f1a0..bdca97ea516e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -712,6 +712,7 @@ static u64 sched_vslice(struct cfs_rq *cfs_rq, struct sched_entity *se)
 
 static int select_idle_sibling(struct task_struct *p, int prev_cpu, int cpu);
 static unsigned long task_h_load(struct task_struct *p);
+static unsigned long capacity_of(int cpu);
 
 /* Give new sched_entity start runnable values to heavy its load in infant time */
 void init_entity_runnable_average(struct sched_entity *se)
@@ -1422,7 +1423,6 @@ bool should_numa_migrate_memory(struct task_struct *p, struct page * page,
 static unsigned long weighted_cpuload(struct rq *rq);
 static unsigned long source_load(int cpu, int type);
 static unsigned long target_load(int cpu, int type);
-static unsigned long capacity_of(int cpu);
 
 /* Cached statistics for all CPUs within a node */
 struct numa_stats {
@@ -3869,6 +3869,30 @@ static inline unsigned long cfs_rq_load_avg(struct cfs_rq *cfs_rq)
 
 static int idle_balance(struct rq *this_rq, struct rq_flags *rf);
 
+static inline unsigned long task_util(struct task_struct *p);
+static inline int task_fits_capacity(struct task_struct *p, long capacity)
+{
+	return capacity * 1024 > task_util(p) * capacity_margin;
+}
+
+static inline void update_misfit_status(struct task_struct *p, struct rq *rq)
+{
+	if (!static_branch_unlikely(&sched_asym_cpucapacity))
+		return;
+
+	if (!p) {
+		rq->misfit_task_load = 0;
+		return;
+	}
+
+	if (task_fits_capacity(p, capacity_of(cpu_of(rq)))) {
+		rq->misfit_task_load = 0;
+		return;
+	}
+
+	rq->misfit_task_load = task_h_load(p);
+}
+
 #else /* CONFIG_SMP */
 
 static inline int
@@ -3898,6 +3922,8 @@ static inline int idle_balance(struct rq *rq, struct rq_flags *rf)
 	return 0;
 }
 
+static inline void update_misfit_status(struct task_struct *p, struct rq *rq) {}
+
 #endif /* CONFIG_SMP */
 
 static void check_spread(struct cfs_rq *cfs_rq, struct sched_entity *se)
@@ -5767,7 +5793,6 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
 	return affine;
 }
 
-static inline unsigned long task_util(struct task_struct *p);
 static unsigned long cpu_util_wake(int cpu, struct task_struct *p);
 
 static unsigned long capacity_spare_wake(int cpu, struct task_struct *p)
@@ -6304,7 +6329,7 @@ static int wake_cap(struct task_struct *p, int cpu, int prev_cpu)
 	/* Bring task utilization in sync with prev_cpu */
 	sync_entity_load_avg(&p->se);
 
-	return min_cap * 1024 < task_util(p) * capacity_margin;
+	return task_fits_capacity(p, min_cap);
 }
 
 /*
@@ -6733,9 +6758,12 @@ done: __maybe_unused
 	if (hrtick_enabled(rq))
 		hrtick_start_fair(rq, p);
 
+	update_misfit_status(p, rq);
+
 	return p;
 
 idle:
+	update_misfit_status(NULL, rq);
 	new_tasks = idle_balance(rq, rf);
 
 	/*
@@ -6941,6 +6969,13 @@ static unsigned long __read_mostly max_load_balance_interval = HZ/10;
 
 enum fbq_type { regular, remote, all };
 
+enum group_type {
+	group_other = 0,
+	group_misfit_task,
+	group_imbalanced,
+	group_overloaded,
+};
+
 #define LBF_ALL_PINNED	0x01
 #define LBF_NEED_BREAK	0x02
 #define LBF_DST_PINNED  0x04
@@ -7453,12 +7488,6 @@ static unsigned long task_h_load(struct task_struct *p)
 
 /********** Helpers for find_busiest_group ************************/
 
-enum group_type {
-	group_other = 0,
-	group_imbalanced,
-	group_overloaded,
-};
-
 /*
  * sg_lb_stats - stats of a sched_group required for load_balancing
  */
@@ -7474,6 +7503,7 @@ struct sg_lb_stats {
 	unsigned int group_weight;
 	enum group_type group_type;
 	int group_no_capacity;
+	int group_misfit_task_load; /* A cpu has a task too big for its capacity */
 #ifdef CONFIG_NUMA_BALANCING
 	unsigned int nr_numa_running;
 	unsigned int nr_preferred_running;
@@ -7773,6 +7803,9 @@ group_type group_classify(struct sched_group *group,
 	if (sg_imbalanced(group))
 		return group_imbalanced;
 
+	if (sgs->group_misfit_task_load)
+		return group_misfit_task;
+
 	return group_other;
 }
 
@@ -7822,6 +7855,10 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 		 */
 		if (!nr_running && idle_cpu(i))
 			sgs->idle_cpus++;
+
+		if (env->sd->flags & SD_ASYM_CPUCAPACITY &&
+		    !sgs->group_misfit_task_load && rq->misfit_task_load)
+			sgs->group_misfit_task_load = rq->misfit_task_load;
 	}
 
 	/* Adjust by relative CPU capacity of the group */
@@ -9436,6 +9473,8 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
 
 	if (static_branch_unlikely(&sched_numa_balancing))
 		task_tick_numa(rq, curr);
+
+	update_misfit_status(curr, rq);
 }
 
 /*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index a06184906640..7d324b706e67 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -771,6 +771,8 @@ struct rq {
 	struct callback_head *balance_callback;
 
 	unsigned char idle_balance;
+
+	unsigned int misfit_task_load;
 	/* For active balancing */
 	int active_balance;
 	int push_cpu;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 3/7] sched/fair: Consider misfit tasks when load-balancing
  2018-02-15 16:20 [PATCH 0/7] sched/fair: Migrate 'misfit' tasks on asymmetric capacity systems Morten Rasmussen
  2018-02-15 16:20 ` [PATCH 1/7] sched: Add static_key for asymmetric cpu capacity optimizations Morten Rasmussen
  2018-02-15 16:20 ` [PATCH 2/7] sched/fair: Add group_misfit_task load-balance type Morten Rasmussen
@ 2018-02-15 16:20 ` Morten Rasmussen
  2018-02-15 16:20 ` [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups Morten Rasmussen
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Morten Rasmussen @ 2018-02-15 16:20 UTC (permalink / raw)
  To: peterz, mingo
  Cc: valentin.schneider, dietmar.eggemann, vincent.guittot,
	linux-kernel, Morten Rasmussen

On asymmetric cpu capacity systems load intensive tasks can end up on
cpus that don't suit their compute demand.  In this scenarios 'misfit'
tasks should be migrated to cpus with higher compute capacity to ensure
better throughput. group_misfit_task indicates this scenario, but tweaks
to the load-balance code are needed to make the migrations happen.

Misfit balancing only makes sense between a source group of lower
per-cpu capacity and destination group of higher compute capacity.
Otherwise, misfit balancing is ignored. group_misfit_task has lowest
priority so any imbalance due to overload is dealt with first.

The modifications are:

1. Only pick a group containing misfit tasks as the busiest group if the
   destination group has higher capacity and has spare capacity.
2. When the busiest group is a 'misfit' group, skip the usual average
   load and group capacity checks.
3. Set the imbalance for 'misfit' balancing sufficiently high for a task
   to be pulled ignoring average load.
4. Pick the first cpu with the rq->misfit flag raised as the source cpu.
5. If the misfit task is alone on the source cpu, go for active
   balancing.

cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
---
 kernel/sched/fair.c | 36 ++++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bdca97ea516e..0d70838ed711 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7004,6 +7004,7 @@ struct lb_env {
 	unsigned int		loop_max;
 
 	enum fbq_type		fbq_type;
+	enum group_type		src_grp_type;
 	struct list_head	tasks;
 };
 
@@ -7894,6 +7895,14 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 {
 	struct sg_lb_stats *busiest = &sds->busiest_stat;
 
+	/*
+	 * Don't try to pull misfit tasks we can't help.
+	 */
+	if (sgs->group_type == group_misfit_task &&
+	    (!group_smaller_cpu_capacity(sg, sds->local) ||
+	     !group_has_capacity(env, &sds->local_stat)))
+		return false;
+
 	if (sgs->group_type > busiest->group_type)
 		return true;
 
@@ -8199,8 +8208,9 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
 	 * factors in sg capacity and sgs with smaller group_type are
 	 * skipped when updating the busiest sg:
 	 */
-	if (busiest->avg_load <= sds->avg_load ||
-	    local->avg_load >= sds->avg_load) {
+	if (busiest->group_type != group_misfit_task &&
+	    (busiest->avg_load <= sds->avg_load ||
+	     local->avg_load >= sds->avg_load)) {
 		env->imbalance = 0;
 		return fix_small_imbalance(env, sds);
 	}
@@ -8234,6 +8244,12 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
 		(sds->avg_load - local->avg_load) * local->group_capacity
 	) / SCHED_CAPACITY_SCALE;
 
+	/* Boost imbalance to allow misfit task to be balanced. */
+	if (busiest->group_type == group_misfit_task) {
+		env->imbalance = max_t(long, env->imbalance,
+				       busiest->group_misfit_task_load);
+	}
+
 	/*
 	 * if *imbalance is less than the average load per runnable task
 	 * there is no guarantee that any tasks will be moved so we'll have
@@ -8300,6 +8316,10 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
 	    busiest->group_no_capacity)
 		goto force_balance;
 
+	/* Misfit tasks should be dealt with regardless of the avg load */
+	if (busiest->group_type == group_misfit_task)
+		goto force_balance;
+
 	/*
 	 * If the local group is busier than the selected busiest group
 	 * don't try and pull any tasks.
@@ -8337,6 +8357,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
 
 force_balance:
 	/* Looks like there is an imbalance. Compute it */
+	env->src_grp_type = busiest->group_type;
 	calculate_imbalance(env, &sds);
 	return sds.busiest;
 
@@ -8384,6 +8405,14 @@ static struct rq *find_busiest_queue(struct lb_env *env,
 		if (rt > env->fbq_type)
 			continue;
 
+		/*
+		 * For ASYM_CPUCAPACITY domains with misfit tasks we ignore
+		 * load.
+		 */
+		if (env->src_grp_type == group_misfit_task &&
+		    rq->misfit_task_load)
+			return rq;
+
 		capacity = capacity_of(i);
 
 		wl = weighted_cpuload(rq);
@@ -8453,6 +8482,9 @@ static int need_active_balance(struct lb_env *env)
 			return 1;
 	}
 
+	if (env->src_grp_type == group_misfit_task)
+		return 1;
+
 	return unlikely(sd->nr_balance_failed > sd->cache_nice_tries+2);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups
  2018-02-15 16:20 [PATCH 0/7] sched/fair: Migrate 'misfit' tasks on asymmetric capacity systems Morten Rasmussen
                   ` (2 preceding siblings ...)
  2018-02-15 16:20 ` [PATCH 3/7] sched/fair: Consider misfit tasks when load-balancing Morten Rasmussen
@ 2018-02-15 16:20 ` Morten Rasmussen
  2018-02-19 14:50   ` Peter Zijlstra
  2018-02-19 15:10   ` Peter Zijlstra
  2018-02-15 16:20 ` [PATCH 5/7] sched/fair: Kick nohz balance if rq->misfit_task Morten Rasmussen
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 31+ messages in thread
From: Morten Rasmussen @ 2018-02-15 16:20 UTC (permalink / raw)
  To: peterz, mingo
  Cc: valentin.schneider, dietmar.eggemann, vincent.guittot,
	linux-kernel, Morten Rasmussen

On systems with asymmetric cpu capacities, a skewed load distribution
might yield better throughput than balancing load per group capacity.
For example, preferring high capacity cpus for compute intensive tasks
leaving low capacity cpus idle rather than balancing the number of idle
cpus across different cpu types. Instead, let load-balance back off if
the busiest group isn't really overloaded.

cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
---
 kernel/sched/fair.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0d70838ed711..e036aef3f10a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7794,6 +7794,19 @@ group_smaller_cpu_capacity(struct sched_group *sg, struct sched_group *ref)
 						ref->sgc->min_capacity * 1024;
 }
 
+/*
+ * group_similar_cpu_capacity: Returns true if the minimum capacity of the
+ * compared groups differ by less than 12.5%.
+ */
+static inline bool
+group_similar_cpu_capacity(struct sched_group *sg, struct sched_group *ref)
+{
+	long diff = sg->sgc->min_capacity - ref->sgc->min_capacity;
+	long max = max(sg->sgc->min_capacity, ref->sgc->min_capacity);
+
+	return abs(diff) < max >> 3;
+}
+
 static inline enum
 group_type group_classify(struct sched_group *group,
 			  struct sg_lb_stats *sgs)
@@ -7925,6 +7938,15 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 	    group_smaller_cpu_capacity(sds->local, sg))
 		return false;
 
+	/*
+	 * Candidate sg doesn't face any severe imbalance issues so
+	 * don't disturb unless the groups are of similar capacity
+	 * where balancing is more harmless.
+	 */
+	if (sgs->group_type == group_other &&
+		!group_similar_cpu_capacity(sds->local, sg))
+		return false;
+
 asym_packing:
 	/* This is the busiest node in its class. */
 	if (!(env->sd->flags & SD_ASYM_PACKING))
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 5/7] sched/fair: Kick nohz balance if rq->misfit_task
  2018-02-15 16:20 [PATCH 0/7] sched/fair: Migrate 'misfit' tasks on asymmetric capacity systems Morten Rasmussen
                   ` (3 preceding siblings ...)
  2018-02-15 16:20 ` [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups Morten Rasmussen
@ 2018-02-15 16:20 ` Morten Rasmussen
  2018-02-15 16:20 ` [PATCH 6/7] sched: Rename root_domain->overload to should_idle_balance Morten Rasmussen
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Morten Rasmussen @ 2018-02-15 16:20 UTC (permalink / raw)
  To: peterz, mingo
  Cc: valentin.schneider, dietmar.eggemann, vincent.guittot,
	linux-kernel, Morten Rasmussen

From: Valentin Schneider <valentin.schneider@arm.com>

There already are a few conditions in nohz_kick_needed() to ensure
a nohz kick is triggered, but they are not enough for some misfit
task scenarios. Excluding asym packing, those are:

* rq->nr_running >=2: Not relevant here because we are running a
misfit task, it needs to be migrated regardless and potentially through
active balance.
* sds->nr_busy_cpus > 1: If there is only the misfit task being run
on a group of low capacity cpus, this will be evaluated to False.
* rq->cfs.h_nr_running >=1 && check_cpu_capacity(): Not relevant here,
misfit task needs to be migrated regardless of rt/IRQ pressure

As such, this commit adds an rq->misfit_task condition to trigger a
nohz kick.

The idea to kick a nohz balance for misfit tasks originally came from
Leo Yan <leo.yan@linaro.org>, and a similar patch was submitted for
the Android Common Kernel - see [1].

[1]: https://lists.linaro.org/pipermail/eas-dev/2016-September/000551.html

cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
---
 kernel/sched/fair.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e036aef3f10a..1439b784b1f0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9411,6 +9411,9 @@ static inline bool nohz_kick_needed(struct rq *rq)
 	if (rq->nr_running >= 2)
 		return true;
 
+	if (rq->misfit_task_load)
+		return true;
+
 	rcu_read_lock();
 	sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
 	if (sds) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 6/7] sched: Rename root_domain->overload to should_idle_balance
  2018-02-15 16:20 [PATCH 0/7] sched/fair: Migrate 'misfit' tasks on asymmetric capacity systems Morten Rasmussen
                   ` (4 preceding siblings ...)
  2018-02-15 16:20 ` [PATCH 5/7] sched/fair: Kick nohz balance if rq->misfit_task Morten Rasmussen
@ 2018-02-15 16:20 ` Morten Rasmussen
  2018-02-16  9:14   ` Juri Lelli
  2018-02-15 16:20 ` [PATCH 7/7] sched/fair: Set sd->should_idle_balance when misfit Morten Rasmussen
  2018-02-28  7:46 ` [PATCH 0/7] sched/fair: Migrate 'misfit' tasks on asymmetric capacity systems Gaku Inami
  7 siblings, 1 reply; 31+ messages in thread
From: Morten Rasmussen @ 2018-02-15 16:20 UTC (permalink / raw)
  To: peterz, mingo
  Cc: valentin.schneider, dietmar.eggemann, vincent.guittot,
	linux-kernel, Morten Rasmussen

From: Valentin Schneider <valentin.schneider@arm.com>

The name "overload" is not very explicit, especially since it doesn't
use any concept of "load" coming from load-tracking signals. For now it
simply tracks if any of the CPUs in root_domain has more than one
runnable task, and is then used to decide whether idle balance should be
performed.

As such, this commit changes that flag's name to 'should_idle_balance',
which makes its role more explicit.

cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
---
 kernel/sched/fair.c  | 18 +++++++++---------
 kernel/sched/sched.h | 12 ++++++++----
 2 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1439b784b1f0..2d2302b7b584 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7830,12 +7830,12 @@ group_type group_classify(struct sched_group *group,
  * @load_idx: Load index of sched_domain of this_cpu for load calc.
  * @local_group: Does group contain this_cpu.
  * @sgs: variable to hold the statistics for this group.
- * @overload: Indicate more than one runnable task for any CPU.
+ * @should_idle_balance: Indicate groups could need idle balance.
  */
 static inline void update_sg_lb_stats(struct lb_env *env,
 			struct sched_group *group, int load_idx,
 			int local_group, struct sg_lb_stats *sgs,
-			bool *overload)
+			bool *should_idle_balance)
 {
 	unsigned long load;
 	int i, nr_running;
@@ -7857,7 +7857,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 
 		nr_running = rq->nr_running;
 		if (nr_running > 1)
-			*overload = true;
+			*should_idle_balance = true;
 
 #ifdef CONFIG_NUMA_BALANCING
 		sgs->nr_numa_running += rq->nr_numa_running;
@@ -8016,7 +8016,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
 	struct sg_lb_stats *local = &sds->local_stat;
 	struct sg_lb_stats tmp_sgs;
 	int load_idx, prefer_sibling = 0;
-	bool overload = false;
+	bool should_idle_balance = false;
 
 	if (child && child->flags & SD_PREFER_SIBLING)
 		prefer_sibling = 1;
@@ -8038,7 +8038,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
 		}
 
 		update_sg_lb_stats(env, sg, load_idx, local_group, sgs,
-						&overload);
+						&should_idle_balance);
 
 		if (local_group)
 			goto next_group;
@@ -8078,9 +8078,9 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
 		env->fbq_type = fbq_classify_group(&sds->busiest_stat);
 
 	if (!env->sd->parent) {
-		/* update overload indicator if we are at root domain */
-		if (env->dst_rq->rd->overload != overload)
-			env->dst_rq->rd->overload = overload;
+		/* update idle_balance indicator if we are at root domain */
+		if (env->dst_rq->rd->should_idle_balance != should_idle_balance)
+			env->dst_rq->rd->should_idle_balance = should_idle_balance;
 	}
 }
 
@@ -8878,7 +8878,7 @@ static int idle_balance(struct rq *this_rq, struct rq_flags *rf)
 	rq_unpin_lock(this_rq, rf);
 
 	if (this_rq->avg_idle < sysctl_sched_migration_cost ||
-	    !this_rq->rd->overload) {
+	    !this_rq->rd->should_idle_balance) {
 		rcu_read_lock();
 		sd = rcu_dereference_check_sched_domain(this_rq->sd);
 		if (sd)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 7d324b706e67..4215438667e5 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -650,8 +650,12 @@ struct root_domain {
 	cpumask_var_t span;
 	cpumask_var_t online;
 
-	/* Indicate more than one runnable task for any CPU */
-	bool overload;
+	/*
+	 * Indicate whether the idle balance can be used to solve
+	 * imbalance within the root domain.
+	 * e.g. There is more than one runnable task for any CPU
+	 */
+	bool should_idle_balance;
 
 	/*
 	 * The bit corresponding to a CPU gets set here if such CPU has more
@@ -1610,8 +1614,8 @@ static inline void add_nr_running(struct rq *rq, unsigned count)
 
 	if (prev_nr < 2 && rq->nr_running >= 2) {
 #ifdef CONFIG_SMP
-		if (!rq->rd->overload)
-			rq->rd->overload = true;
+		if (!rq->rd->should_idle_balance)
+			rq->rd->should_idle_balance = true;
 #endif
 	}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 7/7] sched/fair: Set sd->should_idle_balance when misfit
  2018-02-15 16:20 [PATCH 0/7] sched/fair: Migrate 'misfit' tasks on asymmetric capacity systems Morten Rasmussen
                   ` (5 preceding siblings ...)
  2018-02-15 16:20 ` [PATCH 6/7] sched: Rename root_domain->overload to should_idle_balance Morten Rasmussen
@ 2018-02-15 16:20 ` Morten Rasmussen
  2018-02-28  7:46 ` [PATCH 0/7] sched/fair: Migrate 'misfit' tasks on asymmetric capacity systems Gaku Inami
  7 siblings, 0 replies; 31+ messages in thread
From: Morten Rasmussen @ 2018-02-15 16:20 UTC (permalink / raw)
  To: peterz, mingo
  Cc: valentin.schneider, dietmar.eggemann, vincent.guittot,
	linux-kernel, Morten Rasmussen

From: Valentin Schneider <valentin.schneider@arm.com>

Idle balance is a great opportunity to pull a misfit task. However,
there are scenarios where misfit tasks are present but idle balance is
prevented by the should_idle_balance flag.

A good example of this is a workload of n identical tasks. Let's suppose
we have a 2+2 Arm big.LITTLE system. We then spawn 4 fairly
CPU-intensive tasks - for the sake of simplicity let's say they are just
CPU hogs, even when running on big CPUs.

They are identical tasks, so on an SMP system they should all end at
(roughly) the same time. However, in our case the LITTLE CPUs are less
performing than the big CPUs, so tasks running on the LITTLEs will have
a longer completion time.

This means that the big CPUs will complete their work earlier, at which
point they should pull the tasks from the LITTLEs. What we want to
happen is summarized as follows:

a,b,c,d are our CPU-hogging tasks
_ signifies idling

LITTLE_0 | a a a a _ _
LITTLE_1 | b b b b _ _
---------|-------------
  big_0  | c c c c a a
  big_1  | d d d d b b
                  ^
                  ^
    Tasks end on the big CPUs, idle balance happens
    and the misfit tasks are pulled straight away

This however won't happen, because currently the should_idle_balance
flag is only set when there is any CPU that has more than one runnable
task - which may very well not be the case here if our CPU-hogging
workload is all there is to run.

As such, this commit sets the should_idle_balance flag in
update_sg_lb_stats when a group is flagged as having a misfit task.

cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
---
 kernel/sched/fair.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2d2302b7b584..d080a144f87f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7871,8 +7871,10 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 			sgs->idle_cpus++;
 
 		if (env->sd->flags & SD_ASYM_CPUCAPACITY &&
-		    !sgs->group_misfit_task_load && rq->misfit_task_load)
+		    !sgs->group_misfit_task_load && rq->misfit_task_load) {
 			sgs->group_misfit_task_load = rq->misfit_task_load;
+			*should_idle_balance = true;
+		}
 	}
 
 	/* Adjust by relative CPU capacity of the group */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 6/7] sched: Rename root_domain->overload to should_idle_balance
  2018-02-15 16:20 ` [PATCH 6/7] sched: Rename root_domain->overload to should_idle_balance Morten Rasmussen
@ 2018-02-16  9:14   ` Juri Lelli
  2018-02-16  9:49     ` Peter Zijlstra
  0 siblings, 1 reply; 31+ messages in thread
From: Juri Lelli @ 2018-02-16  9:14 UTC (permalink / raw)
  To: Morten Rasmussen
  Cc: peterz, mingo, valentin.schneider, dietmar.eggemann,
	vincent.guittot, linux-kernel

Hi,

On 15/02/18 16:20, Morten Rasmussen wrote:
> From: Valentin Schneider <valentin.schneider@arm.com>
> 
> The name "overload" is not very explicit, especially since it doesn't
> use any concept of "load" coming from load-tracking signals. For now it
> simply tracks if any of the CPUs in root_domain has more than one
> runnable task, and is then used to decide whether idle balance should be
> performed.
> 
> As such, this commit changes that flag's name to 'should_idle_balance',
> which makes its role more explicit.
> 
> cc: Ingo Molnar <mingo@redhat.com>
> cc: Peter Zijlstra <peterz@infradead.org>
> Suggested-by: Patrick Bellasi <patrick.bellasi@arm.com>
> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
> Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
> ---

[...]

> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 7d324b706e67..4215438667e5 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -650,8 +650,12 @@ struct root_domain {
>  	cpumask_var_t span;
>  	cpumask_var_t online;
>  
> -	/* Indicate more than one runnable task for any CPU */
> -	bool overload;
> +	/*
> +	 * Indicate whether the idle balance can be used to solve
> +	 * imbalance within the root domain.
> +	 * e.g. There is more than one runnable task for any CPU
> +	 */
> +	bool should_idle_balance;

Current name is however consistent with RT/DL's naming convention

 [...]
 /*
  * The bit corresponding to a CPU gets set here if such CPU has more
  * than one runnable -deadline task (as it is below for RT tasks).
  */
 cpumask_var_t dlo_mask;

 [...]

 /*
  * The "RT overload" flag: it gets set if a CPU has more than
  * one runnable RT task.
  */
 cpumask_var_t rto_mask;

Not a big deal, though. Just wanted to point that out. :)

Best,

- Juri

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 6/7] sched: Rename root_domain->overload to should_idle_balance
  2018-02-16  9:14   ` Juri Lelli
@ 2018-02-16  9:49     ` Peter Zijlstra
  2018-02-16 11:59       ` Valentin Schneider
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2018-02-16  9:49 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Morten Rasmussen, mingo, valentin.schneider, dietmar.eggemann,
	vincent.guittot, linux-kernel

On Fri, Feb 16, 2018 at 10:14:02AM +0100, Juri Lelli wrote:
> > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> > index 7d324b706e67..4215438667e5 100644
> > --- a/kernel/sched/sched.h
> > +++ b/kernel/sched/sched.h
> > @@ -650,8 +650,12 @@ struct root_domain {
> >  	cpumask_var_t span;
> >  	cpumask_var_t online;
> >  
> > -	/* Indicate more than one runnable task for any CPU */
> > -	bool overload;
> > +	/*
> > +	 * Indicate whether the idle balance can be used to solve
> > +	 * imbalance within the root domain.
> > +	 * e.g. There is more than one runnable task for any CPU
> > +	 */
> > +	bool should_idle_balance;
> 
> Current name is however consistent with RT/DL's naming convention

Yeah, not a fan either. We've consistently used the term to mean
nr_running>1. The thing to fix there is the stupid bool, not the name.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 6/7] sched: Rename root_domain->overload to should_idle_balance
  2018-02-16  9:49     ` Peter Zijlstra
@ 2018-02-16 11:59       ` Valentin Schneider
  0 siblings, 0 replies; 31+ messages in thread
From: Valentin Schneider @ 2018-02-16 11:59 UTC (permalink / raw)
  To: Peter Zijlstra, Juri Lelli
  Cc: Morten Rasmussen, mingo, dietmar.eggemann, vincent.guittot, linux-kernel

On 02/16/2018 09:49 AM, Peter Zijlstra wrote:
> On Fri, Feb 16, 2018 at 10:14:02AM +0100, Juri Lelli wrote:
>>> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>>> index 7d324b706e67..4215438667e5 100644
>>> --- a/kernel/sched/sched.h
>>> +++ b/kernel/sched/sched.h
>>> @@ -650,8 +650,12 @@ struct root_domain {
>>>  	cpumask_var_t span;
>>>  	cpumask_var_t online;
>>>  
>>> -	/* Indicate more than one runnable task for any CPU */
>>> -	bool overload;
>>> +	/*
>>> +	 * Indicate whether the idle balance can be used to solve
>>> +	 * imbalance within the root domain.
>>> +	 * e.g. There is more than one runnable task for any CPU
>>> +	 */
>>> +	bool should_idle_balance;
>>
>> Current name is however consistent with RT/DL's naming convention

I saw that it was already used elsewhere in fair but didn't know about
RT/DL, thanks for pointing that out.

> 
> Yeah, not a fan either. We've consistently used the term to mean
> nr_running>1. The thing to fix there is the stupid bool, not the name.
> 

So yeah the other thing that doesn't help here is that we're cramming 
several meanings into rq->rd->overload:
- is there an overloaded group
- is there a group with misfit task(s)

So it didn't make sense to keep it named "overload". Perhaps a better way of
handling this would be to keep exposing which is which instead of merging it
all in a bool. Something along those lines:

@update_sg_lb_stats():
[...]
 nr_running = rq->nr_running;
 if (nr_running > 1)
-    *overload = true;
+    sds->balance_status |= LB_STATUS_OVERLOAD

[...]

 if (env->sd->flags & SD_ASYM_CPUCAPACITY &&
 !sgs->group_misfit_task_load && rq->misfit_task_load) {
     sgs->group_misfit_task_load = rq->misfit_task_load;
-     *should_idle_balance = true;
+     sds->balance_status |= LB_STATUS_MISFIT
 }

@update_sd_lb_stats():

[...]
 if (!env->sd->parent) {
     /* update overload indicator if we are at root domain */
-     if (env->dst_rq->rd->overload != overload)
-         env->dst_rq->rd->overload = overload;
+     if (env->dst_rq->rd->balance_status != sds->balance_status)
+         env->dst_rq->rd->balance_status = sds->balance_status
 }

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/7] sched: Add static_key for asymmetric cpu capacity optimizations
  2018-02-15 16:20 ` [PATCH 1/7] sched: Add static_key for asymmetric cpu capacity optimizations Morten Rasmussen
@ 2018-02-16 13:47   ` Peter Zijlstra
  2018-02-16 15:41     ` Morten Rasmussen
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2018-02-16 13:47 UTC (permalink / raw)
  To: Morten Rasmussen
  Cc: mingo, valentin.schneider, dietmar.eggemann, vincent.guittot,
	linux-kernel

On Thu, Feb 15, 2018 at 04:20:48PM +0000, Morten Rasmussen wrote:
> +static void update_asym_cpucapacity(int cpu)
> +{
> +	if (!static_branch_unlikely(&sched_asym_cpucapacity) &&
> +	    lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY))
> +		static_branch_enable(&sched_asym_cpucapacity);
> +}

That looks odd, why not just:

	if (lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY))
		static_branch_enable(&sched_asym_cpucapacity);

? possibly with:

	else
		static_branch_disable(&sched_asym_cpucapacity);

if you want to play funny games :-)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/7] sched: Add static_key for asymmetric cpu capacity optimizations
  2018-02-16 13:47   ` Peter Zijlstra
@ 2018-02-16 15:41     ` Morten Rasmussen
  2018-02-16 16:51       ` Peter Zijlstra
  2018-02-16 17:39       ` Quentin Perret
  0 siblings, 2 replies; 31+ messages in thread
From: Morten Rasmussen @ 2018-02-16 15:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Morten Rasmussen, mingo, valentin.schneider, dietmar.eggemann,
	vincent.guittot, linux-kernel

On Fri, Feb 16, 2018 at 02:47:04PM +0100, Peter Zijlstra wrote:
> On Thu, Feb 15, 2018 at 04:20:48PM +0000, Morten Rasmussen wrote:
> > +static void update_asym_cpucapacity(int cpu)
> > +{
> > +	if (!static_branch_unlikely(&sched_asym_cpucapacity) &&
> > +	    lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY))
> > +		static_branch_enable(&sched_asym_cpucapacity);
> > +}
> 
> That looks odd, why not just:
> 
> 	if (lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY))
> 		static_branch_enable(&sched_asym_cpucapacity);

I actually had that initially and then I misread the implementation of
static_key_enable() as if it trigger the WARN_ON_ONCE() condition if I
enable an already enabled static key. But I see now that it should be
safe to do. 

> ? possibly with:
> 
> 	else
> 		static_branch_disable(&sched_asym_cpucapacity);
> 
> if you want to play funny games :-)

I thought about that too. It could make certain hotplug scenarios even
more expensive. I think we want the sched_asym_cpucapacity code to behave
even if SD_ASYM_CPUCAPACITY isn't set anywhere, so the static key would
be permanently from the point we detect asymmetry and leave it set. This
would be in line with how the smt static key works.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/7] sched: Add static_key for asymmetric cpu capacity optimizations
  2018-02-16 15:41     ` Morten Rasmussen
@ 2018-02-16 16:51       ` Peter Zijlstra
  2018-02-19 11:49         ` Morten Rasmussen
  2018-02-16 17:39       ` Quentin Perret
  1 sibling, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2018-02-16 16:51 UTC (permalink / raw)
  To: Morten Rasmussen
  Cc: Morten Rasmussen, mingo, valentin.schneider, dietmar.eggemann,
	vincent.guittot, linux-kernel

On Fri, Feb 16, 2018 at 03:41:01PM +0000, Morten Rasmussen wrote:
> On Fri, Feb 16, 2018 at 02:47:04PM +0100, Peter Zijlstra wrote:
> > On Thu, Feb 15, 2018 at 04:20:48PM +0000, Morten Rasmussen wrote:
> > > +static void update_asym_cpucapacity(int cpu)
> > > +{
> > > +	if (!static_branch_unlikely(&sched_asym_cpucapacity) &&
> > > +	    lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY))
> > > +		static_branch_enable(&sched_asym_cpucapacity);
> > > +}
> > 
> > That looks odd, why not just:
> > 
> > 	if (lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY))
> > 		static_branch_enable(&sched_asym_cpucapacity);
> 
> I actually had that initially and then I misread the implementation of
> static_key_enable() as if it trigger the WARN_ON_ONCE() condition if I
> enable an already enabled static key. But I see now that it should be
> safe to do. 

Right, that WARN is there for when we use enable/disable on a key with a
value outside of [0,1].

> > ? possibly with:
> > 
> > 	else
> > 		static_branch_disable(&sched_asym_cpucapacity);
> > 
> > if you want to play funny games :-)
> 
> I thought about that too. It could make certain hotplug scenarios even
> more expensive. I think we want the sched_asym_cpucapacity code to behave
> even if SD_ASYM_CPUCAPACITY isn't set anywhere, so the static key would
> be permanently from the point we detect asymmetry and leave it set. This
> would be in line with how the smt static key works.

Fair enough..

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/7] sched: Add static_key for asymmetric cpu capacity optimizations
  2018-02-16 15:41     ` Morten Rasmussen
  2018-02-16 16:51       ` Peter Zijlstra
@ 2018-02-16 17:39       ` Quentin Perret
  2018-02-16 17:43         ` Peter Zijlstra
  1 sibling, 1 reply; 31+ messages in thread
From: Quentin Perret @ 2018-02-16 17:39 UTC (permalink / raw)
  To: Morten Rasmussen
  Cc: Peter Zijlstra, Morten Rasmussen, mingo, valentin.schneider,
	dietmar.eggemann, vincent.guittot, linux-kernel

Hi Morten,

On Friday 16 Feb 2018 at 15:41:01 (+0000), Morten Rasmussen wrote:
> On Fri, Feb 16, 2018 at 02:47:04PM +0100, Peter Zijlstra wrote:
> > On Thu, Feb 15, 2018 at 04:20:48PM +0000, Morten Rasmussen wrote:
> > > +static void update_asym_cpucapacity(int cpu)
> > > +{
> > > +	if (!static_branch_unlikely(&sched_asym_cpucapacity) &&
> > > +	    lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY))
> > > +		static_branch_enable(&sched_asym_cpucapacity);
> > > +}
> > 
> > That looks odd, why not just:
> > 
> > 	if (lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY))
> > 		static_branch_enable(&sched_asym_cpucapacity);
> 
> I actually had that initially and then I misread the implementation of
> static_key_enable() as if it trigger the WARN_ON_ONCE() condition if I
> enable an already enabled static key. But I see now that it should be
> safe to do.

AFAIU it should be safe, but without your check you'll have to go through
cpus_read_lock()/unlock() every time a CPU is hotplugged. There is probably
no good reason to re-do that again and again if the state of the key
never changes. I don't know how expensive those lock/unlock operations are,
but I bet they are more expensive than testing static_branch_unlikely() :)

> 
> > ? possibly with:
> > 
> > 	else
> > 		static_branch_disable(&sched_asym_cpucapacity);
> > 
> > if you want to play funny games :-)
> 
> I thought about that too. It could make certain hotplug scenarios even
> more expensive. I think we want the sched_asym_cpucapacity code to behave
> even if SD_ASYM_CPUCAPACITY isn't set anywhere, so the static key would
> be permanently from the point we detect asymmetry and leave it set. This
> would be in line with how the smt static key works.

Thanks !
Quentin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/7] sched: Add static_key for asymmetric cpu capacity optimizations
  2018-02-16 17:39       ` Quentin Perret
@ 2018-02-16 17:43         ` Peter Zijlstra
  0 siblings, 0 replies; 31+ messages in thread
From: Peter Zijlstra @ 2018-02-16 17:43 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Morten Rasmussen, Morten Rasmussen, mingo, valentin.schneider,
	dietmar.eggemann, vincent.guittot, linux-kernel

On Fri, Feb 16, 2018 at 05:39:27PM +0000, Quentin Perret wrote:
> AFAIU it should be safe, but without your check you'll have to go through
> cpus_read_lock()/unlock() every time a CPU is hotplugged. There is probably
> no good reason to re-do that again and again if the state of the key
> never changes. I don't know how expensive those lock/unlock operations are,
> but I bet they are more expensive than testing static_branch_unlikely() :)

This is not a performance critical path, more obvious code is more
better.

Also cpus_read_lock() is dirt cheap most of the time.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/7] sched: Add static_key for asymmetric cpu capacity optimizations
  2018-02-16 16:51       ` Peter Zijlstra
@ 2018-02-19 11:49         ` Morten Rasmussen
  0 siblings, 0 replies; 31+ messages in thread
From: Morten Rasmussen @ 2018-02-19 11:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Morten Rasmussen, mingo, valentin.schneider, dietmar.eggemann,
	vincent.guittot, linux-kernel

On Fri, Feb 16, 2018 at 05:51:23PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 16, 2018 at 03:41:01PM +0000, Morten Rasmussen wrote:
> > On Fri, Feb 16, 2018 at 02:47:04PM +0100, Peter Zijlstra wrote:
> > > ? possibly with:
> > > 
> > > 	else
> > > 		static_branch_disable(&sched_asym_cpucapacity);
> > > 
> > > if you want to play funny games :-)
> > 
> > I thought about that too. It could make certain hotplug scenarios even
> > more expensive. I think we want the sched_asym_cpucapacity code to behave
> > even if SD_ASYM_CPUCAPACITY isn't set anywhere, so the static key would
> > be permanently from the point we detect asymmetry and leave it set. This
> > would be in line with how the smt static key works.
> 
> Fair enough..

Yeah, we can always add the 'else' bit later if we find a good reason to
do so.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/7] sched/fair: Add group_misfit_task load-balance type
  2018-02-15 16:20 ` [PATCH 2/7] sched/fair: Add group_misfit_task load-balance type Morten Rasmussen
@ 2018-02-19 13:56   ` Peter Zijlstra
  2018-02-19 13:58     ` Peter Zijlstra
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2018-02-19 13:56 UTC (permalink / raw)
  To: Morten Rasmussen
  Cc: mingo, valentin.schneider, dietmar.eggemann, vincent.guittot,
	linux-kernel

On Thu, Feb 15, 2018 at 04:20:49PM +0000, Morten Rasmussen wrote:
> @@ -6733,9 +6758,12 @@ done: __maybe_unused
>  	if (hrtick_enabled(rq))
>  		hrtick_start_fair(rq, p);
>  
> +	update_misfit_status(p, rq);
> +
>  	return p;
>  
>  idle:
> +	update_misfit_status(NULL, rq);
>  	new_tasks = idle_balance(rq, rf);
>  
>  	/*

So we set a point when picking a task (or tick). We clear said pointer
when idle.

> @@ -7822,6 +7855,10 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>  		 */
>  		if (!nr_running && idle_cpu(i))
>  			sgs->idle_cpus++;
> +
> +		if (env->sd->flags & SD_ASYM_CPUCAPACITY &&
> +		    !sgs->group_misfit_task_load && rq->misfit_task_load)
> +			sgs->group_misfit_task_load = rq->misfit_task_load;
>  	}
>  
>  	/* Adjust by relative CPU capacity of the group */

And we read said pointer from another CPU, without holding the
respective rq->lock.

What happens, if right after we set sgs->group_misfit_task_load, our
task decides to exit?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/7] sched/fair: Add group_misfit_task load-balance type
  2018-02-19 13:56   ` Peter Zijlstra
@ 2018-02-19 13:58     ` Peter Zijlstra
  2018-02-20 16:01       ` Morten Rasmussen
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2018-02-19 13:58 UTC (permalink / raw)
  To: Morten Rasmussen
  Cc: mingo, valentin.schneider, dietmar.eggemann, vincent.guittot,
	linux-kernel

On Mon, Feb 19, 2018 at 02:56:44PM +0100, Peter Zijlstra wrote:
> On Thu, Feb 15, 2018 at 04:20:49PM +0000, Morten Rasmussen wrote:
> > @@ -6733,9 +6758,12 @@ done: __maybe_unused
> >  	if (hrtick_enabled(rq))
> >  		hrtick_start_fair(rq, p);
> >  
> > +	update_misfit_status(p, rq);
> > +
> >  	return p;
> >  
> >  idle:
> > +	update_misfit_status(NULL, rq);
> >  	new_tasks = idle_balance(rq, rf);
> >  
> >  	/*
> 
> So we set a point when picking a task (or tick). We clear said pointer
> when idle.

N/m, I can't read today. You only store the load, not the actual task.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups
  2018-02-15 16:20 ` [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups Morten Rasmussen
@ 2018-02-19 14:50   ` Peter Zijlstra
  2018-02-19 14:53     ` Peter Zijlstra
  2018-02-19 15:10   ` Peter Zijlstra
  1 sibling, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2018-02-19 14:50 UTC (permalink / raw)
  To: Morten Rasmussen
  Cc: mingo, valentin.schneider, dietmar.eggemann, vincent.guittot,
	linux-kernel

On Thu, Feb 15, 2018 at 04:20:51PM +0000, Morten Rasmussen wrote:
> On systems with asymmetric cpu capacities, a skewed load distribution
> might yield better throughput than balancing load per group capacity.
> For example, preferring high capacity cpus for compute intensive tasks
> leaving low capacity cpus idle rather than balancing the number of idle
> cpus across different cpu types. Instead, let load-balance back off if
> the busiest group isn't really overloaded.

I'm sorry. I just can't seem to make sense of that today. What?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups
  2018-02-19 14:50   ` Peter Zijlstra
@ 2018-02-19 14:53     ` Peter Zijlstra
  2018-02-20 16:22       ` Morten Rasmussen
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2018-02-19 14:53 UTC (permalink / raw)
  To: Morten Rasmussen
  Cc: mingo, valentin.schneider, dietmar.eggemann, vincent.guittot,
	linux-kernel

On Mon, Feb 19, 2018 at 03:50:12PM +0100, Peter Zijlstra wrote:
> On Thu, Feb 15, 2018 at 04:20:51PM +0000, Morten Rasmussen wrote:
> > On systems with asymmetric cpu capacities, a skewed load distribution
> > might yield better throughput than balancing load per group capacity.
> > For example, preferring high capacity cpus for compute intensive tasks
> > leaving low capacity cpus idle rather than balancing the number of idle
> > cpus across different cpu types. Instead, let load-balance back off if
> > the busiest group isn't really overloaded.
> 
> I'm sorry. I just can't seem to make sense of that today. What?

Aah, you're saying that is we have 4+4 bit.little and 4 runnable tasks,
we would like them running on our big cluster and leave the small
cluster entirely idle, instead of runnint 2+2.

So what about this DynamicQ nonsense? Or is that so unstructured we
can't really do anything sensible with it?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups
  2018-02-15 16:20 ` [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups Morten Rasmussen
  2018-02-19 14:50   ` Peter Zijlstra
@ 2018-02-19 15:10   ` Peter Zijlstra
  2018-02-20 16:33     ` Morten Rasmussen
  1 sibling, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2018-02-19 15:10 UTC (permalink / raw)
  To: Morten Rasmussen
  Cc: mingo, valentin.schneider, dietmar.eggemann, vincent.guittot,
	linux-kernel

On Thu, Feb 15, 2018 at 04:20:51PM +0000, Morten Rasmussen wrote:
> +/*
> + * group_similar_cpu_capacity: Returns true if the minimum capacity of the
> + * compared groups differ by less than 12.5%.
> + */
> +static inline bool
> +group_similar_cpu_capacity(struct sched_group *sg, struct sched_group *ref)
> +{
> +	long diff = sg->sgc->min_capacity - ref->sgc->min_capacity;
> +	long max = max(sg->sgc->min_capacity, ref->sgc->min_capacity);
> +
> +	return abs(diff) < max >> 3;
> +}

This seems a fairly random and dodgy heuristic.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/7] sched/fair: Add group_misfit_task load-balance type
  2018-02-19 13:58     ` Peter Zijlstra
@ 2018-02-20 16:01       ` Morten Rasmussen
  0 siblings, 0 replies; 31+ messages in thread
From: Morten Rasmussen @ 2018-02-20 16:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, valentin.schneider, dietmar.eggemann, vincent.guittot,
	linux-kernel

On Mon, Feb 19, 2018 at 02:58:42PM +0100, Peter Zijlstra wrote:
> On Mon, Feb 19, 2018 at 02:56:44PM +0100, Peter Zijlstra wrote:
> > On Thu, Feb 15, 2018 at 04:20:49PM +0000, Morten Rasmussen wrote:
> > > @@ -6733,9 +6758,12 @@ done: __maybe_unused
> > >  	if (hrtick_enabled(rq))
> > >  		hrtick_start_fair(rq, p);
> > >  
> > > +	update_misfit_status(p, rq);
> > > +
> > >  	return p;
> > >  
> > >  idle:
> > > +	update_misfit_status(NULL, rq);
> > >  	new_tasks = idle_balance(rq, rf);
> > >  
> > >  	/*
> > 
> > So we set a point when picking a task (or tick). We clear said pointer
> > when idle.
> 
> N/m, I can't read today. You only store the load, not the actual task.

It is a very valid question though.

Storing the load of the misfit task isn't the perfect solution as there
is no guarantee that we actually end up pulling the misfit task if some
other task happens to be on the rq when we balance. However, as I think
you are hinting, we will get into all sorts of interesting races if we
store a pointer to the actual task.

In most real scenarios it appears to be 'good enough'. The typical
misfit scenario is one always-running task and potentially a few small
tasks showing up periodically. In that case we are most likely to see
the always-running task when we are balancing.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups
  2018-02-19 14:53     ` Peter Zijlstra
@ 2018-02-20 16:22       ` Morten Rasmussen
  0 siblings, 0 replies; 31+ messages in thread
From: Morten Rasmussen @ 2018-02-20 16:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, valentin.schneider, dietmar.eggemann, vincent.guittot,
	linux-kernel

On Mon, Feb 19, 2018 at 03:53:35PM +0100, Peter Zijlstra wrote:
> On Mon, Feb 19, 2018 at 03:50:12PM +0100, Peter Zijlstra wrote:
> > On Thu, Feb 15, 2018 at 04:20:51PM +0000, Morten Rasmussen wrote:
> > > On systems with asymmetric cpu capacities, a skewed load distribution
> > > might yield better throughput than balancing load per group capacity.
> > > For example, preferring high capacity cpus for compute intensive tasks
> > > leaving low capacity cpus idle rather than balancing the number of idle
> > > cpus across different cpu types. Instead, let load-balance back off if
> > > the busiest group isn't really overloaded.
> > 
> > I'm sorry. I just can't seem to make sense of that today. What?
> 
> Aah, you're saying that is we have 4+4 bit.little and 4 runnable tasks,
> we would like them running on our big cluster and leave the small
> cluster entirely idle, instead of runnint 2+2.

Yes. Ideally, when are busy, we want to fill up the big cpus first, one
task on each, and if we have more tasks we start using little cpus
before putting additional tasks on any of the bigs. The load per group
capacity metric is sort of working against that up until roughly the
point where we have enough tasks for all the cpus.

> So what about this DynamicQ nonsense? Or is that so unstructured we
> can't really do anything sensible with it?

With DynamiQ we don't have any grouping of the cpu types provided by the
architecture. So we are going end up balancing between individual cpus.
I hope these tweaks should work for DynamiQ too. Always-running tasks on
little cpus should to be flagged as misfits and be picked up by big
cpus. It is still to be verified though.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups
  2018-02-19 15:10   ` Peter Zijlstra
@ 2018-02-20 16:33     ` Morten Rasmussen
  2018-02-20 18:26       ` Peter Zijlstra
  0 siblings, 1 reply; 31+ messages in thread
From: Morten Rasmussen @ 2018-02-20 16:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, valentin.schneider, dietmar.eggemann, vincent.guittot,
	linux-kernel

On Mon, Feb 19, 2018 at 04:10:11PM +0100, Peter Zijlstra wrote:
> On Thu, Feb 15, 2018 at 04:20:51PM +0000, Morten Rasmussen wrote:
> > +/*
> > + * group_similar_cpu_capacity: Returns true if the minimum capacity of the
> > + * compared groups differ by less than 12.5%.
> > + */
> > +static inline bool
> > +group_similar_cpu_capacity(struct sched_group *sg, struct sched_group *ref)
> > +{
> > +	long diff = sg->sgc->min_capacity - ref->sgc->min_capacity;
> > +	long max = max(sg->sgc->min_capacity, ref->sgc->min_capacity);
> > +
> > +	return abs(diff) < max >> 3;
> > +}
> 
> This seems a fairly random and dodgy heuristic.

I can't deny that :-)

We need to somehow figure out if we are doing asymmetric cpu capacity
balancing or normal SMP balancing. We probably don't care about
migrating tasks if the capacities are nearly identical. But how much is
'nearly'?

We could make it strictly equal as long as sgc->min_capacity is based on
capacity_orig. If we let things like rt-pressure influence
sgc->min_capacity, it might become a mess.

We could tie it to sd->imbalance_pct to make it slightly less arbitrary,
or we can try to drop the margin.

Alternative solutions and preferences are welcome...

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups
  2018-02-20 16:33     ` Morten Rasmussen
@ 2018-02-20 18:26       ` Peter Zijlstra
  2018-02-23 16:38         ` Morten Rasmussen
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2018-02-20 18:26 UTC (permalink / raw)
  To: Morten Rasmussen
  Cc: mingo, valentin.schneider, dietmar.eggemann, vincent.guittot,
	linux-kernel

On Tue, Feb 20, 2018 at 04:33:52PM +0000, Morten Rasmussen wrote:
> On Mon, Feb 19, 2018 at 04:10:11PM +0100, Peter Zijlstra wrote:
> > On Thu, Feb 15, 2018 at 04:20:51PM +0000, Morten Rasmussen wrote:
> > > +/*
> > > + * group_similar_cpu_capacity: Returns true if the minimum capacity of the
> > > + * compared groups differ by less than 12.5%.
> > > + */
> > > +static inline bool
> > > +group_similar_cpu_capacity(struct sched_group *sg, struct sched_group *ref)
> > > +{
> > > +	long diff = sg->sgc->min_capacity - ref->sgc->min_capacity;
> > > +	long max = max(sg->sgc->min_capacity, ref->sgc->min_capacity);
> > > +
> > > +	return abs(diff) < max >> 3;
> > > +}
> > 
> > This seems a fairly random and dodgy heuristic.
> 
> I can't deny that :-)
> 
> We need to somehow figure out if we are doing asymmetric cpu capacity
> balancing or normal SMP balancing. We probably don't care about
> migrating tasks if the capacities are nearly identical. But how much is
> 'nearly'?
> 
> We could make it strictly equal as long as sgc->min_capacity is based on
> capacity_orig. If we let things like rt-pressure influence
> sgc->min_capacity, it might become a mess.

See, that is the problem, I think it this min_capacity thing is
influenced by rt-pressure and the like.

See update_cpu_capacity(), min_capacity is set after we add the RT scale
factor thingy, and then update_group_capacity() filters the min of the
whole group. The thing only ever goes down.

But this means that if a big CPU has a very high IRQ/RT load, its
capacity will dip below that of a little core and min_capacity for the
big group as a whole will appear smaller than that of the little group.

Or am I now terminally confused again?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups
  2018-02-20 18:26       ` Peter Zijlstra
@ 2018-02-23 16:38         ` Morten Rasmussen
  2018-02-23 16:47           ` Peter Zijlstra
  0 siblings, 1 reply; 31+ messages in thread
From: Morten Rasmussen @ 2018-02-23 16:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, valentin.schneider, dietmar.eggemann, vincent.guittot,
	linux-kernel

On Tue, Feb 20, 2018 at 07:26:05PM +0100, Peter Zijlstra wrote:
> On Tue, Feb 20, 2018 at 04:33:52PM +0000, Morten Rasmussen wrote:
> > On Mon, Feb 19, 2018 at 04:10:11PM +0100, Peter Zijlstra wrote:
> > > On Thu, Feb 15, 2018 at 04:20:51PM +0000, Morten Rasmussen wrote:
> > > > +/*
> > > > + * group_similar_cpu_capacity: Returns true if the minimum capacity of the
> > > > + * compared groups differ by less than 12.5%.
> > > > + */
> > > > +static inline bool
> > > > +group_similar_cpu_capacity(struct sched_group *sg, struct sched_group *ref)
> > > > +{
> > > > +	long diff = sg->sgc->min_capacity - ref->sgc->min_capacity;
> > > > +	long max = max(sg->sgc->min_capacity, ref->sgc->min_capacity);
> > > > +
> > > > +	return abs(diff) < max >> 3;
> > > > +}
> > > 
> > > This seems a fairly random and dodgy heuristic.
> > 
> > I can't deny that :-)
> > 
> > We need to somehow figure out if we are doing asymmetric cpu capacity
> > balancing or normal SMP balancing. We probably don't care about
> > migrating tasks if the capacities are nearly identical. But how much is
> > 'nearly'?
> > 
> > We could make it strictly equal as long as sgc->min_capacity is based on
> > capacity_orig. If we let things like rt-pressure influence
> > sgc->min_capacity, it might become a mess.
> 
> See, that is the problem, I think it this min_capacity thing is
> influenced by rt-pressure and the like.
> 
> See update_cpu_capacity(), min_capacity is set after we add the RT scale
> factor thingy, and then update_group_capacity() filters the min of the
> whole group. The thing only ever goes down.
> 
> But this means that if a big CPU has a very high IRQ/RT load, its
> capacity will dip below that of a little core and min_capacity for the
> big group as a whole will appear smaller than that of the little group.
> 
> Or am I now terminally confused again?

No, I think you are right, or I'm equally confused.

I don't think we can avoid having some sort of margin to decide when
capacities are significantly different if we want to keep this patch.

Looking more into this, I realized that we do already have similar
comparison and margin in group_smaller_cpu_capacity(). So I'm going to
look using that instead if possible.

The commit message could use some clarification too as we do in fact
already use group_smaller_cpu_capacity() to bail out of pulling big
tasks from big cpus to little. However, there are cases where it is fine
to have sum_nr_running > group_weight on the big side and cases where it
is fine to have the bigs idling while the littles are lightly utilized
which should be addressed by this patch.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups
  2018-02-23 16:38         ` Morten Rasmussen
@ 2018-02-23 16:47           ` Peter Zijlstra
  2018-02-26 15:06             ` Morten Rasmussen
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2018-02-23 16:47 UTC (permalink / raw)
  To: Morten Rasmussen
  Cc: mingo, valentin.schneider, dietmar.eggemann, vincent.guittot,
	linux-kernel

On Fri, Feb 23, 2018 at 04:38:06PM +0000, Morten Rasmussen wrote:
> > Or am I now terminally confused again?
> 
> No, I think you are right, or I'm equally confused.

:-)

Would it make sense to also track max_capacity, but then based on the
value before RT scaling ?

That should readily be able to distinguish between big and little
clusters, although then DynamiQ would still completely ruin things.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups
  2018-02-23 16:47           ` Peter Zijlstra
@ 2018-02-26 15:06             ` Morten Rasmussen
  0 siblings, 0 replies; 31+ messages in thread
From: Morten Rasmussen @ 2018-02-26 15:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, valentin.schneider, dietmar.eggemann, vincent.guittot,
	linux-kernel

On Fri, Feb 23, 2018 at 05:47:52PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 23, 2018 at 04:38:06PM +0000, Morten Rasmussen wrote:
> > > Or am I now terminally confused again?
> > 
> > No, I think you are right, or I'm equally confused.
> 
> :-)
> 
> Would it make sense to also track max_capacity, but then based on the
> value before RT scaling ?
> 
> That should readily be able to distinguish between big and little
> clusters, although then DynamiQ would still completely ruin things.

IIRC, I did actually track max_capacity to begin with for the wake_cap()
stuff, but someone suggested to use min_capacity instead to factor in
the RT scaling as it could potentially help some use-cases.

I can add unscaled max_capacity tracking and use that as this is
primarily a solution for asymmetric cpu capacity system.

Whether we track max or min shouldn't really matter if it is based on
original capacity, unless you have a DynamiQ system. For DynamiQ system
it depends on how it is configured. If it is a single DynamiQ cluster we
will have just on sched_domain with per-cpu sched_groups, so we don't
have balancing between mixed groups. If we have multiple DynamiQ
clusters, we can have mixed groups, homogeneous groups, or both
depending on the system configuration. Homogeneous groups should be
okay, mixed groups could work okay, I think, as long as all group have
the same mix, a mix of mixed groups is going to be a challenge. Most
likely we would have to treat all these groups as one and ignore cache
boundaries for scheduling decisions.

We are slowly getting into the mess of which capacity should be used for
various conditions. Is it the original capacity (unscaled and at the
highest frequency), do we subtract the RT utilization, and what if the
thermal framework has disabled some of the higher frequencies?

Particularly, the fact that constraints impose by RT and thermal are not
permanent and might change depending on the use-case and how we place
the tasks. But that is a longer discussion to be had.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH 0/7] sched/fair: Migrate 'misfit' tasks on asymmetric capacity systems
  2018-02-15 16:20 [PATCH 0/7] sched/fair: Migrate 'misfit' tasks on asymmetric capacity systems Morten Rasmussen
                   ` (6 preceding siblings ...)
  2018-02-15 16:20 ` [PATCH 7/7] sched/fair: Set sd->should_idle_balance when misfit Morten Rasmussen
@ 2018-02-28  7:46 ` Gaku Inami
  2018-03-01 11:59   ` Valentin Schneider
  7 siblings, 1 reply; 31+ messages in thread
From: Gaku Inami @ 2018-02-28  7:46 UTC (permalink / raw)
  To: Morten Rasmussen, peterz, mingo
  Cc: valentin.schneider, dietmar.eggemann, vincent.guittot, linux-kernel

Hi

> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Morten Rasmussen
> Sent: Friday, February 16, 2018 1:21 AM
> To: peterz@infradead.org; mingo@redhat.com
> Cc: valentin.schneider@arm.com; dietmar.eggemann@arm.com; vincent.guittot@linaro.org; linux-kernel@vger.kernel.org;
> Morten Rasmussen <morten.rasmussen@arm.com>
> Subject: [PATCH 0/7] sched/fair: Migrate 'misfit' tasks on asymmetric capacity systems
> 
<snip>
> 
> The patches have been tested on:
>    1. Arm Juno (r0): 2+4 Cortex A57/A53
>    2. Hikey960: 4+4 Cortex A73/A53
> 
> Test case:
> Big cpus are always kept busy. Pin a shorter running sysbench tasks to
> big cpus, while creating a longer running set of unpinned sysbench
> tasks.

Tested-by: Gaku Inami <gaku.inami.xh@renesas.com>

I tested as same in other SoC. It looks fine.

The patches have been tested on:
   3. Renesas R-Car H3 : 4+4 Cortex A57/A53

Results:
Single runs with completion time of each task
R-Car H3 (tip)
    total time:                          0.9415s
    total time:                          0.9582s
    total time:                          1.3275s
    total time:                          1.6752s

R-Car H3 (misfit)
    total time:                          0.9312s
    total time:                          0.9429s
    total time:                          0.9525s
    total time:                          0.9660s

10 run summary (tracking longest running task for each run)
	R-Car H3
	avg	max
tip     1.6752  1.6753
misfit  0.9890  1.0204

Regards,
Inami

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/7] sched/fair: Migrate 'misfit' tasks on asymmetric capacity systems
  2018-02-28  7:46 ` [PATCH 0/7] sched/fair: Migrate 'misfit' tasks on asymmetric capacity systems Gaku Inami
@ 2018-03-01 11:59   ` Valentin Schneider
  0 siblings, 0 replies; 31+ messages in thread
From: Valentin Schneider @ 2018-03-01 11:59 UTC (permalink / raw)
  To: Gaku Inami, Morten Rasmussen, peterz, mingo
  Cc: dietmar.eggemann, vincent.guittot, linux-kernel

Hi,

On 02/28/2018 07:46 AM, Gaku Inami wrote:
> Hi
> 
[...]
> 
> Tested-by: Gaku Inami <gaku.inami.xh@renesas.com>
> 
> I tested as same in other SoC. It looks fine.
> 

Thanks for testing on your side !

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2018-03-01 11:59 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-15 16:20 [PATCH 0/7] sched/fair: Migrate 'misfit' tasks on asymmetric capacity systems Morten Rasmussen
2018-02-15 16:20 ` [PATCH 1/7] sched: Add static_key for asymmetric cpu capacity optimizations Morten Rasmussen
2018-02-16 13:47   ` Peter Zijlstra
2018-02-16 15:41     ` Morten Rasmussen
2018-02-16 16:51       ` Peter Zijlstra
2018-02-19 11:49         ` Morten Rasmussen
2018-02-16 17:39       ` Quentin Perret
2018-02-16 17:43         ` Peter Zijlstra
2018-02-15 16:20 ` [PATCH 2/7] sched/fair: Add group_misfit_task load-balance type Morten Rasmussen
2018-02-19 13:56   ` Peter Zijlstra
2018-02-19 13:58     ` Peter Zijlstra
2018-02-20 16:01       ` Morten Rasmussen
2018-02-15 16:20 ` [PATCH 3/7] sched/fair: Consider misfit tasks when load-balancing Morten Rasmussen
2018-02-15 16:20 ` [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups Morten Rasmussen
2018-02-19 14:50   ` Peter Zijlstra
2018-02-19 14:53     ` Peter Zijlstra
2018-02-20 16:22       ` Morten Rasmussen
2018-02-19 15:10   ` Peter Zijlstra
2018-02-20 16:33     ` Morten Rasmussen
2018-02-20 18:26       ` Peter Zijlstra
2018-02-23 16:38         ` Morten Rasmussen
2018-02-23 16:47           ` Peter Zijlstra
2018-02-26 15:06             ` Morten Rasmussen
2018-02-15 16:20 ` [PATCH 5/7] sched/fair: Kick nohz balance if rq->misfit_task Morten Rasmussen
2018-02-15 16:20 ` [PATCH 6/7] sched: Rename root_domain->overload to should_idle_balance Morten Rasmussen
2018-02-16  9:14   ` Juri Lelli
2018-02-16  9:49     ` Peter Zijlstra
2018-02-16 11:59       ` Valentin Schneider
2018-02-15 16:20 ` [PATCH 7/7] sched/fair: Set sd->should_idle_balance when misfit Morten Rasmussen
2018-02-28  7:46 ` [PATCH 0/7] sched/fair: Migrate 'misfit' tasks on asymmetric capacity systems Gaku Inami
2018-03-01 11:59   ` Valentin Schneider

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.