linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/12] sched: consolidation of cpu_capacity
@ 2014-07-28 17:51 Vincent Guittot
  2014-07-28 17:51 ` [PATCH v4 01/12] sched: fix imbalance flag reset Vincent Guittot
                   ` (12 more replies)
  0 siblings, 13 replies; 21+ messages in thread
From: Vincent Guittot @ 2014-07-28 17:51 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel, preeti, linux, linux-arm-kernel
  Cc: riel, Morten.Rasmussen, efault, nicolas.pitre, linaro-kernel,
	daniel.lezcano, dietmar.eggemann, Vincent Guittot

Part of this patchset was previously part of the larger tasks packing patchset
[1]. I have splitted the latter in 3 different patchsets (at least) to make the
thing easier.
-configuration of sched_domain topology [2]
-update and consolidation of cpu_capacity (this patchset)
-tasks packing algorithm

SMT system is no more the only system that can have a CPUs with an original
capacity that is different from the default value. We need to extend the use of
cpu_capacity_orig to all kind of platform so the scheduler will have both the
maximum capacity (cpu_capacity_orig/capacity_orig) and the current capacity
(cpu_capacity/capacity) of CPUs and sched_groups. A new function
arch_scale_cpu_capacity has been created and replace arch_scale_smt_capacity,
which is SMT specifc in the computation of the capapcity of a CPU.

During load balance, the scheduler evaluates the number of tasks that a group
of CPUs can handle. The current method assumes that tasks have a fix load of
SCHED_LOAD_SCALE and CPUs have a default capacity of SCHED_CAPACITY_SCALE.
This assumption generates wrong decision by creating ghost cores and by
removing real ones when the original capacity of CPUs is different from the
default SCHED_CAPACITY_SCALE. We don't try anymore to evaluate the number of
available cores based on the group_capacity but instead we detect when the
group is fully utilized

Now that we have the original capacity of CPUS and their activity/utilization,
we can evaluate more accuratly the capacity and the level of utilization of a
group of CPUs.

This patchset mainly replaces the old capacity method by a new one and has kept
the policy almost unchanged whereas we could certainly take advantage of this
new statistic in several other places of the load balance.

Tests results:
I have put below results of 4 kind of tests:
- hackbench -l 500 -s 4096
- perf bench sched pipe -l 400000
- scp of 100MB file on the platform
- ebizzy with various number of threads
on 4 kernels :
- tip = tip/sched/core
- step1 = tip + patches(1-8)
- patchset = tip + whole patchset
- patchset+irq = tip + this patchset + irq accounting

each test has been run 6 times and the figure below show the stdev and the
diff compared to the tip kernel

Dual A7				 tip   |  +step1           |  +patchset        |  patchset+irq	
			         stdev |  results    stdev |  results    stdev |  results    stdev
hackbench (lower is better) (+/-)0.64% | -0.19% (+/-)0.73% |  0.58% (+/-)1.29% |  0.20% (+/-)1.00%
perf	(lower is better)   (+/-)0.28% |  1.22% (+/-)0.17% |  1.29% (+/-)0.06% |  2.85% (+/-)0.33%
scp			    (+/-)4.81% |  2.61% (+/-)0.28% |  2.39% (+/-)0.22% | 82.18% (+/-)3.30%
ebizzy	-t 1		    (+/-)2.31% | -1.32% (+/-)1.90% | -0.79% (+/-)2.88% |  3.10% (+/-)2.32%
ebizzy	-t 2		    (+/-)0.70% |  8.29% (+/-)6.66% |  1.93% (+/-)5.47% |  2.72% (+/-)5.72%
ebizzy	-t 4		    (+/-)3.54% |  5.57% (+/-)8.00% |  0.36% (+/-)9.00% |  2.53% (+/-)3.17%
ebizzy	-t 6		    (+/-)2.36% | -0.43% (+/-)3.29% | -1.93% (+/-)3.47% |  0.57% (+/-)0.75%
ebizzy	-t 8		    (+/-)1.65% | -0.45% (+/-)0.93% | -1.95% (+/-)1.52% | -1.18% (+/-)1.61%
ebizzy	-t 10		    (+/-)2.55% | -0.98% (+/-)3.06% | -1.18% (+/-)6.17% | -2.33% (+/-)3.28%
ebizzy	-t 12		    (+/-)6.22% |  0.17% (+/-)5.63% |  2.98% (+/-)7.11% |  1.19% (+/-)4.68%
ebizzy	-t 14		    (+/-)5.38% | -0.14% (+/-)5.33% |  2.49% (+/-)4.93% |  1.43% (+/-)6.55%
															
Quad A15			 tip	| +patchset1	    | +patchset2        | patchset+irq		
			         stdev  | results     stdev | results    stdev  | results    stdev
hackbench (lower is better)  (+/-)0.78% |  0.87% (+/-)1.72% |  0.91% (+/-)2.02% |  3.30% (+/-)2.02%
perf	(lower is better)    (+/-)2.03% | -0.31% (+/-)0.76% | -2.38% (+/-)1.37% |  1.42% (+/-)3.14%
scp			     (+/-)0.04% |  0.51% (+/-)1.37% |  1.79% (+/-)0.84% |  1.77% (+/-)0.38%
ebizzy	-t 1		     (+/-)0.41% |  2.05% (+/-)0.38% |  2.08% (+/-)0.24% |  0.17% (+/-)0.62%
ebizzy	-t 2		     (+/-)0.78% |  0.60% (+/-)0.63% |  0.43% (+/-)0.48% |  1.61% (+/-)0.38%
ebizzy	-t 4		     (+/-)0.58% | -0.10% (+/-)0.97% | -0.65% (+/-)0.76% | -0.75% (+/-)0.86%
ebizzy	-t 6		     (+/-)0.31% |  1.07% (+/-)1.12% | -0.16% (+/-)0.87% | -0.76% (+/-)0.22%
ebizzy	-t 8		     (+/-)0.95% | -0.30% (+/-)0.85% | -0.79% (+/-)0.28% | -1.66% (+/-)0.21%
ebizzy	-t 10		     (+/-)0.31% |  0.04% (+/-)0.97% | -1.44% (+/-)1.54% | -0.55% (+/-)0.62%
ebizzy	-t 12		     (+/-)8.35% | -1.89% (+/-)7.64% |  0.75% (+/-)5.30% | -1.18% (+/-)8.16%
ebizzy	-t 14		    (+/-)13.17% |  6.22% (+/-)4.71% |  5.25% (+/-)9.14% |  5.87% (+/-)5.77%

I haven't been able to fully test the patchset for a SMT system to check that
the regression that has been reported by Preethi has been solved but the
various tests that i have done, don't show any regression so far.
The correction of SD_PREFER_SIBLING mode and the use of the latter at SMT level
should have fix the regression.

The usage_avg_contrib is based on the current implementation of the
load avg tracking. I also have a version of the usage_avg_contrib that is based
on the new implementation [3] but haven't provide the patches and results as
[3] is still under review. I can provide change above [3] to change how 
usage_avg_contrib is computed and adapt to new mecanism.

TODO: manage conflict with the next version of [4]

Change since V3:
 - add usage_avg_contrib statistic which sums the running time of tasks on a rq
 - use usage_avg_contrib instead of runnable_avg_sum for cpu_utilization
 - fix replacement power by capacity
 - update some comments

Change since V2:
 - rebase on top of capacity renaming
 - fix wake_affine statistic update
 - rework nohz_kick_needed
 - optimize the active migration of a task from CPU with reduced capacity
 - rename group_activity by group_utilization and remove unused total_utilization
 - repair SD_PREFER_SIBLING and use it for SMT level
 - reorder patchset to gather patches with same topics

Change since V1:
 - add 3 fixes
 - correct some commit messages
 - replace capacity computation by activity
 - take into account current cpu capacity

[1] https://lkml.org/lkml/2013/10/18/121
[2] https://lkml.org/lkml/2014/3/19/377
[3] https://lkml.org/lkml/2014/7/18/110
[4] https://lkml.org/lkml/2014/7/25/589

Vincent Guittot (12):
  sched: fix imbalance flag reset
  sched: remove a wake_affine condition
  sched: fix avg_load computation
  sched: Allow all archs to set the capacity_orig
  ARM: topology: use new cpu_capacity interface
  sched: add per rq cpu_capacity_orig
  sched: test the cpu's capacity in wake affine
  sched: move cfs task on a CPU with higher capacity
  sched: add usage_load_avg
  sched: get CPU's utilization statistic
  sched: replace capacity_factor by utilization
  sched: add SD_PREFER_SIBLING for SMT level

 arch/arm/kernel/topology.c |   4 +-
 include/linux/sched.h      |   4 +-
 kernel/sched/core.c        |   3 +-
 kernel/sched/fair.c        | 350 ++++++++++++++++++++++++++-------------------
 kernel/sched/sched.h       |   3 +-
 5 files changed, 207 insertions(+), 157 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v4 01/12] sched: fix imbalance flag reset
  2014-07-28 17:51 [PATCH v4 00/12] sched: consolidation of cpu_capacity Vincent Guittot
@ 2014-07-28 17:51 ` Vincent Guittot
  2014-11-23 10:25   ` Wanpeng Li
  2014-11-24 23:47   ` Wanpeng Li
  2014-07-28 17:51 ` [PATCH v4 02/12] sched: remove a wake_affine condition Vincent Guittot
                   ` (11 subsequent siblings)
  12 siblings, 2 replies; 21+ messages in thread
From: Vincent Guittot @ 2014-07-28 17:51 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel, preeti, linux, linux-arm-kernel
  Cc: riel, Morten.Rasmussen, efault, nicolas.pitre, linaro-kernel,
	daniel.lezcano, dietmar.eggemann, Vincent Guittot

The imbalance flag can stay set whereas there is no imbalance.

Let assume that we have 3 tasks that run on a dual cores /dual cluster system.
We will have some idle load balance which are triggered during tick.
Unfortunately, the tick is also used to queue background work so we can reach
the situation where short work has been queued on a CPU which already runs a
task. The load balance will detect this imbalance (2 tasks on 1 CPU and an idle
CPU) and will try to pull the waiting task on the idle CPU. The waiting task is
a worker thread that is pinned on a CPU so an imbalance due to pinned task is
detected and the imbalance flag is set.
Then, we will not be able to clear the flag because we have at most 1 task on
each CPU but the imbalance flag will trig to useless active load balance
between the idle CPU and the busy CPU.

We need to reset of the imbalance flag as soon as we have reached a balanced
state. If all tasks are pinned, we don't consider that as a balanced state and
let the imbalance flag set.

Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 923fe32..7eb9126 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6672,10 +6672,8 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 		if (sd_parent) {
 			int *group_imbalance = &sd_parent->groups->sgc->imbalance;
 
-			if ((env.flags & LBF_SOME_PINNED) && env.imbalance > 0) {
+			if ((env.flags & LBF_SOME_PINNED) && env.imbalance > 0)
 				*group_imbalance = 1;
-			} else if (*group_imbalance)
-				*group_imbalance = 0;
 		}
 
 		/* All tasks on this runqueue were pinned by CPU affinity */
@@ -6686,7 +6684,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 				env.loop_break = sched_nr_migrate_break;
 				goto redo;
 			}
-			goto out_balanced;
+			goto out_all_pinned;
 		}
 	}
 
@@ -6760,6 +6758,23 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 	goto out;
 
 out_balanced:
+	/*
+	 * We reach balance although we may have faced some affinity
+	 * constraints. Clear the imbalance flag if it was set.
+	 */
+	if (sd_parent) {
+		int *group_imbalance = &sd_parent->groups->sgc->imbalance;
+
+		if (*group_imbalance)
+			*group_imbalance = 0;
+	}
+
+out_all_pinned:
+	/*
+	 * We reach balance because all tasks are pinned at this level so
+	 * we can't migrate them. Let the imbalance flag set so parent level
+	 * can try to migrate them.
+	 */
 	schedstat_inc(sd, lb_balanced[idle]);
 
 	sd->nr_balance_failed = 0;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 02/12] sched: remove a wake_affine condition
  2014-07-28 17:51 [PATCH v4 00/12] sched: consolidation of cpu_capacity Vincent Guittot
  2014-07-28 17:51 ` [PATCH v4 01/12] sched: fix imbalance flag reset Vincent Guittot
@ 2014-07-28 17:51 ` Vincent Guittot
  2014-07-28 17:51 ` [PATCH v4 03/12] sched: fix avg_load computation Vincent Guittot
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Vincent Guittot @ 2014-07-28 17:51 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel, preeti, linux, linux-arm-kernel
  Cc: riel, Morten.Rasmussen, efault, nicolas.pitre, linaro-kernel,
	daniel.lezcano, dietmar.eggemann, Vincent Guittot

I have tried to understand the meaning of the condition :
 (this_load <= load &&
  this_load + target_load(prev_cpu, idx) <= tl_per_task)
but i failed to find a use case that can take advantage of it and i haven't
found clear description in the previous commits' log.
Futhermore, the comment of the condition refers to the task_hot function that
was used before being replaced by the current condition:
/*
 * This domain has SD_WAKE_AFFINE and
 * p is cache cold in this domain, and
 * there is no bad imbalance.
 */

If we look more deeply the below condition
 this_load + target_load(prev_cpu, idx) <= tl_per_task

When sync is clear, we have :
 tl_per_task = runnable_load_avg / nr_running
 this_load = max(runnable_load_avg, cpuload[idx])
 target_load =  max(runnable_load_avg', cpuload'[idx])

It implies that runnable_load_avg' == 0 and nr_running <= 1 in order to match the
condition. This implies that runnable_load_avg == 0 too because of the
condition: this_load <= load
but if this _load is null, balanced is already set and the test is redundant.

If sync is set, it's not as straight forward as above (especially if cgroup
are involved) but the policy should be similar as we have removed a task that's
going to sleep in order to get a more accurate load and this_load values.

The current conclusion is that these additional condition don't give any benefit
so we can remove them.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 30 ++++++------------------------
 1 file changed, 6 insertions(+), 24 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7eb9126..57f8d8c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4285,7 +4285,6 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
 {
 	s64 this_load, load;
 	int idx, this_cpu, prev_cpu;
-	unsigned long tl_per_task;
 	struct task_group *tg;
 	unsigned long weight;
 	int balanced;
@@ -4343,32 +4342,15 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
 		balanced = this_eff_load <= prev_eff_load;
 	} else
 		balanced = true;
-
-	/*
-	 * If the currently running task will sleep within
-	 * a reasonable amount of time then attract this newly
-	 * woken task:
-	 */
-	if (sync && balanced)
-		return 1;
-
 	schedstat_inc(p, se.statistics.nr_wakeups_affine_attempts);
-	tl_per_task = cpu_avg_load_per_task(this_cpu);
 
-	if (balanced ||
-	    (this_load <= load &&
-	     this_load + target_load(prev_cpu, idx) <= tl_per_task)) {
-		/*
-		 * This domain has SD_WAKE_AFFINE and
-		 * p is cache cold in this domain, and
-		 * there is no bad imbalance.
-		 */
-		schedstat_inc(sd, ttwu_move_affine);
-		schedstat_inc(p, se.statistics.nr_wakeups_affine);
+	if (!balanced)
+		return 0;
 
-		return 1;
-	}
-	return 0;
+	schedstat_inc(sd, ttwu_move_affine);
+	schedstat_inc(p, se.statistics.nr_wakeups_affine);
+
+	return 1;
 }
 
 /*
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 03/12] sched: fix avg_load computation
  2014-07-28 17:51 [PATCH v4 00/12] sched: consolidation of cpu_capacity Vincent Guittot
  2014-07-28 17:51 ` [PATCH v4 01/12] sched: fix imbalance flag reset Vincent Guittot
  2014-07-28 17:51 ` [PATCH v4 02/12] sched: remove a wake_affine condition Vincent Guittot
@ 2014-07-28 17:51 ` Vincent Guittot
  2014-07-28 17:51 ` [PATCH v4 04/12] sched: Allow all archs to set the capacity_orig Vincent Guittot
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Vincent Guittot @ 2014-07-28 17:51 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel, preeti, linux, linux-arm-kernel
  Cc: riel, Morten.Rasmussen, efault, nicolas.pitre, linaro-kernel,
	daniel.lezcano, dietmar.eggemann, Vincent Guittot

The computation of avg_load and avg_load_per_task should only takes into
account the number of cfs tasks. The non cfs task are already taken into
account by decreasing the cpu's capacity and they will be tracked in the
CPU's utilization (group_utilization) of the next patches

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 57f8d8c..647d0a6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4095,7 +4095,7 @@ static unsigned long capacity_of(int cpu)
 static unsigned long cpu_avg_load_per_task(int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
-	unsigned long nr_running = ACCESS_ONCE(rq->nr_running);
+	unsigned long nr_running = ACCESS_ONCE(rq->cfs.h_nr_running);
 	unsigned long load_avg = rq->cfs.runnable_load_avg;
 
 	if (nr_running)
@@ -5909,7 +5909,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 			load = source_load(i, load_idx);
 
 		sgs->group_load += load;
-		sgs->sum_nr_running += rq->nr_running;
+		sgs->sum_nr_running += rq->cfs.h_nr_running;
 
 		if (rq->nr_running > 1)
 			*overload = true;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 04/12] sched: Allow all archs to set the capacity_orig
  2014-07-28 17:51 [PATCH v4 00/12] sched: consolidation of cpu_capacity Vincent Guittot
                   ` (2 preceding siblings ...)
  2014-07-28 17:51 ` [PATCH v4 03/12] sched: fix avg_load computation Vincent Guittot
@ 2014-07-28 17:51 ` Vincent Guittot
  2014-07-28 17:51 ` [PATCH v4 05/12] ARM: topology: use new cpu_capacity interface Vincent Guittot
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Vincent Guittot @ 2014-07-28 17:51 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel, preeti, linux, linux-arm-kernel
  Cc: riel, Morten.Rasmussen, efault, nicolas.pitre, linaro-kernel,
	daniel.lezcano, dietmar.eggemann, Vincent Guittot

capacity_orig is only changed for system with a SMT sched_domain level in order
to reflect the lower capacity of CPUs. Heterogenous system also have to reflect an
original capacity that is different from the default value.

Create a more generic function arch_scale_cpu_capacity that can be also used by
non SMT platform to set capacity_orig.

The weak behavior of arch_scale_cpu_capacity is the previous SMT one in order to
keep backward compatibility in the use of capacity_orig.

arch_scale_smt_capacity and default_scale_smt_capacity have been removed as
they were not use elsewhere than in arch_scale_cpu_capacity.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 25 ++++++-------------------
 1 file changed, 6 insertions(+), 19 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 647d0a6..00ff5f3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5642,19 +5642,12 @@ unsigned long __weak arch_scale_freq_capacity(struct sched_domain *sd, int cpu)
 	return default_scale_capacity(sd, cpu);
 }
 
-static unsigned long default_scale_smt_capacity(struct sched_domain *sd, int cpu)
+unsigned long __weak arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)
 {
-	unsigned long weight = sd->span_weight;
-	unsigned long smt_gain = sd->smt_gain;
+	if ((sd->flags & SD_SHARE_CPUCAPACITY) && (sd->span_weight > 1))
+		return sd->smt_gain / sd->span_weight;
 
-	smt_gain /= weight;
-
-	return smt_gain;
-}
-
-unsigned long __weak arch_scale_smt_capacity(struct sched_domain *sd, int cpu)
-{
-	return default_scale_smt_capacity(sd, cpu);
+	return SCHED_CAPACITY_SCALE;
 }
 
 static unsigned long scale_rt_capacity(int cpu)
@@ -5693,18 +5686,12 @@ static unsigned long scale_rt_capacity(int cpu)
 
 static void update_cpu_capacity(struct sched_domain *sd, int cpu)
 {
-	unsigned long weight = sd->span_weight;
 	unsigned long capacity = SCHED_CAPACITY_SCALE;
 	struct sched_group *sdg = sd->groups;
 
-	if ((sd->flags & SD_SHARE_CPUCAPACITY) && weight > 1) {
-		if (sched_feat(ARCH_CAPACITY))
-			capacity *= arch_scale_smt_capacity(sd, cpu);
-		else
-			capacity *= default_scale_smt_capacity(sd, cpu);
+	capacity *= arch_scale_cpu_capacity(sd, cpu);
 
-		capacity >>= SCHED_CAPACITY_SHIFT;
-	}
+	capacity >>= SCHED_CAPACITY_SHIFT;
 
 	sdg->sgc->capacity_orig = capacity;
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 05/12] ARM: topology: use new cpu_capacity interface
  2014-07-28 17:51 [PATCH v4 00/12] sched: consolidation of cpu_capacity Vincent Guittot
                   ` (3 preceding siblings ...)
  2014-07-28 17:51 ` [PATCH v4 04/12] sched: Allow all archs to set the capacity_orig Vincent Guittot
@ 2014-07-28 17:51 ` Vincent Guittot
  2014-07-28 17:51 ` [PATCH 06/12] sched: add per rq cpu_capacity_orig Vincent Guittot
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Vincent Guittot @ 2014-07-28 17:51 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel, preeti, linux, linux-arm-kernel
  Cc: riel, Morten.Rasmussen, efault, nicolas.pitre, linaro-kernel,
	daniel.lezcano, dietmar.eggemann, Vincent Guittot

Use the new arch_scale_cpu_capacity in order to reflect the original capacity
of a CPU instead of arch_scale_freq_capacity which is more linked to a scaling
of the capacity linked to the frequency.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 arch/arm/kernel/topology.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index e35d880..89cfdd6 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -42,7 +42,7 @@
  */
 static DEFINE_PER_CPU(unsigned long, cpu_scale);
 
-unsigned long arch_scale_freq_capacity(struct sched_domain *sd, int cpu)
+unsigned long arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)
 {
 	return per_cpu(cpu_scale, cpu);
 }
@@ -166,7 +166,7 @@ static void update_cpu_capacity(unsigned int cpu)
 	set_capacity_scale(cpu, cpu_capacity(cpu) / middle_capacity);
 
 	printk(KERN_INFO "CPU%u: update cpu_capacity %lu\n",
-		cpu, arch_scale_freq_capacity(NULL, cpu));
+		cpu, arch_scale_cpu_capacity(NULL, cpu));
 }
 
 #else
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 06/12] sched: add per rq cpu_capacity_orig
  2014-07-28 17:51 [PATCH v4 00/12] sched: consolidation of cpu_capacity Vincent Guittot
                   ` (4 preceding siblings ...)
  2014-07-28 17:51 ` [PATCH v4 05/12] ARM: topology: use new cpu_capacity interface Vincent Guittot
@ 2014-07-28 17:51 ` Vincent Guittot
  2014-07-28 17:51 ` [PATCH v4 07/12] sched: test the cpu's capacity in wake affine Vincent Guittot
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Vincent Guittot @ 2014-07-28 17:51 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel, preeti, linux, linux-arm-kernel
  Cc: riel, Morten.Rasmussen, efault, nicolas.pitre, linaro-kernel,
	daniel.lezcano, dietmar.eggemann, Vincent Guittot

This new field cpu_capacity_orig reflects the available capacity of a CPUs
unlike the cpu_capacity which reflects the current capacity that can be altered
by frequency and rt tasks.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/core.c  | 2 +-
 kernel/sched/fair.c  | 1 +
 kernel/sched/sched.h | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e1a2f31..fc5a275 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6976,7 +6976,7 @@ void __init sched_init(void)
 #ifdef CONFIG_SMP
 		rq->sd = NULL;
 		rq->rd = NULL;
-		rq->cpu_capacity = SCHED_CAPACITY_SCALE;
+		rq->cpu_capacity = rq->cpu_capacity_orig = SCHED_CAPACITY_SCALE;
 		rq->post_schedule = 0;
 		rq->active_balance = 0;
 		rq->next_balance = jiffies;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 00ff5f3..f6258bc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5693,6 +5693,7 @@ static void update_cpu_capacity(struct sched_domain *sd, int cpu)
 
 	capacity >>= SCHED_CAPACITY_SHIFT;
 
+	cpu_rq(cpu)->cpu_capacity_orig = capacity;
 	sdg->sgc->capacity_orig = capacity;
 
 	if (sched_feat(ARCH_CAPACITY))
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 0191ed5..5a8ef50 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -571,6 +571,7 @@ struct rq {
 	struct sched_domain *sd;
 
 	unsigned long cpu_capacity;
+	unsigned long cpu_capacity_orig;
 
 	unsigned char idle_balance;
 	/* For active balancing */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 07/12] sched: test the cpu's capacity in wake affine
  2014-07-28 17:51 [PATCH v4 00/12] sched: consolidation of cpu_capacity Vincent Guittot
                   ` (5 preceding siblings ...)
  2014-07-28 17:51 ` [PATCH 06/12] sched: add per rq cpu_capacity_orig Vincent Guittot
@ 2014-07-28 17:51 ` Vincent Guittot
  2014-07-28 17:51 ` [PATCH v4 08/12] sched: move cfs task on a CPU with higher capacity Vincent Guittot
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Vincent Guittot @ 2014-07-28 17:51 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel, preeti, linux, linux-arm-kernel
  Cc: riel, Morten.Rasmussen, efault, nicolas.pitre, linaro-kernel,
	daniel.lezcano, dietmar.eggemann, Vincent Guittot

Currently the task always wakes affine on this_cpu if the latter is idle.
Before waking up the task on this_cpu, we check that this_cpu capacity is not
significantly reduced because of RT tasks or irq activity.

Use case where the number of irq and/or the time spent under irq is important
will take benefit of this because the task that is woken up by irq or softirq
will not use the same CPU than irq (and softirq) but a idle one.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f6258bc..6843016 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4284,6 +4284,7 @@ static int wake_wide(struct task_struct *p)
 static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
 {
 	s64 this_load, load;
+	s64 this_eff_load, prev_eff_load;
 	int idx, this_cpu, prev_cpu;
 	struct task_group *tg;
 	unsigned long weight;
@@ -4327,21 +4328,21 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
 	 * Otherwise check if either cpus are near enough in load to allow this
 	 * task to be woken on this_cpu.
 	 */
-	if (this_load > 0) {
-		s64 this_eff_load, prev_eff_load;
+	this_eff_load = 100;
+	this_eff_load *= capacity_of(prev_cpu);
+
+	prev_eff_load = 100 + (sd->imbalance_pct - 100) / 2;
+	prev_eff_load *= capacity_of(this_cpu);
 
-		this_eff_load = 100;
-		this_eff_load *= capacity_of(prev_cpu);
+	if (this_load > 0) {
 		this_eff_load *= this_load +
 			effective_load(tg, this_cpu, weight, weight);
 
-		prev_eff_load = 100 + (sd->imbalance_pct - 100) / 2;
-		prev_eff_load *= capacity_of(this_cpu);
 		prev_eff_load *= load + effective_load(tg, prev_cpu, 0, weight);
+	}
+
+	balanced = this_eff_load <= prev_eff_load;
 
-		balanced = this_eff_load <= prev_eff_load;
-	} else
-		balanced = true;
 	schedstat_inc(p, se.statistics.nr_wakeups_affine_attempts);
 
 	if (!balanced)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 08/12] sched: move cfs task on a CPU with higher capacity
  2014-07-28 17:51 [PATCH v4 00/12] sched: consolidation of cpu_capacity Vincent Guittot
                   ` (6 preceding siblings ...)
  2014-07-28 17:51 ` [PATCH v4 07/12] sched: test the cpu's capacity in wake affine Vincent Guittot
@ 2014-07-28 17:51 ` Vincent Guittot
  2014-07-28 18:43   ` Rik van Riel
  2014-07-28 17:51 ` [PATCH 09/12] sched: add usage_load_avg Vincent Guittot
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 21+ messages in thread
From: Vincent Guittot @ 2014-07-28 17:51 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel, preeti, linux, linux-arm-kernel
  Cc: riel, Morten.Rasmussen, efault, nicolas.pitre, linaro-kernel,
	daniel.lezcano, dietmar.eggemann, Vincent Guittot

If the CPU is used for handling lot of IRQs, trig a load balance to check if
it's worth moving its tasks on another CPU that has more capacity.

As a sidenote, this will note generate more spurious ilb because we already
trig an ilb if there is more than 1 busy cpu. If this cpu is the only one that
has a task, we will trig the ilb once for migrating the task.

The nohz_kick_needed function has been cleaned up a bit while adding the new
test

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 69 +++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 49 insertions(+), 20 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6843016..1cde8dd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5969,6 +5969,14 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 			return true;
 	}
 
+	/*
+	 * The group capacity is reduced probably because of activity from other
+	 * sched class or interrupts which use part of the available capacity
+	 */
+	if ((sg->sgc->capacity_orig * 100) > (sgs->group_capacity *
+				env->sd->imbalance_pct))
+		return true;
+
 	return false;
 }
 
@@ -6455,13 +6463,23 @@ static int need_active_balance(struct lb_env *env)
 	struct sched_domain *sd = env->sd;
 
 	if (env->idle == CPU_NEWLY_IDLE) {
+		int src_cpu = env->src_cpu;
 
 		/*
 		 * ASYM_PACKING needs to force migrate tasks from busy but
 		 * higher numbered CPUs in order to pack all tasks in the
 		 * lowest numbered CPUs.
 		 */
-		if ((sd->flags & SD_ASYM_PACKING) && env->src_cpu > env->dst_cpu)
+		if ((sd->flags & SD_ASYM_PACKING) && src_cpu > env->dst_cpu)
+			return 1;
+
+		/*
+		 * If the CPUs share their cache and the src_cpu's capacity is
+		 * reduced because of other sched_class or IRQs, we trig an
+		 * active balance to move the task
+		 */
+		if ((capacity_orig_of(src_cpu) * 100) > (capacity_of(src_cpu) *
+				sd->imbalance_pct))
 			return 1;
 	}
 
@@ -6563,6 +6581,8 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 
 	schedstat_add(sd, lb_imbalance[idle], env.imbalance);
 
+	env.src_cpu = busiest->cpu;
+
 	ld_moved = 0;
 	if (busiest->nr_running > 1) {
 		/*
@@ -6572,7 +6592,6 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 		 * correctly treated as an imbalance.
 		 */
 		env.flags |= LBF_ALL_PINNED;
-		env.src_cpu   = busiest->cpu;
 		env.src_rq    = busiest;
 		env.loop_max  = min(sysctl_sched_nr_migrate, busiest->nr_running);
 
@@ -7262,10 +7281,12 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 
 /*
  * Current heuristic for kicking the idle load balancer in the presence
- * of an idle cpu is the system.
+ * of an idle cpu in the system.
  *   - This rq has more than one task.
- *   - At any scheduler domain level, this cpu's scheduler group has multiple
- *     busy cpu's exceeding the group's capacity.
+ *   - This rq has at least one CFS task and the capacity of the CPU is
+ *     significantly reduced because of RT tasks or IRQs.
+ *   - At parent of LLC scheduler domain level, this cpu's scheduler group has
+ *     multiple busy cpu.
  *   - For SD_ASYM_PACKING, if the lower numbered cpu's in the scheduler
  *     domain span are idle.
  */
@@ -7275,9 +7296,10 @@ static inline int nohz_kick_needed(struct rq *rq)
 	struct sched_domain *sd;
 	struct sched_group_capacity *sgc;
 	int nr_busy, cpu = rq->cpu;
+	bool kick = false;
 
 	if (unlikely(rq->idle_balance))
-		return 0;
+		return false;
 
        /*
 	* We may be recently in ticked or tickless idle mode. At the first
@@ -7291,38 +7313,45 @@ static inline int nohz_kick_needed(struct rq *rq)
 	 * balancing.
 	 */
 	if (likely(!atomic_read(&nohz.nr_cpus)))
-		return 0;
+		return false;
 
 	if (time_before(now, nohz.next_balance))
-		return 0;
+		return false;
 
 	if (rq->nr_running >= 2)
-		goto need_kick;
+		return true;
 
 	rcu_read_lock();
 	sd = rcu_dereference(per_cpu(sd_busy, cpu));
-
 	if (sd) {
 		sgc = sd->groups->sgc;
 		nr_busy = atomic_read(&sgc->nr_busy_cpus);
 
-		if (nr_busy > 1)
-			goto need_kick_unlock;
+		if (nr_busy > 1) {
+			kick = true;
+			goto unlock;
+		}
+
 	}
 
-	sd = rcu_dereference(per_cpu(sd_asym, cpu));
+	sd = rcu_dereference(rq->sd);
+	if (sd) {
+		if ((rq->cfs.h_nr_running >= 1) &&
+				((rq->cpu_capacity * sd->imbalance_pct) <
+				(rq->cpu_capacity_orig * 100))) {
+			kick = true;
+			goto unlock;
+		}
+	}
 
+	sd = rcu_dereference(per_cpu(sd_asym, cpu));
 	if (sd && (cpumask_first_and(nohz.idle_cpus_mask,
 				  sched_domain_span(sd)) < cpu))
-		goto need_kick_unlock;
+		kick = true;
 
+unlock:
 	rcu_read_unlock();
-	return 0;
-
-need_kick_unlock:
-	rcu_read_unlock();
-need_kick:
-	return 1;
+	return kick;
 }
 #else
 static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) { }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 09/12] sched: add usage_load_avg
  2014-07-28 17:51 [PATCH v4 00/12] sched: consolidation of cpu_capacity Vincent Guittot
                   ` (7 preceding siblings ...)
  2014-07-28 17:51 ` [PATCH v4 08/12] sched: move cfs task on a CPU with higher capacity Vincent Guittot
@ 2014-07-28 17:51 ` Vincent Guittot
  2014-07-28 17:51 ` [PATCH v4 10/12] sched: get CPU's utilization statistic Vincent Guittot
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Vincent Guittot @ 2014-07-28 17:51 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel, preeti, linux, linux-arm-kernel
  Cc: riel, Morten.Rasmussen, efault, nicolas.pitre, linaro-kernel,
	daniel.lezcano, dietmar.eggemann, Vincent Guittot

Add new statistics which reflect the average time a task is running on the
CPU and the sum of the tasks' running on a runqueue. The latter is named
usage_avg_contrib.

The rq's usage_avg_contrib will be used to check if a rq is overloaded or not
instead of trying to compute how many task a group of CPUs can handle

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 include/linux/sched.h |  4 ++--
 kernel/sched/fair.c   | 47 ++++++++++++++++++++++++++++++++++++++++++-----
 kernel/sched/sched.h  |  2 +-
 3 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 0376b05..6893d94 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1073,10 +1073,10 @@ struct sched_avg {
 	 * above by 1024/(1-y).  Thus we only need a u32 to store them for all
 	 * choices of y < 1-2^(-32)*1024.
 	 */
-	u32 runnable_avg_sum, runnable_avg_period;
+	u32 runnable_avg_sum, runnable_avg_period, running_avg_sum;
 	u64 last_runnable_update;
 	s64 decay_count;
-	unsigned long load_avg_contrib;
+	unsigned long load_avg_contrib, usage_avg_contrib;
 };
 
 #ifdef CONFIG_SCHEDSTATS
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1cde8dd..8bd57df 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -676,7 +676,7 @@ void init_task_runnable_average(struct task_struct *p)
 
 	p->se.avg.decay_count = 0;
 	slice = sched_slice(task_cfs_rq(p), &p->se) >> 10;
-	p->se.avg.runnable_avg_sum = slice;
+	p->se.avg.runnable_avg_sum = p->se.avg.running_avg_sum = slice;
 	p->se.avg.runnable_avg_period = slice;
 	__update_task_entity_contrib(&p->se);
 }
@@ -2292,7 +2292,8 @@ static u32 __compute_runnable_contrib(u64 n)
  */
 static __always_inline int __update_entity_runnable_avg(u64 now,
 							struct sched_avg *sa,
-							int runnable)
+							int runnable,
+							int running)
 {
 	u64 delta, periods;
 	u32 runnable_contrib;
@@ -2331,6 +2332,8 @@ static __always_inline int __update_entity_runnable_avg(u64 now,
 		delta_w = 1024 - delta_w;
 		if (runnable)
 			sa->runnable_avg_sum += delta_w;
+		if (running)
+			sa->running_avg_sum += delta_w;
 		sa->runnable_avg_period += delta_w;
 
 		delta -= delta_w;
@@ -2341,6 +2344,8 @@ static __always_inline int __update_entity_runnable_avg(u64 now,
 
 		sa->runnable_avg_sum = decay_load(sa->runnable_avg_sum,
 						  periods + 1);
+		sa->running_avg_sum = decay_load(sa->running_avg_sum,
+						  periods + 1);
 		sa->runnable_avg_period = decay_load(sa->runnable_avg_period,
 						     periods + 1);
 
@@ -2348,12 +2353,16 @@ static __always_inline int __update_entity_runnable_avg(u64 now,
 		runnable_contrib = __compute_runnable_contrib(periods);
 		if (runnable)
 			sa->runnable_avg_sum += runnable_contrib;
+		if (running)
+			sa->running_avg_sum += runnable_contrib;
 		sa->runnable_avg_period += runnable_contrib;
 	}
 
 	/* Remainder of delta accrued against u_0` */
 	if (runnable)
 		sa->runnable_avg_sum += delta;
+	if (running)
+		sa->running_avg_sum += delta;
 	sa->runnable_avg_period += delta;
 
 	return decayed;
@@ -2493,6 +2502,27 @@ static long __update_entity_load_avg_contrib(struct sched_entity *se)
 	return se->avg.load_avg_contrib - old_contrib;
 }
 
+
+static inline void __update_task_entity_usage(struct sched_entity *se)
+{
+	u32 contrib;
+
+	/* avoid overflowing a 32-bit type w/ SCHED_LOAD_SCALE */
+	contrib = se->avg.running_avg_sum * scale_load_down(SCHED_LOAD_SCALE);
+	contrib /= (se->avg.runnable_avg_period + 1);
+	se->avg.usage_avg_contrib = scale_load(contrib);
+}
+
+static long __update_entity_usage_avg_contrib(struct sched_entity *se)
+{
+	long old_contrib = se->avg.usage_avg_contrib;
+
+	if (entity_is_task(se))
+		__update_task_entity_usage(se);
+
+	return se->avg.usage_avg_contrib - old_contrib;
+}
+
 static inline void subtract_blocked_load_contrib(struct cfs_rq *cfs_rq,
 						 long load_contrib)
 {
@@ -2509,7 +2539,7 @@ static inline void update_entity_load_avg(struct sched_entity *se,
 					  int update_cfs_rq)
 {
 	struct cfs_rq *cfs_rq = cfs_rq_of(se);
-	long contrib_delta;
+	long contrib_delta, usage_delta;
 	u64 now;
 
 	/*
@@ -2521,16 +2551,20 @@ static inline void update_entity_load_avg(struct sched_entity *se,
 	else
 		now = cfs_rq_clock_task(group_cfs_rq(se));
 
-	if (!__update_entity_runnable_avg(now, &se->avg, se->on_rq))
+	if (!__update_entity_runnable_avg(now, &se->avg, se->on_rq,
+					cfs_rq->curr == se))
 		return;
 
 	contrib_delta = __update_entity_load_avg_contrib(se);
+	usage_delta = __update_entity_usage_avg_contrib(se);
 
 	if (!update_cfs_rq)
 		return;
 
-	if (se->on_rq)
+	if (se->on_rq) {
 		cfs_rq->runnable_load_avg += contrib_delta;
+		cfs_rq->usage_load_avg += usage_delta;
+	}
 	else
 		subtract_blocked_load_contrib(cfs_rq, -contrib_delta);
 }
@@ -2607,6 +2641,7 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq,
 	}
 
 	cfs_rq->runnable_load_avg += se->avg.load_avg_contrib;
+	cfs_rq->usage_load_avg += se->avg.usage_avg_contrib;
 	/* we force update consideration on load-balancer moves */
 	update_cfs_rq_blocked_load(cfs_rq, !wakeup);
 }
@@ -2625,6 +2660,7 @@ static inline void dequeue_entity_load_avg(struct cfs_rq *cfs_rq,
 	update_cfs_rq_blocked_load(cfs_rq, !sleep);
 
 	cfs_rq->runnable_load_avg -= se->avg.load_avg_contrib;
+	cfs_rq->usage_load_avg -= se->avg.usage_avg_contrib;
 	if (sleep) {
 		cfs_rq->blocked_load_avg += se->avg.load_avg_contrib;
 		se->avg.decay_count = atomic64_read(&cfs_rq->decay_counter);
@@ -2962,6 +2998,7 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
 		 */
 		update_stats_wait_end(cfs_rq, se);
 		__dequeue_entity(cfs_rq, se);
+		update_entity_load_avg(se, 1);
 	}
 
 	update_stats_curr_start(cfs_rq, se);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 5a8ef50..e5ab9b1 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -336,7 +336,7 @@ struct cfs_rq {
 	 * This allows for the description of both thread and group usage (in
 	 * the FAIR_GROUP_SCHED case).
 	 */
-	unsigned long runnable_load_avg, blocked_load_avg;
+	unsigned long runnable_load_avg, blocked_load_avg, usage_load_avg;
 	atomic64_t decay_counter;
 	u64 last_decay;
 	atomic_long_t removed_load;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 10/12] sched: get CPU's utilization statistic
  2014-07-28 17:51 [PATCH v4 00/12] sched: consolidation of cpu_capacity Vincent Guittot
                   ` (8 preceding siblings ...)
  2014-07-28 17:51 ` [PATCH 09/12] sched: add usage_load_avg Vincent Guittot
@ 2014-07-28 17:51 ` Vincent Guittot
  2014-07-28 17:51 ` [PATCH v4 11/12] sched: replace capacity_factor by utilization Vincent Guittot
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Vincent Guittot @ 2014-07-28 17:51 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel, preeti, linux, linux-arm-kernel
  Cc: riel, Morten.Rasmussen, efault, nicolas.pitre, linaro-kernel,
	daniel.lezcano, dietmar.eggemann, Vincent Guittot

Monitor the utilization level of each group of each sched_domain level. The
utilization is the amount of cpu_capacity that is currently used on a CPU or
group of CPUs. We use the usage_load_avg to evaluate this utilization level.
In the special use case where the CPU is fully loaded by more than 1 task,
the activity level is set above the cpu_capacity in order to reflect the overload
of the CPU

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8bd57df..8524760 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4129,6 +4129,11 @@ static unsigned long capacity_of(int cpu)
 	return cpu_rq(cpu)->cpu_capacity;
 }
 
+static unsigned long capacity_orig_of(int cpu)
+{
+	return cpu_rq(cpu)->cpu_capacity_orig;
+}
+
 static unsigned long cpu_avg_load_per_task(int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
@@ -4517,6 +4522,17 @@ static int select_idle_sibling(struct task_struct *p, int target)
 	return target;
 }
 
+static int get_cpu_utilization(int cpu)
+{
+	unsigned long usage = cpu_rq(cpu)->cfs.usage_load_avg;
+	unsigned long capacity = capacity_of(cpu);
+
+	if (usage >= SCHED_LOAD_SCALE)
+		return capacity + 1;
+
+	return (usage * capacity) >> SCHED_LOAD_SHIFT;
+}
+
 /*
  * select_task_rq_fair: Select target runqueue for the waking task in domains
  * that have the 'sd_flag' flag set. In practice, this is SD_BALANCE_WAKE,
@@ -5596,6 +5612,7 @@ struct sg_lb_stats {
 	unsigned long sum_weighted_load; /* Weighted load of group's tasks */
 	unsigned long load_per_task;
 	unsigned long group_capacity;
+	unsigned long group_utilization; /* Total utilization of the group */
 	unsigned int sum_nr_running; /* Nr tasks running in the group */
 	unsigned int group_capacity_factor;
 	unsigned int idle_cpus;
@@ -5935,6 +5952,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 			load = source_load(i, load_idx);
 
 		sgs->group_load += load;
+		sgs->group_utilization += get_cpu_utilization(i);
 		sgs->sum_nr_running += rq->cfs.h_nr_running;
 
 		if (rq->nr_running > 1)
@@ -6108,7 +6126,6 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
 		/* Now, start updating sd_lb_stats */
 		sds->total_load += sgs->group_load;
 		sds->total_capacity += sgs->group_capacity;
-
 		sg = sg->next;
 	} while (sg != env->sd->groups);
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 11/12] sched: replace capacity_factor by utilization
  2014-07-28 17:51 [PATCH v4 00/12] sched: consolidation of cpu_capacity Vincent Guittot
                   ` (9 preceding siblings ...)
  2014-07-28 17:51 ` [PATCH v4 10/12] sched: get CPU's utilization statistic Vincent Guittot
@ 2014-07-28 17:51 ` Vincent Guittot
  2014-07-28 17:51 ` [PATCH v4 12/12] sched: add SD_PREFER_SIBLING for SMT level Vincent Guittot
  2014-07-28 18:52 ` [PATCH v4 00/12] sched: consolidation of cpu_capacity Rik van Riel
  12 siblings, 0 replies; 21+ messages in thread
From: Vincent Guittot @ 2014-07-28 17:51 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel, preeti, linux, linux-arm-kernel
  Cc: riel, Morten.Rasmussen, efault, nicolas.pitre, linaro-kernel,
	daniel.lezcano, dietmar.eggemann, Vincent Guittot

The scheduler tries to compute how many tasks a group of CPUs can handle by
assuming that a task's load is SCHED_LOAD_SCALE and a CPU capacity is
SCHED_CAPACITY_SCALE.
We can now have a better idea of the capacity of a group of CPUs and of the
utilization of this group thanks to the rework of group_capacity_orig and the
group_utilization. We can now deduct how many capacity is still available.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 115 +++++++++++++++++++++-------------------------------
 1 file changed, 47 insertions(+), 68 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8524760..292ee7c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5612,13 +5612,13 @@ struct sg_lb_stats {
 	unsigned long sum_weighted_load; /* Weighted load of group's tasks */
 	unsigned long load_per_task;
 	unsigned long group_capacity;
+	unsigned long group_capacity_orig;
 	unsigned long group_utilization; /* Total utilization of the group */
 	unsigned int sum_nr_running; /* Nr tasks running in the group */
-	unsigned int group_capacity_factor;
 	unsigned int idle_cpus;
 	unsigned int group_weight;
 	int group_imb; /* Is there an imbalance in the group ? */
-	int group_has_free_capacity;
+	int group_out_of_capacity;
 #ifdef CONFIG_NUMA_BALANCING
 	unsigned int nr_numa_running;
 	unsigned int nr_preferred_running;
@@ -5838,31 +5838,6 @@ void update_group_capacity(struct sched_domain *sd, int cpu)
 }
 
 /*
- * Try and fix up capacity for tiny siblings, this is needed when
- * things like SD_ASYM_PACKING need f_b_g to select another sibling
- * which on its own isn't powerful enough.
- *
- * See update_sd_pick_busiest() and check_asym_packing().
- */
-static inline int
-fix_small_capacity(struct sched_domain *sd, struct sched_group *group)
-{
-	/*
-	 * Only siblings can have significantly less than SCHED_CAPACITY_SCALE
-	 */
-	if (!(sd->flags & SD_SHARE_CPUCAPACITY))
-		return 0;
-
-	/*
-	 * If ~90% of the cpu_capacity is still there, we're good.
-	 */
-	if (group->sgc->capacity * 32 > group->sgc->capacity_orig * 29)
-		return 1;
-
-	return 0;
-}
-
-/*
  * Group imbalance indicates (and tries to solve) the problem where balancing
  * groups is inadequate due to tsk_cpus_allowed() constraints.
  *
@@ -5896,32 +5871,30 @@ static inline int sg_imbalanced(struct sched_group *group)
 	return group->sgc->imbalance;
 }
 
-/*
- * Compute the group capacity factor.
- *
- * Avoid the issue where N*frac(smt_capacity) >= 1 creates 'phantom' cores by
- * first dividing out the smt factor and computing the actual number of cores
- * and limit unit capacity with that.
- */
-static inline int sg_capacity_factor(struct lb_env *env, struct sched_group *group)
+static inline int group_has_free_capacity(struct sg_lb_stats *sgs,
+			struct lb_env *env)
 {
-	unsigned int capacity_factor, smt, cpus;
-	unsigned int capacity, capacity_orig;
+	if ((sgs->group_capacity_orig * 100) >
+			(sgs->group_utilization * env->sd->imbalance_pct))
+		return 1;
 
-	capacity = group->sgc->capacity;
-	capacity_orig = group->sgc->capacity_orig;
-	cpus = group->group_weight;
+	if (sgs->sum_nr_running < sgs->group_weight)
+		return 1;
 
-	/* smt := ceil(cpus / capacity), assumes: 1 < smt_capacity < 2 */
-	smt = DIV_ROUND_UP(SCHED_CAPACITY_SCALE * cpus, capacity_orig);
-	capacity_factor = cpus / smt; /* cores */
+	return 0;
+}
 
-	capacity_factor = min_t(unsigned,
-		capacity_factor, DIV_ROUND_CLOSEST(capacity, SCHED_CAPACITY_SCALE));
-	if (!capacity_factor)
-		capacity_factor = fix_small_capacity(env->sd, group);
+static inline int group_is_overloaded(struct sg_lb_stats *sgs,
+			struct lb_env *env)
+{
+	if (sgs->sum_nr_running <= sgs->group_weight)
+		return 0;
 
-	return capacity_factor;
+	if ((sgs->group_capacity_orig * 100) <
+			(sgs->group_utilization * env->sd->imbalance_pct))
+		return 1;
+
+	return 0;
 }
 
 /**
@@ -5967,6 +5940,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 			sgs->idle_cpus++;
 	}
 
+	sgs->group_capacity_orig = group->sgc->capacity_orig;
 	/* Adjust by relative CPU capacity of the group */
 	sgs->group_capacity = group->sgc->capacity;
 	sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / sgs->group_capacity;
@@ -5977,10 +5951,8 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 	sgs->group_weight = group->group_weight;
 
 	sgs->group_imb = sg_imbalanced(group);
-	sgs->group_capacity_factor = sg_capacity_factor(env, group);
 
-	if (sgs->group_capacity_factor > sgs->sum_nr_running)
-		sgs->group_has_free_capacity = 1;
+	sgs->group_out_of_capacity = group_is_overloaded(sgs, env);
 }
 
 /**
@@ -6004,7 +5976,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 	if (sgs->avg_load <= sds->busiest_stat.avg_load)
 		return false;
 
-	if (sgs->sum_nr_running > sgs->group_capacity_factor)
+	if (sgs->group_out_of_capacity)
 		return true;
 
 	if (sgs->group_imb)
@@ -6105,17 +6077,21 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
 
 		/*
 		 * In case the child domain prefers tasks go to siblings
-		 * first, lower the sg capacity factor to one so that we'll try
+		 * first, lower the sg capacity to one so that we'll try
 		 * and move all the excess tasks away. We lower the capacity
 		 * of a group only if the local group has the capacity to fit
-		 * these excess tasks, i.e. nr_running < group_capacity_factor. The
+		 * these excess tasks, i.e. group_capacity > 0. The
 		 * extra check prevents the case where you always pull from the
 		 * heaviest group when it is already under-utilized (possible
 		 * with a large weight task outweighs the tasks on the system).
 		 */
 		if (prefer_sibling && sds->local &&
-		    sds->local_stat.group_has_free_capacity)
-			sgs->group_capacity_factor = min(sgs->group_capacity_factor, 1U);
+		    group_has_free_capacity(&sds->local_stat, env)) {
+			if (sgs->sum_nr_running > 1)
+				sgs->group_out_of_capacity = 1;
+			sgs->group_capacity = min(sgs->group_capacity,
+						SCHED_CAPACITY_SCALE);
+		}
 
 		if (update_sd_pick_busiest(env, sds, sg, sgs)) {
 			sds->busiest = sg;
@@ -6294,11 +6270,12 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
 		 * Except of course for the group_imb case, since then we might
 		 * have to drop below capacity to reach cpu-load equilibrium.
 		 */
-		load_above_capacity =
-			(busiest->sum_nr_running - busiest->group_capacity_factor);
-
-		load_above_capacity *= (SCHED_LOAD_SCALE * SCHED_CAPACITY_SCALE);
-		load_above_capacity /= busiest->group_capacity;
+		load_above_capacity = busiest->sum_nr_running *
+					SCHED_LOAD_SCALE;
+		if (load_above_capacity > busiest->group_capacity)
+			load_above_capacity -= busiest->group_capacity;
+		else
+			load_above_capacity = ~0UL;
 	}
 
 	/*
@@ -6361,6 +6338,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
 	local = &sds.local_stat;
 	busiest = &sds.busiest_stat;
 
+	/* ASYM feature bypasses nice load balance check */
 	if ((env->idle == CPU_IDLE || env->idle == CPU_NEWLY_IDLE) &&
 	    check_asym_packing(env, &sds))
 		return sds.busiest;
@@ -6381,8 +6359,9 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
 		goto force_balance;
 
 	/* SD_BALANCE_NEWIDLE trumps SMP nice when underutilized */
-	if (env->idle == CPU_NEWLY_IDLE && local->group_has_free_capacity &&
-	    !busiest->group_has_free_capacity)
+	if (env->idle == CPU_NEWLY_IDLE &&
+			group_has_free_capacity(local, env) &&
+			busiest->group_out_of_capacity)
 		goto force_balance;
 
 	/*
@@ -6440,7 +6419,7 @@ static struct rq *find_busiest_queue(struct lb_env *env,
 	int i;
 
 	for_each_cpu_and(i, sched_group_cpus(group), env->cpus) {
-		unsigned long capacity, capacity_factor, wl;
+		unsigned long capacity, wl;
 		enum fbq_type rt;
 
 		rq = cpu_rq(i);
@@ -6469,9 +6448,6 @@ static struct rq *find_busiest_queue(struct lb_env *env,
 			continue;
 
 		capacity = capacity_of(i);
-		capacity_factor = DIV_ROUND_CLOSEST(capacity, SCHED_CAPACITY_SCALE);
-		if (!capacity_factor)
-			capacity_factor = fix_small_capacity(env->sd, group);
 
 		wl = weighted_cpuload(i);
 
@@ -6479,7 +6455,10 @@ static struct rq *find_busiest_queue(struct lb_env *env,
 		 * When comparing with imbalance, use weighted_cpuload()
 		 * which is not scaled with the cpu capacity.
 		 */
-		if (capacity_factor && rq->nr_running == 1 && wl > env->imbalance)
+
+		if (rq->nr_running == 1 && wl > env->imbalance &&
+		    ((capacity * env->sd->imbalance_pct) >=
+				(rq->cpu_capacity_orig * 100)))
 			continue;
 
 		/*
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 12/12] sched: add SD_PREFER_SIBLING for SMT level
  2014-07-28 17:51 [PATCH v4 00/12] sched: consolidation of cpu_capacity Vincent Guittot
                   ` (10 preceding siblings ...)
  2014-07-28 17:51 ` [PATCH v4 11/12] sched: replace capacity_factor by utilization Vincent Guittot
@ 2014-07-28 17:51 ` Vincent Guittot
  2014-07-28 18:52 ` [PATCH v4 00/12] sched: consolidation of cpu_capacity Rik van Riel
  12 siblings, 0 replies; 21+ messages in thread
From: Vincent Guittot @ 2014-07-28 17:51 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel, preeti, linux, linux-arm-kernel
  Cc: riel, Morten.Rasmussen, efault, nicolas.pitre, linaro-kernel,
	daniel.lezcano, dietmar.eggemann, Vincent Guittot

add the SD_PREFER_SIBLING flag for SMT level in order to ensure that
the scheduler will put at least 1 task per core.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fc5a275..3e4eea4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6080,6 +6080,7 @@ sd_init(struct sched_domain_topology_level *tl, int cpu)
 	 */
 
 	if (sd->flags & SD_SHARE_CPUCAPACITY) {
+		sd->flags |= SD_PREFER_SIBLING;
 		sd->imbalance_pct = 110;
 		sd->smt_gain = 1178; /* ~15% */
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 08/12] sched: move cfs task on a CPU with higher capacity
  2014-07-28 17:51 ` [PATCH v4 08/12] sched: move cfs task on a CPU with higher capacity Vincent Guittot
@ 2014-07-28 18:43   ` Rik van Riel
  2014-07-29  7:40     ` Vincent Guittot
  0 siblings, 1 reply; 21+ messages in thread
From: Rik van Riel @ 2014-07-28 18:43 UTC (permalink / raw)
  To: Vincent Guittot, peterz, mingo, linux-kernel, preeti, linux,
	linux-arm-kernel
  Cc: Morten.Rasmussen, efault, nicolas.pitre, linaro-kernel,
	daniel.lezcano, dietmar.eggemann

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/28/2014 01:51 PM, Vincent Guittot wrote:
> If the CPU is used for handling lot of IRQs, trig a load balance to
> check if it's worth moving its tasks on another CPU that has more
> capacity.
> 
> As a sidenote, this will note generate more spurious ilb because we
> already trig an ilb if there is more than 1 busy cpu. If this cpu
> is the only one that has a task, we will trig the ilb once for
> migrating the task.
> 
> The nohz_kick_needed function has been cleaned up a bit while
> adding the new test
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> --- 
> kernel/sched/fair.c | 69
> +++++++++++++++++++++++++++++++++++++---------------- 1 file
> changed, 49 insertions(+), 20 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index
> 6843016..1cde8dd 100644 --- a/kernel/sched/fair.c +++
> b/kernel/sched/fair.c @@ -5969,6 +5969,14 @@ static bool
> update_sd_pick_busiest(struct lb_env *env, return true; }
> 
> +	/* +	 * The group capacity is reduced probably because of
> activity from other +	 * sched class or interrupts which use part
> of the available capacity +	 */ +	if ((sg->sgc->capacity_orig *
> 100) > (sgs->group_capacity * +				env->sd->imbalance_pct)) +
> return true; + return false; }

Isn't this already reflected in avg_load, by having avg_load increase
due to the capacity decreasing when a cpu is busy with non-CFS loads?

Also, this part of update_sd_pick_busiest will not be reached in the
!SD_ASYM_PACKING case once my patch is applied, so this is a small
conflict between our series :)

- -- 
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJT1pm+AAoJEM553pKExN6DysEH/1/s2dY0EWM4mVctRAcyASc+
qD5yQgDZEbnIzuUldtTRNwlAxHSaexLI7oF418RAaUV3oue+OQIesJPhKDVrR2+J
fLo4fhtutuF1SHJ5Zo2fiGBIUI+GuLspT2fXiG2UxXQXtYioVdDeB+cjo6H3xQ3D
R2hR+WeSsLznwFhRnufI1neleIRpqk/Nw1wfdXCyE03kM478rCQjlygMUx6eqURn
jblr7GY7jmtzhYFPY9qnE0za/WHWUVAf4RXSjCcuYwZqdhbzmHPKpJyiC3cl9XGz
kNzAjqVzvSyCBblaOnJfrRyCWmBdatp5eZQCzZK9/l+YpbkkzQJNlUMAAVV1eNw=
=qD2A
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 00/12] sched: consolidation of cpu_capacity
  2014-07-28 17:51 [PATCH v4 00/12] sched: consolidation of cpu_capacity Vincent Guittot
                   ` (11 preceding siblings ...)
  2014-07-28 17:51 ` [PATCH v4 12/12] sched: add SD_PREFER_SIBLING for SMT level Vincent Guittot
@ 2014-07-28 18:52 ` Rik van Riel
  12 siblings, 0 replies; 21+ messages in thread
From: Rik van Riel @ 2014-07-28 18:52 UTC (permalink / raw)
  To: Vincent Guittot, peterz, mingo, linux-kernel, preeti, linux,
	linux-arm-kernel
  Cc: Morten.Rasmussen, efault, nicolas.pitre, linaro-kernel,
	daniel.lezcano, dietmar.eggemann

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/28/2014 01:51 PM, Vincent Guittot wrote:

> The usage_avg_contrib is based on the current implementation of
> the load avg tracking. I also have a version of the
> usage_avg_contrib that is based on the new implementation [3] but
> haven't provide the patches and results as [3] is still under
> review. I can provide change above [3] to change how 
> usage_avg_contrib is computed and adapt to new mecanism.
> 
> TODO: manage conflict with the next version of [4]

FWIW, your series makes sense to me, and only the first hunk of patch
8/12 looks like it would conflict with my patches, in that your patch
changes code that will only be reached in exceptional circumstances.

- -- 
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJT1pviAAoJEM553pKExN6D4IoH/3Ou8H8kM5zhgDc4sDY5h+0F
qFDZIwdCWtKL36QfuZXIS/SXzikzs8XjxG7PIDj/sIEcpzs5jzGvlqAr/nTbQGlo
2ONnXobwSYSPbf8tEKrX4sdPS4YH1sd8+DgHC9TdYOgnWhdjpvfrwA9vFHmsZZvC
SxlLKCwqdWgtQPCJiGGJGdQOaB+OsNAJcoyo8OyokdOUrpMX30bfLT+8TiAiB/zS
uKrjzj3yZbrd/CRgpTSYopZgtzxHciZLLuIAurw7BJrUPL8Q1SL1HQ5eC5q/eS07
hXJrD6/o2DDfWoyeT2eDdJ9QkZKxT3vDKqC4WyVYn/2JZMzD5C+PY3XmQk7uYbU=
=zV+U
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 08/12] sched: move cfs task on a CPU with higher capacity
  2014-07-28 18:43   ` Rik van Riel
@ 2014-07-29  7:40     ` Vincent Guittot
  0 siblings, 0 replies; 21+ messages in thread
From: Vincent Guittot @ 2014-07-29  7:40 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Preeti U Murthy,
	Russell King - ARM Linux, LAK, Morten Rasmussen, Mike Galbraith,
	Nicolas Pitre, linaro-kernel, Daniel Lezcano, Dietmar Eggemann

On 28 July 2014 20:43, Rik van Riel <riel@redhat.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 07/28/2014 01:51 PM, Vincent Guittot wrote:
>> If the CPU is used for handling lot of IRQs, trig a load balance to
>> check if it's worth moving its tasks on another CPU that has more
>> capacity.
>>
>> As a sidenote, this will note generate more spurious ilb because we
>> already trig an ilb if there is more than 1 busy cpu. If this cpu
>> is the only one that has a task, we will trig the ilb once for
>> migrating the task.
>>
>> The nohz_kick_needed function has been cleaned up a bit while
>> adding the new test
>>
>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> ---
>> kernel/sched/fair.c | 69
>> +++++++++++++++++++++++++++++++++++++---------------- 1 file
>> changed, 49 insertions(+), 20 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index
>> 6843016..1cde8dd 100644 --- a/kernel/sched/fair.c +++
>> b/kernel/sched/fair.c @@ -5969,6 +5969,14 @@ static bool
>> update_sd_pick_busiest(struct lb_env *env, return true; }
>>
>> +     /* +     * The group capacity is reduced probably because of
>> activity from other +  * sched class or interrupts which use part
>> of the available capacity +    */ +   if ((sg->sgc->capacity_orig *
>> 100) > (sgs->group_capacity * +                               env->sd->imbalance_pct)) +
>> return true; + return false; }
>
> Isn't this already reflected in avg_load, by having avg_load increase
> due to the capacity decreasing when a cpu is busy with non-CFS loads?

Yes, the avg_load should probably increase but it doesn't mean that it
will be selected for a load balancing.

>
> Also, this part of update_sd_pick_busiest will not be reached in the
> !SD_ASYM_PACKING case once my patch is applied, so this is a small
> conflict between our series :)

yes, this will conflict with your change but the conflict should be
obvious to solve as it only adds a new condition for picking the group

>
> - --
> All rights reversed
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQEcBAEBAgAGBQJT1pm+AAoJEM553pKExN6DysEH/1/s2dY0EWM4mVctRAcyASc+
> qD5yQgDZEbnIzuUldtTRNwlAxHSaexLI7oF418RAaUV3oue+OQIesJPhKDVrR2+J
> fLo4fhtutuF1SHJ5Zo2fiGBIUI+GuLspT2fXiG2UxXQXtYioVdDeB+cjo6H3xQ3D
> R2hR+WeSsLznwFhRnufI1neleIRpqk/Nw1wfdXCyE03kM478rCQjlygMUx6eqURn
> jblr7GY7jmtzhYFPY9qnE0za/WHWUVAf4RXSjCcuYwZqdhbzmHPKpJyiC3cl9XGz
> kNzAjqVzvSyCBblaOnJfrRyCWmBdatp5eZQCzZK9/l+YpbkkzQJNlUMAAVV1eNw=
> =qD2A
> -----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 01/12] sched: fix imbalance flag reset
  2014-07-28 17:51 ` [PATCH v4 01/12] sched: fix imbalance flag reset Vincent Guittot
@ 2014-11-23 10:25   ` Wanpeng Li
  2014-11-24 10:31     ` Vincent Guittot
  2014-11-24 23:47   ` Wanpeng Li
  1 sibling, 1 reply; 21+ messages in thread
From: Wanpeng Li @ 2014-11-23 10:25 UTC (permalink / raw)
  To: Vincent Guittot, peterz, mingo, linux-kernel, preeti, linux,
	linux-arm-kernel
  Cc: riel, Morten.Rasmussen, efault, nicolas.pitre, linaro-kernel,
	daniel.lezcano, dietmar.eggemann

Hi Vincent,
On 7/29/14, 1:51 AM, Vincent Guittot wrote:
> The imbalance flag can stay set whereas there is no imbalance.
>
> Let assume that we have 3 tasks that run on a dual cores /dual cluster system.
> We will have some idle load balance which are triggered during tick.
> Unfortunately, the tick is also used to queue background work so we can reach
> the situation where short work has been queued on a CPU which already runs a
> task. The load balance will detect this imbalance (2 tasks on 1 CPU and an idle
> CPU) and will try to pull the waiting task on the idle CPU. The waiting task is
> a worker thread that is pinned on a CPU so an imbalance due to pinned task is
> detected and the imbalance flag is set.

The waiting task is the third task or one the '2 tasks on 1 CPU' ?

Regards,
Wanpeng Li

> Then, we will not be able to clear the flag because we have at most 1 task on
> each CPU but the imbalance flag will trig to useless active load balance
> between the idle CPU and the busy CPU.
>
> We need to reset of the imbalance flag as soon as we have reached a balanced
> state. If all tasks are pinned, we don't consider that as a balanced state and
> let the imbalance flag set.
>
> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>   kernel/sched/fair.c | 23 +++++++++++++++++++----
>   1 file changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 923fe32..7eb9126 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6672,10 +6672,8 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>   		if (sd_parent) {
>   			int *group_imbalance = &sd_parent->groups->sgc->imbalance;
>   
> -			if ((env.flags & LBF_SOME_PINNED) && env.imbalance > 0) {
> +			if ((env.flags & LBF_SOME_PINNED) && env.imbalance > 0)
>   				*group_imbalance = 1;
> -			} else if (*group_imbalance)
> -				*group_imbalance = 0;
>   		}
>   
>   		/* All tasks on this runqueue were pinned by CPU affinity */
> @@ -6686,7 +6684,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>   				env.loop_break = sched_nr_migrate_break;
>   				goto redo;
>   			}
> -			goto out_balanced;
> +			goto out_all_pinned;
>   		}
>   	}
>   
> @@ -6760,6 +6758,23 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>   	goto out;
>   
>   out_balanced:
> +	/*
> +	 * We reach balance although we may have faced some affinity
> +	 * constraints. Clear the imbalance flag if it was set.
> +	 */
> +	if (sd_parent) {
> +		int *group_imbalance = &sd_parent->groups->sgc->imbalance;
> +
> +		if (*group_imbalance)
> +			*group_imbalance = 0;
> +	}
> +
> +out_all_pinned:
> +	/*
> +	 * We reach balance because all tasks are pinned at this level so
> +	 * we can't migrate them. Let the imbalance flag set so parent level
> +	 * can try to migrate them.
> +	 */
>   	schedstat_inc(sd, lb_balanced[idle]);
>   
>   	sd->nr_balance_failed = 0;


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 01/12] sched: fix imbalance flag reset
  2014-11-23 10:25   ` Wanpeng Li
@ 2014-11-24 10:31     ` Vincent Guittot
  0 siblings, 0 replies; 21+ messages in thread
From: Vincent Guittot @ 2014-11-24 10:31 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Preeti U Murthy,
	Russell King - ARM Linux, LAK, Rik van Riel, Morten Rasmussen,
	Mike Galbraith, Nicolas Pitre, linaro-kernel, Daniel Lezcano,
	Dietmar Eggemann

On 23 November 2014 at 11:25, Wanpeng Li <kernellwp@gmail.com> wrote:
> Hi Vincent,
> On 7/29/14, 1:51 AM, Vincent Guittot wrote:
>>
>> The imbalance flag can stay set whereas there is no imbalance.
>>
>> Let assume that we have 3 tasks that run on a dual cores /dual cluster
>> system.
>> We will have some idle load balance which are triggered during tick.
>> Unfortunately, the tick is also used to queue background work so we can
>> reach
>> the situation where short work has been queued on a CPU which already runs
>> a
>> task. The load balance will detect this imbalance (2 tasks on 1 CPU and an
>> idle
>> CPU) and will try to pull the waiting task on the idle CPU. The waiting
>> task is
>> a worker thread that is pinned on a CPU so an imbalance due to pinned task
>> is
>> detected and the imbalance flag is set.
>
>
> The waiting task is the third task or one the '2 tasks on 1 CPU' ?

The waiting task is one of the 2 tasks on 1 CPU (the worker)

Regards,
Vincent

>
> Regards,
> Wanpeng Li
>
>
>> Then, we will not be able to clear the flag because we have at most 1 task
>> on
>> each CPU but the imbalance flag will trig to useless active load balance
>> between the idle CPU and the busy CPU.
>>
>> We need to reset of the imbalance flag as soon as we have reached a
>> balanced
>> state. If all tasks are pinned, we don't consider that as a balanced state
>> and
>> let the imbalance flag set.
>>
>> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>> ---
>>   kernel/sched/fair.c | 23 +++++++++++++++++++----
>>   1 file changed, 19 insertions(+), 4 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 923fe32..7eb9126 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -6672,10 +6672,8 @@ static int load_balance(int this_cpu, struct rq
>> *this_rq,
>>                 if (sd_parent) {
>>                         int *group_imbalance =
>> &sd_parent->groups->sgc->imbalance;
>>   -                     if ((env.flags & LBF_SOME_PINNED) && env.imbalance
>> > 0) {
>> +                       if ((env.flags & LBF_SOME_PINNED) && env.imbalance
>> > 0)
>>                                 *group_imbalance = 1;
>> -                       } else if (*group_imbalance)
>> -                               *group_imbalance = 0;
>>                 }
>>                 /* All tasks on this runqueue were pinned by CPU affinity
>> */
>> @@ -6686,7 +6684,7 @@ static int load_balance(int this_cpu, struct rq
>> *this_rq,
>>                                 env.loop_break = sched_nr_migrate_break;
>>                                 goto redo;
>>                         }
>> -                       goto out_balanced;
>> +                       goto out_all_pinned;
>>                 }
>>         }
>>   @@ -6760,6 +6758,23 @@ static int load_balance(int this_cpu, struct rq
>> *this_rq,
>>         goto out;
>>     out_balanced:
>> +       /*
>> +        * We reach balance although we may have faced some affinity
>> +        * constraints. Clear the imbalance flag if it was set.
>> +        */
>> +       if (sd_parent) {
>> +               int *group_imbalance = &sd_parent->groups->sgc->imbalance;
>> +
>> +               if (*group_imbalance)
>> +                       *group_imbalance = 0;
>> +       }
>> +
>> +out_all_pinned:
>> +       /*
>> +        * We reach balance because all tasks are pinned at this level so
>> +        * we can't migrate them. Let the imbalance flag set so parent
>> level
>> +        * can try to migrate them.
>> +        */
>>         schedstat_inc(sd, lb_balanced[idle]);
>>         sd->nr_balance_failed = 0;
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 01/12] sched: fix imbalance flag reset
  2014-07-28 17:51 ` [PATCH v4 01/12] sched: fix imbalance flag reset Vincent Guittot
  2014-11-23 10:25   ` Wanpeng Li
@ 2014-11-24 23:47   ` Wanpeng Li
  2014-11-25  9:04     ` Vincent Guittot
  1 sibling, 1 reply; 21+ messages in thread
From: Wanpeng Li @ 2014-11-24 23:47 UTC (permalink / raw)
  To: Vincent Guittot, peterz, mingo, linux-kernel, preeti, linux,
	linux-arm-kernel
  Cc: riel, Morten.Rasmussen, efault, nicolas.pitre, linaro-kernel,
	daniel.lezcano, dietmar.eggemann

Hi Vincent,
On 7/29/14, 1:51 AM, Vincent Guittot wrote:
> The imbalance flag can stay set whereas there is no imbalance.
>
> Let assume that we have 3 tasks that run on a dual cores /dual cluster system.
> We will have some idle load balance which are triggered during tick.
> Unfortunately, the tick is also used to queue background work so we can reach
> the situation where short work has been queued on a CPU which already runs a
> task. The load balance will detect this imbalance (2 tasks on 1 CPU and an idle
> CPU) and will try to pull the waiting task on the idle CPU. The waiting task is
> a worker thread that is pinned on a CPU so an imbalance due to pinned task is
> detected and the imbalance flag is set.
> Then, we will not be able to clear the flag because we have at most 1 task on
> each CPU but the imbalance flag will trig to useless active load balance
> between the idle CPU and the busy CPU.
>
> We need to reset of the imbalance flag as soon as we have reached a balanced
> state. If all tasks are pinned, we don't consider that as a balanced state and
> let the imbalance flag set.
>
> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>   kernel/sched/fair.c | 23 +++++++++++++++++++----
>   1 file changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 923fe32..7eb9126 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6672,10 +6672,8 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>   		if (sd_parent) {
>   			int *group_imbalance = &sd_parent->groups->sgc->imbalance;
>   
> -			if ((env.flags & LBF_SOME_PINNED) && env.imbalance > 0) {
> +			if ((env.flags & LBF_SOME_PINNED) && env.imbalance > 0)
>   				*group_imbalance = 1;
> -			} else if (*group_imbalance)
> -				*group_imbalance = 0;

As you mentioned above " We need to reset of the imbalance flag as soon 
as we have reached a balanced state. " I think the codes before your 
patch have already do this, where I miss? Great thanks for your patient. ;-)

Regards,
Wanpeng Li

>   		}
>   
>   		/* All tasks on this runqueue were pinned by CPU affinity */
> @@ -6686,7 +6684,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>   				env.loop_break = sched_nr_migrate_break;
>   				goto redo;
>   			}
> -			goto out_balanced;
> +			goto out_all_pinned;
>   		}
>   	}
>   
> @@ -6760,6 +6758,23 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>   	goto out;
>   
>   out_balanced:
> +	/*
> +	 * We reach balance although we may have faced some affinity
> +	 * constraints. Clear the imbalance flag if it was set.
> +	 */
> +	if (sd_parent) {
> +		int *group_imbalance = &sd_parent->groups->sgc->imbalance;
> +
> +		if (*group_imbalance)
> +			*group_imbalance = 0;
> +	}
> +
> +out_all_pinned:
> +	/*
> +	 * We reach balance because all tasks are pinned at this level so
> +	 * we can't migrate them. Let the imbalance flag set so parent level
> +	 * can try to migrate them.
> +	 */
>   	schedstat_inc(sd, lb_balanced[idle]);
>   
>   	sd->nr_balance_failed = 0;


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 01/12] sched: fix imbalance flag reset
  2014-11-24 23:47   ` Wanpeng Li
@ 2014-11-25  9:04     ` Vincent Guittot
  2014-11-25 10:13       ` Wanpeng Li
  0 siblings, 1 reply; 21+ messages in thread
From: Vincent Guittot @ 2014-11-25  9:04 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Preeti U Murthy,
	Russell King - ARM Linux, LAK, Rik van Riel, Morten Rasmussen,
	Mike Galbraith, Nicolas Pitre, linaro-kernel, Daniel Lezcano,
	Dietmar Eggemann

On 25 November 2014 at 00:47, Wanpeng Li <kernellwp@gmail.com> wrote:
> Hi Vincent,
> On 7/29/14, 1:51 AM, Vincent Guittot wrote:
>>
>> The imbalance flag can stay set whereas there is no imbalance.
>>
>> Let assume that we have 3 tasks that run on a dual cores /dual cluster
>> system.
>> We will have some idle load balance which are triggered during tick.
>> Unfortunately, the tick is also used to queue background work so we can
>> reach
>> the situation where short work has been queued on a CPU which already runs
>> a
>> task. The load balance will detect this imbalance (2 tasks on 1 CPU and an
>> idle
>> CPU) and will try to pull the waiting task on the idle CPU. The waiting
>> task is
>> a worker thread that is pinned on a CPU so an imbalance due to pinned task
>> is
>> detected and the imbalance flag is set.
>> Then, we will not be able to clear the flag because we have at most 1 task
>> on
>> each CPU but the imbalance flag will trig to useless active load balance
>> between the idle CPU and the busy CPU.
>>
>> We need to reset of the imbalance flag as soon as we have reached a
>> balanced
>> state. If all tasks are pinned, we don't consider that as a balanced state
>> and
>> let the imbalance flag set.
>>
>> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>> ---
>>   kernel/sched/fair.c | 23 +++++++++++++++++++----
>>   1 file changed, 19 insertions(+), 4 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 923fe32..7eb9126 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -6672,10 +6672,8 @@ static int load_balance(int this_cpu, struct rq
>> *this_rq,
>>                 if (sd_parent) {
>>                         int *group_imbalance =
>> &sd_parent->groups->sgc->imbalance;
>>   -                     if ((env.flags & LBF_SOME_PINNED) && env.imbalance
>> > 0) {
>> +                       if ((env.flags & LBF_SOME_PINNED) && env.imbalance
>> > 0)
>>                                 *group_imbalance = 1;
>> -                       } else if (*group_imbalance)
>> -                               *group_imbalance = 0;
>
>
> As you mentioned above " We need to reset of the imbalance flag as soon as
> we have reached a balanced state. " I think the codes before your patch have
> already do this, where I miss? Great thanks for your patient. ;-)

The previous code was called only when busiest->nr_running > 1.  The
background activity will be on the rq only 1 tick per few seconds and
we will set qroup_imbalance when the background activity is on the rq.
Then, during the next load balances, the qroup_imbalance is still set
but we can't clear qroup_imbalance  because we have only 1 task per rq

Regards,
Vincent

>
> Regards,
> Wanpeng Li
>
>
>>                 }
>>                 /* All tasks on this runqueue were pinned by CPU affinity
>> */
>> @@ -6686,7 +6684,7 @@ static int load_balance(int this_cpu, struct rq
>> *this_rq,
>>                                 env.loop_break = sched_nr_migrate_break;
>>                                 goto redo;
>>                         }
>> -                       goto out_balanced;
>> +                       goto out_all_pinned;
>>                 }
>>         }
>>   @@ -6760,6 +6758,23 @@ static int load_balance(int this_cpu, struct rq
>> *this_rq,
>>         goto out;
>>     out_balanced:
>> +       /*
>> +        * We reach balance although we may have faced some affinity
>> +        * constraints. Clear the imbalance flag if it was set.
>> +        */
>> +       if (sd_parent) {
>> +               int *group_imbalance = &sd_parent->groups->sgc->imbalance;
>> +
>> +               if (*group_imbalance)
>> +                       *group_imbalance = 0;
>> +       }
>> +
>> +out_all_pinned:
>> +       /*
>> +        * We reach balance because all tasks are pinned at this level so
>> +        * we can't migrate them. Let the imbalance flag set so parent
>> level
>> +        * can try to migrate them.
>> +        */
>>         schedstat_inc(sd, lb_balanced[idle]);
>>         sd->nr_balance_failed = 0;
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 01/12] sched: fix imbalance flag reset
  2014-11-25  9:04     ` Vincent Guittot
@ 2014-11-25 10:13       ` Wanpeng Li
  0 siblings, 0 replies; 21+ messages in thread
From: Wanpeng Li @ 2014-11-25 10:13 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Preeti U Murthy,
	Russell King - ARM Linux, LAK, Rik van Riel, Morten Rasmussen,
	Mike Galbraith, Nicolas Pitre, linaro-kernel, Daniel Lezcano,
	Dietmar Eggemann

Hi Vincent,
On 11/25/14, 5:04 PM, Vincent Guittot wrote:
> On 25 November 2014 at 00:47, Wanpeng Li <kernellwp@gmail.com> wrote:
>> Hi Vincent,
>> On 7/29/14, 1:51 AM, Vincent Guittot wrote:
>>> The imbalance flag can stay set whereas there is no imbalance.
>>>
>>> Let assume that we have 3 tasks that run on a dual cores /dual cluster
>>> system.
>>> We will have some idle load balance which are triggered during tick.
>>> Unfortunately, the tick is also used to queue background work so we can
>>> reach
>>> the situation where short work has been queued on a CPU which already runs
>>> a
>>> task. The load balance will detect this imbalance (2 tasks on 1 CPU and an
>>> idle
>>> CPU) and will try to pull the waiting task on the idle CPU. The waiting
>>> task is
>>> a worker thread that is pinned on a CPU so an imbalance due to pinned task
>>> is
>>> detected and the imbalance flag is set.
>>> Then, we will not be able to clear the flag because we have at most 1 task
>>> on
>>> each CPU but the imbalance flag will trig to useless active load balance
>>> between the idle CPU and the busy CPU.
>>>
>>> We need to reset of the imbalance flag as soon as we have reached a
>>> balanced
>>> state. If all tasks are pinned, we don't consider that as a balanced state
>>> and
>>> let the imbalance flag set.
>>>
>>> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
>>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>>> ---
>>>    kernel/sched/fair.c | 23 +++++++++++++++++++----
>>>    1 file changed, 19 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 923fe32..7eb9126 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -6672,10 +6672,8 @@ static int load_balance(int this_cpu, struct rq
>>> *this_rq,
>>>                  if (sd_parent) {
>>>                          int *group_imbalance =
>>> &sd_parent->groups->sgc->imbalance;
>>>    -                     if ((env.flags & LBF_SOME_PINNED) && env.imbalance
>>>> 0) {
>>> +                       if ((env.flags & LBF_SOME_PINNED) && env.imbalance
>>>> 0)
>>>                                  *group_imbalance = 1;
>>> -                       } else if (*group_imbalance)
>>> -                               *group_imbalance = 0;
>>
>> As you mentioned above " We need to reset of the imbalance flag as soon as
>> we have reached a balanced state. " I think the codes before your patch have
>> already do this, where I miss? Great thanks for your patient. ;-)
> The previous code was called only when busiest->nr_running > 1.  The
> background activity will be on the rq only 1 tick per few seconds and
> we will set qroup_imbalance when the background activity is on the rq.
> Then, during the next load balances, the qroup_imbalance is still set
> but we can't clear qroup_imbalance  because we have only 1 task per rq

There is no load balance I think since busiest->nr_running > 1 is not 
true even if the patch is not applied.

Regards,
Wanpeng Li

>
> Regards,
> Vincent
>
>> Regards,
>> Wanpeng Li
>>
>>
>>>                  }
>>>                  /* All tasks on this runqueue were pinned by CPU affinity
>>> */
>>> @@ -6686,7 +6684,7 @@ static int load_balance(int this_cpu, struct rq
>>> *this_rq,
>>>                                  env.loop_break = sched_nr_migrate_break;
>>>                                  goto redo;
>>>                          }
>>> -                       goto out_balanced;
>>> +                       goto out_all_pinned;
>>>                  }
>>>          }
>>>    @@ -6760,6 +6758,23 @@ static int load_balance(int this_cpu, struct rq
>>> *this_rq,
>>>          goto out;
>>>      out_balanced:
>>> +       /*
>>> +        * We reach balance although we may have faced some affinity
>>> +        * constraints. Clear the imbalance flag if it was set.
>>> +        */
>>> +       if (sd_parent) {
>>> +               int *group_imbalance = &sd_parent->groups->sgc->imbalance;
>>> +
>>> +               if (*group_imbalance)
>>> +                       *group_imbalance = 0;
>>> +       }
>>> +
>>> +out_all_pinned:
>>> +       /*
>>> +        * We reach balance because all tasks are pinned at this level so
>>> +        * we can't migrate them. Let the imbalance flag set so parent
>>> level
>>> +        * can try to migrate them.
>>> +        */
>>>          schedstat_inc(sd, lb_balanced[idle]);
>>>          sd->nr_balance_failed = 0;
>>


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2014-11-25 10:14 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-28 17:51 [PATCH v4 00/12] sched: consolidation of cpu_capacity Vincent Guittot
2014-07-28 17:51 ` [PATCH v4 01/12] sched: fix imbalance flag reset Vincent Guittot
2014-11-23 10:25   ` Wanpeng Li
2014-11-24 10:31     ` Vincent Guittot
2014-11-24 23:47   ` Wanpeng Li
2014-11-25  9:04     ` Vincent Guittot
2014-11-25 10:13       ` Wanpeng Li
2014-07-28 17:51 ` [PATCH v4 02/12] sched: remove a wake_affine condition Vincent Guittot
2014-07-28 17:51 ` [PATCH v4 03/12] sched: fix avg_load computation Vincent Guittot
2014-07-28 17:51 ` [PATCH v4 04/12] sched: Allow all archs to set the capacity_orig Vincent Guittot
2014-07-28 17:51 ` [PATCH v4 05/12] ARM: topology: use new cpu_capacity interface Vincent Guittot
2014-07-28 17:51 ` [PATCH 06/12] sched: add per rq cpu_capacity_orig Vincent Guittot
2014-07-28 17:51 ` [PATCH v4 07/12] sched: test the cpu's capacity in wake affine Vincent Guittot
2014-07-28 17:51 ` [PATCH v4 08/12] sched: move cfs task on a CPU with higher capacity Vincent Guittot
2014-07-28 18:43   ` Rik van Riel
2014-07-29  7:40     ` Vincent Guittot
2014-07-28 17:51 ` [PATCH 09/12] sched: add usage_load_avg Vincent Guittot
2014-07-28 17:51 ` [PATCH v4 10/12] sched: get CPU's utilization statistic Vincent Guittot
2014-07-28 17:51 ` [PATCH v4 11/12] sched: replace capacity_factor by utilization Vincent Guittot
2014-07-28 17:51 ` [PATCH v4 12/12] sched: add SD_PREFER_SIBLING for SMT level Vincent Guittot
2014-07-28 18:52 ` [PATCH v4 00/12] sched: consolidation of cpu_capacity Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).