linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] sched: remove cpu_load decay.
@ 2013-11-22  6:37 Alex Shi
  2013-11-22  6:37 ` [RFC PATCH 1/4] sched: shortcut to remove load_idx effect Alex Shi
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Alex Shi @ 2013-11-22  6:37 UTC (permalink / raw)
  To: mingo, peterz, morten.rasmussen, vincent.guittot, daniel.lezcano,
	fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
	fengguang.wu
  Cc: james.hogan, alex.shi, jason.low2, gregkh, hanjun.guo, linux-kernel

The cpu_load decays on time according past cpu load of rq. New sched_avg decays on tasks' load of time. Now we has 2 kind decay for cpu_load. That is a kind of redundancy. And increase the system load in sched_tick etc.

This patch trying to remove the cpu_load decay. And fixed a nohz_full bug by the way. 

There are 5 load_idx used for cpu_load in sched_domain. busy_idx and idle_idx are not zero usually, but newidle_idx, wake_idx and forkexec_idx are all zero on every arch. A shortcut to remove cpu_Load decay in the first patch. just one line patch for this change. :)

I have tested the patchset on my pandaES board, 2 cores ARM Cortex A9.
hackbench thread/pipe performance increased nearly 10% with this patchset! That do surprise me!

	latest kernel 527d1511310a89		+ this patchset
hackbench -T -g 10 -f 40
	23.25"					21.7"
	23.16"					19.99"
	24.24"					21.53"
hackbench -p -g 10 -f 40
	26.52"					22.48"
	23.89"					24.00"
	25.65"					23.06"
hackbench -P -g 10 -f 40
	20.14"					19.37"
	19.96"					19.76"
	21.76"					21.54"

The git tree for this patchset at:
 git@github.com:alexshi/power-scheduling.git no-load-idx 
Since Fengguang had included this tree into his kernel testing system. and I haven't get a regression report until now. I suppose it is fine for x86 system.

But anyway, since the scheduler change will effect all archs. and hackbench is only benchmark I found now for this patchset. I'd like to see more testing and talking on this patchset.

Regards
Alex


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH 1/4] sched: shortcut to remove load_idx effect
  2013-11-22  6:37 [RFC PATCH 0/4] sched: remove cpu_load decay Alex Shi
@ 2013-11-22  6:37 ` Alex Shi
  2013-11-22  6:37 ` [RFC PATCH 2/4] sched: change rq->cpu_load[load_idx] array to rq->cpu_load Alex Shi
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Alex Shi @ 2013-11-22  6:37 UTC (permalink / raw)
  To: mingo, peterz, morten.rasmussen, vincent.guittot, daniel.lezcano,
	fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
	fengguang.wu
  Cc: james.hogan, alex.shi, jason.low2, gregkh, hanjun.guo, linux-kernel

Shortcut to remove rq->cpu_load[load_idx] effect in scheduler.
In five load idx, only busy_idx, idle_idx are not zero.
Newidle_idx, wake_idx and fork_idx are all zero in all archs.

So, change the idx to zero here can fully remove load_idx.

Signed-off-by: Alex Shi <alex.shi@linaro.org>
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e8b652e..ce683aa 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5633,7 +5633,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
 	if (child && child->flags & SD_PREFER_SIBLING)
 		prefer_sibling = 1;
 
-	load_idx = get_sd_load_idx(env->sd, env->idle);
+	load_idx = 0;
 
 	do {
 		struct sg_lb_stats *sgs = &tmp_sgs;
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH 2/4] sched: change rq->cpu_load[load_idx] array to rq->cpu_load
  2013-11-22  6:37 [RFC PATCH 0/4] sched: remove cpu_load decay Alex Shi
  2013-11-22  6:37 ` [RFC PATCH 1/4] sched: shortcut to remove load_idx effect Alex Shi
@ 2013-11-22  6:37 ` Alex Shi
  2013-11-22  6:37 ` [RFC PATCH 3/4] sched: clean up __update_cpu_load Alex Shi
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Alex Shi @ 2013-11-22  6:37 UTC (permalink / raw)
  To: mingo, peterz, morten.rasmussen, vincent.guittot, daniel.lezcano,
	fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
	fengguang.wu
  Cc: james.hogan, alex.shi, jason.low2, gregkh, hanjun.guo, linux-kernel

Since load_idx effect removed in load balance, we don't need the
load_idx decays in scheduler. that will save some time in sched_tick
and others places.

Signed-off-by: Alex Shi <alex.shi@linaro.org>
---
 arch/ia64/include/asm/topology.h  |  5 ---
 arch/metag/include/asm/topology.h |  5 ---
 arch/tile/include/asm/topology.h  |  6 ---
 include/linux/sched.h             |  5 ---
 include/linux/topology.h          |  8 ----
 kernel/sched/core.c               | 60 ++++++++-----------------
 kernel/sched/debug.c              |  6 +--
 kernel/sched/fair.c               | 79 +++++++++------------------------
 kernel/sched/proc.c               | 92 ++-------------------------------------
 kernel/sched/sched.h              |  3 +-
 10 files changed, 43 insertions(+), 226 deletions(-)

diff --git a/arch/ia64/include/asm/topology.h b/arch/ia64/include/asm/topology.h
index a2496e4..54e5b17 100644
--- a/arch/ia64/include/asm/topology.h
+++ b/arch/ia64/include/asm/topology.h
@@ -55,11 +55,6 @@ void build_cpu_to_node_map(void);
 	.busy_factor		= 64,			\
 	.imbalance_pct		= 125,			\
 	.cache_nice_tries	= 2,			\
-	.busy_idx		= 2,			\
-	.idle_idx		= 1,			\
-	.newidle_idx		= 0,			\
-	.wake_idx		= 0,			\
-	.forkexec_idx		= 0,			\
 	.flags			= SD_LOAD_BALANCE	\
 				| SD_BALANCE_NEWIDLE	\
 				| SD_BALANCE_EXEC	\
diff --git a/arch/metag/include/asm/topology.h b/arch/metag/include/asm/topology.h
index 8e9c0b3..d1d15cd 100644
--- a/arch/metag/include/asm/topology.h
+++ b/arch/metag/include/asm/topology.h
@@ -13,11 +13,6 @@
 	.busy_factor		= 32,			\
 	.imbalance_pct		= 125,			\
 	.cache_nice_tries	= 2,			\
-	.busy_idx		= 3,			\
-	.idle_idx		= 2,			\
-	.newidle_idx		= 0,			\
-	.wake_idx		= 0,			\
-	.forkexec_idx		= 0,			\
 	.flags			= SD_LOAD_BALANCE	\
 				| SD_BALANCE_FORK	\
 				| SD_BALANCE_EXEC	\
diff --git a/arch/tile/include/asm/topology.h b/arch/tile/include/asm/topology.h
index d15c0d8..05f6ffe 100644
--- a/arch/tile/include/asm/topology.h
+++ b/arch/tile/include/asm/topology.h
@@ -57,12 +57,6 @@ static inline const struct cpumask *cpumask_of_node(int node)
 	.busy_factor		= 64,					\
 	.imbalance_pct		= 125,					\
 	.cache_nice_tries	= 1,					\
-	.busy_idx		= 2,					\
-	.idle_idx		= 1,					\
-	.newidle_idx		= 0,					\
-	.wake_idx		= 0,					\
-	.forkexec_idx		= 0,					\
-									\
 	.flags			= 1*SD_LOAD_BALANCE			\
 				| 1*SD_BALANCE_NEWIDLE			\
 				| 1*SD_BALANCE_EXEC			\
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 7e35d4b..a23e02d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -815,11 +815,6 @@ struct sched_domain {
 	unsigned int busy_factor;	/* less balancing by factor if busy */
 	unsigned int imbalance_pct;	/* No balance until over watermark */
 	unsigned int cache_nice_tries;	/* Leave cache hot tasks for # tries */
-	unsigned int busy_idx;
-	unsigned int idle_idx;
-	unsigned int newidle_idx;
-	unsigned int wake_idx;
-	unsigned int forkexec_idx;
 	unsigned int smt_gain;
 
 	int nohz_idle;			/* NOHZ IDLE status */
diff --git a/include/linux/topology.h b/include/linux/topology.h
index 12ae6ce..863fad3 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -121,9 +121,6 @@ int arch_update_cpu_topology(void);
 	.busy_factor		= 64,					\
 	.imbalance_pct		= 125,					\
 	.cache_nice_tries	= 1,					\
-	.busy_idx		= 2,					\
-	.wake_idx		= 0,					\
-	.forkexec_idx		= 0,					\
 									\
 	.flags			= 1*SD_LOAD_BALANCE			\
 				| 1*SD_BALANCE_NEWIDLE			\
@@ -151,11 +148,6 @@ int arch_update_cpu_topology(void);
 	.busy_factor		= 64,					\
 	.imbalance_pct		= 125,					\
 	.cache_nice_tries	= 1,					\
-	.busy_idx		= 2,					\
-	.idle_idx		= 1,					\
-	.newidle_idx		= 0,					\
-	.wake_idx		= 0,					\
-	.forkexec_idx		= 0,					\
 									\
 	.flags			= 1*SD_LOAD_BALANCE			\
 				| 1*SD_BALANCE_NEWIDLE			\
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c180860..9528f75 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4279,61 +4279,42 @@ static void sd_free_ctl_entry(struct ctl_table **tablep)
 	*tablep = NULL;
 }
 
-static int min_load_idx = 0;
-static int max_load_idx = CPU_LOAD_IDX_MAX-1;
-
 static void
 set_table_entry(struct ctl_table *entry,
 		const char *procname, void *data, int maxlen,
-		umode_t mode, proc_handler *proc_handler,
-		bool load_idx)
+		umode_t mode, proc_handler *proc_handler)
 {
 	entry->procname = procname;
 	entry->data = data;
 	entry->maxlen = maxlen;
 	entry->mode = mode;
 	entry->proc_handler = proc_handler;
-
-	if (load_idx) {
-		entry->extra1 = &min_load_idx;
-		entry->extra2 = &max_load_idx;
-	}
 }
 
 static struct ctl_table *
 sd_alloc_ctl_domain_table(struct sched_domain *sd)
 {
-	struct ctl_table *table = sd_alloc_ctl_entry(13);
+	struct ctl_table *table = sd_alloc_ctl_entry(8);
 
 	if (table == NULL)
 		return NULL;
 
 	set_table_entry(&table[0], "min_interval", &sd->min_interval,
-		sizeof(long), 0644, proc_doulongvec_minmax, false);
+		sizeof(long), 0644, proc_doulongvec_minmax);
 	set_table_entry(&table[1], "max_interval", &sd->max_interval,
-		sizeof(long), 0644, proc_doulongvec_minmax, false);
-	set_table_entry(&table[2], "busy_idx", &sd->busy_idx,
-		sizeof(int), 0644, proc_dointvec_minmax, true);
-	set_table_entry(&table[3], "idle_idx", &sd->idle_idx,
-		sizeof(int), 0644, proc_dointvec_minmax, true);
-	set_table_entry(&table[4], "newidle_idx", &sd->newidle_idx,
-		sizeof(int), 0644, proc_dointvec_minmax, true);
-	set_table_entry(&table[5], "wake_idx", &sd->wake_idx,
-		sizeof(int), 0644, proc_dointvec_minmax, true);
-	set_table_entry(&table[6], "forkexec_idx", &sd->forkexec_idx,
-		sizeof(int), 0644, proc_dointvec_minmax, true);
-	set_table_entry(&table[7], "busy_factor", &sd->busy_factor,
-		sizeof(int), 0644, proc_dointvec_minmax, false);
-	set_table_entry(&table[8], "imbalance_pct", &sd->imbalance_pct,
-		sizeof(int), 0644, proc_dointvec_minmax, false);
-	set_table_entry(&table[9], "cache_nice_tries",
+		sizeof(long), 0644, proc_doulongvec_minmax);
+	set_table_entry(&table[2], "busy_factor", &sd->busy_factor,
+		sizeof(int), 0644, proc_dointvec_minmax);
+	set_table_entry(&table[3], "imbalance_pct", &sd->imbalance_pct,
+		sizeof(int), 0644, proc_dointvec_minmax);
+	set_table_entry(&table[4], "cache_nice_tries",
 		&sd->cache_nice_tries,
-		sizeof(int), 0644, proc_dointvec_minmax, false);
-	set_table_entry(&table[10], "flags", &sd->flags,
-		sizeof(int), 0644, proc_dointvec_minmax, false);
-	set_table_entry(&table[11], "name", sd->name,
-		CORENAME_MAX_SIZE, 0444, proc_dostring, false);
-	/* &table[12] is terminator */
+		sizeof(int), 0644, proc_dointvec_minmax);
+	set_table_entry(&table[5], "flags", &sd->flags,
+		sizeof(int), 0644, proc_dointvec_minmax);
+	set_table_entry(&table[6], "name", sd->name,
+		CORENAME_MAX_SIZE, 0444, proc_dostring);
+	/* &table[7] is terminator */
 
 	return table;
 }
@@ -5425,11 +5406,6 @@ sd_numa_init(struct sched_domain_topology_level *tl, int cpu)
 		.busy_factor		= 32,
 		.imbalance_pct		= 125,
 		.cache_nice_tries	= 2,
-		.busy_idx		= 3,
-		.idle_idx		= 2,
-		.newidle_idx		= 0,
-		.wake_idx		= 0,
-		.forkexec_idx		= 0,
 
 		.flags			= 1*SD_LOAD_BALANCE
 					| 1*SD_BALANCE_NEWIDLE
@@ -6178,7 +6154,7 @@ DECLARE_PER_CPU(cpumask_var_t, load_balance_mask);
 
 void __init sched_init(void)
 {
-	int i, j;
+	int i;
 	unsigned long alloc_size = 0, ptr;
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
@@ -6279,9 +6255,7 @@ void __init sched_init(void)
 		init_tg_rt_entry(&root_task_group, &rq->rt, NULL, i, NULL);
 #endif
 
-		for (j = 0; j < CPU_LOAD_IDX_MAX; j++)
-			rq->cpu_load[j] = 0;
-
+		rq->cpu_load = 0;
 		rq->last_load_update_tick = jiffies;
 
 #ifdef CONFIG_SMP
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 5c34d18..675be71 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -303,11 +303,7 @@ do {									\
 	PN(next_balance);
 	SEQ_printf(m, "  .%-30s: %ld\n", "curr->pid", (long)(task_pid_nr(rq->curr)));
 	PN(clock);
-	P(cpu_load[0]);
-	P(cpu_load[1]);
-	P(cpu_load[2]);
-	P(cpu_load[3]);
-	P(cpu_load[4]);
+	P(cpu_load);
 #undef P
 #undef PN
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ce683aa..bccdd89 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -977,8 +977,8 @@ static inline unsigned long group_weight(struct task_struct *p, int nid)
 }
 
 static unsigned long weighted_cpuload(const int cpu);
-static unsigned long source_load(int cpu, int type);
-static unsigned long target_load(int cpu, int type);
+static unsigned long source_load(int cpu);
+static unsigned long target_load(int cpu);
 static unsigned long power_of(int cpu);
 static long effective_load(struct task_group *tg, int cpu, long wl, long wg);
 
@@ -3794,30 +3794,30 @@ static unsigned long weighted_cpuload(const int cpu)
  * We want to under-estimate the load of migration sources, to
  * balance conservatively.
  */
-static unsigned long source_load(int cpu, int type)
+static unsigned long source_load(int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
 	unsigned long total = weighted_cpuload(cpu);
 
-	if (type == 0 || !sched_feat(LB_BIAS))
+	if (!sched_feat(LB_BIAS))
 		return total;
 
-	return min(rq->cpu_load[type-1], total);
+	return min(rq->cpu_load, total);
 }
 
 /*
  * Return a high guess at the load of a migration-target cpu weighted
  * according to the scheduling class and "nice" value.
  */
-static unsigned long target_load(int cpu, int type)
+static unsigned long target_load(int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
 	unsigned long total = weighted_cpuload(cpu);
 
-	if (type == 0 || !sched_feat(LB_BIAS))
+	if (!sched_feat(LB_BIAS))
 		return total;
 
-	return max(rq->cpu_load[type-1], total);
+	return max(rq->cpu_load, total);
 }
 
 static unsigned long power_of(int cpu)
@@ -4017,7 +4017,7 @@ static int wake_wide(struct task_struct *p)
 static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
 {
 	s64 this_load, load;
-	int idx, this_cpu, prev_cpu;
+	int this_cpu, prev_cpu;
 	unsigned long tl_per_task;
 	struct task_group *tg;
 	unsigned long weight;
@@ -4030,11 +4030,10 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
 	if (wake_wide(p))
 		return 0;
 
-	idx	  = sd->wake_idx;
 	this_cpu  = smp_processor_id();
 	prev_cpu  = task_cpu(p);
-	load	  = source_load(prev_cpu, idx);
-	this_load = target_load(this_cpu, idx);
+	load	  = source_load(prev_cpu);
+	this_load = target_load(this_cpu);
 
 	/*
 	 * If sync wakeup then subtract the (maximum possible)
@@ -4090,7 +4089,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
 
 	if (balanced ||
 	    (this_load <= load &&
-	     this_load + target_load(prev_cpu, idx) <= tl_per_task)) {
+	     this_load + target_load(prev_cpu) <= tl_per_task)) {
 		/*
 		 * This domain has SD_WAKE_AFFINE and
 		 * p is cache cold in this domain, and
@@ -4109,8 +4108,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
  * domain.
  */
 static struct sched_group *
-find_idlest_group(struct sched_domain *sd, struct task_struct *p,
-		  int this_cpu, int load_idx)
+find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu)
 {
 	struct sched_group *idlest = NULL, *group = sd->groups;
 	unsigned long min_load = ULONG_MAX, this_load = 0;
@@ -4135,9 +4133,9 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p,
 		for_each_cpu(i, sched_group_cpus(group)) {
 			/* Bias balancing toward cpus of our domain */
 			if (local_group)
-				load = source_load(i, load_idx);
+				load = source_load(i);
 			else
-				load = target_load(i, load_idx);
+				load = target_load(i);
 
 			avg_load += load;
 		}
@@ -4283,7 +4281,6 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
 	}
 
 	while (sd) {
-		int load_idx = sd->forkexec_idx;
 		struct sched_group *group;
 		int weight;
 
@@ -4292,10 +4289,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
 			continue;
 		}
 
-		if (sd_flag & SD_BALANCE_WAKE)
-			load_idx = sd->wake_idx;
-
-		group = find_idlest_group(sd, p, cpu, load_idx);
+		group = find_idlest_group(sd, p, cpu);
 		if (!group) {
 			sd = sd->child;
 			continue;
@@ -5238,34 +5232,6 @@ static inline void init_sd_lb_stats(struct sd_lb_stats *sds)
 	};
 }
 
-/**
- * get_sd_load_idx - Obtain the load index for a given sched domain.
- * @sd: The sched_domain whose load_idx is to be obtained.
- * @idle: The idle status of the CPU for whose sd load_idx is obtained.
- *
- * Return: The load index.
- */
-static inline int get_sd_load_idx(struct sched_domain *sd,
-					enum cpu_idle_type idle)
-{
-	int load_idx;
-
-	switch (idle) {
-	case CPU_NOT_IDLE:
-		load_idx = sd->busy_idx;
-		break;
-
-	case CPU_NEWLY_IDLE:
-		load_idx = sd->newidle_idx;
-		break;
-	default:
-		load_idx = sd->idle_idx;
-		break;
-	}
-
-	return load_idx;
-}
-
 static unsigned long default_scale_freq_power(struct sched_domain *sd, int cpu)
 {
 	return SCHED_POWER_SCALE;
@@ -5492,12 +5458,11 @@ static inline int sg_capacity(struct lb_env *env, struct sched_group *group)
  * update_sg_lb_stats - Update sched_group's statistics for load balancing.
  * @env: The load balancing environment.
  * @group: sched_group whose statistics are to be updated.
- * @load_idx: Load index of sched_domain of this_cpu for load calc.
  * @local_group: Does group contain this_cpu.
  * @sgs: variable to hold the statistics for this group.
  */
 static inline void update_sg_lb_stats(struct lb_env *env,
-			struct sched_group *group, int load_idx,
+			struct sched_group *group,
 			int local_group, struct sg_lb_stats *sgs)
 {
 	unsigned long nr_running;
@@ -5513,9 +5478,9 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 
 		/* Bias balancing toward cpus of our domain */
 		if (local_group)
-			load = target_load(i, load_idx);
+			load = target_load(i);
 		else
-			load = source_load(i, load_idx);
+			load = source_load(i);
 
 		sgs->group_load += load;
 		sgs->sum_nr_running += nr_running;
@@ -5628,13 +5593,11 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
 	struct sched_domain *child = env->sd->child;
 	struct sched_group *sg = env->sd->groups;
 	struct sg_lb_stats tmp_sgs;
-	int load_idx, prefer_sibling = 0;
+	int prefer_sibling = 0;
 
 	if (child && child->flags & SD_PREFER_SIBLING)
 		prefer_sibling = 1;
 
-	load_idx = 0;
-
 	do {
 		struct sg_lb_stats *sgs = &tmp_sgs;
 		int local_group;
@@ -5649,7 +5612,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
 				update_group_power(env->sd, env->dst_cpu);
 		}
 
-		update_sg_lb_stats(env, sg, load_idx, local_group, sgs);
+		update_sg_lb_stats(env, sg, local_group, sgs);
 
 		if (local_group)
 			goto next_group;
diff --git a/kernel/sched/proc.c b/kernel/sched/proc.c
index 16f5a30..a2435c5 100644
--- a/kernel/sched/proc.c
+++ b/kernel/sched/proc.c
@@ -11,7 +11,7 @@
 unsigned long this_cpu_load(void)
 {
 	struct rq *this = this_rq();
-	return this->cpu_load[0];
+	return this->cpu_load;
 }
 
 
@@ -398,105 +398,19 @@ static void calc_load_account_active(struct rq *this_rq)
  * End of global load-average stuff
  */
 
-/*
- * The exact cpuload at various idx values, calculated at every tick would be
- * load = (2^idx - 1) / 2^idx * load + 1 / 2^idx * cur_load
- *
- * If a cpu misses updates for n-1 ticks (as it was idle) and update gets called
- * on nth tick when cpu may be busy, then we have:
- * load = ((2^idx - 1) / 2^idx)^(n-1) * load
- * load = (2^idx - 1) / 2^idx) * load + 1 / 2^idx * cur_load
- *
- * decay_load_missed() below does efficient calculation of
- * load = ((2^idx - 1) / 2^idx)^(n-1) * load
- * avoiding 0..n-1 loop doing load = ((2^idx - 1) / 2^idx) * load
- *
- * The calculation is approximated on a 128 point scale.
- * degrade_zero_ticks is the number of ticks after which load at any
- * particular idx is approximated to be zero.
- * degrade_factor is a precomputed table, a row for each load idx.
- * Each column corresponds to degradation factor for a power of two ticks,
- * based on 128 point scale.
- * Example:
- * row 2, col 3 (=12) says that the degradation at load idx 2 after
- * 8 ticks is 12/128 (which is an approximation of exact factor 3^8/4^8).
- *
- * With this power of 2 load factors, we can degrade the load n times
- * by looking at 1 bits in n and doing as many mult/shift instead of
- * n mult/shifts needed by the exact degradation.
- */
-#define DEGRADE_SHIFT		7
-static const unsigned char
-		degrade_zero_ticks[CPU_LOAD_IDX_MAX] = {0, 8, 32, 64, 128};
-static const unsigned char
-		degrade_factor[CPU_LOAD_IDX_MAX][DEGRADE_SHIFT + 1] = {
-					{0, 0, 0, 0, 0, 0, 0, 0},
-					{64, 32, 8, 0, 0, 0, 0, 0},
-					{96, 72, 40, 12, 1, 0, 0},
-					{112, 98, 75, 43, 15, 1, 0},
-					{120, 112, 98, 76, 45, 16, 2} };
 
 /*
- * Update cpu_load for any missed ticks, due to tickless idle. The backlog
- * would be when CPU is idle and so we just decay the old load without
- * adding any new load.
- */
-static unsigned long
-decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
-{
-	int j = 0;
-
-	if (!missed_updates)
-		return load;
-
-	if (missed_updates >= degrade_zero_ticks[idx])
-		return 0;
-
-	if (idx == 1)
-		return load >> missed_updates;
-
-	while (missed_updates) {
-		if (missed_updates % 2)
-			load = (load * degrade_factor[idx][j]) >> DEGRADE_SHIFT;
-
-		missed_updates >>= 1;
-		j++;
-	}
-	return load;
-}
-
-/*
- * Update rq->cpu_load[] statistics. This function is usually called every
+ * Update rq->cpu_load statistics. This function is usually called every
  * scheduler tick (TICK_NSEC). With tickless idle this will not be called
  * every tick. We fix it up based on jiffies.
  */
 static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
 			      unsigned long pending_updates)
 {
-	int i, scale;
-
 	this_rq->nr_load_updates++;
 
 	/* Update our load: */
-	this_rq->cpu_load[0] = this_load; /* Fasttrack for idx 0 */
-	for (i = 1, scale = 2; i < CPU_LOAD_IDX_MAX; i++, scale += scale) {
-		unsigned long old_load, new_load;
-
-		/* scale is effectively 1 << i now, and >> i divides by scale */
-
-		old_load = this_rq->cpu_load[i];
-		old_load = decay_load_missed(old_load, pending_updates - 1, i);
-		new_load = this_load;
-		/*
-		 * Round up the averaging division if load is increasing. This
-		 * prevents us from getting stuck on 9 if the load is 10, for
-		 * example.
-		 */
-		if (new_load > old_load)
-			new_load += scale - 1;
-
-		this_rq->cpu_load[i] = (old_load * (scale - 1) + new_load) >> i;
-	}
+	this_rq->cpu_load = this_load; /* Fasttrack for idx 0 */
 
 	sched_avg_update(this_rq);
 }
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 88c85b2..01f6e7a 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -413,8 +413,7 @@ struct rq {
 	unsigned int nr_numa_running;
 	unsigned int nr_preferred_running;
 #endif
-	#define CPU_LOAD_IDX_MAX 5
-	unsigned long cpu_load[CPU_LOAD_IDX_MAX];
+	unsigned long cpu_load;
 	unsigned long last_load_update_tick;
 #ifdef CONFIG_NO_HZ_COMMON
 	u64 nohz_stamp;
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH 3/4] sched: clean up __update_cpu_load
  2013-11-22  6:37 [RFC PATCH 0/4] sched: remove cpu_load decay Alex Shi
  2013-11-22  6:37 ` [RFC PATCH 1/4] sched: shortcut to remove load_idx effect Alex Shi
  2013-11-22  6:37 ` [RFC PATCH 2/4] sched: change rq->cpu_load[load_idx] array to rq->cpu_load Alex Shi
@ 2013-11-22  6:37 ` Alex Shi
  2013-11-22  6:37 ` [RFC PATCH 4/4] sched/nohz_full: give correct cpu load for nohz_full cpu Alex Shi
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Alex Shi @ 2013-11-22  6:37 UTC (permalink / raw)
  To: mingo, peterz, morten.rasmussen, vincent.guittot, daniel.lezcano,
	fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
	fengguang.wu
  Cc: james.hogan, alex.shi, jason.low2, gregkh, hanjun.guo, linux-kernel

Since we don't decay the rq->cpu_load, so we don't need the
pending_updates. But we still want update rq->rt_avg, so
still keep rq->last_load_update_tick and func __update_cpu_load.

After remove the load_idx, in the most of time the source_load is
equal to target_load, except only when source cpu is idle. At that
time we force set the cpu_load is 0(in update_cpu_load_nohz).
So we still left cpu_load in rq.

Signed-off-by: Alex Shi <alex.shi@linaro.org>
---
 kernel/sched/proc.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/proc.c b/kernel/sched/proc.c
index a2435c5..057bb9b 100644
--- a/kernel/sched/proc.c
+++ b/kernel/sched/proc.c
@@ -404,8 +404,7 @@ static void calc_load_account_active(struct rq *this_rq)
  * scheduler tick (TICK_NSEC). With tickless idle this will not be called
  * every tick. We fix it up based on jiffies.
  */
-static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
-			      unsigned long pending_updates)
+static void __update_cpu_load(struct rq *this_rq, unsigned long this_load)
 {
 	this_rq->nr_load_updates++;
 
@@ -449,7 +448,6 @@ void update_idle_cpu_load(struct rq *this_rq)
 {
 	unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
 	unsigned long load = get_rq_runnable_load(this_rq);
-	unsigned long pending_updates;
 
 	/*
 	 * bail if there's load or we're actually up-to-date.
@@ -457,10 +455,9 @@ void update_idle_cpu_load(struct rq *this_rq)
 	if (load || curr_jiffies == this_rq->last_load_update_tick)
 		return;
 
-	pending_updates = curr_jiffies - this_rq->last_load_update_tick;
 	this_rq->last_load_update_tick = curr_jiffies;
 
-	__update_cpu_load(this_rq, load, pending_updates);
+	__update_cpu_load(this_rq, load);
 }
 
 /*
@@ -483,7 +480,7 @@ void update_cpu_load_nohz(void)
 		 * We were idle, this means load 0, the current load might be
 		 * !0 due to remote wakeups and the sort.
 		 */
-		__update_cpu_load(this_rq, 0, pending_updates);
+		__update_cpu_load(this_rq, 0);
 	}
 	raw_spin_unlock(&this_rq->lock);
 }
@@ -499,7 +496,7 @@ void update_cpu_load_active(struct rq *this_rq)
 	 * See the mess around update_idle_cpu_load() / update_cpu_load_nohz().
 	 */
 	this_rq->last_load_update_tick = jiffies;
-	__update_cpu_load(this_rq, load, 1);
+	__update_cpu_load(this_rq, load);
 
 	calc_load_account_active(this_rq);
 }
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH 4/4] sched/nohz_full: give correct cpu load for nohz_full cpu
  2013-11-22  6:37 [RFC PATCH 0/4] sched: remove cpu_load decay Alex Shi
                   ` (2 preceding siblings ...)
  2013-11-22  6:37 ` [RFC PATCH 3/4] sched: clean up __update_cpu_load Alex Shi
@ 2013-11-22  6:37 ` Alex Shi
  2013-11-26 12:59   ` Alex Shi
  2013-11-22 12:13 ` [RFC PATCH 0/4] sched: remove cpu_load decay Daniel Lezcano
  2013-11-27  2:48 ` Alex Shi
  5 siblings, 1 reply; 18+ messages in thread
From: Alex Shi @ 2013-11-22  6:37 UTC (permalink / raw)
  To: mingo, peterz, morten.rasmussen, vincent.guittot, daniel.lezcano,
	fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
	fengguang.wu
  Cc: james.hogan, alex.shi, jason.low2, gregkh, hanjun.guo, linux-kernel

When a nohz_full cpu in tickless mode, it may update cpu_load in
following chain:
__tick_nohz_full_check
    tick_nohz_restart_sched_tick
        update_cpu_load_nohz
then it will be set a incorrect cpu_load: 0.
This patch try to fix it and give it the correct cpu_load value.

Signed-off-by: Alex Shi <alex.shi@linaro.org>
---
 kernel/sched/proc.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/proc.c b/kernel/sched/proc.c
index 057bb9b..5058e6a 100644
--- a/kernel/sched/proc.c
+++ b/kernel/sched/proc.c
@@ -477,10 +477,16 @@ void update_cpu_load_nohz(void)
 	if (pending_updates) {
 		this_rq->last_load_update_tick = curr_jiffies;
 		/*
-		 * We were idle, this means load 0, the current load might be
-		 * !0 due to remote wakeups and the sort.
+		 * We may has one task and in NO_HZ_FULL, then use normal
+		 * cfs load.
+		 * Or we were idle, this means load 0, the current load might
+		 * be !0 due to remote wakeups and the sort.
 		 */
-		__update_cpu_load(this_rq, 0);
+		if (this_rq->cfs.h_nr_running) {
+			unsigned load = get_rq_runnable_load(this_rq);
+			__update_cpu_load(this_rq, load);
+		} else
+			__update_cpu_load(this_rq, 0);
 	}
 	raw_spin_unlock(&this_rq->lock);
 }
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/4] sched: remove cpu_load decay.
  2013-11-22  6:37 [RFC PATCH 0/4] sched: remove cpu_load decay Alex Shi
                   ` (3 preceding siblings ...)
  2013-11-22  6:37 ` [RFC PATCH 4/4] sched/nohz_full: give correct cpu load for nohz_full cpu Alex Shi
@ 2013-11-22 12:13 ` Daniel Lezcano
  2013-11-24  5:00   ` Alex Shi
                     ` (2 more replies)
  2013-11-27  2:48 ` Alex Shi
  5 siblings, 3 replies; 18+ messages in thread
From: Daniel Lezcano @ 2013-11-22 12:13 UTC (permalink / raw)
  To: Alex Shi, mingo, peterz, morten.rasmussen, vincent.guittot,
	fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
	fengguang.wu
  Cc: james.hogan, jason.low2, gregkh, hanjun.guo, linux-kernel

On 11/22/2013 07:37 AM, Alex Shi wrote:
> The cpu_load decays on time according past cpu load of rq. New sched_avg decays on tasks' load of time. Now we has 2 kind decay for cpu_load. That is a kind of redundancy. And increase the system load in sched_tick etc.
>
> This patch trying to remove the cpu_load decay. And fixed a nohz_full bug by the way.
>
> There are 5 load_idx used for cpu_load in sched_domain. busy_idx and idle_idx are not zero usually, but newidle_idx, wake_idx and forkexec_idx are all zero on every arch. A shortcut to remove cpu_Load decay in the first patch. just one line patch for this change. :)
>
> I have tested the patchset on my pandaES board, 2 cores ARM Cortex A9.
> hackbench thread/pipe performance increased nearly 10% with this patchset! That do surprise me!
>
> 	latest kernel 527d1511310a89		+ this patchset
> hackbench -T -g 10 -f 40
> 	23.25"					21.7"
> 	23.16"					19.99"
> 	24.24"					21.53"
> hackbench -p -g 10 -f 40
> 	26.52"					22.48"
> 	23.89"					24.00"
> 	25.65"					23.06"
> hackbench -P -g 10 -f 40
> 	20.14"					19.37"
> 	19.96"					19.76"
> 	21.76"					21.54"
>
> The git tree for this patchset at:
>   git@github.com:alexshi/power-scheduling.git no-load-idx
> Since Fengguang had included this tree into his kernel testing system. and I haven't get a regression report until now. I suppose it is fine for x86 system.
>
> But anyway, since the scheduler change will effect all archs. and hackbench is only benchmark I found now for this patchset. I'd like to see more testing and talking on this patchset.

Hi Alex,

I tried on my Xeon server (2 x 4 cores) your patchset and got the 
following result:

kernel a5d6e63323fe7799eb0e6  / + patchset

hackbench -T -s 4096 -l 1000 -g 10 -f 40
	  27.604     	     38.556
	  27.397	     38.694
	  26.695	     38.647
	  25.975	     38.528
	  29.586	     38.553
	  25.956	     38.331
	  27.895	     38.472
	  26.874	     38.608
	  26.836	     38.341
	  28.064	     38.626
hackbench -p -s 4096 -l 1000 -g 10 -f 40
	  34.502     	     35.489
	  34.551	     35.389
	  34.027	     35.664
	  34.343	     35.418
	  34.570	     35.423
	  34.386	     35.466
	  34.387	     35.486
	  33.869	     35.212
	  34.600	     35.465
	  34.155	     35.235
hackbench -P -s 4096 -l 1000 -g 10 -f 40
	  39.170     	     38.794
	  39.108	     38.662
	  39.056	     38.946
	  39.120	     38.668
	  38.896	     38.865
	  39.109	     38.803
	  39.020	     38.946
	  39.099	     38.844
	  38.820	     38.872
	  38.923	     39.337



-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/4] sched: remove cpu_load decay.
  2013-11-22 12:13 ` [RFC PATCH 0/4] sched: remove cpu_load decay Daniel Lezcano
@ 2013-11-24  5:00   ` Alex Shi
  2013-11-24  5:29   ` Alex Shi
  2013-11-25  0:58   ` Alex Shi
  2 siblings, 0 replies; 18+ messages in thread
From: Alex Shi @ 2013-11-24  5:00 UTC (permalink / raw)
  To: Daniel Lezcano, mingo, peterz, morten.rasmussen, vincent.guittot,
	fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
	fengguang.wu
  Cc: james.hogan, jason.low2, gregkh, hanjun.guo, linux-kernel

On 11/22/2013 08:13 PM, Daniel Lezcano wrote:
>>
>> The git tree for this patchset at:
>>   git@github.com:alexshi/power-scheduling.git no-load-idx
>> Since Fengguang had included this tree into his kernel testing system.
>> and I haven't get a regression report until now. I suppose it is fine
>> for x86 system.
>>
>> But anyway, since the scheduler change will effect all archs. and
>> hackbench is only benchmark I found now for this patchset. I'd like to
>> see more testing and talking on this patchset.
> 
> Hi Alex,
> 
> I tried on my Xeon server (2 x 4 cores) your patchset and got the
> following result:
> 
> kernel a5d6e63323fe7799eb0e6  / + patchset
> 
> hackbench -T -s 4096 -l 1000 -g 10 -f 40
>       27.604              38.556


Thanks for your testing, Daniel!

Fengguang, how about your kernel results for this patchset?

-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/4] sched: remove cpu_load decay.
  2013-11-22 12:13 ` [RFC PATCH 0/4] sched: remove cpu_load decay Daniel Lezcano
  2013-11-24  5:00   ` Alex Shi
@ 2013-11-24  5:29   ` Alex Shi
  2013-11-26 12:35     ` Daniel Lezcano
  2013-11-25  0:58   ` Alex Shi
  2 siblings, 1 reply; 18+ messages in thread
From: Alex Shi @ 2013-11-24  5:29 UTC (permalink / raw)
  To: Daniel Lezcano, mingo, peterz, morten.rasmussen, vincent.guittot,
	fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
	fengguang.wu
  Cc: james.hogan, jason.low2, gregkh, hanjun.guo, linux-kernel

On 11/22/2013 08:13 PM, Daniel Lezcano wrote:
> 
> Hi Alex,
> 
> I tried on my Xeon server (2 x 4 cores) your patchset and got the
> following result:
> 
> kernel a5d6e63323fe7799eb0e6  / + patchset
> 
> hackbench -T -s 4096 -l 1000 -g 10 -f 40
>       27.604              38.556

Wondering if the following patch is helpful on your Xeon server?

Btw, you can run vmstat as background tool or use 'perf sched'
to get scheduler statistics change for this patchset.

The following are results of original kernel and all 5 patches 
on pandaboard ES.

    latest kernel 527d1511310a89        + this patchset
hackbench -T -g 10 -f 40
    23.25"                    20.79"
    23.16"                    20.4"
    24.24"                    20.29"
hackbench -p -g 10 -f 40
    26.52"                    21.2"
    23.89"                    24.07"
    25.65"                    20.30"
hackbench -P -g 10 -f 40
    20.14"                    19.53"
    19.96"                    20.37"
    21.76"                    20.39"

------
>From 4f5efd6c2b1e7293410ad57c3db24dcf3394c4a3 Mon Sep 17 00:00:00 2001
From: Alex Shi <alex.shi@linaro.org>
Date: Sat, 23 Nov 2013 23:18:09 +0800
Subject: [PATCH] sched: aggravate target cpu load to reduce task moving

Task migration happens when target just a bit less then source cpu load
to reduce such situation happens, aggravate the target cpu with sd->
imbalance_pct.

Signed-off-by: Alex Shi <alex.shi@linaro.org>
---
 kernel/sched/fair.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bccdd89..c49b7ba 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -978,7 +978,7 @@ static inline unsigned long group_weight(struct task_struct *p, int nid)
 
 static unsigned long weighted_cpuload(const int cpu);
 static unsigned long source_load(int cpu);
-static unsigned long target_load(int cpu);
+static unsigned long target_load(int cpu, int imbalance_pct);
 static unsigned long power_of(int cpu);
 static long effective_load(struct task_group *tg, int cpu, long wl, long wg);
 
@@ -3809,11 +3809,17 @@ static unsigned long source_load(int cpu)
  * Return a high guess at the load of a migration-target cpu weighted
  * according to the scheduling class and "nice" value.
  */
-static unsigned long target_load(int cpu)
+static unsigned long target_load(int cpu, int imbalance_pct)
 {
 	struct rq *rq = cpu_rq(cpu);
 	unsigned long total = weighted_cpuload(cpu);
 
+	/*
+	 * without cpu_load decay, in most of time cpu_load is same as total
+	 * so we need to make target a bit heavier to reduce task migration
+	 */
+	total = total * imbalance_pct / 100;
+
 	if (!sched_feat(LB_BIAS))
 		return total;
 
@@ -4033,7 +4039,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
 	this_cpu  = smp_processor_id();
 	prev_cpu  = task_cpu(p);
 	load	  = source_load(prev_cpu);
-	this_load = target_load(this_cpu);
+	this_load = target_load(this_cpu, 100);
 
 	/*
 	 * If sync wakeup then subtract the (maximum possible)
@@ -4089,7 +4095,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
 
 	if (balanced ||
 	    (this_load <= load &&
-	     this_load + target_load(prev_cpu) <= tl_per_task)) {
+	     this_load + target_load(prev_cpu, 100) <= tl_per_task)) {
 		/*
 		 * This domain has SD_WAKE_AFFINE and
 		 * p is cache cold in this domain, and
@@ -4135,7 +4141,7 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu)
 			if (local_group)
 				load = source_load(i);
 			else
-				load = target_load(i);
+				load = target_load(i, sd->imbalance_pct);
 
 			avg_load += load;
 		}
@@ -5478,7 +5484,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 
 		/* Bias balancing toward cpus of our domain */
 		if (local_group)
-			load = target_load(i);
+			load = target_load(i, env->sd->imbalance_pct);
 		else
 			load = source_load(i);
 
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/4] sched: remove cpu_load decay.
  2013-11-22 12:13 ` [RFC PATCH 0/4] sched: remove cpu_load decay Daniel Lezcano
  2013-11-24  5:00   ` Alex Shi
  2013-11-24  5:29   ` Alex Shi
@ 2013-11-25  0:58   ` Alex Shi
  2013-11-25  8:36     ` Daniel Lezcano
  2 siblings, 1 reply; 18+ messages in thread
From: Alex Shi @ 2013-11-25  0:58 UTC (permalink / raw)
  To: Daniel Lezcano, mingo, peterz, morten.rasmussen, vincent.guittot,
	fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
	fengguang.wu
  Cc: james.hogan, jason.low2, gregkh, hanjun.guo, linux-kernel

On 11/22/2013 08:13 PM, Daniel Lezcano wrote:
> 
> Hi Alex,
> 
> I tried on my Xeon server (2 x 4 cores) your patchset and got the
> following result:
> 
> kernel a5d6e63323fe7799eb0e6  / + patchset
> 
> hackbench -T -s 4096 -l 1000 -g 10 -f 40
>       27.604              38.556

Hi Daniel, would you like give the detailed server info? 2 socket * 4
cores, sounds it isn't a modern machine.

-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/4] sched: remove cpu_load decay.
  2013-11-25  0:58   ` Alex Shi
@ 2013-11-25  8:36     ` Daniel Lezcano
  2013-11-25 12:00       ` Alex Shi
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Lezcano @ 2013-11-25  8:36 UTC (permalink / raw)
  To: Alex Shi, mingo, peterz, morten.rasmussen, vincent.guittot,
	fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
	fengguang.wu
  Cc: james.hogan, jason.low2, gregkh, hanjun.guo, linux-kernel

On 11/25/2013 01:58 AM, Alex Shi wrote:
> On 11/22/2013 08:13 PM, Daniel Lezcano wrote:
>>
>> Hi Alex,
>>
>> I tried on my Xeon server (2 x 4 cores) your patchset and got the
>> following result:
>>
>> kernel a5d6e63323fe7799eb0e6  / + patchset
>>
>> hackbench -T -s 4096 -l 1000 -g 10 -f 40
>>        27.604              38.556
>
> Hi Daniel, would you like give the detailed server info? 2 socket * 4
> cores, sounds it isn't a modern machine.

Well it has several years old now, that's true but still competing with 
some recent processors :)

Bi-Xeon E5345 2.33GHz / 8Mb L2 cache / 7BG FB-DIMM Memory 667 MHz / 
300GB SSD 3Gb/s


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/4] sched: remove cpu_load decay.
  2013-11-25  8:36     ` Daniel Lezcano
@ 2013-11-25 12:00       ` Alex Shi
  0 siblings, 0 replies; 18+ messages in thread
From: Alex Shi @ 2013-11-25 12:00 UTC (permalink / raw)
  To: Daniel Lezcano, mingo, peterz, morten.rasmussen, vincent.guittot,
	fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
	fengguang.wu
  Cc: james.hogan, jason.low2, gregkh, hanjun.guo, linux-kernel

On 11/25/2013 04:36 PM, Daniel Lezcano wrote:
> On 11/25/2013 01:58 AM, Alex Shi wrote:
>> On 11/22/2013 08:13 PM, Daniel Lezcano wrote:
>>>
>>> Hi Alex,
>>>
>>> I tried on my Xeon server (2 x 4 cores) your patchset and got the
>>> following result:
>>>
>>> kernel a5d6e63323fe7799eb0e6  / + patchset
>>>
>>> hackbench -T -s 4096 -l 1000 -g 10 -f 40
>>>        27.604              38.556
>>
>> Hi Daniel, would you like give the detailed server info? 2 socket * 4
>> cores, sounds it isn't a modern machine.
> 
> Well it has several years old now, that's true but still competing with
> some recent processors :)
> 
> Bi-Xeon E5345 2.33GHz / 8Mb L2 cache / 7BG FB-DIMM Memory 667 MHz /
> 300GB SSD 3Gb/s
> 
> 


It is a core2 CPU, quite old.
Fengguang, do you include similar box in your system?

-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/4] sched: remove cpu_load decay.
  2013-11-24  5:29   ` Alex Shi
@ 2013-11-26 12:35     ` Daniel Lezcano
  2013-11-26 12:52       ` Alex Shi
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Lezcano @ 2013-11-26 12:35 UTC (permalink / raw)
  To: Alex Shi, mingo, peterz, morten.rasmussen, vincent.guittot,
	fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
	fengguang.wu
  Cc: james.hogan, jason.low2, gregkh, hanjun.guo, linux-kernel

On 11/24/2013 06:29 AM, Alex Shi wrote:
> On 11/22/2013 08:13 PM, Daniel Lezcano wrote:
>>
>> Hi Alex,
>>
>> I tried on my Xeon server (2 x 4 cores) your patchset and got the
>> following result:
>>
>> kernel a5d6e63323fe7799eb0e6  / + patchset
>>
>> hackbench -T -s 4096 -l 1000 -g 10 -f 40
>>        27.604              38.556
>
> Wondering if the following patch is helpful on your Xeon server?
>
> Btw, you can run vmstat as background tool or use 'perf sched'
> to get scheduler statistics change for this patchset.
>
> The following are results of original kernel and all 5 patches
> on pandaboard ES.
>
>      latest kernel 527d1511310a89        + this patchset
> hackbench -T -g 10 -f 40
>      23.25"                    20.79"
>      23.16"                    20.4"
>      24.24"                    20.29"
> hackbench -p -g 10 -f 40
>      26.52"                    21.2"
>      23.89"                    24.07"
>      25.65"                    20.30"
> hackbench -P -g 10 -f 40
>      20.14"                    19.53"
>      19.96"                    20.37"
>      21.76"                    20.39"
>


Here the new results with your patchset + patch #5

I have some issues with perf for the moment, so I will fix it up and 
send the result after.


527d1511310a          / + patchset + #5

hackbench -T -s 4096 -l 1000 -g 10 -f 40
26.677	     	     30.308
27.914		     28.497
28.390		     30.360
28.048		     28.587
26.344		     29.513
27.848		     28.706
28.315		     30.152
28.232		     29.721
26.549		     28.766
30.340		     38.801
hackbench -p -s 4096 -l 1000 -g 10 -f 40
34.522	     	     35.469
34.545		     34.966
34.469		     35.342
34.115		     35.286
34.457		     35.592
34.561		     35.314
34.459		     35.316
34.054		     35.629
34.532		     35.149
34.459		     34.876
hackbench -P -s 4096 -l 1000 -g 10 -f 40
38.938	     	     30.308
39.363		     28.497
39.340		     30.360
38.909		     28.587
39.095		     29.513
38.869		     28.706
39.041		     30.152
38.939		     29.721
38.992		     28.766
38.947		     38.801


> ------
>  From 4f5efd6c2b1e7293410ad57c3db24dcf3394c4a3 Mon Sep 17 00:00:00 2001
> From: Alex Shi <alex.shi@linaro.org>
> Date: Sat, 23 Nov 2013 23:18:09 +0800
> Subject: [PATCH] sched: aggravate target cpu load to reduce task moving
>
> Task migration happens when target just a bit less then source cpu load
> to reduce such situation happens, aggravate the target cpu with sd->
> imbalance_pct.
>
> Signed-off-by: Alex Shi <alex.shi@linaro.org>
> ---
>   kernel/sched/fair.c | 18 ++++++++++++------
>   1 file changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bccdd89..c49b7ba 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -978,7 +978,7 @@ static inline unsigned long group_weight(struct task_struct *p, int nid)
>
>   static unsigned long weighted_cpuload(const int cpu);
>   static unsigned long source_load(int cpu);
> -static unsigned long target_load(int cpu);
> +static unsigned long target_load(int cpu, int imbalance_pct);
>   static unsigned long power_of(int cpu);
>   static long effective_load(struct task_group *tg, int cpu, long wl, long wg);
>
> @@ -3809,11 +3809,17 @@ static unsigned long source_load(int cpu)
>    * Return a high guess at the load of a migration-target cpu weighted
>    * according to the scheduling class and "nice" value.
>    */
> -static unsigned long target_load(int cpu)
> +static unsigned long target_load(int cpu, int imbalance_pct)
>   {
>   	struct rq *rq = cpu_rq(cpu);
>   	unsigned long total = weighted_cpuload(cpu);
>
> +	/*
> +	 * without cpu_load decay, in most of time cpu_load is same as total
> +	 * so we need to make target a bit heavier to reduce task migration
> +	 */
> +	total = total * imbalance_pct / 100;
> +
>   	if (!sched_feat(LB_BIAS))
>   		return total;
>
> @@ -4033,7 +4039,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
>   	this_cpu  = smp_processor_id();
>   	prev_cpu  = task_cpu(p);
>   	load	  = source_load(prev_cpu);
> -	this_load = target_load(this_cpu);
> +	this_load = target_load(this_cpu, 100);
>
>   	/*
>   	 * If sync wakeup then subtract the (maximum possible)
> @@ -4089,7 +4095,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
>
>   	if (balanced ||
>   	    (this_load <= load &&
> -	     this_load + target_load(prev_cpu) <= tl_per_task)) {
> +	     this_load + target_load(prev_cpu, 100) <= tl_per_task)) {
>   		/*
>   		 * This domain has SD_WAKE_AFFINE and
>   		 * p is cache cold in this domain, and
> @@ -4135,7 +4141,7 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu)
>   			if (local_group)
>   				load = source_load(i);
>   			else
> -				load = target_load(i);
> +				load = target_load(i, sd->imbalance_pct);
>
>   			avg_load += load;
>   		}
> @@ -5478,7 +5484,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>
>   		/* Bias balancing toward cpus of our domain */
>   		if (local_group)
> -			load = target_load(i);
> +			load = target_load(i, env->sd->imbalance_pct);
>   		else
>   			load = source_load(i);
>
>


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/4] sched: remove cpu_load decay.
  2013-11-26 12:35     ` Daniel Lezcano
@ 2013-11-26 12:52       ` Alex Shi
  2013-11-26 12:57         ` Daniel Lezcano
  2013-11-26 13:01         ` Daniel Lezcano
  0 siblings, 2 replies; 18+ messages in thread
From: Alex Shi @ 2013-11-26 12:52 UTC (permalink / raw)
  To: Daniel Lezcano, mingo, peterz, morten.rasmussen, vincent.guittot,
	fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
	fengguang.wu
  Cc: james.hogan, jason.low2, gregkh, hanjun.guo, linux-kernel

On 11/26/2013 08:35 PM, Daniel Lezcano wrote:
> 
> 
> Here the new results with your patchset + patch #5
> 
> I have some issues with perf for the moment, so I will fix it up and
> send the result after.

Thanks a lot, Daniel!
The result is pretty good!, thread/pipe performance has a slight little
drop, but processes performance increase about 25%!


> 
> 
> 527d1511310a          / + patchset + #5
> 
> hackbench -T -s 4096 -l 1000 -g 10 -f 40
> 26.677                  30.308
> 27.914             28.497
> 28.390             30.360
> 28.048             28.587
> 26.344             29.513
> 27.848             28.706
> 28.315             30.152
> 28.232             29.721
> 26.549             28.766
> 30.340             38.801
> hackbench -p -s 4096 -l 1000 -g 10 -f 40
> 34.522                  35.469
> 34.545             34.966
> 34.469             35.342
> 34.115             35.286
> 34.457             35.592
> 34.561             35.314
> 34.459             35.316
> 34.054             35.629
> 34.532             35.149
> 34.459             34.876
> hackbench -P -s 4096 -l 1000 -g 10 -f 40
> 38.938                  30.308
> 39.363             28.497
> 39.340             30.360
> 38.909             28.587
> 39.095             29.513
> 38.869             28.706
> 39.041             30.152
> 38.939             29.721
> 38.992             28.766
> 38.947             38.801


-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/4] sched: remove cpu_load decay.
  2013-11-26 12:52       ` Alex Shi
@ 2013-11-26 12:57         ` Daniel Lezcano
  2013-11-26 13:01         ` Daniel Lezcano
  1 sibling, 0 replies; 18+ messages in thread
From: Daniel Lezcano @ 2013-11-26 12:57 UTC (permalink / raw)
  To: Alex Shi, mingo, peterz, morten.rasmussen, vincent.guittot,
	fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
	fengguang.wu
  Cc: james.hogan, jason.low2, gregkh, hanjun.guo, linux-kernel

On 11/26/2013 01:52 PM, Alex Shi wrote:
> On 11/26/2013 08:35 PM, Daniel Lezcano wrote:
>>
>>
>> Here the new results with your patchset + patch #5
>>
>> I have some issues with perf for the moment, so I will fix it up and
>> send the result after.
>
> Thanks a lot, Daniel!
> The result is pretty good!, thread/pipe performance has a slight little
> drop, but processes performance increase about 25%!

Mmh, wait. Let me double check the results, it sounds weird we have so 
much performance increase.

>>
>> 527d1511310a          / + patchset + #5
>>
>> hackbench -T -s 4096 -l 1000 -g 10 -f 40
>> 26.677                  30.308
>> 27.914             28.497
>> 28.390             30.360
>> 28.048             28.587
>> 26.344             29.513
>> 27.848             28.706
>> 28.315             30.152
>> 28.232             29.721
>> 26.549             28.766
>> 30.340             38.801
>> hackbench -p -s 4096 -l 1000 -g 10 -f 40
>> 34.522                  35.469
>> 34.545             34.966
>> 34.469             35.342
>> 34.115             35.286
>> 34.457             35.592
>> 34.561             35.314
>> 34.459             35.316
>> 34.054             35.629
>> 34.532             35.149
>> 34.459             34.876
>> hackbench -P -s 4096 -l 1000 -g 10 -f 40
>> 38.938                  30.308
>> 39.363             28.497
>> 39.340             30.360
>> 38.909             28.587
>> 39.095             29.513
>> 38.869             28.706
>> 39.041             30.152
>> 38.939             29.721
>> 38.992             28.766
>> 38.947             38.801
>
>


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 4/4] sched/nohz_full: give correct cpu load for nohz_full cpu
  2013-11-22  6:37 ` [RFC PATCH 4/4] sched/nohz_full: give correct cpu load for nohz_full cpu Alex Shi
@ 2013-11-26 12:59   ` Alex Shi
  0 siblings, 0 replies; 18+ messages in thread
From: Alex Shi @ 2013-11-26 12:59 UTC (permalink / raw)
  To: mingo, peterz, morten.rasmussen, vincent.guittot, daniel.lezcano,
	fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
	fengguang.wu
  Cc: james.hogan, alex.shi, jason.low2, gregkh, hanjun.guo, linux-kernel

On 11/22/2013 02:37 PM, Alex Shi wrote:
> When a nohz_full cpu in tickless mode, it may update cpu_load in
> following chain:
> __tick_nohz_full_check
>     tick_nohz_restart_sched_tick
>         update_cpu_load_nohz
> then it will be set a incorrect cpu_load: 0.
> This patch try to fix it and give it the correct cpu_load value.

Frederic,

Would you like to give some comments on this patch?


-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/4] sched: remove cpu_load decay.
  2013-11-26 12:52       ` Alex Shi
  2013-11-26 12:57         ` Daniel Lezcano
@ 2013-11-26 13:01         ` Daniel Lezcano
  2013-11-26 13:04           ` Alex Shi
  1 sibling, 1 reply; 18+ messages in thread
From: Daniel Lezcano @ 2013-11-26 13:01 UTC (permalink / raw)
  To: Alex Shi, mingo, peterz, morten.rasmussen, vincent.guittot,
	fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
	fengguang.wu
  Cc: james.hogan, jason.low2, gregkh, hanjun.guo, linux-kernel

On 11/26/2013 01:52 PM, Alex Shi wrote:
> On 11/26/2013 08:35 PM, Daniel Lezcano wrote:
>>
>>
>> Here the new results with your patchset + patch #5
>>
>> I have some issues with perf for the moment, so I will fix it up and
>> send the result after.
>
> Thanks a lot, Daniel!
> The result is pretty good!, thread/pipe performance has a slight little
> drop, but processes performance increase about 25%!
>
>
>>
>>
>> 527d1511310a          / + patchset + #5
>>
>> hackbench -T -s 4096 -l 1000 -g 10 -f 40
>> 26.677                  30.308
>> 27.914             28.497
>> 28.390             30.360
>> 28.048             28.587
>> 26.344             29.513
>> 27.848             28.706
>> 28.315             30.152
>> 28.232             29.721
>> 26.549             28.766
>> 30.340             38.801
>> hackbench -p -s 4096 -l 1000 -g 10 -f 40
>> 34.522                  35.469
>> 34.545             34.966
>> 34.469             35.342
>> 34.115             35.286
>> 34.457             35.592
>> 34.561             35.314
>> 34.459             35.316
>> 34.054             35.629
>> 34.532             35.149
>> 34.459             34.876
>> hackbench -P -s 4096 -l 1000 -g 10 -f 40
>> 38.938                  30.308
>> 39.363             28.497
>> 39.340             30.360
>> 38.909             28.587
>> 39.095             29.513
>> 38.869             28.706
>> 39.041             30.152
>> 38.939             29.721
>> 38.992             28.766
>> 38.947             38.801

Ok, bad copy-paste, the third test run results with the patchset is wrong.

hackbench -P -s 4096 -l 1000 -g 10 -f 40
38.938	     	     39.585	     	      	     	
39.363		     39.008
39.340		     38.954
38.909		     39.273
39.095		     38.755
38.869		     39.003
39.041		     38.945
38.939		     38.005
38.992		     38.994
38.947		     38.855



-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/4] sched: remove cpu_load decay.
  2013-11-26 13:01         ` Daniel Lezcano
@ 2013-11-26 13:04           ` Alex Shi
  0 siblings, 0 replies; 18+ messages in thread
From: Alex Shi @ 2013-11-26 13:04 UTC (permalink / raw)
  To: Daniel Lezcano, mingo, peterz, morten.rasmussen, vincent.guittot,
	fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
	fengguang.wu
  Cc: james.hogan, jason.low2, gregkh, hanjun.guo, linux-kernel

On 11/26/2013 09:01 PM, Daniel Lezcano wrote:
> 
> Ok, bad copy-paste, the third test run results with the patchset is wrong.
> 
> hackbench -P -s 4096 -l 1000 -g 10 -f 40
> 38.938                  39.585                               
> 39.363             39.008
> 39.340             38.954
> 38.909             39.273
> 39.095             38.755
> 38.869             39.003
> 39.041             38.945
> 38.939             38.005
> 38.992             38.994
> 38.947             38.855

Oops,
Anyway, at least, no harm on hackbench process testing. :)

-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/4] sched: remove cpu_load decay.
  2013-11-22  6:37 [RFC PATCH 0/4] sched: remove cpu_load decay Alex Shi
                   ` (4 preceding siblings ...)
  2013-11-22 12:13 ` [RFC PATCH 0/4] sched: remove cpu_load decay Daniel Lezcano
@ 2013-11-27  2:48 ` Alex Shi
  5 siblings, 0 replies; 18+ messages in thread
From: Alex Shi @ 2013-11-27  2:48 UTC (permalink / raw)
  To: mingo, peterz, morten.rasmussen, vincent.guittot, daniel.lezcano,
	fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
	fengguang.wu
  Cc: james.hogan, alex.shi, jason.low2, gregkh, hanjun.guo, linux-kernel

On 11/22/2013 02:37 PM, Alex Shi wrote:
> 	latest kernel 527d1511310a89		+ this patchset
> hackbench -T -g 10 -f 40
> 	23.25"					21.7"
> 	23.16"					19.99"
> 	24.24"					21.53"
> hackbench -p -g 10 -f 40
> 	26.52"					22.48"
> 	23.89"					24.00"
> 	25.65"					23.06"
> hackbench -P -g 10 -f 40
> 	20.14"					19.37"
> 	19.96"					19.76"
> 	21.76"					21.54"
> 
> The git tree for this patchset at:
>  git@github.com:alexshi/power-scheduling.git no-load-idx 

Fengguang,

Did your kernel testing find sth unusual on this 3 patches?

-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-11-27  2:48 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-22  6:37 [RFC PATCH 0/4] sched: remove cpu_load decay Alex Shi
2013-11-22  6:37 ` [RFC PATCH 1/4] sched: shortcut to remove load_idx effect Alex Shi
2013-11-22  6:37 ` [RFC PATCH 2/4] sched: change rq->cpu_load[load_idx] array to rq->cpu_load Alex Shi
2013-11-22  6:37 ` [RFC PATCH 3/4] sched: clean up __update_cpu_load Alex Shi
2013-11-22  6:37 ` [RFC PATCH 4/4] sched/nohz_full: give correct cpu load for nohz_full cpu Alex Shi
2013-11-26 12:59   ` Alex Shi
2013-11-22 12:13 ` [RFC PATCH 0/4] sched: remove cpu_load decay Daniel Lezcano
2013-11-24  5:00   ` Alex Shi
2013-11-24  5:29   ` Alex Shi
2013-11-26 12:35     ` Daniel Lezcano
2013-11-26 12:52       ` Alex Shi
2013-11-26 12:57         ` Daniel Lezcano
2013-11-26 13:01         ` Daniel Lezcano
2013-11-26 13:04           ` Alex Shi
2013-11-25  0:58   ` Alex Shi
2013-11-25  8:36     ` Daniel Lezcano
2013-11-25 12:00       ` Alex Shi
2013-11-27  2:48 ` Alex Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).