* [RFC PATCH 0/4] sched: remove cpu_load decay.
@ 2013-11-22 6:37 Alex Shi
2013-11-22 6:37 ` [RFC PATCH 1/4] sched: shortcut to remove load_idx effect Alex Shi
` (5 more replies)
0 siblings, 6 replies; 18+ messages in thread
From: Alex Shi @ 2013-11-22 6:37 UTC (permalink / raw)
To: mingo, peterz, morten.rasmussen, vincent.guittot, daniel.lezcano,
fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
fengguang.wu
Cc: james.hogan, alex.shi, jason.low2, gregkh, hanjun.guo, linux-kernel
The cpu_load decays on time according past cpu load of rq. New sched_avg decays on tasks' load of time. Now we has 2 kind decay for cpu_load. That is a kind of redundancy. And increase the system load in sched_tick etc.
This patch trying to remove the cpu_load decay. And fixed a nohz_full bug by the way.
There are 5 load_idx used for cpu_load in sched_domain. busy_idx and idle_idx are not zero usually, but newidle_idx, wake_idx and forkexec_idx are all zero on every arch. A shortcut to remove cpu_Load decay in the first patch. just one line patch for this change. :)
I have tested the patchset on my pandaES board, 2 cores ARM Cortex A9.
hackbench thread/pipe performance increased nearly 10% with this patchset! That do surprise me!
latest kernel 527d1511310a89 + this patchset
hackbench -T -g 10 -f 40
23.25" 21.7"
23.16" 19.99"
24.24" 21.53"
hackbench -p -g 10 -f 40
26.52" 22.48"
23.89" 24.00"
25.65" 23.06"
hackbench -P -g 10 -f 40
20.14" 19.37"
19.96" 19.76"
21.76" 21.54"
The git tree for this patchset at:
git@github.com:alexshi/power-scheduling.git no-load-idx
Since Fengguang had included this tree into his kernel testing system. and I haven't get a regression report until now. I suppose it is fine for x86 system.
But anyway, since the scheduler change will effect all archs. and hackbench is only benchmark I found now for this patchset. I'd like to see more testing and talking on this patchset.
Regards
Alex
^ permalink raw reply [flat|nested] 18+ messages in thread
* [RFC PATCH 1/4] sched: shortcut to remove load_idx effect
2013-11-22 6:37 [RFC PATCH 0/4] sched: remove cpu_load decay Alex Shi
@ 2013-11-22 6:37 ` Alex Shi
2013-11-22 6:37 ` [RFC PATCH 2/4] sched: change rq->cpu_load[load_idx] array to rq->cpu_load Alex Shi
` (4 subsequent siblings)
5 siblings, 0 replies; 18+ messages in thread
From: Alex Shi @ 2013-11-22 6:37 UTC (permalink / raw)
To: mingo, peterz, morten.rasmussen, vincent.guittot, daniel.lezcano,
fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
fengguang.wu
Cc: james.hogan, alex.shi, jason.low2, gregkh, hanjun.guo, linux-kernel
Shortcut to remove rq->cpu_load[load_idx] effect in scheduler.
In five load idx, only busy_idx, idle_idx are not zero.
Newidle_idx, wake_idx and fork_idx are all zero in all archs.
So, change the idx to zero here can fully remove load_idx.
Signed-off-by: Alex Shi <alex.shi@linaro.org>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e8b652e..ce683aa 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5633,7 +5633,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
if (child && child->flags & SD_PREFER_SIBLING)
prefer_sibling = 1;
- load_idx = get_sd_load_idx(env->sd, env->idle);
+ load_idx = 0;
do {
struct sg_lb_stats *sgs = &tmp_sgs;
--
1.8.1.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC PATCH 2/4] sched: change rq->cpu_load[load_idx] array to rq->cpu_load
2013-11-22 6:37 [RFC PATCH 0/4] sched: remove cpu_load decay Alex Shi
2013-11-22 6:37 ` [RFC PATCH 1/4] sched: shortcut to remove load_idx effect Alex Shi
@ 2013-11-22 6:37 ` Alex Shi
2013-11-22 6:37 ` [RFC PATCH 3/4] sched: clean up __update_cpu_load Alex Shi
` (3 subsequent siblings)
5 siblings, 0 replies; 18+ messages in thread
From: Alex Shi @ 2013-11-22 6:37 UTC (permalink / raw)
To: mingo, peterz, morten.rasmussen, vincent.guittot, daniel.lezcano,
fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
fengguang.wu
Cc: james.hogan, alex.shi, jason.low2, gregkh, hanjun.guo, linux-kernel
Since load_idx effect removed in load balance, we don't need the
load_idx decays in scheduler. that will save some time in sched_tick
and others places.
Signed-off-by: Alex Shi <alex.shi@linaro.org>
---
arch/ia64/include/asm/topology.h | 5 ---
arch/metag/include/asm/topology.h | 5 ---
arch/tile/include/asm/topology.h | 6 ---
include/linux/sched.h | 5 ---
include/linux/topology.h | 8 ----
kernel/sched/core.c | 60 ++++++++-----------------
kernel/sched/debug.c | 6 +--
kernel/sched/fair.c | 79 +++++++++------------------------
kernel/sched/proc.c | 92 ++-------------------------------------
kernel/sched/sched.h | 3 +-
10 files changed, 43 insertions(+), 226 deletions(-)
diff --git a/arch/ia64/include/asm/topology.h b/arch/ia64/include/asm/topology.h
index a2496e4..54e5b17 100644
--- a/arch/ia64/include/asm/topology.h
+++ b/arch/ia64/include/asm/topology.h
@@ -55,11 +55,6 @@ void build_cpu_to_node_map(void);
.busy_factor = 64, \
.imbalance_pct = 125, \
.cache_nice_tries = 2, \
- .busy_idx = 2, \
- .idle_idx = 1, \
- .newidle_idx = 0, \
- .wake_idx = 0, \
- .forkexec_idx = 0, \
.flags = SD_LOAD_BALANCE \
| SD_BALANCE_NEWIDLE \
| SD_BALANCE_EXEC \
diff --git a/arch/metag/include/asm/topology.h b/arch/metag/include/asm/topology.h
index 8e9c0b3..d1d15cd 100644
--- a/arch/metag/include/asm/topology.h
+++ b/arch/metag/include/asm/topology.h
@@ -13,11 +13,6 @@
.busy_factor = 32, \
.imbalance_pct = 125, \
.cache_nice_tries = 2, \
- .busy_idx = 3, \
- .idle_idx = 2, \
- .newidle_idx = 0, \
- .wake_idx = 0, \
- .forkexec_idx = 0, \
.flags = SD_LOAD_BALANCE \
| SD_BALANCE_FORK \
| SD_BALANCE_EXEC \
diff --git a/arch/tile/include/asm/topology.h b/arch/tile/include/asm/topology.h
index d15c0d8..05f6ffe 100644
--- a/arch/tile/include/asm/topology.h
+++ b/arch/tile/include/asm/topology.h
@@ -57,12 +57,6 @@ static inline const struct cpumask *cpumask_of_node(int node)
.busy_factor = 64, \
.imbalance_pct = 125, \
.cache_nice_tries = 1, \
- .busy_idx = 2, \
- .idle_idx = 1, \
- .newidle_idx = 0, \
- .wake_idx = 0, \
- .forkexec_idx = 0, \
- \
.flags = 1*SD_LOAD_BALANCE \
| 1*SD_BALANCE_NEWIDLE \
| 1*SD_BALANCE_EXEC \
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 7e35d4b..a23e02d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -815,11 +815,6 @@ struct sched_domain {
unsigned int busy_factor; /* less balancing by factor if busy */
unsigned int imbalance_pct; /* No balance until over watermark */
unsigned int cache_nice_tries; /* Leave cache hot tasks for # tries */
- unsigned int busy_idx;
- unsigned int idle_idx;
- unsigned int newidle_idx;
- unsigned int wake_idx;
- unsigned int forkexec_idx;
unsigned int smt_gain;
int nohz_idle; /* NOHZ IDLE status */
diff --git a/include/linux/topology.h b/include/linux/topology.h
index 12ae6ce..863fad3 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -121,9 +121,6 @@ int arch_update_cpu_topology(void);
.busy_factor = 64, \
.imbalance_pct = 125, \
.cache_nice_tries = 1, \
- .busy_idx = 2, \
- .wake_idx = 0, \
- .forkexec_idx = 0, \
\
.flags = 1*SD_LOAD_BALANCE \
| 1*SD_BALANCE_NEWIDLE \
@@ -151,11 +148,6 @@ int arch_update_cpu_topology(void);
.busy_factor = 64, \
.imbalance_pct = 125, \
.cache_nice_tries = 1, \
- .busy_idx = 2, \
- .idle_idx = 1, \
- .newidle_idx = 0, \
- .wake_idx = 0, \
- .forkexec_idx = 0, \
\
.flags = 1*SD_LOAD_BALANCE \
| 1*SD_BALANCE_NEWIDLE \
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c180860..9528f75 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4279,61 +4279,42 @@ static void sd_free_ctl_entry(struct ctl_table **tablep)
*tablep = NULL;
}
-static int min_load_idx = 0;
-static int max_load_idx = CPU_LOAD_IDX_MAX-1;
-
static void
set_table_entry(struct ctl_table *entry,
const char *procname, void *data, int maxlen,
- umode_t mode, proc_handler *proc_handler,
- bool load_idx)
+ umode_t mode, proc_handler *proc_handler)
{
entry->procname = procname;
entry->data = data;
entry->maxlen = maxlen;
entry->mode = mode;
entry->proc_handler = proc_handler;
-
- if (load_idx) {
- entry->extra1 = &min_load_idx;
- entry->extra2 = &max_load_idx;
- }
}
static struct ctl_table *
sd_alloc_ctl_domain_table(struct sched_domain *sd)
{
- struct ctl_table *table = sd_alloc_ctl_entry(13);
+ struct ctl_table *table = sd_alloc_ctl_entry(8);
if (table == NULL)
return NULL;
set_table_entry(&table[0], "min_interval", &sd->min_interval,
- sizeof(long), 0644, proc_doulongvec_minmax, false);
+ sizeof(long), 0644, proc_doulongvec_minmax);
set_table_entry(&table[1], "max_interval", &sd->max_interval,
- sizeof(long), 0644, proc_doulongvec_minmax, false);
- set_table_entry(&table[2], "busy_idx", &sd->busy_idx,
- sizeof(int), 0644, proc_dointvec_minmax, true);
- set_table_entry(&table[3], "idle_idx", &sd->idle_idx,
- sizeof(int), 0644, proc_dointvec_minmax, true);
- set_table_entry(&table[4], "newidle_idx", &sd->newidle_idx,
- sizeof(int), 0644, proc_dointvec_minmax, true);
- set_table_entry(&table[5], "wake_idx", &sd->wake_idx,
- sizeof(int), 0644, proc_dointvec_minmax, true);
- set_table_entry(&table[6], "forkexec_idx", &sd->forkexec_idx,
- sizeof(int), 0644, proc_dointvec_minmax, true);
- set_table_entry(&table[7], "busy_factor", &sd->busy_factor,
- sizeof(int), 0644, proc_dointvec_minmax, false);
- set_table_entry(&table[8], "imbalance_pct", &sd->imbalance_pct,
- sizeof(int), 0644, proc_dointvec_minmax, false);
- set_table_entry(&table[9], "cache_nice_tries",
+ sizeof(long), 0644, proc_doulongvec_minmax);
+ set_table_entry(&table[2], "busy_factor", &sd->busy_factor,
+ sizeof(int), 0644, proc_dointvec_minmax);
+ set_table_entry(&table[3], "imbalance_pct", &sd->imbalance_pct,
+ sizeof(int), 0644, proc_dointvec_minmax);
+ set_table_entry(&table[4], "cache_nice_tries",
&sd->cache_nice_tries,
- sizeof(int), 0644, proc_dointvec_minmax, false);
- set_table_entry(&table[10], "flags", &sd->flags,
- sizeof(int), 0644, proc_dointvec_minmax, false);
- set_table_entry(&table[11], "name", sd->name,
- CORENAME_MAX_SIZE, 0444, proc_dostring, false);
- /* &table[12] is terminator */
+ sizeof(int), 0644, proc_dointvec_minmax);
+ set_table_entry(&table[5], "flags", &sd->flags,
+ sizeof(int), 0644, proc_dointvec_minmax);
+ set_table_entry(&table[6], "name", sd->name,
+ CORENAME_MAX_SIZE, 0444, proc_dostring);
+ /* &table[7] is terminator */
return table;
}
@@ -5425,11 +5406,6 @@ sd_numa_init(struct sched_domain_topology_level *tl, int cpu)
.busy_factor = 32,
.imbalance_pct = 125,
.cache_nice_tries = 2,
- .busy_idx = 3,
- .idle_idx = 2,
- .newidle_idx = 0,
- .wake_idx = 0,
- .forkexec_idx = 0,
.flags = 1*SD_LOAD_BALANCE
| 1*SD_BALANCE_NEWIDLE
@@ -6178,7 +6154,7 @@ DECLARE_PER_CPU(cpumask_var_t, load_balance_mask);
void __init sched_init(void)
{
- int i, j;
+ int i;
unsigned long alloc_size = 0, ptr;
#ifdef CONFIG_FAIR_GROUP_SCHED
@@ -6279,9 +6255,7 @@ void __init sched_init(void)
init_tg_rt_entry(&root_task_group, &rq->rt, NULL, i, NULL);
#endif
- for (j = 0; j < CPU_LOAD_IDX_MAX; j++)
- rq->cpu_load[j] = 0;
-
+ rq->cpu_load = 0;
rq->last_load_update_tick = jiffies;
#ifdef CONFIG_SMP
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 5c34d18..675be71 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -303,11 +303,7 @@ do { \
PN(next_balance);
SEQ_printf(m, " .%-30s: %ld\n", "curr->pid", (long)(task_pid_nr(rq->curr)));
PN(clock);
- P(cpu_load[0]);
- P(cpu_load[1]);
- P(cpu_load[2]);
- P(cpu_load[3]);
- P(cpu_load[4]);
+ P(cpu_load);
#undef P
#undef PN
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ce683aa..bccdd89 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -977,8 +977,8 @@ static inline unsigned long group_weight(struct task_struct *p, int nid)
}
static unsigned long weighted_cpuload(const int cpu);
-static unsigned long source_load(int cpu, int type);
-static unsigned long target_load(int cpu, int type);
+static unsigned long source_load(int cpu);
+static unsigned long target_load(int cpu);
static unsigned long power_of(int cpu);
static long effective_load(struct task_group *tg, int cpu, long wl, long wg);
@@ -3794,30 +3794,30 @@ static unsigned long weighted_cpuload(const int cpu)
* We want to under-estimate the load of migration sources, to
* balance conservatively.
*/
-static unsigned long source_load(int cpu, int type)
+static unsigned long source_load(int cpu)
{
struct rq *rq = cpu_rq(cpu);
unsigned long total = weighted_cpuload(cpu);
- if (type == 0 || !sched_feat(LB_BIAS))
+ if (!sched_feat(LB_BIAS))
return total;
- return min(rq->cpu_load[type-1], total);
+ return min(rq->cpu_load, total);
}
/*
* Return a high guess at the load of a migration-target cpu weighted
* according to the scheduling class and "nice" value.
*/
-static unsigned long target_load(int cpu, int type)
+static unsigned long target_load(int cpu)
{
struct rq *rq = cpu_rq(cpu);
unsigned long total = weighted_cpuload(cpu);
- if (type == 0 || !sched_feat(LB_BIAS))
+ if (!sched_feat(LB_BIAS))
return total;
- return max(rq->cpu_load[type-1], total);
+ return max(rq->cpu_load, total);
}
static unsigned long power_of(int cpu)
@@ -4017,7 +4017,7 @@ static int wake_wide(struct task_struct *p)
static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
{
s64 this_load, load;
- int idx, this_cpu, prev_cpu;
+ int this_cpu, prev_cpu;
unsigned long tl_per_task;
struct task_group *tg;
unsigned long weight;
@@ -4030,11 +4030,10 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
if (wake_wide(p))
return 0;
- idx = sd->wake_idx;
this_cpu = smp_processor_id();
prev_cpu = task_cpu(p);
- load = source_load(prev_cpu, idx);
- this_load = target_load(this_cpu, idx);
+ load = source_load(prev_cpu);
+ this_load = target_load(this_cpu);
/*
* If sync wakeup then subtract the (maximum possible)
@@ -4090,7 +4089,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
if (balanced ||
(this_load <= load &&
- this_load + target_load(prev_cpu, idx) <= tl_per_task)) {
+ this_load + target_load(prev_cpu) <= tl_per_task)) {
/*
* This domain has SD_WAKE_AFFINE and
* p is cache cold in this domain, and
@@ -4109,8 +4108,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
* domain.
*/
static struct sched_group *
-find_idlest_group(struct sched_domain *sd, struct task_struct *p,
- int this_cpu, int load_idx)
+find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu)
{
struct sched_group *idlest = NULL, *group = sd->groups;
unsigned long min_load = ULONG_MAX, this_load = 0;
@@ -4135,9 +4133,9 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p,
for_each_cpu(i, sched_group_cpus(group)) {
/* Bias balancing toward cpus of our domain */
if (local_group)
- load = source_load(i, load_idx);
+ load = source_load(i);
else
- load = target_load(i, load_idx);
+ load = target_load(i);
avg_load += load;
}
@@ -4283,7 +4281,6 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
}
while (sd) {
- int load_idx = sd->forkexec_idx;
struct sched_group *group;
int weight;
@@ -4292,10 +4289,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
continue;
}
- if (sd_flag & SD_BALANCE_WAKE)
- load_idx = sd->wake_idx;
-
- group = find_idlest_group(sd, p, cpu, load_idx);
+ group = find_idlest_group(sd, p, cpu);
if (!group) {
sd = sd->child;
continue;
@@ -5238,34 +5232,6 @@ static inline void init_sd_lb_stats(struct sd_lb_stats *sds)
};
}
-/**
- * get_sd_load_idx - Obtain the load index for a given sched domain.
- * @sd: The sched_domain whose load_idx is to be obtained.
- * @idle: The idle status of the CPU for whose sd load_idx is obtained.
- *
- * Return: The load index.
- */
-static inline int get_sd_load_idx(struct sched_domain *sd,
- enum cpu_idle_type idle)
-{
- int load_idx;
-
- switch (idle) {
- case CPU_NOT_IDLE:
- load_idx = sd->busy_idx;
- break;
-
- case CPU_NEWLY_IDLE:
- load_idx = sd->newidle_idx;
- break;
- default:
- load_idx = sd->idle_idx;
- break;
- }
-
- return load_idx;
-}
-
static unsigned long default_scale_freq_power(struct sched_domain *sd, int cpu)
{
return SCHED_POWER_SCALE;
@@ -5492,12 +5458,11 @@ static inline int sg_capacity(struct lb_env *env, struct sched_group *group)
* update_sg_lb_stats - Update sched_group's statistics for load balancing.
* @env: The load balancing environment.
* @group: sched_group whose statistics are to be updated.
- * @load_idx: Load index of sched_domain of this_cpu for load calc.
* @local_group: Does group contain this_cpu.
* @sgs: variable to hold the statistics for this group.
*/
static inline void update_sg_lb_stats(struct lb_env *env,
- struct sched_group *group, int load_idx,
+ struct sched_group *group,
int local_group, struct sg_lb_stats *sgs)
{
unsigned long nr_running;
@@ -5513,9 +5478,9 @@ static inline void update_sg_lb_stats(struct lb_env *env,
/* Bias balancing toward cpus of our domain */
if (local_group)
- load = target_load(i, load_idx);
+ load = target_load(i);
else
- load = source_load(i, load_idx);
+ load = source_load(i);
sgs->group_load += load;
sgs->sum_nr_running += nr_running;
@@ -5628,13 +5593,11 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
struct sched_domain *child = env->sd->child;
struct sched_group *sg = env->sd->groups;
struct sg_lb_stats tmp_sgs;
- int load_idx, prefer_sibling = 0;
+ int prefer_sibling = 0;
if (child && child->flags & SD_PREFER_SIBLING)
prefer_sibling = 1;
- load_idx = 0;
-
do {
struct sg_lb_stats *sgs = &tmp_sgs;
int local_group;
@@ -5649,7 +5612,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
update_group_power(env->sd, env->dst_cpu);
}
- update_sg_lb_stats(env, sg, load_idx, local_group, sgs);
+ update_sg_lb_stats(env, sg, local_group, sgs);
if (local_group)
goto next_group;
diff --git a/kernel/sched/proc.c b/kernel/sched/proc.c
index 16f5a30..a2435c5 100644
--- a/kernel/sched/proc.c
+++ b/kernel/sched/proc.c
@@ -11,7 +11,7 @@
unsigned long this_cpu_load(void)
{
struct rq *this = this_rq();
- return this->cpu_load[0];
+ return this->cpu_load;
}
@@ -398,105 +398,19 @@ static void calc_load_account_active(struct rq *this_rq)
* End of global load-average stuff
*/
-/*
- * The exact cpuload at various idx values, calculated at every tick would be
- * load = (2^idx - 1) / 2^idx * load + 1 / 2^idx * cur_load
- *
- * If a cpu misses updates for n-1 ticks (as it was idle) and update gets called
- * on nth tick when cpu may be busy, then we have:
- * load = ((2^idx - 1) / 2^idx)^(n-1) * load
- * load = (2^idx - 1) / 2^idx) * load + 1 / 2^idx * cur_load
- *
- * decay_load_missed() below does efficient calculation of
- * load = ((2^idx - 1) / 2^idx)^(n-1) * load
- * avoiding 0..n-1 loop doing load = ((2^idx - 1) / 2^idx) * load
- *
- * The calculation is approximated on a 128 point scale.
- * degrade_zero_ticks is the number of ticks after which load at any
- * particular idx is approximated to be zero.
- * degrade_factor is a precomputed table, a row for each load idx.
- * Each column corresponds to degradation factor for a power of two ticks,
- * based on 128 point scale.
- * Example:
- * row 2, col 3 (=12) says that the degradation at load idx 2 after
- * 8 ticks is 12/128 (which is an approximation of exact factor 3^8/4^8).
- *
- * With this power of 2 load factors, we can degrade the load n times
- * by looking at 1 bits in n and doing as many mult/shift instead of
- * n mult/shifts needed by the exact degradation.
- */
-#define DEGRADE_SHIFT 7
-static const unsigned char
- degrade_zero_ticks[CPU_LOAD_IDX_MAX] = {0, 8, 32, 64, 128};
-static const unsigned char
- degrade_factor[CPU_LOAD_IDX_MAX][DEGRADE_SHIFT + 1] = {
- {0, 0, 0, 0, 0, 0, 0, 0},
- {64, 32, 8, 0, 0, 0, 0, 0},
- {96, 72, 40, 12, 1, 0, 0},
- {112, 98, 75, 43, 15, 1, 0},
- {120, 112, 98, 76, 45, 16, 2} };
/*
- * Update cpu_load for any missed ticks, due to tickless idle. The backlog
- * would be when CPU is idle and so we just decay the old load without
- * adding any new load.
- */
-static unsigned long
-decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
-{
- int j = 0;
-
- if (!missed_updates)
- return load;
-
- if (missed_updates >= degrade_zero_ticks[idx])
- return 0;
-
- if (idx == 1)
- return load >> missed_updates;
-
- while (missed_updates) {
- if (missed_updates % 2)
- load = (load * degrade_factor[idx][j]) >> DEGRADE_SHIFT;
-
- missed_updates >>= 1;
- j++;
- }
- return load;
-}
-
-/*
- * Update rq->cpu_load[] statistics. This function is usually called every
+ * Update rq->cpu_load statistics. This function is usually called every
* scheduler tick (TICK_NSEC). With tickless idle this will not be called
* every tick. We fix it up based on jiffies.
*/
static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
unsigned long pending_updates)
{
- int i, scale;
-
this_rq->nr_load_updates++;
/* Update our load: */
- this_rq->cpu_load[0] = this_load; /* Fasttrack for idx 0 */
- for (i = 1, scale = 2; i < CPU_LOAD_IDX_MAX; i++, scale += scale) {
- unsigned long old_load, new_load;
-
- /* scale is effectively 1 << i now, and >> i divides by scale */
-
- old_load = this_rq->cpu_load[i];
- old_load = decay_load_missed(old_load, pending_updates - 1, i);
- new_load = this_load;
- /*
- * Round up the averaging division if load is increasing. This
- * prevents us from getting stuck on 9 if the load is 10, for
- * example.
- */
- if (new_load > old_load)
- new_load += scale - 1;
-
- this_rq->cpu_load[i] = (old_load * (scale - 1) + new_load) >> i;
- }
+ this_rq->cpu_load = this_load; /* Fasttrack for idx 0 */
sched_avg_update(this_rq);
}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 88c85b2..01f6e7a 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -413,8 +413,7 @@ struct rq {
unsigned int nr_numa_running;
unsigned int nr_preferred_running;
#endif
- #define CPU_LOAD_IDX_MAX 5
- unsigned long cpu_load[CPU_LOAD_IDX_MAX];
+ unsigned long cpu_load;
unsigned long last_load_update_tick;
#ifdef CONFIG_NO_HZ_COMMON
u64 nohz_stamp;
--
1.8.1.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC PATCH 3/4] sched: clean up __update_cpu_load
2013-11-22 6:37 [RFC PATCH 0/4] sched: remove cpu_load decay Alex Shi
2013-11-22 6:37 ` [RFC PATCH 1/4] sched: shortcut to remove load_idx effect Alex Shi
2013-11-22 6:37 ` [RFC PATCH 2/4] sched: change rq->cpu_load[load_idx] array to rq->cpu_load Alex Shi
@ 2013-11-22 6:37 ` Alex Shi
2013-11-22 6:37 ` [RFC PATCH 4/4] sched/nohz_full: give correct cpu load for nohz_full cpu Alex Shi
` (2 subsequent siblings)
5 siblings, 0 replies; 18+ messages in thread
From: Alex Shi @ 2013-11-22 6:37 UTC (permalink / raw)
To: mingo, peterz, morten.rasmussen, vincent.guittot, daniel.lezcano,
fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
fengguang.wu
Cc: james.hogan, alex.shi, jason.low2, gregkh, hanjun.guo, linux-kernel
Since we don't decay the rq->cpu_load, so we don't need the
pending_updates. But we still want update rq->rt_avg, so
still keep rq->last_load_update_tick and func __update_cpu_load.
After remove the load_idx, in the most of time the source_load is
equal to target_load, except only when source cpu is idle. At that
time we force set the cpu_load is 0(in update_cpu_load_nohz).
So we still left cpu_load in rq.
Signed-off-by: Alex Shi <alex.shi@linaro.org>
---
kernel/sched/proc.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/kernel/sched/proc.c b/kernel/sched/proc.c
index a2435c5..057bb9b 100644
--- a/kernel/sched/proc.c
+++ b/kernel/sched/proc.c
@@ -404,8 +404,7 @@ static void calc_load_account_active(struct rq *this_rq)
* scheduler tick (TICK_NSEC). With tickless idle this will not be called
* every tick. We fix it up based on jiffies.
*/
-static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
- unsigned long pending_updates)
+static void __update_cpu_load(struct rq *this_rq, unsigned long this_load)
{
this_rq->nr_load_updates++;
@@ -449,7 +448,6 @@ void update_idle_cpu_load(struct rq *this_rq)
{
unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
unsigned long load = get_rq_runnable_load(this_rq);
- unsigned long pending_updates;
/*
* bail if there's load or we're actually up-to-date.
@@ -457,10 +455,9 @@ void update_idle_cpu_load(struct rq *this_rq)
if (load || curr_jiffies == this_rq->last_load_update_tick)
return;
- pending_updates = curr_jiffies - this_rq->last_load_update_tick;
this_rq->last_load_update_tick = curr_jiffies;
- __update_cpu_load(this_rq, load, pending_updates);
+ __update_cpu_load(this_rq, load);
}
/*
@@ -483,7 +480,7 @@ void update_cpu_load_nohz(void)
* We were idle, this means load 0, the current load might be
* !0 due to remote wakeups and the sort.
*/
- __update_cpu_load(this_rq, 0, pending_updates);
+ __update_cpu_load(this_rq, 0);
}
raw_spin_unlock(&this_rq->lock);
}
@@ -499,7 +496,7 @@ void update_cpu_load_active(struct rq *this_rq)
* See the mess around update_idle_cpu_load() / update_cpu_load_nohz().
*/
this_rq->last_load_update_tick = jiffies;
- __update_cpu_load(this_rq, load, 1);
+ __update_cpu_load(this_rq, load);
calc_load_account_active(this_rq);
}
--
1.8.1.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC PATCH 4/4] sched/nohz_full: give correct cpu load for nohz_full cpu
2013-11-22 6:37 [RFC PATCH 0/4] sched: remove cpu_load decay Alex Shi
` (2 preceding siblings ...)
2013-11-22 6:37 ` [RFC PATCH 3/4] sched: clean up __update_cpu_load Alex Shi
@ 2013-11-22 6:37 ` Alex Shi
2013-11-26 12:59 ` Alex Shi
2013-11-22 12:13 ` [RFC PATCH 0/4] sched: remove cpu_load decay Daniel Lezcano
2013-11-27 2:48 ` Alex Shi
5 siblings, 1 reply; 18+ messages in thread
From: Alex Shi @ 2013-11-22 6:37 UTC (permalink / raw)
To: mingo, peterz, morten.rasmussen, vincent.guittot, daniel.lezcano,
fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
fengguang.wu
Cc: james.hogan, alex.shi, jason.low2, gregkh, hanjun.guo, linux-kernel
When a nohz_full cpu in tickless mode, it may update cpu_load in
following chain:
__tick_nohz_full_check
tick_nohz_restart_sched_tick
update_cpu_load_nohz
then it will be set a incorrect cpu_load: 0.
This patch try to fix it and give it the correct cpu_load value.
Signed-off-by: Alex Shi <alex.shi@linaro.org>
---
kernel/sched/proc.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/proc.c b/kernel/sched/proc.c
index 057bb9b..5058e6a 100644
--- a/kernel/sched/proc.c
+++ b/kernel/sched/proc.c
@@ -477,10 +477,16 @@ void update_cpu_load_nohz(void)
if (pending_updates) {
this_rq->last_load_update_tick = curr_jiffies;
/*
- * We were idle, this means load 0, the current load might be
- * !0 due to remote wakeups and the sort.
+ * We may has one task and in NO_HZ_FULL, then use normal
+ * cfs load.
+ * Or we were idle, this means load 0, the current load might
+ * be !0 due to remote wakeups and the sort.
*/
- __update_cpu_load(this_rq, 0);
+ if (this_rq->cfs.h_nr_running) {
+ unsigned load = get_rq_runnable_load(this_rq);
+ __update_cpu_load(this_rq, load);
+ } else
+ __update_cpu_load(this_rq, 0);
}
raw_spin_unlock(&this_rq->lock);
}
--
1.8.1.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/4] sched: remove cpu_load decay.
2013-11-22 6:37 [RFC PATCH 0/4] sched: remove cpu_load decay Alex Shi
` (3 preceding siblings ...)
2013-11-22 6:37 ` [RFC PATCH 4/4] sched/nohz_full: give correct cpu load for nohz_full cpu Alex Shi
@ 2013-11-22 12:13 ` Daniel Lezcano
2013-11-24 5:00 ` Alex Shi
` (2 more replies)
2013-11-27 2:48 ` Alex Shi
5 siblings, 3 replies; 18+ messages in thread
From: Daniel Lezcano @ 2013-11-22 12:13 UTC (permalink / raw)
To: Alex Shi, mingo, peterz, morten.rasmussen, vincent.guittot,
fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
fengguang.wu
Cc: james.hogan, jason.low2, gregkh, hanjun.guo, linux-kernel
On 11/22/2013 07:37 AM, Alex Shi wrote:
> The cpu_load decays on time according past cpu load of rq. New sched_avg decays on tasks' load of time. Now we has 2 kind decay for cpu_load. That is a kind of redundancy. And increase the system load in sched_tick etc.
>
> This patch trying to remove the cpu_load decay. And fixed a nohz_full bug by the way.
>
> There are 5 load_idx used for cpu_load in sched_domain. busy_idx and idle_idx are not zero usually, but newidle_idx, wake_idx and forkexec_idx are all zero on every arch. A shortcut to remove cpu_Load decay in the first patch. just one line patch for this change. :)
>
> I have tested the patchset on my pandaES board, 2 cores ARM Cortex A9.
> hackbench thread/pipe performance increased nearly 10% with this patchset! That do surprise me!
>
> latest kernel 527d1511310a89 + this patchset
> hackbench -T -g 10 -f 40
> 23.25" 21.7"
> 23.16" 19.99"
> 24.24" 21.53"
> hackbench -p -g 10 -f 40
> 26.52" 22.48"
> 23.89" 24.00"
> 25.65" 23.06"
> hackbench -P -g 10 -f 40
> 20.14" 19.37"
> 19.96" 19.76"
> 21.76" 21.54"
>
> The git tree for this patchset at:
> git@github.com:alexshi/power-scheduling.git no-load-idx
> Since Fengguang had included this tree into his kernel testing system. and I haven't get a regression report until now. I suppose it is fine for x86 system.
>
> But anyway, since the scheduler change will effect all archs. and hackbench is only benchmark I found now for this patchset. I'd like to see more testing and talking on this patchset.
Hi Alex,
I tried on my Xeon server (2 x 4 cores) your patchset and got the
following result:
kernel a5d6e63323fe7799eb0e6 / + patchset
hackbench -T -s 4096 -l 1000 -g 10 -f 40
27.604 38.556
27.397 38.694
26.695 38.647
25.975 38.528
29.586 38.553
25.956 38.331
27.895 38.472
26.874 38.608
26.836 38.341
28.064 38.626
hackbench -p -s 4096 -l 1000 -g 10 -f 40
34.502 35.489
34.551 35.389
34.027 35.664
34.343 35.418
34.570 35.423
34.386 35.466
34.387 35.486
33.869 35.212
34.600 35.465
34.155 35.235
hackbench -P -s 4096 -l 1000 -g 10 -f 40
39.170 38.794
39.108 38.662
39.056 38.946
39.120 38.668
38.896 38.865
39.109 38.803
39.020 38.946
39.099 38.844
38.820 38.872
38.923 39.337
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/4] sched: remove cpu_load decay.
2013-11-22 12:13 ` [RFC PATCH 0/4] sched: remove cpu_load decay Daniel Lezcano
@ 2013-11-24 5:00 ` Alex Shi
2013-11-24 5:29 ` Alex Shi
2013-11-25 0:58 ` Alex Shi
2 siblings, 0 replies; 18+ messages in thread
From: Alex Shi @ 2013-11-24 5:00 UTC (permalink / raw)
To: Daniel Lezcano, mingo, peterz, morten.rasmussen, vincent.guittot,
fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
fengguang.wu
Cc: james.hogan, jason.low2, gregkh, hanjun.guo, linux-kernel
On 11/22/2013 08:13 PM, Daniel Lezcano wrote:
>>
>> The git tree for this patchset at:
>> git@github.com:alexshi/power-scheduling.git no-load-idx
>> Since Fengguang had included this tree into his kernel testing system.
>> and I haven't get a regression report until now. I suppose it is fine
>> for x86 system.
>>
>> But anyway, since the scheduler change will effect all archs. and
>> hackbench is only benchmark I found now for this patchset. I'd like to
>> see more testing and talking on this patchset.
>
> Hi Alex,
>
> I tried on my Xeon server (2 x 4 cores) your patchset and got the
> following result:
>
> kernel a5d6e63323fe7799eb0e6 / + patchset
>
> hackbench -T -s 4096 -l 1000 -g 10 -f 40
> 27.604 38.556
Thanks for your testing, Daniel!
Fengguang, how about your kernel results for this patchset?
--
Thanks
Alex
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/4] sched: remove cpu_load decay.
2013-11-22 12:13 ` [RFC PATCH 0/4] sched: remove cpu_load decay Daniel Lezcano
2013-11-24 5:00 ` Alex Shi
@ 2013-11-24 5:29 ` Alex Shi
2013-11-26 12:35 ` Daniel Lezcano
2013-11-25 0:58 ` Alex Shi
2 siblings, 1 reply; 18+ messages in thread
From: Alex Shi @ 2013-11-24 5:29 UTC (permalink / raw)
To: Daniel Lezcano, mingo, peterz, morten.rasmussen, vincent.guittot,
fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
fengguang.wu
Cc: james.hogan, jason.low2, gregkh, hanjun.guo, linux-kernel
On 11/22/2013 08:13 PM, Daniel Lezcano wrote:
>
> Hi Alex,
>
> I tried on my Xeon server (2 x 4 cores) your patchset and got the
> following result:
>
> kernel a5d6e63323fe7799eb0e6 / + patchset
>
> hackbench -T -s 4096 -l 1000 -g 10 -f 40
> 27.604 38.556
Wondering if the following patch is helpful on your Xeon server?
Btw, you can run vmstat as background tool or use 'perf sched'
to get scheduler statistics change for this patchset.
The following are results of original kernel and all 5 patches
on pandaboard ES.
latest kernel 527d1511310a89 + this patchset
hackbench -T -g 10 -f 40
23.25" 20.79"
23.16" 20.4"
24.24" 20.29"
hackbench -p -g 10 -f 40
26.52" 21.2"
23.89" 24.07"
25.65" 20.30"
hackbench -P -g 10 -f 40
20.14" 19.53"
19.96" 20.37"
21.76" 20.39"
------
>From 4f5efd6c2b1e7293410ad57c3db24dcf3394c4a3 Mon Sep 17 00:00:00 2001
From: Alex Shi <alex.shi@linaro.org>
Date: Sat, 23 Nov 2013 23:18:09 +0800
Subject: [PATCH] sched: aggravate target cpu load to reduce task moving
Task migration happens when target just a bit less then source cpu load
to reduce such situation happens, aggravate the target cpu with sd->
imbalance_pct.
Signed-off-by: Alex Shi <alex.shi@linaro.org>
---
kernel/sched/fair.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bccdd89..c49b7ba 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -978,7 +978,7 @@ static inline unsigned long group_weight(struct task_struct *p, int nid)
static unsigned long weighted_cpuload(const int cpu);
static unsigned long source_load(int cpu);
-static unsigned long target_load(int cpu);
+static unsigned long target_load(int cpu, int imbalance_pct);
static unsigned long power_of(int cpu);
static long effective_load(struct task_group *tg, int cpu, long wl, long wg);
@@ -3809,11 +3809,17 @@ static unsigned long source_load(int cpu)
* Return a high guess at the load of a migration-target cpu weighted
* according to the scheduling class and "nice" value.
*/
-static unsigned long target_load(int cpu)
+static unsigned long target_load(int cpu, int imbalance_pct)
{
struct rq *rq = cpu_rq(cpu);
unsigned long total = weighted_cpuload(cpu);
+ /*
+ * without cpu_load decay, in most of time cpu_load is same as total
+ * so we need to make target a bit heavier to reduce task migration
+ */
+ total = total * imbalance_pct / 100;
+
if (!sched_feat(LB_BIAS))
return total;
@@ -4033,7 +4039,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
this_cpu = smp_processor_id();
prev_cpu = task_cpu(p);
load = source_load(prev_cpu);
- this_load = target_load(this_cpu);
+ this_load = target_load(this_cpu, 100);
/*
* If sync wakeup then subtract the (maximum possible)
@@ -4089,7 +4095,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
if (balanced ||
(this_load <= load &&
- this_load + target_load(prev_cpu) <= tl_per_task)) {
+ this_load + target_load(prev_cpu, 100) <= tl_per_task)) {
/*
* This domain has SD_WAKE_AFFINE and
* p is cache cold in this domain, and
@@ -4135,7 +4141,7 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu)
if (local_group)
load = source_load(i);
else
- load = target_load(i);
+ load = target_load(i, sd->imbalance_pct);
avg_load += load;
}
@@ -5478,7 +5484,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
/* Bias balancing toward cpus of our domain */
if (local_group)
- load = target_load(i);
+ load = target_load(i, env->sd->imbalance_pct);
else
load = source_load(i);
--
1.8.1.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/4] sched: remove cpu_load decay.
2013-11-22 12:13 ` [RFC PATCH 0/4] sched: remove cpu_load decay Daniel Lezcano
2013-11-24 5:00 ` Alex Shi
2013-11-24 5:29 ` Alex Shi
@ 2013-11-25 0:58 ` Alex Shi
2013-11-25 8:36 ` Daniel Lezcano
2 siblings, 1 reply; 18+ messages in thread
From: Alex Shi @ 2013-11-25 0:58 UTC (permalink / raw)
To: Daniel Lezcano, mingo, peterz, morten.rasmussen, vincent.guittot,
fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
fengguang.wu
Cc: james.hogan, jason.low2, gregkh, hanjun.guo, linux-kernel
On 11/22/2013 08:13 PM, Daniel Lezcano wrote:
>
> Hi Alex,
>
> I tried on my Xeon server (2 x 4 cores) your patchset and got the
> following result:
>
> kernel a5d6e63323fe7799eb0e6 / + patchset
>
> hackbench -T -s 4096 -l 1000 -g 10 -f 40
> 27.604 38.556
Hi Daniel, would you like give the detailed server info? 2 socket * 4
cores, sounds it isn't a modern machine.
--
Thanks
Alex
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/4] sched: remove cpu_load decay.
2013-11-25 0:58 ` Alex Shi
@ 2013-11-25 8:36 ` Daniel Lezcano
2013-11-25 12:00 ` Alex Shi
0 siblings, 1 reply; 18+ messages in thread
From: Daniel Lezcano @ 2013-11-25 8:36 UTC (permalink / raw)
To: Alex Shi, mingo, peterz, morten.rasmussen, vincent.guittot,
fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
fengguang.wu
Cc: james.hogan, jason.low2, gregkh, hanjun.guo, linux-kernel
On 11/25/2013 01:58 AM, Alex Shi wrote:
> On 11/22/2013 08:13 PM, Daniel Lezcano wrote:
>>
>> Hi Alex,
>>
>> I tried on my Xeon server (2 x 4 cores) your patchset and got the
>> following result:
>>
>> kernel a5d6e63323fe7799eb0e6 / + patchset
>>
>> hackbench -T -s 4096 -l 1000 -g 10 -f 40
>> 27.604 38.556
>
> Hi Daniel, would you like give the detailed server info? 2 socket * 4
> cores, sounds it isn't a modern machine.
Well it has several years old now, that's true but still competing with
some recent processors :)
Bi-Xeon E5345 2.33GHz / 8Mb L2 cache / 7BG FB-DIMM Memory 667 MHz /
300GB SSD 3Gb/s
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/4] sched: remove cpu_load decay.
2013-11-25 8:36 ` Daniel Lezcano
@ 2013-11-25 12:00 ` Alex Shi
0 siblings, 0 replies; 18+ messages in thread
From: Alex Shi @ 2013-11-25 12:00 UTC (permalink / raw)
To: Daniel Lezcano, mingo, peterz, morten.rasmussen, vincent.guittot,
fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
fengguang.wu
Cc: james.hogan, jason.low2, gregkh, hanjun.guo, linux-kernel
On 11/25/2013 04:36 PM, Daniel Lezcano wrote:
> On 11/25/2013 01:58 AM, Alex Shi wrote:
>> On 11/22/2013 08:13 PM, Daniel Lezcano wrote:
>>>
>>> Hi Alex,
>>>
>>> I tried on my Xeon server (2 x 4 cores) your patchset and got the
>>> following result:
>>>
>>> kernel a5d6e63323fe7799eb0e6 / + patchset
>>>
>>> hackbench -T -s 4096 -l 1000 -g 10 -f 40
>>> 27.604 38.556
>>
>> Hi Daniel, would you like give the detailed server info? 2 socket * 4
>> cores, sounds it isn't a modern machine.
>
> Well it has several years old now, that's true but still competing with
> some recent processors :)
>
> Bi-Xeon E5345 2.33GHz / 8Mb L2 cache / 7BG FB-DIMM Memory 667 MHz /
> 300GB SSD 3Gb/s
>
>
It is a core2 CPU, quite old.
Fengguang, do you include similar box in your system?
--
Thanks
Alex
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/4] sched: remove cpu_load decay.
2013-11-24 5:29 ` Alex Shi
@ 2013-11-26 12:35 ` Daniel Lezcano
2013-11-26 12:52 ` Alex Shi
0 siblings, 1 reply; 18+ messages in thread
From: Daniel Lezcano @ 2013-11-26 12:35 UTC (permalink / raw)
To: Alex Shi, mingo, peterz, morten.rasmussen, vincent.guittot,
fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
fengguang.wu
Cc: james.hogan, jason.low2, gregkh, hanjun.guo, linux-kernel
On 11/24/2013 06:29 AM, Alex Shi wrote:
> On 11/22/2013 08:13 PM, Daniel Lezcano wrote:
>>
>> Hi Alex,
>>
>> I tried on my Xeon server (2 x 4 cores) your patchset and got the
>> following result:
>>
>> kernel a5d6e63323fe7799eb0e6 / + patchset
>>
>> hackbench -T -s 4096 -l 1000 -g 10 -f 40
>> 27.604 38.556
>
> Wondering if the following patch is helpful on your Xeon server?
>
> Btw, you can run vmstat as background tool or use 'perf sched'
> to get scheduler statistics change for this patchset.
>
> The following are results of original kernel and all 5 patches
> on pandaboard ES.
>
> latest kernel 527d1511310a89 + this patchset
> hackbench -T -g 10 -f 40
> 23.25" 20.79"
> 23.16" 20.4"
> 24.24" 20.29"
> hackbench -p -g 10 -f 40
> 26.52" 21.2"
> 23.89" 24.07"
> 25.65" 20.30"
> hackbench -P -g 10 -f 40
> 20.14" 19.53"
> 19.96" 20.37"
> 21.76" 20.39"
>
Here the new results with your patchset + patch #5
I have some issues with perf for the moment, so I will fix it up and
send the result after.
527d1511310a / + patchset + #5
hackbench -T -s 4096 -l 1000 -g 10 -f 40
26.677 30.308
27.914 28.497
28.390 30.360
28.048 28.587
26.344 29.513
27.848 28.706
28.315 30.152
28.232 29.721
26.549 28.766
30.340 38.801
hackbench -p -s 4096 -l 1000 -g 10 -f 40
34.522 35.469
34.545 34.966
34.469 35.342
34.115 35.286
34.457 35.592
34.561 35.314
34.459 35.316
34.054 35.629
34.532 35.149
34.459 34.876
hackbench -P -s 4096 -l 1000 -g 10 -f 40
38.938 30.308
39.363 28.497
39.340 30.360
38.909 28.587
39.095 29.513
38.869 28.706
39.041 30.152
38.939 29.721
38.992 28.766
38.947 38.801
> ------
> From 4f5efd6c2b1e7293410ad57c3db24dcf3394c4a3 Mon Sep 17 00:00:00 2001
> From: Alex Shi <alex.shi@linaro.org>
> Date: Sat, 23 Nov 2013 23:18:09 +0800
> Subject: [PATCH] sched: aggravate target cpu load to reduce task moving
>
> Task migration happens when target just a bit less then source cpu load
> to reduce such situation happens, aggravate the target cpu with sd->
> imbalance_pct.
>
> Signed-off-by: Alex Shi <alex.shi@linaro.org>
> ---
> kernel/sched/fair.c | 18 ++++++++++++------
> 1 file changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bccdd89..c49b7ba 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -978,7 +978,7 @@ static inline unsigned long group_weight(struct task_struct *p, int nid)
>
> static unsigned long weighted_cpuload(const int cpu);
> static unsigned long source_load(int cpu);
> -static unsigned long target_load(int cpu);
> +static unsigned long target_load(int cpu, int imbalance_pct);
> static unsigned long power_of(int cpu);
> static long effective_load(struct task_group *tg, int cpu, long wl, long wg);
>
> @@ -3809,11 +3809,17 @@ static unsigned long source_load(int cpu)
> * Return a high guess at the load of a migration-target cpu weighted
> * according to the scheduling class and "nice" value.
> */
> -static unsigned long target_load(int cpu)
> +static unsigned long target_load(int cpu, int imbalance_pct)
> {
> struct rq *rq = cpu_rq(cpu);
> unsigned long total = weighted_cpuload(cpu);
>
> + /*
> + * without cpu_load decay, in most of time cpu_load is same as total
> + * so we need to make target a bit heavier to reduce task migration
> + */
> + total = total * imbalance_pct / 100;
> +
> if (!sched_feat(LB_BIAS))
> return total;
>
> @@ -4033,7 +4039,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
> this_cpu = smp_processor_id();
> prev_cpu = task_cpu(p);
> load = source_load(prev_cpu);
> - this_load = target_load(this_cpu);
> + this_load = target_load(this_cpu, 100);
>
> /*
> * If sync wakeup then subtract the (maximum possible)
> @@ -4089,7 +4095,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
>
> if (balanced ||
> (this_load <= load &&
> - this_load + target_load(prev_cpu) <= tl_per_task)) {
> + this_load + target_load(prev_cpu, 100) <= tl_per_task)) {
> /*
> * This domain has SD_WAKE_AFFINE and
> * p is cache cold in this domain, and
> @@ -4135,7 +4141,7 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu)
> if (local_group)
> load = source_load(i);
> else
> - load = target_load(i);
> + load = target_load(i, sd->imbalance_pct);
>
> avg_load += load;
> }
> @@ -5478,7 +5484,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>
> /* Bias balancing toward cpus of our domain */
> if (local_group)
> - load = target_load(i);
> + load = target_load(i, env->sd->imbalance_pct);
> else
> load = source_load(i);
>
>
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/4] sched: remove cpu_load decay.
2013-11-26 12:35 ` Daniel Lezcano
@ 2013-11-26 12:52 ` Alex Shi
2013-11-26 12:57 ` Daniel Lezcano
2013-11-26 13:01 ` Daniel Lezcano
0 siblings, 2 replies; 18+ messages in thread
From: Alex Shi @ 2013-11-26 12:52 UTC (permalink / raw)
To: Daniel Lezcano, mingo, peterz, morten.rasmussen, vincent.guittot,
fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
fengguang.wu
Cc: james.hogan, jason.low2, gregkh, hanjun.guo, linux-kernel
On 11/26/2013 08:35 PM, Daniel Lezcano wrote:
>
>
> Here the new results with your patchset + patch #5
>
> I have some issues with perf for the moment, so I will fix it up and
> send the result after.
Thanks a lot, Daniel!
The result is pretty good!, thread/pipe performance has a slight little
drop, but processes performance increase about 25%!
>
>
> 527d1511310a / + patchset + #5
>
> hackbench -T -s 4096 -l 1000 -g 10 -f 40
> 26.677 30.308
> 27.914 28.497
> 28.390 30.360
> 28.048 28.587
> 26.344 29.513
> 27.848 28.706
> 28.315 30.152
> 28.232 29.721
> 26.549 28.766
> 30.340 38.801
> hackbench -p -s 4096 -l 1000 -g 10 -f 40
> 34.522 35.469
> 34.545 34.966
> 34.469 35.342
> 34.115 35.286
> 34.457 35.592
> 34.561 35.314
> 34.459 35.316
> 34.054 35.629
> 34.532 35.149
> 34.459 34.876
> hackbench -P -s 4096 -l 1000 -g 10 -f 40
> 38.938 30.308
> 39.363 28.497
> 39.340 30.360
> 38.909 28.587
> 39.095 29.513
> 38.869 28.706
> 39.041 30.152
> 38.939 29.721
> 38.992 28.766
> 38.947 38.801
--
Thanks
Alex
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/4] sched: remove cpu_load decay.
2013-11-26 12:52 ` Alex Shi
@ 2013-11-26 12:57 ` Daniel Lezcano
2013-11-26 13:01 ` Daniel Lezcano
1 sibling, 0 replies; 18+ messages in thread
From: Daniel Lezcano @ 2013-11-26 12:57 UTC (permalink / raw)
To: Alex Shi, mingo, peterz, morten.rasmussen, vincent.guittot,
fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
fengguang.wu
Cc: james.hogan, jason.low2, gregkh, hanjun.guo, linux-kernel
On 11/26/2013 01:52 PM, Alex Shi wrote:
> On 11/26/2013 08:35 PM, Daniel Lezcano wrote:
>>
>>
>> Here the new results with your patchset + patch #5
>>
>> I have some issues with perf for the moment, so I will fix it up and
>> send the result after.
>
> Thanks a lot, Daniel!
> The result is pretty good!, thread/pipe performance has a slight little
> drop, but processes performance increase about 25%!
Mmh, wait. Let me double check the results, it sounds weird we have so
much performance increase.
>>
>> 527d1511310a / + patchset + #5
>>
>> hackbench -T -s 4096 -l 1000 -g 10 -f 40
>> 26.677 30.308
>> 27.914 28.497
>> 28.390 30.360
>> 28.048 28.587
>> 26.344 29.513
>> 27.848 28.706
>> 28.315 30.152
>> 28.232 29.721
>> 26.549 28.766
>> 30.340 38.801
>> hackbench -p -s 4096 -l 1000 -g 10 -f 40
>> 34.522 35.469
>> 34.545 34.966
>> 34.469 35.342
>> 34.115 35.286
>> 34.457 35.592
>> 34.561 35.314
>> 34.459 35.316
>> 34.054 35.629
>> 34.532 35.149
>> 34.459 34.876
>> hackbench -P -s 4096 -l 1000 -g 10 -f 40
>> 38.938 30.308
>> 39.363 28.497
>> 39.340 30.360
>> 38.909 28.587
>> 39.095 29.513
>> 38.869 28.706
>> 39.041 30.152
>> 38.939 29.721
>> 38.992 28.766
>> 38.947 38.801
>
>
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 4/4] sched/nohz_full: give correct cpu load for nohz_full cpu
2013-11-22 6:37 ` [RFC PATCH 4/4] sched/nohz_full: give correct cpu load for nohz_full cpu Alex Shi
@ 2013-11-26 12:59 ` Alex Shi
0 siblings, 0 replies; 18+ messages in thread
From: Alex Shi @ 2013-11-26 12:59 UTC (permalink / raw)
To: mingo, peterz, morten.rasmussen, vincent.guittot, daniel.lezcano,
fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
fengguang.wu
Cc: james.hogan, alex.shi, jason.low2, gregkh, hanjun.guo, linux-kernel
On 11/22/2013 02:37 PM, Alex Shi wrote:
> When a nohz_full cpu in tickless mode, it may update cpu_load in
> following chain:
> __tick_nohz_full_check
> tick_nohz_restart_sched_tick
> update_cpu_load_nohz
> then it will be set a incorrect cpu_load: 0.
> This patch try to fix it and give it the correct cpu_load value.
Frederic,
Would you like to give some comments on this patch?
--
Thanks
Alex
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/4] sched: remove cpu_load decay.
2013-11-26 12:52 ` Alex Shi
2013-11-26 12:57 ` Daniel Lezcano
@ 2013-11-26 13:01 ` Daniel Lezcano
2013-11-26 13:04 ` Alex Shi
1 sibling, 1 reply; 18+ messages in thread
From: Daniel Lezcano @ 2013-11-26 13:01 UTC (permalink / raw)
To: Alex Shi, mingo, peterz, morten.rasmussen, vincent.guittot,
fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
fengguang.wu
Cc: james.hogan, jason.low2, gregkh, hanjun.guo, linux-kernel
On 11/26/2013 01:52 PM, Alex Shi wrote:
> On 11/26/2013 08:35 PM, Daniel Lezcano wrote:
>>
>>
>> Here the new results with your patchset + patch #5
>>
>> I have some issues with perf for the moment, so I will fix it up and
>> send the result after.
>
> Thanks a lot, Daniel!
> The result is pretty good!, thread/pipe performance has a slight little
> drop, but processes performance increase about 25%!
>
>
>>
>>
>> 527d1511310a / + patchset + #5
>>
>> hackbench -T -s 4096 -l 1000 -g 10 -f 40
>> 26.677 30.308
>> 27.914 28.497
>> 28.390 30.360
>> 28.048 28.587
>> 26.344 29.513
>> 27.848 28.706
>> 28.315 30.152
>> 28.232 29.721
>> 26.549 28.766
>> 30.340 38.801
>> hackbench -p -s 4096 -l 1000 -g 10 -f 40
>> 34.522 35.469
>> 34.545 34.966
>> 34.469 35.342
>> 34.115 35.286
>> 34.457 35.592
>> 34.561 35.314
>> 34.459 35.316
>> 34.054 35.629
>> 34.532 35.149
>> 34.459 34.876
>> hackbench -P -s 4096 -l 1000 -g 10 -f 40
>> 38.938 30.308
>> 39.363 28.497
>> 39.340 30.360
>> 38.909 28.587
>> 39.095 29.513
>> 38.869 28.706
>> 39.041 30.152
>> 38.939 29.721
>> 38.992 28.766
>> 38.947 38.801
Ok, bad copy-paste, the third test run results with the patchset is wrong.
hackbench -P -s 4096 -l 1000 -g 10 -f 40
38.938 39.585
39.363 39.008
39.340 38.954
38.909 39.273
39.095 38.755
38.869 39.003
39.041 38.945
38.939 38.005
38.992 38.994
38.947 38.855
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/4] sched: remove cpu_load decay.
2013-11-26 13:01 ` Daniel Lezcano
@ 2013-11-26 13:04 ` Alex Shi
0 siblings, 0 replies; 18+ messages in thread
From: Alex Shi @ 2013-11-26 13:04 UTC (permalink / raw)
To: Daniel Lezcano, mingo, peterz, morten.rasmussen, vincent.guittot,
fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
fengguang.wu
Cc: james.hogan, jason.low2, gregkh, hanjun.guo, linux-kernel
On 11/26/2013 09:01 PM, Daniel Lezcano wrote:
>
> Ok, bad copy-paste, the third test run results with the patchset is wrong.
>
> hackbench -P -s 4096 -l 1000 -g 10 -f 40
> 38.938 39.585
> 39.363 39.008
> 39.340 38.954
> 38.909 39.273
> 39.095 38.755
> 38.869 39.003
> 39.041 38.945
> 38.939 38.005
> 38.992 38.994
> 38.947 38.855
Oops,
Anyway, at least, no harm on hackbench process testing. :)
--
Thanks
Alex
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/4] sched: remove cpu_load decay.
2013-11-22 6:37 [RFC PATCH 0/4] sched: remove cpu_load decay Alex Shi
` (4 preceding siblings ...)
2013-11-22 12:13 ` [RFC PATCH 0/4] sched: remove cpu_load decay Daniel Lezcano
@ 2013-11-27 2:48 ` Alex Shi
5 siblings, 0 replies; 18+ messages in thread
From: Alex Shi @ 2013-11-27 2:48 UTC (permalink / raw)
To: mingo, peterz, morten.rasmussen, vincent.guittot, daniel.lezcano,
fweisbec, linux, tony.luck, fenghua.yu, tglx, akpm, arjan, pjt,
fengguang.wu
Cc: james.hogan, alex.shi, jason.low2, gregkh, hanjun.guo, linux-kernel
On 11/22/2013 02:37 PM, Alex Shi wrote:
> latest kernel 527d1511310a89 + this patchset
> hackbench -T -g 10 -f 40
> 23.25" 21.7"
> 23.16" 19.99"
> 24.24" 21.53"
> hackbench -p -g 10 -f 40
> 26.52" 22.48"
> 23.89" 24.00"
> 25.65" 23.06"
> hackbench -P -g 10 -f 40
> 20.14" 19.37"
> 19.96" 19.76"
> 21.76" 21.54"
>
> The git tree for this patchset at:
> git@github.com:alexshi/power-scheduling.git no-load-idx
Fengguang,
Did your kernel testing find sth unusual on this 3 patches?
--
Thanks
Alex
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2013-11-27 2:48 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-22 6:37 [RFC PATCH 0/4] sched: remove cpu_load decay Alex Shi
2013-11-22 6:37 ` [RFC PATCH 1/4] sched: shortcut to remove load_idx effect Alex Shi
2013-11-22 6:37 ` [RFC PATCH 2/4] sched: change rq->cpu_load[load_idx] array to rq->cpu_load Alex Shi
2013-11-22 6:37 ` [RFC PATCH 3/4] sched: clean up __update_cpu_load Alex Shi
2013-11-22 6:37 ` [RFC PATCH 4/4] sched/nohz_full: give correct cpu load for nohz_full cpu Alex Shi
2013-11-26 12:59 ` Alex Shi
2013-11-22 12:13 ` [RFC PATCH 0/4] sched: remove cpu_load decay Daniel Lezcano
2013-11-24 5:00 ` Alex Shi
2013-11-24 5:29 ` Alex Shi
2013-11-26 12:35 ` Daniel Lezcano
2013-11-26 12:52 ` Alex Shi
2013-11-26 12:57 ` Daniel Lezcano
2013-11-26 13:01 ` Daniel Lezcano
2013-11-26 13:04 ` Alex Shi
2013-11-25 0:58 ` Alex Shi
2013-11-25 8:36 ` Daniel Lezcano
2013-11-25 12:00 ` Alex Shi
2013-11-27 2:48 ` Alex Shi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).