* [ RFC patch 0/4]: use runnable load avg in cfs balance instead of instant load
@ 2013-01-24 3:30 Alex Shi
2013-01-24 3:30 ` [PATCH 1/4] sched: update cpu load after task_tick Alex Shi
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Alex Shi @ 2013-01-24 3:30 UTC (permalink / raw)
To: mingo, peterz, tglx, akpm, arjan, bp, pjt, namhyung, efault
Cc: vincent.guittot, gregkh, preeti, viresh.kumar, linux-kernel, alex.shi
This patchset can be used, but causes burst waking benchmark aim9 drop 5~7%
on my 2 sockets machine. The reason is too light runnable load in early stage
of waked tasks cause imbalance in balancing.
So, it is immature and just a reference for guys who want to go gurther.
Thanks!
Alex
[PATCH 1/4] sched: update cpu load after task_tick.
[PATCH 2/4] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task
[PATCH 3/4] sched: consider runnable load average in move_tasks
[PATCH 4/4] sched: consider runnable load average in effective_load
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/4] sched: update cpu load after task_tick.
2013-01-24 3:30 [ RFC patch 0/4]: use runnable load avg in cfs balance instead of instant load Alex Shi
@ 2013-01-24 3:30 ` Alex Shi
2013-01-24 3:30 ` [PATCH 2/4] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task Alex Shi
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Alex Shi @ 2013-01-24 3:30 UTC (permalink / raw)
To: mingo, peterz, tglx, akpm, arjan, bp, pjt, namhyung, efault
Cc: vincent.guittot, gregkh, preeti, viresh.kumar, linux-kernel, alex.shi
To get the latest runnable info, we need do this cpuload update after
task_tick.
Signed-off-by: Alex Shi <alex.shi@intel.com>
---
kernel/sched/core.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index dbab4b3..4f4714e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2695,8 +2695,8 @@ void scheduler_tick(void)
raw_spin_lock(&rq->lock);
update_rq_clock(rq);
- update_cpu_load_active(rq);
curr->sched_class->task_tick(rq, curr, 0);
+ update_cpu_load_active(rq);
raw_spin_unlock(&rq->lock);
perf_event_task_tick();
--
1.7.0.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/4] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task
2013-01-24 3:30 [ RFC patch 0/4]: use runnable load avg in cfs balance instead of instant load Alex Shi
2013-01-24 3:30 ` [PATCH 1/4] sched: update cpu load after task_tick Alex Shi
@ 2013-01-24 3:30 ` Alex Shi
2013-01-24 10:08 ` Ingo Molnar
2013-01-24 3:30 ` [PATCH 3/4] sched: consider runnable load average in move_tasks Alex Shi
2013-01-24 3:30 ` [PATCH 4/4] sched: consider runnable load average in effective_load Alex Shi
3 siblings, 1 reply; 8+ messages in thread
From: Alex Shi @ 2013-01-24 3:30 UTC (permalink / raw)
To: mingo, peterz, tglx, akpm, arjan, bp, pjt, namhyung, efault
Cc: vincent.guittot, gregkh, preeti, viresh.kumar, linux-kernel, alex.shi
They are the base values in load balance, update them with rq runnable
load average, then the load balance will consider runnable load avg
naturally.
Signed-off-by: Alex Shi <alex.shi@intel.com>
---
kernel/sched/core.c | 8 ++++++++
kernel/sched/fair.c | 4 ++--
2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4f4714e..f25b8d8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2539,7 +2539,11 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
void update_idle_cpu_load(struct rq *this_rq)
{
unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
+#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)
+ unsigned long load = (unsigned long)this_rq->cfs.runnable_load_avg;
+#else
unsigned long load = this_rq->load.weight;
+#endif
unsigned long pending_updates;
/*
@@ -2589,7 +2593,11 @@ static void update_cpu_load_active(struct rq *this_rq)
* See the mess around update_idle_cpu_load() / update_cpu_load_nohz().
*/
this_rq->last_load_update_tick = jiffies;
+#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)
+ __update_cpu_load(this_rq, this_rq->cfs.runnable_load_avg, 1);
+#else
__update_cpu_load(this_rq, this_rq->load.weight, 1);
+#endif
calc_load_account_active(this_rq);
}
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7697171..9148bcc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2909,7 +2909,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
/* Used instead of source_load when we know the type == 0 */
static unsigned long weighted_cpuload(const int cpu)
{
- return cpu_rq(cpu)->load.weight;
+ return (unsigned long)cpu_rq(cpu)->cfs.runnable_load_avg;
}
/*
@@ -2956,7 +2956,7 @@ static unsigned long cpu_avg_load_per_task(int cpu)
unsigned long nr_running = ACCESS_ONCE(rq->nr_running);
if (nr_running)
- return rq->load.weight / nr_running;
+ return (unsigned long)rq->cfs.runnable_load_avg / nr_running;
return 0;
}
--
1.7.0.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/4] sched: consider runnable load average in move_tasks
2013-01-24 3:30 [ RFC patch 0/4]: use runnable load avg in cfs balance instead of instant load Alex Shi
2013-01-24 3:30 ` [PATCH 1/4] sched: update cpu load after task_tick Alex Shi
2013-01-24 3:30 ` [PATCH 2/4] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task Alex Shi
@ 2013-01-24 3:30 ` Alex Shi
2013-01-24 3:30 ` [PATCH 4/4] sched: consider runnable load average in effective_load Alex Shi
3 siblings, 0 replies; 8+ messages in thread
From: Alex Shi @ 2013-01-24 3:30 UTC (permalink / raw)
To: mingo, peterz, tglx, akpm, arjan, bp, pjt, namhyung, efault
Cc: vincent.guittot, gregkh, preeti, viresh.kumar, linux-kernel, alex.shi
Except using runnable load average in background, move_tasks is also
the key functions in load balance. We need consider the runnable load
average in it in order to the apple to apple load comparison.
Signed-off-by: Alex Shi <alex.shi@intel.com>
---
kernel/sched/fair.c | 11 ++++++++++-
1 files changed, 10 insertions(+), 1 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9148bcc..876a6a0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4192,6 +4192,15 @@ static unsigned long task_h_load(struct task_struct *p);
static const unsigned int sched_nr_migrate_break = 32;
+static unsigned long task_h_load_avg(struct task_struct *p)
+{
+ u32 period = p->se.avg.runnable_avg_period;
+ if (!period)
+ return 0;
+
+ return task_h_load(p) * p->se.avg.runnable_avg_sum / period;
+}
+
/*
* move_tasks tries to move up to imbalance weighted load from busiest to
* this_rq, as part of a balancing operation within domain "sd".
@@ -4227,7 +4236,7 @@ static int move_tasks(struct lb_env *env)
if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
goto next;
- load = task_h_load(p);
+ load = task_h_load_avg(p);
if (sched_feat(LB_MIN) && load < 16 && !env->sd->nr_balance_failed)
goto next;
--
1.7.0.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 4/4] sched: consider runnable load average in effective_load
2013-01-24 3:30 [ RFC patch 0/4]: use runnable load avg in cfs balance instead of instant load Alex Shi
` (2 preceding siblings ...)
2013-01-24 3:30 ` [PATCH 3/4] sched: consider runnable load average in move_tasks Alex Shi
@ 2013-01-24 3:30 ` Alex Shi
3 siblings, 0 replies; 8+ messages in thread
From: Alex Shi @ 2013-01-24 3:30 UTC (permalink / raw)
To: mingo, peterz, tglx, akpm, arjan, bp, pjt, namhyung, efault
Cc: vincent.guittot, gregkh, preeti, viresh.kumar, linux-kernel, alex.shi
effective_load calculates the load change as seen from the
root_task_group. It needs to engage the runnable average
of changed task.
Thanks for Morten Rasmussen's reminder of this.
Signed-off-by: Alex Shi <alex.shi@intel.com>
---
kernel/sched/fair.c | 27 ++++++++++++++++++++-------
1 files changed, 20 insertions(+), 7 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 876a6a0..76251be 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2985,7 +2985,8 @@ static void task_waking_fair(struct task_struct *p)
#ifdef CONFIG_FAIR_GROUP_SCHED
/*
- * effective_load() calculates the load change as seen from the root_task_group
+ * effective_load() calculates the runnable load average change as seen from
+ * the root_task_group
*
* Adding load to a group doesn't make a group heavier, but can cause movement
* of group shares between cpus. Assuming the shares were perfectly aligned one
@@ -3033,6 +3034,9 @@ static void task_waking_fair(struct task_struct *p)
* Therefore the effective change in loads on CPU 0 would be 5/56 (3/8 - 2/7)
* times the weight of the group. The effect on CPU 1 would be -4/56 (4/8 -
* 4/7) times the weight of the group.
+ *
+ * After get effective_load of the load moving, will engaged the sched entity's
+ * runnable avg.
*/
static long effective_load(struct task_group *tg, int cpu, long wl, long wg)
{
@@ -3107,6 +3111,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
struct task_group *tg;
unsigned long weight;
int balanced;
+ int runnable_avg;
idx = sd->wake_idx;
this_cpu = smp_processor_id();
@@ -3122,13 +3127,19 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
if (sync) {
tg = task_group(current);
weight = current->se.load.weight;
+ runnable_avg = current->se.avg.runnable_avg_sum * NICE_0_LOAD
+ / (current->se.avg.runnable_avg_period + 1);
- this_load += effective_load(tg, this_cpu, -weight, -weight);
- load += effective_load(tg, prev_cpu, 0, -weight);
+ this_load += effective_load(tg, this_cpu, -weight, -weight)
+ * runnable_avg >> NICE_0_SHIFT;
+ load += effective_load(tg, prev_cpu, 0, -weight)
+ * runnable_avg >> NICE_0_SHIFT;
}
tg = task_group(p);
weight = p->se.load.weight;
+ runnable_avg = p->se.avg.runnable_avg_sum * NICE_0_LOAD
+ / (p->se.avg.runnable_avg_period + 1);
/*
* In low-load situations, where prev_cpu is idle and this_cpu is idle
@@ -3140,16 +3151,18 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
* task to be woken on this_cpu.
*/
if (this_load > 0) {
- s64 this_eff_load, prev_eff_load;
+ s64 this_eff_load, prev_eff_load, tmp_eff_load;
this_eff_load = 100;
this_eff_load *= power_of(prev_cpu);
- this_eff_load *= this_load +
- effective_load(tg, this_cpu, weight, weight);
+ tmp_eff_load = effective_load(tg, this_cpu, weight, weight)
+ * runnable_avg >> NICE_0_SHIFT;
+ this_eff_load *= this_load + tmp_eff_load;
prev_eff_load = 100 + (sd->imbalance_pct - 100) / 2;
prev_eff_load *= power_of(this_cpu);
- prev_eff_load *= load + effective_load(tg, prev_cpu, 0, weight);
+ prev_eff_load *= load + (effective_load(tg, prev_cpu, 0, weight)
+ * runnable_avg >> NICE_0_SHIFT);
balanced = this_eff_load <= prev_eff_load;
} else
--
1.7.0.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/4] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task
2013-01-24 3:30 ` [PATCH 2/4] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task Alex Shi
@ 2013-01-24 10:08 ` Ingo Molnar
2013-01-24 15:16 ` Alex Shi
0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2013-01-24 10:08 UTC (permalink / raw)
To: Alex Shi
Cc: mingo, peterz, tglx, akpm, arjan, bp, pjt, namhyung, efault,
vincent.guittot, gregkh, preeti, viresh.kumar, linux-kernel
* Alex Shi <alex.shi@intel.com> wrote:
> @@ -2539,7 +2539,11 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
> void update_idle_cpu_load(struct rq *this_rq)
> {
> unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
> +#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)
> + unsigned long load = (unsigned long)this_rq->cfs.runnable_load_avg;
> +#else
> unsigned long load = this_rq->load.weight;
> +#endif
I'd not make it conditional - just calculate runnable_load_avg
all the time (even if group scheduling is disabled) and use it
consistently. The last thing we want is to bifurcate scheduler
balancer behavior even further.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/4] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task
2013-01-24 10:08 ` Ingo Molnar
@ 2013-01-24 15:16 ` Alex Shi
2013-01-25 1:03 ` Alex Shi
0 siblings, 1 reply; 8+ messages in thread
From: Alex Shi @ 2013-01-24 15:16 UTC (permalink / raw)
To: Ingo Molnar
Cc: mingo, peterz, tglx, akpm, arjan, bp, pjt, namhyung, efault,
vincent.guittot, gregkh, preeti, viresh.kumar, linux-kernel
On 01/24/2013 06:08 PM, Ingo Molnar wrote:
>
> * Alex Shi <alex.shi@intel.com> wrote:
>
>> @@ -2539,7 +2539,11 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
>> void update_idle_cpu_load(struct rq *this_rq)
>> {
>> unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
>> +#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)
>> + unsigned long load = (unsigned long)this_rq->cfs.runnable_load_avg;
>> +#else
>> unsigned long load = this_rq->load.weight;
>> +#endif
>
> I'd not make it conditional - just calculate runnable_load_avg
> all the time (even if group scheduling is disabled) and use it
> consistently. The last thing we want is to bifurcate scheduler
> balancer behavior even further.
Very glad to see you being back, Ingo! :)
This patch set is following my power aware scheduling patchset. But for
a separate workable runnable load engaged balancing. only needs the
other 3 patches, that already sent you at another patchset
[patch v4 06/18] sched: give initial value for runnable avg of sched
[patch v4 07/18] sched: set initial load avg of new forked task
[patch v4 08/18] Revert "sched: Introduce temporary FAIR_GROUP_SCHED
>
> Thanks,
>
> Ingo
>
--
Thanks
Alex
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/4] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task
2013-01-24 15:16 ` Alex Shi
@ 2013-01-25 1:03 ` Alex Shi
0 siblings, 0 replies; 8+ messages in thread
From: Alex Shi @ 2013-01-25 1:03 UTC (permalink / raw)
To: Alex Shi
Cc: Ingo Molnar, mingo, peterz, tglx, akpm, arjan, bp, pjt, namhyung,
efault, vincent.guittot, gregkh, preeti, viresh.kumar,
linux-kernel
On Thu, Jan 24, 2013 at 11:16 PM, Alex Shi <alex.shi@intel.com> wrote:
> On 01/24/2013 06:08 PM, Ingo Molnar wrote:
>>
>> * Alex Shi <alex.shi@intel.com> wrote:
>>
>>> @@ -2539,7 +2539,11 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
>>> void update_idle_cpu_load(struct rq *this_rq)
>>> {
>>> unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
>>> +#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)
>>> + unsigned long load = (unsigned long)this_rq->cfs.runnable_load_avg;
>>> +#else
>>> unsigned long load = this_rq->load.weight;
>>> +#endif
>>
>> I'd not make it conditional - just calculate runnable_load_avg
>> all the time (even if group scheduling is disabled) and use it
>> consistently. The last thing we want is to bifurcate scheduler
>> balancer behavior even further.
>
> Very glad to see you being back, Ingo! :)
>
> This patch set is following my power aware scheduling patchset. But for
> a separate workable runnable load engaged balancing. only needs the
> other 3 patches, that already sent you at another patchset
>
> [patch v4 06/18] sched: give initial value for runnable avg of sched
> [patch v4 07/18] sched: set initial load avg of new forked task
> [patch v4 08/18] Revert "sched: Introduce temporary FAIR_GROUP_SCHED
You's right, Ingo! the last revert patch missed above 2 points.
I will resend new patches with full version.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2013-01-25 1:03 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-24 3:30 [ RFC patch 0/4]: use runnable load avg in cfs balance instead of instant load Alex Shi
2013-01-24 3:30 ` [PATCH 1/4] sched: update cpu load after task_tick Alex Shi
2013-01-24 3:30 ` [PATCH 2/4] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task Alex Shi
2013-01-24 10:08 ` Ingo Molnar
2013-01-24 15:16 ` Alex Shi
2013-01-25 1:03 ` Alex Shi
2013-01-24 3:30 ` [PATCH 3/4] sched: consider runnable load average in move_tasks Alex Shi
2013-01-24 3:30 ` [PATCH 4/4] sched: consider runnable load average in effective_load Alex Shi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.