[ RFC patch 0/4]: use runnable load avg in cfs balance instead of instant load

All of lore.kernel.org
 help / color / mirror / Atom feed

* [ RFC patch 0/4]: use runnable load avg in cfs balance instead of instant load
@ 2013-01-24  3:30 Alex Shi
  2013-01-24  3:30 ` [PATCH 1/4] sched: update cpu load after task_tick Alex Shi
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Alex Shi @ 2013-01-24  3:30 UTC (permalink / raw)
  To: mingo, peterz, tglx, akpm, arjan, bp, pjt, namhyung, efault
  Cc: vincent.guittot, gregkh, preeti, viresh.kumar, linux-kernel, alex.shi

This patchset can be used, but causes burst waking benchmark aim9 drop 5~7%
on my 2 sockets machine. The reason is too light runnable load in early stage
of waked tasks cause imbalance in balancing.

So, it is immature and just a reference for guys who want to go gurther.

Thanks!
Alex

[PATCH 1/4] sched: update cpu load after task_tick.
[PATCH 2/4] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task
[PATCH 3/4] sched: consider runnable load average in move_tasks
[PATCH 4/4] sched: consider runnable load average in effective_load

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/4] sched: update cpu load after task_tick.
  2013-01-24  3:30 [ RFC patch 0/4]: use runnable load avg in cfs balance instead of instant load Alex Shi
@ 2013-01-24  3:30 ` Alex Shi
  2013-01-24  3:30 ` [PATCH 2/4] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task Alex Shi
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Alex Shi @ 2013-01-24  3:30 UTC (permalink / raw)
  To: mingo, peterz, tglx, akpm, arjan, bp, pjt, namhyung, efault
  Cc: vincent.guittot, gregkh, preeti, viresh.kumar, linux-kernel, alex.shi

To get the latest runnable info, we need do this cpuload update after
task_tick.

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 kernel/sched/core.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index dbab4b3..4f4714e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2695,8 +2695,8 @@ void scheduler_tick(void)
 
 	raw_spin_lock(&rq->lock);
 	update_rq_clock(rq);
-	update_cpu_load_active(rq);
 	curr->sched_class->task_tick(rq, curr, 0);
+	update_cpu_load_active(rq);
 	raw_spin_unlock(&rq->lock);
 
 	perf_event_task_tick();
-- 
1.7.0.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/4] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task
  2013-01-24  3:30 [ RFC patch 0/4]: use runnable load avg in cfs balance instead of instant load Alex Shi
  2013-01-24  3:30 ` [PATCH 1/4] sched: update cpu load after task_tick Alex Shi
@ 2013-01-24  3:30 ` Alex Shi
  2013-01-24 10:08   ` Ingo Molnar
  2013-01-24  3:30 ` [PATCH 3/4] sched: consider runnable load average in move_tasks Alex Shi
  2013-01-24  3:30 ` [PATCH 4/4] sched: consider runnable load average in effective_load Alex Shi
  3 siblings, 1 reply; 8+ messages in thread
From: Alex Shi @ 2013-01-24  3:30 UTC (permalink / raw)
  To: mingo, peterz, tglx, akpm, arjan, bp, pjt, namhyung, efault
  Cc: vincent.guittot, gregkh, preeti, viresh.kumar, linux-kernel, alex.shi

They are the base values in load balance, update them with rq runnable
load average, then the load balance will consider runnable load avg
naturally.

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 kernel/sched/core.c |    8 ++++++++
 kernel/sched/fair.c |    4 ++--
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4f4714e..f25b8d8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2539,7 +2539,11 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
 void update_idle_cpu_load(struct rq *this_rq)
 {
 	unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
+#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)
+	unsigned long load = (unsigned long)this_rq->cfs.runnable_load_avg;
+#else
 	unsigned long load = this_rq->load.weight;
+#endif
 	unsigned long pending_updates;
 
 	/*
@@ -2589,7 +2593,11 @@ static void update_cpu_load_active(struct rq *this_rq)
 	 * See the mess around update_idle_cpu_load() / update_cpu_load_nohz().
 	 */
 	this_rq->last_load_update_tick = jiffies;
+#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)
+	__update_cpu_load(this_rq, this_rq->cfs.runnable_load_avg, 1);
+#else
 	__update_cpu_load(this_rq, this_rq->load.weight, 1);
+#endif
 
 	calc_load_account_active(this_rq);
 }
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7697171..9148bcc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2909,7 +2909,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 /* Used instead of source_load when we know the type == 0 */
 static unsigned long weighted_cpuload(const int cpu)
 {
-	return cpu_rq(cpu)->load.weight;
+	return (unsigned long)cpu_rq(cpu)->cfs.runnable_load_avg;
 }
 
 /*
@@ -2956,7 +2956,7 @@ static unsigned long cpu_avg_load_per_task(int cpu)
 	unsigned long nr_running = ACCESS_ONCE(rq->nr_running);
 
 	if (nr_running)
-		return rq->load.weight / nr_running;
+		return (unsigned long)rq->cfs.runnable_load_avg / nr_running;
 
 	return 0;
 }
-- 
1.7.0.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/4] sched: consider runnable load average in move_tasks
  2013-01-24  3:30 [ RFC patch 0/4]: use runnable load avg in cfs balance instead of instant load Alex Shi
  2013-01-24  3:30 ` [PATCH 1/4] sched: update cpu load after task_tick Alex Shi
  2013-01-24  3:30 ` [PATCH 2/4] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task Alex Shi
@ 2013-01-24  3:30 ` Alex Shi
  2013-01-24  3:30 ` [PATCH 4/4] sched: consider runnable load average in effective_load Alex Shi
  3 siblings, 0 replies; 8+ messages in thread
From: Alex Shi @ 2013-01-24  3:30 UTC (permalink / raw)
  To: mingo, peterz, tglx, akpm, arjan, bp, pjt, namhyung, efault
  Cc: vincent.guittot, gregkh, preeti, viresh.kumar, linux-kernel, alex.shi

Except using runnable load average in background, move_tasks is also
the key functions in load balance. We need consider the runnable load
average in it in order to the apple to apple load comparison.

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 kernel/sched/fair.c |   11 ++++++++++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9148bcc..876a6a0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4192,6 +4192,15 @@ static unsigned long task_h_load(struct task_struct *p);
 
 static const unsigned int sched_nr_migrate_break = 32;
 
+static unsigned long task_h_load_avg(struct task_struct *p)
+{
+	u32 period = p->se.avg.runnable_avg_period;
+	if (!period)
+		return 0;
+
+	return task_h_load(p) * p->se.avg.runnable_avg_sum / period;
+}
+
 /*
  * move_tasks tries to move up to imbalance weighted load from busiest to
  * this_rq, as part of a balancing operation within domain "sd".
@@ -4227,7 +4236,7 @@ static int move_tasks(struct lb_env *env)
 		if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
 			goto next;
 
-		load = task_h_load(p);
+		load = task_h_load_avg(p);
 
 		if (sched_feat(LB_MIN) && load < 16 && !env->sd->nr_balance_failed)
 			goto next;
-- 
1.7.0.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/4] sched: consider runnable load average in effective_load
  2013-01-24  3:30 [ RFC patch 0/4]: use runnable load avg in cfs balance instead of instant load Alex Shi
                   ` (2 preceding siblings ...)
  2013-01-24  3:30 ` [PATCH 3/4] sched: consider runnable load average in move_tasks Alex Shi
@ 2013-01-24  3:30 ` Alex Shi
  3 siblings, 0 replies; 8+ messages in thread
From: Alex Shi @ 2013-01-24  3:30 UTC (permalink / raw)
  To: mingo, peterz, tglx, akpm, arjan, bp, pjt, namhyung, efault
  Cc: vincent.guittot, gregkh, preeti, viresh.kumar, linux-kernel, alex.shi

effective_load calculates the load change as seen from the
root_task_group. It needs to engage the runnable average
of changed task.

Thanks for Morten Rasmussen's reminder of this.

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 kernel/sched/fair.c |   27 ++++++++++++++++++++-------
 1 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 876a6a0..76251be 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2985,7 +2985,8 @@ static void task_waking_fair(struct task_struct *p)
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
 /*
- * effective_load() calculates the load change as seen from the root_task_group
+ * effective_load() calculates the runnable load average change as seen from
+ * the root_task_group
  *
  * Adding load to a group doesn't make a group heavier, but can cause movement
  * of group shares between cpus. Assuming the shares were perfectly aligned one
@@ -3033,6 +3034,9 @@ static void task_waking_fair(struct task_struct *p)
  * Therefore the effective change in loads on CPU 0 would be 5/56 (3/8 - 2/7)
  * times the weight of the group. The effect on CPU 1 would be -4/56 (4/8 -
  * 4/7) times the weight of the group.
+ *
+ * After get effective_load of the load moving, will engaged the sched entity's
+ * runnable avg.
  */
 static long effective_load(struct task_group *tg, int cpu, long wl, long wg)
 {
@@ -3107,6 +3111,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
 	struct task_group *tg;
 	unsigned long weight;
 	int balanced;
+	int runnable_avg;
 
 	idx	  = sd->wake_idx;
 	this_cpu  = smp_processor_id();
@@ -3122,13 +3127,19 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
 	if (sync) {
 		tg = task_group(current);
 		weight = current->se.load.weight;
+		runnable_avg = current->se.avg.runnable_avg_sum * NICE_0_LOAD
+				/ (current->se.avg.runnable_avg_period + 1);
 
-		this_load += effective_load(tg, this_cpu, -weight, -weight);
-		load += effective_load(tg, prev_cpu, 0, -weight);
+		this_load += effective_load(tg, this_cpu, -weight, -weight)
+				* runnable_avg >> NICE_0_SHIFT;
+		load += effective_load(tg, prev_cpu, 0, -weight)
+				* runnable_avg >> NICE_0_SHIFT;
 	}
 
 	tg = task_group(p);
 	weight = p->se.load.weight;
+	runnable_avg = p->se.avg.runnable_avg_sum * NICE_0_LOAD
+				/ (p->se.avg.runnable_avg_period + 1);
 
 	/*
 	 * In low-load situations, where prev_cpu is idle and this_cpu is idle
@@ -3140,16 +3151,18 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
 	 * task to be woken on this_cpu.
 	 */
 	if (this_load > 0) {
-		s64 this_eff_load, prev_eff_load;
+		s64 this_eff_load, prev_eff_load, tmp_eff_load;
 
 		this_eff_load = 100;
 		this_eff_load *= power_of(prev_cpu);
-		this_eff_load *= this_load +
-			effective_load(tg, this_cpu, weight, weight);
+		tmp_eff_load = effective_load(tg, this_cpu, weight, weight)
+				* runnable_avg >> NICE_0_SHIFT;
+		this_eff_load *= this_load + tmp_eff_load;
 
 		prev_eff_load = 100 + (sd->imbalance_pct - 100) / 2;
 		prev_eff_load *= power_of(this_cpu);
-		prev_eff_load *= load + effective_load(tg, prev_cpu, 0, weight);
+		prev_eff_load *= load + (effective_load(tg, prev_cpu, 0, weight)
+						* runnable_avg >> NICE_0_SHIFT);
 
 		balanced = this_eff_load <= prev_eff_load;
 	} else
-- 
1.7.0.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/4] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task
  2013-01-24  3:30 ` [PATCH 2/4] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task Alex Shi
@ 2013-01-24 10:08   ` Ingo Molnar
  2013-01-24 15:16     ` Alex Shi
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2013-01-24 10:08 UTC (permalink / raw)
  To: Alex Shi
  Cc: mingo, peterz, tglx, akpm, arjan, bp, pjt, namhyung, efault,
	vincent.guittot, gregkh, preeti, viresh.kumar, linux-kernel


* Alex Shi <alex.shi@intel.com> wrote:

> @@ -2539,7 +2539,11 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
>  void update_idle_cpu_load(struct rq *this_rq)
>  {
>  	unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
> +#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)
> +	unsigned long load = (unsigned long)this_rq->cfs.runnable_load_avg;
> +#else
>  	unsigned long load = this_rq->load.weight;
> +#endif

I'd not make it conditional - just calculate runnable_load_avg 
all the time (even if group scheduling is disabled) and use it 
consistently. The last thing we want is to bifurcate scheduler 
balancer behavior even further.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/4] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task
  2013-01-24 10:08   ` Ingo Molnar
@ 2013-01-24 15:16     ` Alex Shi
  2013-01-25  1:03       ` Alex Shi
  0 siblings, 1 reply; 8+ messages in thread
From: Alex Shi @ 2013-01-24 15:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: mingo, peterz, tglx, akpm, arjan, bp, pjt, namhyung, efault,
	vincent.guittot, gregkh, preeti, viresh.kumar, linux-kernel

On 01/24/2013 06:08 PM, Ingo Molnar wrote:
> 
> * Alex Shi <alex.shi@intel.com> wrote:
> 
>> @@ -2539,7 +2539,11 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
>>  void update_idle_cpu_load(struct rq *this_rq)
>>  {
>>  	unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
>> +#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)
>> +	unsigned long load = (unsigned long)this_rq->cfs.runnable_load_avg;
>> +#else
>>  	unsigned long load = this_rq->load.weight;
>> +#endif
> 
> I'd not make it conditional - just calculate runnable_load_avg 
> all the time (even if group scheduling is disabled) and use it 
> consistently. The last thing we want is to bifurcate scheduler 
> balancer behavior even further.

Very glad to see you being back, Ingo! :)

This patch set is following my power aware scheduling patchset. But for
a separate workable runnable load engaged balancing. only needs the
other 3 patches, that already sent you at another patchset

[patch v4 06/18] sched: give initial value for runnable avg of sched
[patch v4 07/18] sched: set initial load avg of new forked task
[patch v4 08/18] Revert "sched: Introduce temporary FAIR_GROUP_SCHED


> 
> Thanks,
> 
> 	Ingo
> 


-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/4] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task
  2013-01-24 15:16     ` Alex Shi
@ 2013-01-25  1:03       ` Alex Shi
  0 siblings, 0 replies; 8+ messages in thread
From: Alex Shi @ 2013-01-25  1:03 UTC (permalink / raw)
  To: Alex Shi
  Cc: Ingo Molnar, mingo, peterz, tglx, akpm, arjan, bp, pjt, namhyung,
	efault, vincent.guittot, gregkh, preeti, viresh.kumar,
	linux-kernel

On Thu, Jan 24, 2013 at 11:16 PM, Alex Shi <alex.shi@intel.com> wrote:
> On 01/24/2013 06:08 PM, Ingo Molnar wrote:
>>
>> * Alex Shi <alex.shi@intel.com> wrote:
>>
>>> @@ -2539,7 +2539,11 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
>>>  void update_idle_cpu_load(struct rq *this_rq)
>>>  {
>>>      unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
>>> +#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)
>>> +    unsigned long load = (unsigned long)this_rq->cfs.runnable_load_avg;
>>> +#else
>>>      unsigned long load = this_rq->load.weight;
>>> +#endif
>>
>> I'd not make it conditional - just calculate runnable_load_avg
>> all the time (even if group scheduling is disabled) and use it
>> consistently. The last thing we want is to bifurcate scheduler
>> balancer behavior even further.
>
> Very glad to see you being back, Ingo! :)
>
> This patch set is following my power aware scheduling patchset. But for
> a separate workable runnable load engaged balancing. only needs the
> other 3 patches, that already sent you at another patchset
>
> [patch v4 06/18] sched: give initial value for runnable avg of sched
> [patch v4 07/18] sched: set initial load avg of new forked task
> [patch v4 08/18] Revert "sched: Introduce temporary FAIR_GROUP_SCHED

You's right, Ingo! the last revert patch missed above 2 points.
I will resend new patches with full version.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-01-25  1:03 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-24  3:30 [ RFC patch 0/4]: use runnable load avg in cfs balance instead of instant load Alex Shi
2013-01-24  3:30 ` [PATCH 1/4] sched: update cpu load after task_tick Alex Shi
2013-01-24  3:30 ` [PATCH 2/4] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task Alex Shi
2013-01-24 10:08   ` Ingo Molnar
2013-01-24 15:16     ` Alex Shi
2013-01-25  1:03       ` Alex Shi
2013-01-24  3:30 ` [PATCH 3/4] sched: consider runnable load average in move_tasks Alex Shi
2013-01-24  3:30 ` [PATCH 4/4] sched: consider runnable load average in effective_load Alex Shi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.