All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] enable runnable load avg in load balance
@ 2012-11-17 13:04 Alex Shi
  2012-11-17 13:04 ` [RFC PATCH 1/5] sched: get rq runnable load average for " Alex Shi
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Alex Shi @ 2012-11-17 13:04 UTC (permalink / raw)
  To: mingo, peterz, pjt, preeti, vincent.guittot; +Cc: linux-kernel

This patchset try to consider runnable load avg when do cpu load comparison
in load balance.

I had seen preeti's enabling before patch finished, but I still think considing
runnable load avg on rq is may a more natrual way.

BTW, I am thinking if 2 times decay for cpu_load is too complicate? one for
runnable time, another for CPU_LOAD_IDX. I think I missed the decay reason
for CPU_LOAD_IDX. Could anyone like do me favor to give some hints of this?

Best Regards!
Alex 

[RFC PATCH 1/5] sched: get rq runnable load average for load balance
[RFC PATCH 2/5] sched: update rq runnable load average in time
[RFC PATCH 3/5] sched: using runnable load avg in cpu_load and
[RFC PATCH 4/5] sched: consider runnable load average in wake_affine
[RFC PATCH 5/5] sched: revert 'Introduce temporary FAIR_GROUP_SCHED

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC PATCH 1/5] sched: get rq runnable load average for load balance
  2012-11-17 13:04 [RFC PATCH 0/5] enable runnable load avg in load balance Alex Shi
@ 2012-11-17 13:04 ` Alex Shi
  2012-11-17 13:04 ` [RFC PATCH 2/5] sched: update rq runnable load average in time Alex Shi
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Alex Shi @ 2012-11-17 13:04 UTC (permalink / raw)
  To: mingo, peterz, pjt, preeti, vincent.guittot; +Cc: linux-kernel

In load balance, rq load weight is the core of balance judgement.
Now it's time to consider the PJT's runnable load tracking in load
balance.

Since we already have rq runnable_avg_sum and rq load weight,
the rq runnable load average is easy to get:
	runnable_load(rq) = runnable_avg(rq) * weight(rq)

then reuse rq->avg.load_avg_contrib to store the value.

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 kernel/sched/debug.c |    1 +
 kernel/sched/fair.c  |   20 ++++++++++++++++----
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 2cd3c1b..1cd5639 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -71,6 +71,7 @@ static void print_cfs_group_stats(struct seq_file *m, int cpu, struct task_group
 		struct sched_avg *avg = &cpu_rq(cpu)->avg;
 		P(avg->runnable_avg_sum);
 		P(avg->runnable_avg_period);
+		P(avg->load_avg_contrib);
 		return;
 	}
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a624d3b..bc60e43 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1439,14 +1439,25 @@ static inline void __update_tg_runnable_avg(struct sched_avg *sa,
 static inline void __update_group_entity_contrib(struct sched_entity *se) {}
 #endif
 
-static inline void __update_task_entity_contrib(struct sched_entity *se)
+static inline void __update_load_avg_contrib(struct sched_avg *sa,
+						struct load_weight *load)
 {
 	u32 contrib;
 
 	/* avoid overflowing a 32-bit type w/ SCHED_LOAD_SCALE */
-	contrib = se->avg.runnable_avg_sum * scale_load_down(se->load.weight);
-	contrib /= (se->avg.runnable_avg_period + 1);
-	se->avg.load_avg_contrib = scale_load(contrib);
+	contrib = sa->runnable_avg_sum * scale_load_down(load->weight);
+	contrib /= (sa->runnable_avg_period + 1);
+	sa->load_avg_contrib = scale_load(contrib);
+}
+
+static inline void __update_task_entity_contrib(struct sched_entity *se)
+{
+	__update_load_avg_contrib(&se->avg, &se->load);
+}
+
+static inline void __update_rq_load_contrib(struct rq *rq)
+{
+	__update_load_avg_contrib(&rq->avg, &rq->load);
 }
 
 /* Compute the current contribution to load_avg by se, return any delta */
@@ -1539,6 +1550,7 @@ static inline void update_rq_runnable_avg(struct rq *rq, int runnable)
 {
 	__update_entity_runnable_avg(rq->clock_task, &rq->avg, runnable);
 	__update_tg_runnable_avg(&rq->avg, &rq->cfs);
+	__update_rq_load_contrib(rq);
 }
 
 /* Add the load generated by se into cfs_rq's child load-average */
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH 2/5] sched: update rq runnable load average in time
  2012-11-17 13:04 [RFC PATCH 0/5] enable runnable load avg in load balance Alex Shi
  2012-11-17 13:04 ` [RFC PATCH 1/5] sched: get rq runnable load average for " Alex Shi
@ 2012-11-17 13:04 ` Alex Shi
  2012-11-17 13:04 ` [RFC PATCH 3/5] sched: using runnable load avg in cpu_load and cpu_avg_load_per_task Alex Shi
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Alex Shi @ 2012-11-17 13:04 UTC (permalink / raw)
  To: mingo, peterz, pjt, preeti, vincent.guittot; +Cc: linux-kernel

Now we have rq runnable load average value, and prepare to use it in rq
cpu_load[] updating.

So we want to make sure rq cup_load[] updating is using the latest
data. The update_cpu_load_active(rq) was put after task_tick() is for
this purpose.

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 kernel/sched/core.c |    2 +-
 kernel/sched/fair.c |    1 +
 2 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9dbbe45..bacfee0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2657,8 +2657,8 @@ void scheduler_tick(void)
 
 	raw_spin_lock(&rq->lock);
 	update_rq_clock(rq);
-	update_cpu_load_active(rq);
 	curr->sched_class->task_tick(rq, curr, 0);
+	update_cpu_load_active(rq);
 	raw_spin_unlock(&rq->lock);
 
 	perf_event_task_tick();
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bc60e43..44c07ed 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6011,6 +6011,7 @@ static void nohz_idle_balance(int this_cpu, enum cpu_idle_type idle)
 
 		raw_spin_lock_irq(&rq->lock);
 		update_rq_clock(rq);
+		update_rq_runnable_avg(rq, rq->nr_running);
 		update_idle_cpu_load(rq);
 		raw_spin_unlock_irq(&rq->lock);
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH 3/5] sched: using runnable load avg in cpu_load and cpu_avg_load_per_task
  2012-11-17 13:04 [RFC PATCH 0/5] enable runnable load avg in load balance Alex Shi
  2012-11-17 13:04 ` [RFC PATCH 1/5] sched: get rq runnable load average for " Alex Shi
  2012-11-17 13:04 ` [RFC PATCH 2/5] sched: update rq runnable load average in time Alex Shi
@ 2012-11-17 13:04 ` Alex Shi
  2012-11-17 13:04 ` [RFC PATCH 4/5] sched: consider runnable load average in wake_affine and move_tasks Alex Shi
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Alex Shi @ 2012-11-17 13:04 UTC (permalink / raw)
  To: mingo, peterz, pjt, preeti, vincent.guittot; +Cc: linux-kernel

They are the base values in load balance, update them with rq runnable
load average, then the load balance will consider runnable load avg
naturally.

The base idea of runnable load avg usage is just cosider the runnable
load coefficient in direct load balance process, like load comparison
between cpus.

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 kernel/sched/core.c |    4 ++--
 kernel/sched/fair.c |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bacfee0..ee6d765 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2501,7 +2501,7 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
 void update_idle_cpu_load(struct rq *this_rq)
 {
 	unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
-	unsigned long load = this_rq->load.weight;
+	unsigned long load = this_rq->avg.load_avg_contrib;
 	unsigned long pending_updates;
 
 	/*
@@ -2551,7 +2551,7 @@ static void update_cpu_load_active(struct rq *this_rq)
 	 * See the mess around update_idle_cpu_load() / update_cpu_load_nohz().
 	 */
 	this_rq->last_load_update_tick = jiffies;
-	__update_cpu_load(this_rq, this_rq->load.weight, 1);
+	__update_cpu_load(this_rq, this_rq->avg.load_avg_contrib, 1);
 
 	calc_load_account_active(this_rq);
 }
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 44c07ed..f918919 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2950,7 +2950,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 /* Used instead of source_load when we know the type == 0 */
 static unsigned long weighted_cpuload(const int cpu)
 {
-	return cpu_rq(cpu)->load.weight;
+	return cpu_rq(cpu)->avg.load_avg_contrib;
 }
 
 /*
@@ -2997,7 +2997,7 @@ static unsigned long cpu_avg_load_per_task(int cpu)
 	unsigned long nr_running = ACCESS_ONCE(rq->nr_running);
 
 	if (nr_running)
-		return rq->load.weight / nr_running;
+		return rq->avg.load_avg_contrib / nr_running;
 
 	return 0;
 }
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH 4/5] sched: consider runnable load average in wake_affine and move_tasks
  2012-11-17 13:04 [RFC PATCH 0/5] enable runnable load avg in load balance Alex Shi
                   ` (2 preceding siblings ...)
  2012-11-17 13:04 ` [RFC PATCH 3/5] sched: using runnable load avg in cpu_load and cpu_avg_load_per_task Alex Shi
@ 2012-11-17 13:04 ` Alex Shi
  2012-11-17 18:09   ` Preeti U Murthy
  2012-11-17 13:04 ` [RFC PATCH 5/5] sched: revert 'Introduce temporary FAIR_GROUP_SCHED dependency ...' Alex Shi
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Alex Shi @ 2012-11-17 13:04 UTC (permalink / raw)
  To: mingo, peterz, pjt, preeti, vincent.guittot; +Cc: linux-kernel

Except using runnable load average in background, wake_affine and
move_tasks is also the key functions in load balance. We need consider
the runnable load average in them in order to the apple to apple load
comparison in load balance.

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 kernel/sched/fair.c |   16 ++++++++++------
 1 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f918919..7064a13 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3164,8 +3164,10 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
 		tg = task_group(current);
 		weight = current->se.load.weight;
 
-		this_load += effective_load(tg, this_cpu, -weight, -weight);
-		load += effective_load(tg, prev_cpu, 0, -weight);
+		this_load += effective_load(tg, this_cpu, -weight, -weight)
+				* cpu_rq(this_cpu)->avg.load_avg_contrib;
+		load += effective_load(tg, prev_cpu, 0, -weight)
+				* cpu_rq(prev_cpu)->avg.load_avg_contrib;
 	}
 
 	tg = task_group(p);
@@ -3185,12 +3187,14 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
 
 		this_eff_load = 100;
 		this_eff_load *= power_of(prev_cpu);
-		this_eff_load *= this_load +
-			effective_load(tg, this_cpu, weight, weight);
+		this_eff_load *= (this_load +
+			effective_load(tg, this_cpu, weight, weight))
+				* cpu_rq(this_cpu)->avg.load_avg_contrib;
 
 		prev_eff_load = 100 + (sd->imbalance_pct - 100) / 2;
 		prev_eff_load *= power_of(this_cpu);
-		prev_eff_load *= load + effective_load(tg, prev_cpu, 0, weight);
+		prev_eff_load *= (load + effective_load(tg, prev_cpu, 0, weight))
+				* cpu_rq(prev_cpu)->avg.load_avg_contrib;
 
 		balanced = this_eff_load <= prev_eff_load;
 	} else
@@ -4229,7 +4233,7 @@ static int move_tasks(struct lb_env *env)
 		if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
 			goto next;
 
-		load = task_h_load(p);
+		load = task_h_load(p) * p->se.avg.load_avg_contrib;
 
 		if (sched_feat(LB_MIN) && load < 16 && !env->failed)
 			goto next;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH 5/5] sched: revert 'Introduce temporary FAIR_GROUP_SCHED dependency ...'
  2012-11-17 13:04 [RFC PATCH 0/5] enable runnable load avg in load balance Alex Shi
                   ` (3 preceding siblings ...)
  2012-11-17 13:04 ` [RFC PATCH 4/5] sched: consider runnable load average in wake_affine and move_tasks Alex Shi
@ 2012-11-17 13:04 ` Alex Shi
  2012-11-17 13:49 ` [RFC PATCH 0/5] enable runnable load avg in load balance Ricardo Nabinger Sanchez
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Alex Shi @ 2012-11-17 13:04 UTC (permalink / raw)
  To: mingo, peterz, pjt, preeti, vincent.guittot; +Cc: linux-kernel

revert commit f4e26b120b9de84cb627b for load-tracking patchset using in
kernel.

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 include/linux/sched.h |    8 +-------
 kernel/sched/core.c   |    7 +------
 kernel/sched/fair.c   |   12 +-----------
 kernel/sched/sched.h  |    9 +--------
 4 files changed, 4 insertions(+), 32 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8f65323..4ce885a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1169,13 +1169,7 @@ struct sched_entity {
 	/* rq "owned" by this entity/group: */
 	struct cfs_rq		*my_q;
 #endif
-/*
- * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be
- * removed when useful for applications beyond shares distribution (e.g.
- * load-balance).
- */
-#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)
-	/* Per-entity load-tracking */
+#ifdef CONFIG_SMP
 	struct sched_avg	avg;
 #endif
 };
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ee6d765..9f9615d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1526,12 +1526,7 @@ static void __sched_fork(struct task_struct *p)
 	p->se.vruntime			= 0;
 	INIT_LIST_HEAD(&p->se.group_node);
 
-/*
- * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be
- * removed when useful for applications beyond shares distribution (e.g.
- * load-balance).
- */
-#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)
+#ifdef CONFIG_SMP
 	p->se.avg.runnable_avg_period = 0;
 	p->se.avg.runnable_avg_sum = 0;
 #endif
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7064a13..3f7f732 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1149,8 +1149,7 @@ static inline void update_cfs_shares(struct cfs_rq *cfs_rq)
 }
 #endif /* CONFIG_FAIR_GROUP_SCHED */
 
-/* Only depends on SMP, FAIR_GROUP_SCHED may be removed when useful in lb */
-#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)
+#ifdef CONFIG_SMP
 /*
  * We choose a half-life close to 1 scheduling period.
  * Note: The tables below are dependent on this value.
@@ -3457,7 +3456,6 @@ unlock:
 	return new_cpu;
 }
 
-#ifdef CONFIG_FAIR_GROUP_SCHED
 static void migrate_task_rq_entity(struct task_struct *p, int next_cpu)
 {
 	struct sched_entity *se = &p->se;
@@ -3474,16 +3472,8 @@ static void migrate_task_rq_entity(struct task_struct *p, int next_cpu)
 		atomic64_add(se->avg.load_avg_contrib, &cfs_rq->removed_load);
 	}
 }
-#else
-static void migrate_task_rq_entity(struct task_struct *p, int next_cpu) { }
-#endif
 
 /*
- * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be
- * removed when useful for applications beyond shares distribution (e.g.
- * load-balance).
- */
-/*
  * Called immediately before a task is migrated to a new cpu; task_cpu(p) and
  * cfs_rq_of(p) references at time of call are still valid and identify the
  * previous cpu.  However, the caller only guarantees p->pi_lock is held; no
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index bb9475c..3a4a8d6 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -226,12 +226,6 @@ struct cfs_rq {
 #endif
 
 #ifdef CONFIG_SMP
-/*
- * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be
- * removed when useful for applications beyond shares distribution (e.g.
- * load-balance).
- */
-#ifdef CONFIG_FAIR_GROUP_SCHED
 	/*
 	 * CFS Load tracking
 	 * Under CFS, load is tracked on a per-entity basis and aggregated up.
@@ -241,8 +235,7 @@ struct cfs_rq {
 	u64 runnable_load_avg, blocked_load_avg;
 	atomic64_t decay_counter, removed_load;
 	u64 last_decay;
-#endif /* CONFIG_FAIR_GROUP_SCHED */
-/* These always depend on CONFIG_FAIR_GROUP_SCHED */
+
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	u32 tg_runnable_contrib;
 	u64 tg_load_contrib;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/5] enable runnable load avg in load balance
  2012-11-17 13:04 [RFC PATCH 0/5] enable runnable load avg in load balance Alex Shi
                   ` (4 preceding siblings ...)
  2012-11-17 13:04 ` [RFC PATCH 5/5] sched: revert 'Introduce temporary FAIR_GROUP_SCHED dependency ...' Alex Shi
@ 2012-11-17 13:49 ` Ricardo Nabinger Sanchez
  2012-11-17 19:12 ` Preeti U Murthy
  2012-11-26 19:03 ` Benjamin Segall
  7 siblings, 0 replies; 19+ messages in thread
From: Ricardo Nabinger Sanchez @ 2012-11-17 13:49 UTC (permalink / raw)
  To: linux-kernel

On Sat, 17 Nov 2012 21:04:12 +0800, Alex Shi wrote:

> This patchset try to consider runnable load avg when do cpu load
> comparison in load balance.

I found the wording in the commit messages (pretty much all of them, 
including the introductory message) rather confusing, especially patch 
4/5.


-- 
Ricardo Nabinger Sanchez           http://rnsanchez.wait4.org/
  "Left to themselves, things tend to go from bad to worse."


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 4/5] sched: consider runnable load average in wake_affine and move_tasks
  2012-11-17 13:04 ` [RFC PATCH 4/5] sched: consider runnable load average in wake_affine and move_tasks Alex Shi
@ 2012-11-17 18:09   ` Preeti U Murthy
  2012-11-18  9:36     ` Alex Shi
  0 siblings, 1 reply; 19+ messages in thread
From: Preeti U Murthy @ 2012-11-17 18:09 UTC (permalink / raw)
  To: Alex Shi; +Cc: mingo, peterz, pjt, vincent.guittot, linux-kernel

Hi Alex,

On 11/17/2012 06:34 PM, Alex Shi wrote:
> Except using runnable load average in background, wake_affine and
> move_tasks is also the key functions in load balance. We need consider
> the runnable load average in them in order to the apple to apple load
> comparison in load balance.
> 
> Signed-off-by: Alex Shi <alex.shi@intel.com>
> ---
>  kernel/sched/fair.c |   16 ++++++++++------
>  1 files changed, 10 insertions(+), 6 deletions(-)
> 
> @@ -4229,7 +4233,7 @@ static int move_tasks(struct lb_env *env)
>  		if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
>  			goto next;
> 
> -		load = task_h_load(p);
> +		load = task_h_load(p) * p->se.avg.load_avg_contrib;
Shouldn't the above be just load = p->se.avg.load_avg_contrib? This
metric already has considered p->se.load.weight.task_h_load(p) returns
the same.
> 
>  		if (sched_feat(LB_MIN) && load < 16 && !env->failed)
>  			goto next;
> 
Regards
Preeti U Murthy


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/5] enable runnable load avg in load balance
  2012-11-17 13:04 [RFC PATCH 0/5] enable runnable load avg in load balance Alex Shi
                   ` (5 preceding siblings ...)
  2012-11-17 13:49 ` [RFC PATCH 0/5] enable runnable load avg in load balance Ricardo Nabinger Sanchez
@ 2012-11-17 19:12 ` Preeti U Murthy
  2012-11-18  8:35   ` Alex Shi
  2012-11-26 19:03 ` Benjamin Segall
  7 siblings, 1 reply; 19+ messages in thread
From: Preeti U Murthy @ 2012-11-17 19:12 UTC (permalink / raw)
  To: Alex Shi; +Cc: mingo, peterz, pjt, vincent.guittot, linux-kernel

Hi Alex,

On 11/17/2012 06:34 PM, Alex Shi wrote:
> This patchset try to consider runnable load avg when do cpu load comparison
> in load balance.
> 
> I had seen preeti's enabling before patch finished, but I still think considing
> runnable load avg on rq is may a more natrual way.
> 
> BTW, I am thinking if 2 times decay for cpu_load is too complicate? one for
> runnable time, another for CPU_LOAD_IDX. I think I missed the decay reason
> for CPU_LOAD_IDX. Could anyone like do me favor to give some hints of this?

The decay happening for CPU_LOAD_IDX is *more coarse grained* than the
decay that __update_entity_runnable_avg() is performing.While
__update_cpu_load() decays the rq->load.weight *for every jiffy*(~4ms)
passed so far without update of the load,
__update_entity_runnable_avg() decays the rq->load.weight *for every
1ms* when called from update_rq_runnable_avg().

Before the introduction of PJT's series,__update_cpu_load() seems to be
the only place where decay of older rq load was happening(so as to give
the older load less importance in its relevance),but with the
introduction of PJT's series since the older rq load gets decayed in
__update_entity_runnable_avg() in a more fine grained fashion,perhaps
you are right,while the CPU_LOAD_IDX gets updated,we dont need to decay
the load once again here.
> 
> Best Regards!
> Alex 
> 
> [RFC PATCH 1/5] sched: get rq runnable load average for load balance
> [RFC PATCH 2/5] sched: update rq runnable load average in time
> [RFC PATCH 3/5] sched: using runnable load avg in cpu_load and
> [RFC PATCH 4/5] sched: consider runnable load average in wake_affine
> [RFC PATCH 5/5] sched: revert 'Introduce temporary FAIR_GROUP_SCHED
> 
Regards
Preeti U Murthy


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/5] enable runnable load avg in load balance
  2012-11-17 19:12 ` Preeti U Murthy
@ 2012-11-18  8:35   ` Alex Shi
  0 siblings, 0 replies; 19+ messages in thread
From: Alex Shi @ 2012-11-18  8:35 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: Alex Shi, mingo, peterz, pjt, vincent.guittot, linux-kernel

On Sun, Nov 18, 2012 at 3:12 AM, Preeti U Murthy
<preeti@linux.vnet.ibm.com> wrote:
> Hi Alex,
>
> On 11/17/2012 06:34 PM, Alex Shi wrote:
>> This patchset try to consider runnable load avg when do cpu load comparison
>> in load balance.
>>
>> I had seen preeti's enabling before patch finished, but I still think considing
>> runnable load avg on rq is may a more natrual way.
>>
>> BTW, I am thinking if 2 times decay for cpu_load is too complicate? one for
>> runnable time, another for CPU_LOAD_IDX. I think I missed the decay reason
>> for CPU_LOAD_IDX. Could anyone like do me favor to give some hints of this?
>
> The decay happening for CPU_LOAD_IDX is *more coarse grained* than the
> decay that __update_entity_runnable_avg() is performing.While
> __update_cpu_load() decays the rq->load.weight *for every jiffy*(~4ms)
> passed so far without update of the load,
> __update_entity_runnable_avg() decays the rq->load.weight *for every
> 1ms* when called from update_rq_runnable_avg().
>
> Before the introduction of PJT's series,__update_cpu_load() seems to be
> the only place where decay of older rq load was happening(so as to give
> the older load less importance in its relevance),but with the
> introduction of PJT's series since the older rq load gets decayed in
> __update_entity_runnable_avg() in a more fine grained fashion,perhaps
> you are right,while the CPU_LOAD_IDX gets updated,we dont need to decay
> the load once again here.


If cpu_load is just a coarse decay, we can remove it. but it has
different meaning for busy_idx, forkexec_idx, idle_idx, newidle_idx.
each of them has different degree decay. that is the key part, but I
has no idea of their value come from.

Thanks!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 4/5] sched: consider runnable load average in wake_affine and move_tasks
  2012-11-17 18:09   ` Preeti U Murthy
@ 2012-11-18  9:36     ` Alex Shi
  0 siblings, 0 replies; 19+ messages in thread
From: Alex Shi @ 2012-11-18  9:36 UTC (permalink / raw)
  To: Preeti U Murthy; +Cc: mingo, peterz, pjt, vincent.guittot, linux-kernel

On 11/18/2012 02:09 AM, Preeti U Murthy wrote:
> Hi Alex,
> 
> On 11/17/2012 06:34 PM, Alex Shi wrote:
>> Except using runnable load average in background, wake_affine and
>> move_tasks is also the key functions in load balance. We need consider
>> the runnable load average in them in order to the apple to apple load
>> comparison in load balance.
>>
>> Signed-off-by: Alex Shi <alex.shi@intel.com>
>> ---
>>  kernel/sched/fair.c |   16 ++++++++++------
>>  1 files changed, 10 insertions(+), 6 deletions(-)
>>
>> @@ -4229,7 +4233,7 @@ static int move_tasks(struct lb_env *env)
>>  		if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
>>  			goto next;
>>
>> -		load = task_h_load(p);
>> +		load = task_h_load(p) * p->se.avg.load_avg_contrib;
> Shouldn't the above be just load = p->se.avg.load_avg_contrib? This
> metric already has considered p->se.load.weight.task_h_load(p) returns
> the same.

Thanks for catching this bug!
but task_h_load(p) is clearly not same as p->se.load.weight when tg using.
So, it could be changed as:
+               load = task_h_load(p) * p->se.avg.runnable_avg_sum
+                                       / (p->se.avg.runnable_avg_period + 1);

a fixed patch is here:

----------

>From 972296706292dcb5cd2bd3c25fa15566130ba74d Mon Sep 17 00:00:00 2001
From: Alex Shi <alex.shi@intel.com>
Date: Sat, 17 Nov 2012 19:21:48 +0800
Subject: [PATCH 5/9] sched: consider runnable load average in wake_affine and
 move_tasks

Except using runnable load average in background, wake_affine and
move_tasks is also the key functions in load balance. We need consider
the runnable load average in them in order to the apple to apple load
comparison in load balance.

Thanks for Preeti caught the task_h_load bug.

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 kernel/sched/fair.c |   17 +++++++++++------
 1 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f918919..f9f1010 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3164,8 +3164,10 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
 		tg = task_group(current);
 		weight = current->se.load.weight;
 
-		this_load += effective_load(tg, this_cpu, -weight, -weight);
-		load += effective_load(tg, prev_cpu, 0, -weight);
+		this_load += effective_load(tg, this_cpu, -weight, -weight)
+				* cpu_rq(this_cpu)->avg.load_avg_contrib;
+		load += effective_load(tg, prev_cpu, 0, -weight)
+				* cpu_rq(prev_cpu)->avg.load_avg_contrib;
 	}
 
 	tg = task_group(p);
@@ -3185,12 +3187,14 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
 
 		this_eff_load = 100;
 		this_eff_load *= power_of(prev_cpu);
-		this_eff_load *= this_load +
-			effective_load(tg, this_cpu, weight, weight);
+		this_eff_load *= (this_load +
+			effective_load(tg, this_cpu, weight, weight))
+				* cpu_rq(this_cpu)->avg.load_avg_contrib;
 
 		prev_eff_load = 100 + (sd->imbalance_pct - 100) / 2;
 		prev_eff_load *= power_of(this_cpu);
-		prev_eff_load *= load + effective_load(tg, prev_cpu, 0, weight);
+		prev_eff_load *= (load + effective_load(tg, prev_cpu, 0, weight))
+				* cpu_rq(prev_cpu)->avg.load_avg_contrib;
 
 		balanced = this_eff_load <= prev_eff_load;
 	} else
@@ -4229,7 +4233,8 @@ static int move_tasks(struct lb_env *env)
 		if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
 			goto next;
 
-		load = task_h_load(p);
+		load = task_h_load(p) * p->se.avg.runnable_avg_sum
+					/ (p->se.avg.runnable_avg_period + 1);
 
 		if (sched_feat(LB_MIN) && load < 16 && !env->failed)
 			goto next;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/5] enable runnable load avg in load balance
  2012-11-17 13:04 [RFC PATCH 0/5] enable runnable load avg in load balance Alex Shi
                   ` (6 preceding siblings ...)
  2012-11-17 19:12 ` Preeti U Murthy
@ 2012-11-26 19:03 ` Benjamin Segall
  2012-11-27  0:37   ` Alex Shi
                     ` (2 more replies)
  7 siblings, 3 replies; 19+ messages in thread
From: Benjamin Segall @ 2012-11-26 19:03 UTC (permalink / raw)
  To: Alex Shi; +Cc: mingo, peterz, pjt, preeti, vincent.guittot, linux-kernel

So, I've been trying out using the runnable averages for load balance in
a few ways, but haven't actually gotten any improvement on the
benchmarks I've run. I'll post my patches once I have the numbers down,
but it's generally been about half a percent to 1% worse on the tests
I've tried.

The basic idea is to use (cfs_rq->runnable_load_avg +
cfs_rq->blocked_load_avg) (which should be equivalent to doing
load_avg_contrib on the rq) for cfs_rqs and possibly the rq, and
p->se.load.weight * p->se.avg.runnable_avg_sum / period for tasks.

I have not yet tried including wake_affine, so this has just involved
h_load (task_load_down and task_h_load), as that makes everything
(besides wake_affine) be based on either the new averages or the
rq->cpu_load averages.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/5] enable runnable load avg in load balance
  2012-11-26 19:03 ` Benjamin Segall
@ 2012-11-27  0:37   ` Alex Shi
  2012-11-27  1:01     ` Benjamin Segall
  2012-11-27  1:11   ` Alex Shi
  2012-11-27  3:08   ` Preeti U Murthy
  2 siblings, 1 reply; 19+ messages in thread
From: Alex Shi @ 2012-11-27  0:37 UTC (permalink / raw)
  To: Benjamin Segall; +Cc: mingo, peterz, pjt, preeti, vincent.guittot, linux-kernel

On 11/27/2012 03:03 AM, Benjamin Segall wrote:
> So, I've been trying out using the runnable averages for load balance in
> a few ways, but haven't actually gotten any improvement on the
> benchmarks I've run. I'll post my patches once I have the numbers down,
> but it's generally been about half a percent to 1% worse on the tests
> I've tried.
> 
> The basic idea is to use (cfs_rq->runnable_load_avg +
> cfs_rq->blocked_load_avg) (which should be equivalent to doing
> load_avg_contrib on the rq) for cfs_rqs and possibly the rq, and
> p->se.load.weight * p->se.avg.runnable_avg_sum / period for tasks.
> 
> I have not yet tried including wake_affine, so this has just involved
> h_load (task_load_down and task_h_load), as that makes everything
> (besides wake_affine) be based on either the new averages or the
> rq->cpu_load averages.
> 


which tree do your code base on? tip/master is changing quickly recently.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/5] enable runnable load avg in load balance
  2012-11-27  0:37   ` Alex Shi
@ 2012-11-27  1:01     ` Benjamin Segall
  0 siblings, 0 replies; 19+ messages in thread
From: Benjamin Segall @ 2012-11-27  1:01 UTC (permalink / raw)
  To: Alex Shi; +Cc: mingo, peterz, pjt, preeti, vincent.guittot, linux-kernel

I ran it on tip/sched/core.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/5] enable runnable load avg in load balance
  2012-11-26 19:03 ` Benjamin Segall
  2012-11-27  0:37   ` Alex Shi
@ 2012-11-27  1:11   ` Alex Shi
  2012-11-27  3:08   ` Preeti U Murthy
  2 siblings, 0 replies; 19+ messages in thread
From: Alex Shi @ 2012-11-27  1:11 UTC (permalink / raw)
  To: Benjamin Segall; +Cc: mingo, peterz, pjt, preeti, vincent.guittot, linux-kernel

On 11/27/2012 03:03 AM, Benjamin Segall wrote:
> So, I've been trying out using the runnable averages for load balance in
> a few ways, but haven't actually gotten any improvement on the
> benchmarks I've run. I'll post my patches once I have the numbers down,
> but it's generally been about half a percent to 1% worse on the tests
> I've tried.

Did you tried this rfc patch? and what's the result of it? :)

> 
> The basic idea is to use (cfs_rq->runnable_load_avg +
> cfs_rq->blocked_load_avg) (which should be equivalent to doing
> load_avg_contrib on the rq) for cfs_rqs and possibly the rq, and
> p->se.load.weight * p->se.avg.runnable_avg_sum / period for tasks.
> 
> I have not yet tried including wake_affine, so this has just involved
> h_load (task_load_down and task_h_load), as that makes everything
> (besides wake_affine) be based on either the new averages or the
> rq->cpu_load averages.
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/5] enable runnable load avg in load balance
  2012-11-26 19:03 ` Benjamin Segall
  2012-11-27  0:37   ` Alex Shi
  2012-11-27  1:11   ` Alex Shi
@ 2012-11-27  3:08   ` Preeti U Murthy
  2012-11-27  6:14     ` Alex Shi
  2 siblings, 1 reply; 19+ messages in thread
From: Preeti U Murthy @ 2012-11-27  3:08 UTC (permalink / raw)
  To: Benjamin Segall
  Cc: Alex Shi, mingo, peterz, pjt, vincent.guittot, linux-kernel

Hi everyone,

On 11/27/2012 12:33 AM, Benjamin Segall wrote:
> So, I've been trying out using the runnable averages for load balance in
> a few ways, but haven't actually gotten any improvement on the
> benchmarks I've run. I'll post my patches once I have the numbers down,
> but it's generally been about half a percent to 1% worse on the tests
> I've tried.
> 
> The basic idea is to use (cfs_rq->runnable_load_avg +
> cfs_rq->blocked_load_avg) (which should be equivalent to doing
> load_avg_contrib on the rq) for cfs_rqs and possibly the rq, and
> p->se.load.weight * p->se.avg.runnable_avg_sum / period for tasks.

Why should cfs_rq->blocked_load_avg be included to calculate the load
on the rq? They do not contribute to the active load of the cpu right?

When a task goes to sleep its load is removed from cfs_rq->load.weight
as well in account_entity_dequeue(). Which means the load balancer
considers a sleeping entity as *not* contributing to the active runqueue
load.So shouldn't the new metric consider cfs_rq->runnable_load_avg alone?
> 
> I have not yet tried including wake_affine, so this has just involved
> h_load (task_load_down and task_h_load), as that makes everything
> (besides wake_affine) be based on either the new averages or the
> rq->cpu_load averages.
> 

Yeah I have been trying to view the performance as well,but with
cfs_rq->runnable_load_avg as the rq load contribution and the task load,
same as mentioned above.I have not completed my experiments but I would
expect some significant performance difference due to the below scenario:

                     Task3(10% task)
Task1(100% task)     Task4(10% task)
Task2(100% task)     Task5(10% task)
---------------     ----------------       ----------
CPU1                  CPU2                  CPU3

When cpu3 triggers load balancing:

CASE1:
 without PJT's metric the following loads will be perceived
 CPU1->2048
 CPU2->3042
 Therefore CPU2 might be relieved of one task to result in:


Task1(100% task)     Task4(10% task)
Task2(100% task)     Task5(10% task)       Task3(10% task)
---------------     ----------------       ----------
CPU1                  CPU2                  CPU3

CASE2:
  with PJT's metric the following loads will be perceived
  CPU1->2048
  CPU2->1022
 Therefore CPU1 might be relieved of one task to result in:

                     Task3(10% task)
                     Task4(10% task)
Task2(100% task)     Task5(10% task)     Task1(100% task)
---------------     ----------------       ----------
CPU1                  CPU2                  CPU3


The differences between the above two scenarios include:

1.Reduced latency for Task1 in CASE2,which is the right task to be moved
in the above scenario.

2.Even though in the former case CPU2 is relieved of one task,its of no
use if Task3 is going to sleep most of the time.This might result in
more load balancing on behalf of cpu3.

What do you guys think?

Thank you

Regards
Preeti U Murthy





^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/5] enable runnable load avg in load balance
  2012-11-27  3:08   ` Preeti U Murthy
@ 2012-11-27  6:14     ` Alex Shi
  2012-11-27  6:45       ` Preeti U Murthy
  0 siblings, 1 reply; 19+ messages in thread
From: Alex Shi @ 2012-11-27  6:14 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: Benjamin Segall, mingo, peterz, pjt, vincent.guittot, linux-kernel

On 11/27/2012 11:08 AM, Preeti U Murthy wrote:
> Hi everyone,
> 
> On 11/27/2012 12:33 AM, Benjamin Segall wrote:
>> So, I've been trying out using the runnable averages for load balance in
>> a few ways, but haven't actually gotten any improvement on the
>> benchmarks I've run. I'll post my patches once I have the numbers down,
>> but it's generally been about half a percent to 1% worse on the tests
>> I've tried.
>>
>> The basic idea is to use (cfs_rq->runnable_load_avg +
>> cfs_rq->blocked_load_avg) (which should be equivalent to doing
>> load_avg_contrib on the rq) for cfs_rqs and possibly the rq, and
>> p->se.load.weight * p->se.avg.runnable_avg_sum / period for tasks.
> 
> Why should cfs_rq->blocked_load_avg be included to calculate the load
> on the rq? They do not contribute to the active load of the cpu right?
> 
> When a task goes to sleep its load is removed from cfs_rq->load.weight
> as well in account_entity_dequeue(). Which means the load balancer
> considers a sleeping entity as *not* contributing to the active runqueue
> load.So shouldn't the new metric consider cfs_rq->runnable_load_avg alone?
>>
>> I have not yet tried including wake_affine, so this has just involved
>> h_load (task_load_down and task_h_load), as that makes everything
>> (besides wake_affine) be based on either the new averages or the
>> rq->cpu_load averages.
>>
> 
> Yeah I have been trying to view the performance as well,but with
> cfs_rq->runnable_load_avg as the rq load contribution and the task load,
> same as mentioned above.I have not completed my experiments but I would
> expect some significant performance difference due to the below scenario:
> 
>                      Task3(10% task)
> Task1(100% task)     Task4(10% task)
> Task2(100% task)     Task5(10% task)
> ---------------     ----------------       ----------
> CPU1                  CPU2                  CPU3
> 
> When cpu3 triggers load balancing:
> 
> CASE1:
>  without PJT's metric the following loads will be perceived
>  CPU1->2048
>  CPU2->3042
>  Therefore CPU2 might be relieved of one task to result in:
> 
> 
> Task1(100% task)     Task4(10% task)
> Task2(100% task)     Task5(10% task)       Task3(10% task)
> ---------------     ----------------       ----------
> CPU1                  CPU2                  CPU3
> 
> CASE2:
>   with PJT's metric the following loads will be perceived
>   CPU1->2048
>   CPU2->1022
>  Therefore CPU1 might be relieved of one task to result in:
> 
>                      Task3(10% task)
>                      Task4(10% task)
> Task2(100% task)     Task5(10% task)     Task1(100% task)
> ---------------     ----------------       ----------
> CPU1                  CPU2                  CPU3
> 
> 
> The differences between the above two scenarios include:
> 
> 1.Reduced latency for Task1 in CASE2,which is the right task to be moved
> in the above scenario.
> 
> 2.Even though in the former case CPU2 is relieved of one task,its of no
> use if Task3 is going to sleep most of the time.This might result in
> more load balancing on behalf of cpu3.
> 
> What do you guys think?

It looks fine. just a question of CASE 1.
Usually the cpu2 with 3 10% load task will show nr_running == 0, at 70%
time. So, how you make rq->nr_running = 3 always?

Guess in most chance load balance with pull task1 or task2 to cpu2 or
cpu3. not the result of CASE 1.


> 
> Thank you
> 
> Regards
> Preeti U Murthy
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/5] enable runnable load avg in load balance
  2012-11-27  6:14     ` Alex Shi
@ 2012-11-27  6:45       ` Preeti U Murthy
  2012-11-27  8:06         ` Alex Shi
  0 siblings, 1 reply; 19+ messages in thread
From: Preeti U Murthy @ 2012-11-27  6:45 UTC (permalink / raw)
  To: Alex Shi
  Cc: Benjamin Segall, mingo, peterz, pjt, vincent.guittot, linux-kernel

Hi,
On 11/27/2012 11:44 AM, Alex Shi wrote:
> On 11/27/2012 11:08 AM, Preeti U Murthy wrote:
>> Hi everyone,
>>
>> On 11/27/2012 12:33 AM, Benjamin Segall wrote:
>>> So, I've been trying out using the runnable averages for load balance in
>>> a few ways, but haven't actually gotten any improvement on the
>>> benchmarks I've run. I'll post my patches once I have the numbers down,
>>> but it's generally been about half a percent to 1% worse on the tests
>>> I've tried.
>>>
>>> The basic idea is to use (cfs_rq->runnable_load_avg +
>>> cfs_rq->blocked_load_avg) (which should be equivalent to doing
>>> load_avg_contrib on the rq) for cfs_rqs and possibly the rq, and
>>> p->se.load.weight * p->se.avg.runnable_avg_sum / period for tasks.
>>
>> Why should cfs_rq->blocked_load_avg be included to calculate the load
>> on the rq? They do not contribute to the active load of the cpu right?
>>
>> When a task goes to sleep its load is removed from cfs_rq->load.weight
>> as well in account_entity_dequeue(). Which means the load balancer
>> considers a sleeping entity as *not* contributing to the active runqueue
>> load.So shouldn't the new metric consider cfs_rq->runnable_load_avg alone?
>>>
>>> I have not yet tried including wake_affine, so this has just involved
>>> h_load (task_load_down and task_h_load), as that makes everything
>>> (besides wake_affine) be based on either the new averages or the
>>> rq->cpu_load averages.
>>>
>>
>> Yeah I have been trying to view the performance as well,but with
>> cfs_rq->runnable_load_avg as the rq load contribution and the task load,
>> same as mentioned above.I have not completed my experiments but I would
>> expect some significant performance difference due to the below scenario:
>>
>>                      Task3(10% task)
>> Task1(100% task)     Task4(10% task)
>> Task2(100% task)     Task5(10% task)
>> ---------------     ----------------       ----------
>> CPU1                  CPU2                  CPU3
>>
>> When cpu3 triggers load balancing:
>>
>> CASE1:
>>  without PJT's metric the following loads will be perceived
>>  CPU1->2048
>>  CPU2->3042
>>  Therefore CPU2 might be relieved of one task to result in:
>>
>>
>> Task1(100% task)     Task4(10% task)
>> Task2(100% task)     Task5(10% task)       Task3(10% task)
>> ---------------     ----------------       ----------
>> CPU1                  CPU2                  CPU3
>>
>> CASE2:
>>   with PJT's metric the following loads will be perceived
>>   CPU1->2048
>>   CPU2->1022
>>  Therefore CPU1 might be relieved of one task to result in:
>>
>>                      Task3(10% task)
>>                      Task4(10% task)
>> Task2(100% task)     Task5(10% task)     Task1(100% task)
>> ---------------     ----------------       ----------
>> CPU1                  CPU2                  CPU3
>>
>>
>> The differences between the above two scenarios include:
>>
>> 1.Reduced latency for Task1 in CASE2,which is the right task to be moved
>> in the above scenario.
>>
>> 2.Even though in the former case CPU2 is relieved of one task,its of no
>> use if Task3 is going to sleep most of the time.This might result in
>> more load balancing on behalf of cpu3.
>>
>> What do you guys think?
> 
> It looks fine. just a question of CASE 1.
> Usually the cpu2 with 3 10% load task will show nr_running == 0, at 70%
> time. So, how you make rq->nr_running = 3 always?
> 
> Guess in most chance load balance with pull task1 or task2 to cpu2 or
> cpu3. not the result of CASE 1.

Thats right Alex.Most of the time the nr_running on CPU2 will be shown
to be 0 or perhaps 1/2.But whether you use PJT's metric or not,the load
balancer in such circumstances will behave the same, as you have rightly
pointed out: pull task1/2 to cpu2/3.

But the issue usually arises when all three wake up at the same time on
cpu2,portraying wrongly that the load is 3042, if PJT's metric is not
used.This could lead to load balancing one of these short running tasks
as shown by CASE1.This is the situation where in my opinion,PJT's metric
could make a difference.

Regards
Preeti U Murthy


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/5] enable runnable load avg in load balance
  2012-11-27  6:45       ` Preeti U Murthy
@ 2012-11-27  8:06         ` Alex Shi
  0 siblings, 0 replies; 19+ messages in thread
From: Alex Shi @ 2012-11-27  8:06 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: Benjamin Segall, mingo, peterz, pjt, vincent.guittot, linux-kernel

On 11/27/2012 02:45 PM, Preeti U Murthy wrote:
> Hi,
> On 11/27/2012 11:44 AM, Alex Shi wrote:
>> On 11/27/2012 11:08 AM, Preeti U Murthy wrote:
>>> Hi everyone,
>>>
>>> On 11/27/2012 12:33 AM, Benjamin Segall wrote:
>>>> So, I've been trying out using the runnable averages for load balance in
>>>> a few ways, but haven't actually gotten any improvement on the
>>>> benchmarks I've run. I'll post my patches once I have the numbers down,
>>>> but it's generally been about half a percent to 1% worse on the tests
>>>> I've tried.
>>>>
>>>> The basic idea is to use (cfs_rq->runnable_load_avg +
>>>> cfs_rq->blocked_load_avg) (which should be equivalent to doing
>>>> load_avg_contrib on the rq) for cfs_rqs and possibly the rq, and
>>>> p->se.load.weight * p->se.avg.runnable_avg_sum / period for tasks.
>>>
>>> Why should cfs_rq->blocked_load_avg be included to calculate the load
>>> on the rq? They do not contribute to the active load of the cpu right?
>>>
>>> When a task goes to sleep its load is removed from cfs_rq->load.weight
>>> as well in account_entity_dequeue(). Which means the load balancer
>>> considers a sleeping entity as *not* contributing to the active runqueue
>>> load.So shouldn't the new metric consider cfs_rq->runnable_load_avg alone?
>>>>
>>>> I have not yet tried including wake_affine, so this has just involved
>>>> h_load (task_load_down and task_h_load), as that makes everything
>>>> (besides wake_affine) be based on either the new averages or the
>>>> rq->cpu_load averages.
>>>>
>>>
>>> Yeah I have been trying to view the performance as well,but with
>>> cfs_rq->runnable_load_avg as the rq load contribution and the task load,
>>> same as mentioned above.I have not completed my experiments but I would
>>> expect some significant performance difference due to the below scenario:
>>>
>>>                      Task3(10% task)
>>> Task1(100% task)     Task4(10% task)
>>> Task2(100% task)     Task5(10% task)
>>> ---------------     ----------------       ----------
>>> CPU1                  CPU2                  CPU3
>>>
>>> When cpu3 triggers load balancing:
>>>
>>> CASE1:
>>>  without PJT's metric the following loads will be perceived
>>>  CPU1->2048
>>>  CPU2->3042
>>>  Therefore CPU2 might be relieved of one task to result in:
>>>
>>>
>>> Task1(100% task)     Task4(10% task)
>>> Task2(100% task)     Task5(10% task)       Task3(10% task)
>>> ---------------     ----------------       ----------
>>> CPU1                  CPU2                  CPU3
>>>
>>> CASE2:
>>>   with PJT's metric the following loads will be perceived
>>>   CPU1->2048
>>>   CPU2->1022
>>>  Therefore CPU1 might be relieved of one task to result in:
>>>
>>>                      Task3(10% task)
>>>                      Task4(10% task)
>>> Task2(100% task)     Task5(10% task)     Task1(100% task)
>>> ---------------     ----------------       ----------
>>> CPU1                  CPU2                  CPU3
>>>
>>>
>>> The differences between the above two scenarios include:
>>>
>>> 1.Reduced latency for Task1 in CASE2,which is the right task to be moved
>>> in the above scenario.
>>>
>>> 2.Even though in the former case CPU2 is relieved of one task,its of no
>>> use if Task3 is going to sleep most of the time.This might result in
>>> more load balancing on behalf of cpu3.
>>>
>>> What do you guys think?
>>
>> It looks fine. just a question of CASE 1.
>> Usually the cpu2 with 3 10% load task will show nr_running == 0, at 70%
>> time. So, how you make rq->nr_running = 3 always?
>>
>> Guess in most chance load balance with pull task1 or task2 to cpu2 or
>> cpu3. not the result of CASE 1.
> 
> Thats right Alex.Most of the time the nr_running on CPU2 will be shown
> to be 0 or perhaps 1/2.But whether you use PJT's metric or not,the load
> balancer in such circumstances will behave the same, as you have rightly
> pointed out: pull task1/2 to cpu2/3.
> 
> But the issue usually arises when all three wake up at the same time on
> cpu2,portraying wrongly that the load is 3042, if PJT's metric is not
> used.This could lead to load balancing one of these short running tasks
> as shown by CASE1.This is the situation where in my opinion,PJT's metric
> could make a difference.

Sure. And it will be perfect if you can find a appropriate benchmark to
support it.
> 
> Regards
> Preeti U Murthy
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-11-27  8:08 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-17 13:04 [RFC PATCH 0/5] enable runnable load avg in load balance Alex Shi
2012-11-17 13:04 ` [RFC PATCH 1/5] sched: get rq runnable load average for " Alex Shi
2012-11-17 13:04 ` [RFC PATCH 2/5] sched: update rq runnable load average in time Alex Shi
2012-11-17 13:04 ` [RFC PATCH 3/5] sched: using runnable load avg in cpu_load and cpu_avg_load_per_task Alex Shi
2012-11-17 13:04 ` [RFC PATCH 4/5] sched: consider runnable load average in wake_affine and move_tasks Alex Shi
2012-11-17 18:09   ` Preeti U Murthy
2012-11-18  9:36     ` Alex Shi
2012-11-17 13:04 ` [RFC PATCH 5/5] sched: revert 'Introduce temporary FAIR_GROUP_SCHED dependency ...' Alex Shi
2012-11-17 13:49 ` [RFC PATCH 0/5] enable runnable load avg in load balance Ricardo Nabinger Sanchez
2012-11-17 19:12 ` Preeti U Murthy
2012-11-18  8:35   ` Alex Shi
2012-11-26 19:03 ` Benjamin Segall
2012-11-27  0:37   ` Alex Shi
2012-11-27  1:01     ` Benjamin Segall
2012-11-27  1:11   ` Alex Shi
2012-11-27  3:08   ` Preeti U Murthy
2012-11-27  6:14     ` Alex Shi
2012-11-27  6:45       ` Preeti U Murthy
2012-11-27  8:06         ` Alex Shi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.