All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 0/8]: use runnable load avg in balance
@ 2013-05-10 15:17 Alex Shi
  2013-05-10 15:17 ` [patch v6 1/8] Revert "sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking" Alex Shi
                   ` (10 more replies)
  0 siblings, 11 replies; 32+ messages in thread
From: Alex Shi @ 2013-05-10 15:17 UTC (permalink / raw)
  To: mingo, peterz, tglx, akpm, bp, pjt, namhyung, efault, morten.rasmussen
  Cc: vincent.guittot, preeti, viresh.kumar, linux-kernel, alex.shi,
	mgorman, riel, wangyun

This patchset bases on tip/sched/core.

This version changed the runnable load avg value setting for new task
in patch 3rd.

We also tried to include blocked load avg in balance. but find many benchmark
performance dropping. Guess the too bigger cpu load drive task to be waken
on remote CPU, and cause wrong decision in periodic balance.

I retested on Intel core2, NHM, SNB, IVB, 2 and 4 sockets machines with
benchmark kbuild, aim7, dbench, tbench, hackbench, oltp, and netperf loopback
etc. The performance is better now. 

On SNB EP 4 sockets machine, the hackbench increased about 50%, and result
become stable. on other machines, hackbench increased about 2~10%.
oltp increased about 30% in NHM EX box.
netperf loopback also increased on SNB EP 4 sockets box.
no clear changes on other benchmarks.
 
Michael Wang had tested previous version on pgbench on his box:
https://lkml.org/lkml/2013/4/2/1022

And Morten tested previous version too.
http://comments.gmane.org/gmane.linux.kernel/1463371

Thanks comments from Peter, Paul, Morten, Miacheal and Preeti.
And more comments are appreciated!

Regards
Alex

[patch v6 1/8] Revert "sched: Introduce temporary FAIR_GROUP_SCHED
[patch v6 2/8] sched: move few runnable tg variables into CONFIG_SMP
[patch v6 3/8] sched: set initial value of runnable avg for new
[patch v6 4/8] sched: fix slept time double counting in enqueue
[patch v6 5/8] sched: update cpu load after task_tick.
[patch v6 6/8] sched: compute runnable load avg in cpu_load and
[patch v6 7/8] sched: consider runnable load average in move_tasks
[patch v6 8/8] sched: remove blocked_load_avg in tg

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch v6 1/8] Revert "sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking"
  2013-05-10 15:17 [patch 0/8]: use runnable load avg in balance Alex Shi
@ 2013-05-10 15:17 ` Alex Shi
  2013-05-10 15:17 ` [patch v6 2/8] sched: move few runnable tg variables into CONFIG_SMP Alex Shi
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Alex Shi @ 2013-05-10 15:17 UTC (permalink / raw)
  To: mingo, peterz, tglx, akpm, bp, pjt, namhyung, efault, morten.rasmussen
  Cc: vincent.guittot, preeti, viresh.kumar, linux-kernel, alex.shi,
	mgorman, riel, wangyun

Remove CONFIG_FAIR_GROUP_SCHED that covers the runnable info, then
we can use runnable load variables.

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 include/linux/sched.h |    7 +------
 kernel/sched/core.c   |    7 +------
 kernel/sched/fair.c   |   13 ++-----------
 kernel/sched/sched.h  |   10 ++--------
 4 files changed, 6 insertions(+), 31 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index e692a02..9539597 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1161,12 +1161,7 @@ struct sched_entity {
 	struct cfs_rq		*my_q;
 #endif
 
-/*
- * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be
- * removed when useful for applications beyond shares distribution (e.g.
- * load-balance).
- */
-#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)
+#ifdef CONFIG_SMP
 	/* Per-entity load-tracking */
 	struct sched_avg	avg;
 #endif
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 67d0465..c8db984 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1563,12 +1563,7 @@ static void __sched_fork(struct task_struct *p)
 	p->se.vruntime			= 0;
 	INIT_LIST_HEAD(&p->se.group_node);
 
-/*
- * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be
- * removed when useful for applications beyond shares distribution (e.g.
- * load-balance).
- */
-#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)
+#ifdef CONFIG_SMP
 	p->se.avg.runnable_avg_period = 0;
 	p->se.avg.runnable_avg_sum = 0;
 #endif
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7a33e59..9c2f726 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1109,8 +1109,7 @@ static inline void update_cfs_shares(struct cfs_rq *cfs_rq)
 }
 #endif /* CONFIG_FAIR_GROUP_SCHED */
 
-/* Only depends on SMP, FAIR_GROUP_SCHED may be removed when useful in lb */
-#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)
+#ifdef CONFIG_SMP
 /*
  * We choose a half-life close to 1 scheduling period.
  * Note: The tables below are dependent on this value.
@@ -3394,12 +3393,6 @@ unlock:
 }
 
 /*
- * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be
- * removed when useful for applications beyond shares distribution (e.g.
- * load-balance).
- */
-#ifdef CONFIG_FAIR_GROUP_SCHED
-/*
  * Called immediately before a task is migrated to a new cpu; task_cpu(p) and
  * cfs_rq_of(p) references at time of call are still valid and identify the
  * previous cpu.  However, the caller only guarantees p->pi_lock is held; no
@@ -3422,7 +3415,6 @@ migrate_task_rq_fair(struct task_struct *p, int next_cpu)
 		atomic64_add(se->avg.load_avg_contrib, &cfs_rq->removed_load);
 	}
 }
-#endif
 #endif /* CONFIG_SMP */
 
 static unsigned long
@@ -6114,9 +6106,8 @@ const struct sched_class fair_sched_class = {
 
 #ifdef CONFIG_SMP
 	.select_task_rq		= select_task_rq_fair,
-#ifdef CONFIG_FAIR_GROUP_SCHED
 	.migrate_task_rq	= migrate_task_rq_fair,
-#endif
+
 	.rq_online		= rq_online_fair,
 	.rq_offline		= rq_offline_fair,
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index cc03cfd..9419764 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -227,12 +227,6 @@ struct cfs_rq {
 #endif
 
 #ifdef CONFIG_SMP
-/*
- * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be
- * removed when useful for applications beyond shares distribution (e.g.
- * load-balance).
- */
-#ifdef CONFIG_FAIR_GROUP_SCHED
 	/*
 	 * CFS Load tracking
 	 * Under CFS, load is tracked on a per-entity basis and aggregated up.
@@ -242,9 +236,9 @@ struct cfs_rq {
 	u64 runnable_load_avg, blocked_load_avg;
 	atomic64_t decay_counter, removed_load;
 	u64 last_decay;
-#endif /* CONFIG_FAIR_GROUP_SCHED */
-/* These always depend on CONFIG_FAIR_GROUP_SCHED */
+
 #ifdef CONFIG_FAIR_GROUP_SCHED
+	/* Required to track per-cpu representation of a task_group */
 	u32 tg_runnable_contrib;
 	u64 tg_load_contrib;
 #endif /* CONFIG_FAIR_GROUP_SCHED */
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [patch v6 2/8] sched: move few runnable tg variables into CONFIG_SMP
  2013-05-10 15:17 [patch 0/8]: use runnable load avg in balance Alex Shi
  2013-05-10 15:17 ` [patch v6 1/8] Revert "sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking" Alex Shi
@ 2013-05-10 15:17 ` Alex Shi
  2013-05-10 15:17 ` [patch v6 3/8] sched: set initial value of runnable avg for new forked task Alex Shi
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Alex Shi @ 2013-05-10 15:17 UTC (permalink / raw)
  To: mingo, peterz, tglx, akpm, bp, pjt, namhyung, efault, morten.rasmussen
  Cc: vincent.guittot, preeti, viresh.kumar, linux-kernel, alex.shi,
	mgorman, riel, wangyun

The following 2 variables only used under CONFIG_SMP, so better to move
their definiation into CONFIG_SMP too.

        atomic64_t load_avg;
        atomic_t runnable_avg;

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 kernel/sched/sched.h |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 9419764..c6634f1 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -114,9 +114,11 @@ struct task_group {
 	unsigned long shares;
 
 	atomic_t load_weight;
+#ifdef	CONFIG_SMP
 	atomic64_t load_avg;
 	atomic_t runnable_avg;
 #endif
+#endif
 
 #ifdef CONFIG_RT_GROUP_SCHED
 	struct sched_rt_entity **rt_se;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [patch v6 3/8] sched: set initial value of runnable avg for new forked task
  2013-05-10 15:17 [patch 0/8]: use runnable load avg in balance Alex Shi
  2013-05-10 15:17 ` [patch v6 1/8] Revert "sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking" Alex Shi
  2013-05-10 15:17 ` [patch v6 2/8] sched: move few runnable tg variables into CONFIG_SMP Alex Shi
@ 2013-05-10 15:17 ` Alex Shi
  2013-05-16  6:28   ` Alex Shi
  2013-05-10 15:17 ` [patch v6 4/8] sched: fix slept time double counting in enqueue entity Alex Shi
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Alex Shi @ 2013-05-10 15:17 UTC (permalink / raw)
  To: mingo, peterz, tglx, akpm, bp, pjt, namhyung, efault, morten.rasmussen
  Cc: vincent.guittot, preeti, viresh.kumar, linux-kernel, alex.shi,
	mgorman, riel, wangyun

We need initialize the se.avg.{decay_count, load_avg_contrib} for a
new forked task.
Otherwise random values of above variables cause mess when do new task
enqueue:
    enqueue_task_fair
        enqueue_entity
            enqueue_entity_load_avg

and make forking balancing imbalance since incorrect load_avg_contrib.

Further more, Morten Rasmussen notice some tasks were not launched at
once after created. So Paul and Peter suggest giving a start value for
new task runnable avg time same as sched_slice().

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 kernel/sched/core.c  |    6 ++----
 kernel/sched/fair.c  |   23 +++++++++++++++++++++++
 kernel/sched/sched.h |    2 ++
 3 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c8db984..866c05a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1563,10 +1563,6 @@ static void __sched_fork(struct task_struct *p)
 	p->se.vruntime			= 0;
 	INIT_LIST_HEAD(&p->se.group_node);
 
-#ifdef CONFIG_SMP
-	p->se.avg.runnable_avg_period = 0;
-	p->se.avg.runnable_avg_sum = 0;
-#endif
 #ifdef CONFIG_SCHEDSTATS
 	memset(&p->se.statistics, 0, sizeof(p->se.statistics));
 #endif
@@ -1710,6 +1706,8 @@ void wake_up_new_task(struct task_struct *p)
 	set_task_cpu(p, select_task_rq(p, SD_BALANCE_FORK, 0));
 #endif
 
+	/* Give new task start runnable values */
+	set_task_runnable_avg(p);
 	rq = __task_rq_lock(p);
 	activate_task(rq, p, 0);
 	p->on_rq = 1;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9c2f726..203f236 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -661,6 +661,26 @@ static u64 sched_vslice(struct cfs_rq *cfs_rq, struct sched_entity *se)
 	return calc_delta_fair(sched_slice(cfs_rq, se), se);
 }
 
+#ifdef CONFIG_SMP
+static inline void __update_task_entity_contrib(struct sched_entity *se);
+
+/* Give new task start runnable values to heavy its load in infant time */
+void set_task_runnable_avg(struct task_struct *p)
+{
+	u32 slice;
+
+	p->se.avg.decay_count = 0;
+	slice = sched_slice(task_cfs_rq(p), &p->se) >> 10;
+	p->se.avg.runnable_avg_sum = slice;
+	p->se.avg.runnable_avg_period = slice;
+	__update_task_entity_contrib(&p->se);
+}
+#else
+void set_task_runnable_avg(struct task_struct *p)
+{
+}
+#endif
+
 /*
  * Update the current task's runtime statistics. Skip current tasks that
  * are not in our scheduling class.
@@ -1508,6 +1528,9 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq,
 	 * We track migrations using entity decay_count <= 0, on a wake-up
 	 * migration we use a negative decay count to track the remote decays
 	 * accumulated while sleeping.
+	 *
+	 * When enqueue a new forked task, the se->avg.decay_count == 0, so
+	 * we bypass update_entity_load_avg(), use avg.load_avg_contrib direct.
 	 */
 	if (unlikely(se->avg.decay_count <= 0)) {
 		se->avg.last_runnable_update = rq_of(cfs_rq)->clock_task;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index c6634f1..518f3d8a 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -900,6 +900,8 @@ extern void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 runtime
 
 extern void update_idle_cpu_load(struct rq *this_rq);
 
+extern void set_task_runnable_avg(struct task_struct *p);
+
 #ifdef CONFIG_CGROUP_CPUACCT
 #include <linux/cgroup.h>
 /* track cpu usage of a group of tasks and its child groups */
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [patch v6 4/8] sched: fix slept time double counting in enqueue entity
  2013-05-10 15:17 [patch 0/8]: use runnable load avg in balance Alex Shi
                   ` (2 preceding siblings ...)
  2013-05-10 15:17 ` [patch v6 3/8] sched: set initial value of runnable avg for new forked task Alex Shi
@ 2013-05-10 15:17 ` Alex Shi
  2013-05-10 15:17 ` [patch v6 5/8] sched: update cpu load after task_tick Alex Shi
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Alex Shi @ 2013-05-10 15:17 UTC (permalink / raw)
  To: mingo, peterz, tglx, akpm, bp, pjt, namhyung, efault, morten.rasmussen
  Cc: vincent.guittot, preeti, viresh.kumar, linux-kernel, alex.shi,
	mgorman, riel, wangyun

The wakeuped migrated task will __synchronize_entity_decay(se); in
migrate_task_fair, then it needs to set
`se->avg.last_runnable_update -= (-se->avg.decay_count) << 20'
before update_entity_load_avg, in order to avoid slept time is updated
twice for se.avg.load_avg_contrib in both __syncchronize and
update_entity_load_avg.

but if the slept task is waked up from self cpu, it miss the
last_runnable_update before update_entity_load_avg(se, 0, 1), then the
slept time was used twice in both functions.
So we need to remove the double slept time counting.

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 kernel/sched/fair.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 203f236..8cd19f3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1551,7 +1551,8 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq,
 		}
 		wakeup = 0;
 	} else {
-		__synchronize_entity_decay(se);
+		se->avg.last_runnable_update += __synchronize_entity_decay(se)
+							<< 20;
 	}
 
 	/* migrated tasks did not contribute to our blocked load */
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [patch v6 5/8] sched: update cpu load after task_tick.
  2013-05-10 15:17 [patch 0/8]: use runnable load avg in balance Alex Shi
                   ` (3 preceding siblings ...)
  2013-05-10 15:17 ` [patch v6 4/8] sched: fix slept time double counting in enqueue entity Alex Shi
@ 2013-05-10 15:17 ` Alex Shi
  2013-05-10 15:17 ` [patch v6 6/8] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task Alex Shi
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Alex Shi @ 2013-05-10 15:17 UTC (permalink / raw)
  To: mingo, peterz, tglx, akpm, bp, pjt, namhyung, efault, morten.rasmussen
  Cc: vincent.guittot, preeti, viresh.kumar, linux-kernel, alex.shi,
	mgorman, riel, wangyun

To get the latest runnable info, we need do this cpuload update after
task_tick.

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 kernel/sched/core.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 866c05a..f1f9641 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2684,8 +2684,8 @@ void scheduler_tick(void)
 
 	raw_spin_lock(&rq->lock);
 	update_rq_clock(rq);
-	update_cpu_load_active(rq);
 	curr->sched_class->task_tick(rq, curr, 0);
+	update_cpu_load_active(rq);
 	raw_spin_unlock(&rq->lock);
 
 	perf_event_task_tick();
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [patch v6 6/8] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task
  2013-05-10 15:17 [patch 0/8]: use runnable load avg in balance Alex Shi
                   ` (4 preceding siblings ...)
  2013-05-10 15:17 ` [patch v6 5/8] sched: update cpu load after task_tick Alex Shi
@ 2013-05-10 15:17 ` Alex Shi
  2013-05-13 14:06   ` Peter Zijlstra
  2013-05-10 15:17 ` [patch v6 7/8] sched: consider runnable load average in move_tasks Alex Shi
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Alex Shi @ 2013-05-10 15:17 UTC (permalink / raw)
  To: mingo, peterz, tglx, akpm, bp, pjt, namhyung, efault, morten.rasmussen
  Cc: vincent.guittot, preeti, viresh.kumar, linux-kernel, alex.shi,
	mgorman, riel, wangyun

They are the base values in load balance, update them with rq runnable
load average, then the load balance will consider runnable load avg
naturally.

We also try to include the blocked_load_avg as cpu load in balancing,
but that cause kbuild/aim7/oltp benchmark performance drop.

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 kernel/sched/core.c |   16 ++++++++++++++--
 kernel/sched/fair.c |    5 +++--
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f1f9641..8ab37c3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2528,9 +2528,14 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
 void update_idle_cpu_load(struct rq *this_rq)
 {
 	unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
-	unsigned long load = this_rq->load.weight;
+	unsigned long load;
 	unsigned long pending_updates;
 
+#ifdef CONFIG_SMP
+	load = this_rq->cfs.runnable_load_avg;
+#else
+	load = this_rq->load.weight;
+#endif
 	/*
 	 * bail if there's load or we're actually up-to-date.
 	 */
@@ -2574,11 +2579,18 @@ void update_cpu_load_nohz(void)
  */
 static void update_cpu_load_active(struct rq *this_rq)
 {
+	unsigned long load;
+
+#ifdef CONFIG_SMP
+	load = this_rq->cfs.runnable_load_avg;
+#else
+	load = this_rq->load.weight;
+#endif
 	/*
 	 * See the mess around update_idle_cpu_load() / update_cpu_load_nohz().
 	 */
 	this_rq->last_load_update_tick = jiffies;
-	__update_cpu_load(this_rq, this_rq->load.weight, 1);
+	__update_cpu_load(this_rq, load, 1);
 
 	calc_load_account_active(this_rq);
 }
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8cd19f3..0159c85 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2920,7 +2920,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 /* Used instead of source_load when we know the type == 0 */
 static unsigned long weighted_cpuload(const int cpu)
 {
-	return cpu_rq(cpu)->load.weight;
+	return cpu_rq(cpu)->cfs.runnable_load_avg;
 }
 
 /*
@@ -2965,9 +2965,10 @@ static unsigned long cpu_avg_load_per_task(int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
 	unsigned long nr_running = ACCESS_ONCE(rq->nr_running);
+	unsigned long load_avg = rq->cfs.runnable_load_avg;
 
 	if (nr_running)
-		return rq->load.weight / nr_running;
+		return load_avg / nr_running;
 
 	return 0;
 }
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [patch v6 7/8] sched: consider runnable load average in move_tasks
  2013-05-10 15:17 [patch 0/8]: use runnable load avg in balance Alex Shi
                   ` (5 preceding siblings ...)
  2013-05-10 15:17 ` [patch v6 6/8] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task Alex Shi
@ 2013-05-10 15:17 ` Alex Shi
  2013-05-10 15:17 ` [patch v6 8/8] sched: remove blocked_load_avg in tg Alex Shi
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Alex Shi @ 2013-05-10 15:17 UTC (permalink / raw)
  To: mingo, peterz, tglx, akpm, bp, pjt, namhyung, efault, morten.rasmussen
  Cc: vincent.guittot, preeti, viresh.kumar, linux-kernel, alex.shi,
	mgorman, riel, wangyun

Except using runnable load average in background, move_tasks is also
the key functions in load balance. We need consider the runnable load
average in it in order to the apple to apple load comparison.

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 kernel/sched/fair.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0159c85..91e60ac 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4138,11 +4138,11 @@ static int tg_load_down(struct task_group *tg, void *data)
 	long cpu = (long)data;
 
 	if (!tg->parent) {
-		load = cpu_rq(cpu)->load.weight;
+		load = cpu_rq(cpu)->avg.load_avg_contrib;
 	} else {
 		load = tg->parent->cfs_rq[cpu]->h_load;
-		load *= tg->se[cpu]->load.weight;
-		load /= tg->parent->cfs_rq[cpu]->load.weight + 1;
+		load *= tg->se[cpu]->avg.load_avg_contrib;
+		load /= tg->parent->cfs_rq[cpu]->runnable_load_avg + 1;
 	}
 
 	tg->cfs_rq[cpu]->h_load = load;
@@ -4170,8 +4170,8 @@ static unsigned long task_h_load(struct task_struct *p)
 	struct cfs_rq *cfs_rq = task_cfs_rq(p);
 	unsigned long load;
 
-	load = p->se.load.weight;
-	load = div_u64(load * cfs_rq->h_load, cfs_rq->load.weight + 1);
+	load = p->se.avg.load_avg_contrib;
+	load = div_u64(load * cfs_rq->h_load, cfs_rq->runnable_load_avg + 1);
 
 	return load;
 }
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [patch v6 8/8] sched: remove blocked_load_avg in tg
  2013-05-10 15:17 [patch 0/8]: use runnable load avg in balance Alex Shi
                   ` (6 preceding siblings ...)
  2013-05-10 15:17 ` [patch v6 7/8] sched: consider runnable load average in move_tasks Alex Shi
@ 2013-05-10 15:17 ` Alex Shi
  2013-05-14  8:31   ` Peter Zijlstra
                     ` (2 more replies)
  2013-05-14  8:07 ` [patch 0/8]: use runnable load avg in balance Alex Shi
                   ` (2 subsequent siblings)
  10 siblings, 3 replies; 32+ messages in thread
From: Alex Shi @ 2013-05-10 15:17 UTC (permalink / raw)
  To: mingo, peterz, tglx, akpm, bp, pjt, namhyung, efault, morten.rasmussen
  Cc: vincent.guittot, preeti, viresh.kumar, linux-kernel, alex.shi,
	mgorman, riel, wangyun

blocked_load_avg sometime is too heavy and far bigger than runnable load
avg. that make balance make wrong decision. So better don't consider it.

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 kernel/sched/fair.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 91e60ac..75c200c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1339,7 +1339,7 @@ static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
 	struct task_group *tg = cfs_rq->tg;
 	s64 tg_contrib;
 
-	tg_contrib = cfs_rq->runnable_load_avg + cfs_rq->blocked_load_avg;
+	tg_contrib = cfs_rq->runnable_load_avg;
 	tg_contrib -= cfs_rq->tg_load_contrib;
 
 	if (force_update || abs64(tg_contrib) > cfs_rq->tg_load_contrib / 8) {
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [patch v6 6/8] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task
  2013-05-10 15:17 ` [patch v6 6/8] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task Alex Shi
@ 2013-05-13 14:06   ` Peter Zijlstra
  2013-05-14  0:51     ` Alex Shi
  2013-05-14  7:27     ` Alex Shi
  0 siblings, 2 replies; 32+ messages in thread
From: Peter Zijlstra @ 2013-05-13 14:06 UTC (permalink / raw)
  To: Alex Shi
  Cc: mingo, tglx, akpm, bp, pjt, namhyung, efault, morten.rasmussen,
	vincent.guittot, preeti, viresh.kumar, linux-kernel, mgorman,
	riel, wangyun

On Fri, May 10, 2013 at 11:17:27PM +0800, Alex Shi wrote:
> They are the base values in load balance, update them with rq runnable
> load average, then the load balance will consider runnable load avg
> naturally.
> 
> We also try to include the blocked_load_avg as cpu load in balancing,
> but that cause kbuild/aim7/oltp benchmark performance drop.
> 
> Signed-off-by: Alex Shi <alex.shi@intel.com>
> ---
>  kernel/sched/core.c |   16 ++++++++++++++--
>  kernel/sched/fair.c |    5 +++--
>  2 files changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f1f9641..8ab37c3 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2528,9 +2528,14 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
>  void update_idle_cpu_load(struct rq *this_rq)
>  {
>  	unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
> -	unsigned long load = this_rq->load.weight;
> +	unsigned long load;
>  	unsigned long pending_updates;
>  
> +#ifdef CONFIG_SMP
> +	load = this_rq->cfs.runnable_load_avg;
> +#else
> +	load = this_rq->load.weight;
> +#endif
>  	/*
>  	 * bail if there's load or we're actually up-to-date.
>  	 */
> @@ -2574,11 +2579,18 @@ void update_cpu_load_nohz(void)
>   */
>  static void update_cpu_load_active(struct rq *this_rq)
>  {
> +	unsigned long load;
> +
> +#ifdef CONFIG_SMP
> +	load = this_rq->cfs.runnable_load_avg;
> +#else
> +	load = this_rq->load.weight;
> +#endif
>  	/*
>  	 * See the mess around update_idle_cpu_load() / update_cpu_load_nohz().

This just smells like you want a helper function... :-)

Also it doesn't apply anymore due to Paul Gortemaker moving some of this
stuff about.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch v6 6/8] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task
  2013-05-13 14:06   ` Peter Zijlstra
@ 2013-05-14  0:51     ` Alex Shi
  2013-05-14  7:27     ` Alex Shi
  1 sibling, 0 replies; 32+ messages in thread
From: Alex Shi @ 2013-05-14  0:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, akpm, bp, pjt, namhyung, efault, morten.rasmussen,
	vincent.guittot, preeti, viresh.kumar, linux-kernel, mgorman,
	riel, wangyun

On 05/13/2013 10:06 PM, Peter Zijlstra wrote:
>> >  static void update_cpu_load_active(struct rq *this_rq)
>> >  {
>> > +	unsigned long load;
>> > +
>> > +#ifdef CONFIG_SMP
>> > +	load = this_rq->cfs.runnable_load_avg;
>> > +#else
>> > +	load = this_rq->load.weight;
>> > +#endif
>> >  	/*
>> >  	 * See the mess around update_idle_cpu_load() / update_cpu_load_nohz().
> This just smells like you want a helper function... :-)

Yes, thanks for point out this!
> 
> Also it doesn't apply anymore due to Paul Gortemaker moving some of this
> stuff about.

Will rebase on this. Thanks again! :)

-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch v6 6/8] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task
  2013-05-13 14:06   ` Peter Zijlstra
  2013-05-14  0:51     ` Alex Shi
@ 2013-05-14  7:27     ` Alex Shi
  2013-05-16  5:49       ` Michael Wang
  1 sibling, 1 reply; 32+ messages in thread
From: Alex Shi @ 2013-05-14  7:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, akpm, bp, pjt, namhyung, efault, morten.rasmussen,
	vincent.guittot, preeti, viresh.kumar, linux-kernel, mgorman,
	riel, wangyun

On 05/13/2013 10:06 PM, Peter Zijlstra wrote:
>> >  	 * See the mess around update_idle_cpu_load() / update_cpu_load_nohz().
> This just smells like you want a helper function... :-)
> 
> Also it doesn't apply anymore due to Paul Gortemaker moving some of this
> stuff about.
> 
> 

patch updated. Any comments are appreciated! :)
---
>From fe23d908a7f80dc5cca0abf9cefaf1004a67b331 Mon Sep 17 00:00:00 2001
From: Alex Shi <alex.shi@intel.com>
Date: Tue, 14 May 2013 10:11:12 +0800
Subject: [PATCH 6/8] sched: compute runnable load avg in cpu_load and
 cpu_avg_load_per_task

They are the base values in load balance, update them with rq runnable
load average, then the load balance will consider runnable load avg
naturally.

We also try to include the blocked_load_avg as cpu load in balancing,
but that cause kbuild performance drop 6% on every Intel machine, and
aim7/oltp drop on some of 4 CPU sockets machines.

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 kernel/sched/fair.c |  5 +++--
 kernel/sched/proc.c | 17 +++++++++++++++--
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a534d1f..d2d3e03 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2960,7 +2960,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 /* Used instead of source_load when we know the type == 0 */
 static unsigned long weighted_cpuload(const int cpu)
 {
-	return cpu_rq(cpu)->load.weight;
+	return cpu_rq(cpu)->cfs.runnable_load_avg;
 }
 
 /*
@@ -3005,9 +3005,10 @@ static unsigned long cpu_avg_load_per_task(int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
 	unsigned long nr_running = ACCESS_ONCE(rq->nr_running);
+	unsigned long load_avg = rq->cfs.runnable_load_avg;
 
 	if (nr_running)
-		return rq->load.weight / nr_running;
+		return load_avg / nr_running;
 
 	return 0;
 }
diff --git a/kernel/sched/proc.c b/kernel/sched/proc.c
index bb3a6a0..ce5cd48 100644
--- a/kernel/sched/proc.c
+++ b/kernel/sched/proc.c
@@ -501,6 +501,18 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
 	sched_avg_update(this_rq);
 }
 
+#ifdef CONFIG_SMP
+unsigned long get_rq_runnable_load(struct rq *rq)
+{
+	return rq->cfs.runnable_load_avg;
+}
+#else
+unsigned long get_rq_runnable_load(struct rq *rq)
+{
+	return rq->load.weight;
+}
+#endif
+
 #ifdef CONFIG_NO_HZ_COMMON
 /*
  * There is no sane way to deal with nohz on smp when using jiffies because the
@@ -522,7 +534,7 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
 void update_idle_cpu_load(struct rq *this_rq)
 {
 	unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
-	unsigned long load = this_rq->load.weight;
+	unsigned long load = get_rq_runnable_load(this_rq);
 	unsigned long pending_updates;
 
 	/*
@@ -568,11 +580,12 @@ void update_cpu_load_nohz(void)
  */
 void update_cpu_load_active(struct rq *this_rq)
 {
+	unsigned long load = get_rq_runnable_load(this_rq);
 	/*
 	 * See the mess around update_idle_cpu_load() / update_cpu_load_nohz().
 	 */
 	this_rq->last_load_update_tick = jiffies;
-	__update_cpu_load(this_rq, this_rq->load.weight, 1);
+	__update_cpu_load(this_rq, load, 1);
 
 	calc_load_account_active(this_rq);
 }
-- 
1.7.12

-- 
Thanks
    Alex

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [patch 0/8]: use runnable load avg in balance
  2013-05-10 15:17 [patch 0/8]: use runnable load avg in balance Alex Shi
                   ` (7 preceding siblings ...)
  2013-05-10 15:17 ` [patch v6 8/8] sched: remove blocked_load_avg in tg Alex Shi
@ 2013-05-14  8:07 ` Alex Shi
  2013-05-14  9:34 ` Paul Turner
  2013-05-16  7:29 ` Michael Wang
  10 siblings, 0 replies; 32+ messages in thread
From: Alex Shi @ 2013-05-14  8:07 UTC (permalink / raw)
  To: Alex Shi
  Cc: mingo, peterz, tglx, akpm, bp, pjt, namhyung, efault,
	morten.rasmussen, vincent.guittot, preeti, viresh.kumar,
	linux-kernel, mgorman, riel, wangyun

On 05/10/2013 11:17 PM, Alex Shi wrote:
> On SNB EP 4 sockets machine, the hackbench increased about 50%, and result
> become stable. on other machines, hackbench increased about 2~10%.
> oltp increased about 30% in NHM EX box.
> netperf loopback also increased on SNB EP 4 sockets box.
> no clear changes on other benchmarks.

Paul, Michael, Morten, any comments are appreciated from you! :)

-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch v6 8/8] sched: remove blocked_load_avg in tg
  2013-05-10 15:17 ` [patch v6 8/8] sched: remove blocked_load_avg in tg Alex Shi
@ 2013-05-14  8:31   ` Peter Zijlstra
  2013-05-14 11:35     ` Alex Shi
  2013-05-14  9:05   ` Paul Turner
  2013-05-29 17:00   ` Jason Low
  2 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2013-05-14  8:31 UTC (permalink / raw)
  To: Alex Shi
  Cc: mingo, tglx, akpm, bp, pjt, namhyung, efault, morten.rasmussen,
	vincent.guittot, preeti, viresh.kumar, linux-kernel, mgorman,
	riel, wangyun

On Fri, May 10, 2013 at 11:17:29PM +0800, Alex Shi wrote:
> blocked_load_avg sometime is too heavy and far bigger than runnable load
> avg. that make balance make wrong decision. So better don't consider it.

Would you happen to have an example around that illustrates this? 

Also, you've just changed the cgroup balancing -- did you run any tests on that?

> Signed-off-by: Alex Shi <alex.shi@intel.com>
> ---
>  kernel/sched/fair.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 91e60ac..75c200c 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1339,7 +1339,7 @@ static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
>  	struct task_group *tg = cfs_rq->tg;
>  	s64 tg_contrib;
>  
> -	tg_contrib = cfs_rq->runnable_load_avg + cfs_rq->blocked_load_avg;
> +	tg_contrib = cfs_rq->runnable_load_avg;
>  	tg_contrib -= cfs_rq->tg_load_contrib;
>  
>  	if (force_update || abs64(tg_contrib) > cfs_rq->tg_load_contrib / 8) {
> -- 
> 1.7.5.4
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch v6 8/8] sched: remove blocked_load_avg in tg
  2013-05-10 15:17 ` [patch v6 8/8] sched: remove blocked_load_avg in tg Alex Shi
  2013-05-14  8:31   ` Peter Zijlstra
@ 2013-05-14  9:05   ` Paul Turner
  2013-05-14 11:37     ` Alex Shi
  2013-05-29 17:00   ` Jason Low
  2 siblings, 1 reply; 32+ messages in thread
From: Paul Turner @ 2013-05-14  9:05 UTC (permalink / raw)
  To: Alex Shi
  Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Andrew Morton,
	Borislav Petkov, Namhyung Kim, Mike Galbraith, Morten Rasmussen,
	Vincent Guittot, Preeti U Murthy, Viresh Kumar, LKML, Mel Gorman,
	Rik van Riel, Michael Wang

On Fri, May 10, 2013 at 8:17 AM, Alex Shi <alex.shi@intel.com> wrote:
> blocked_load_avg sometime is too heavy and far bigger than runnable load
> avg. that make balance make wrong decision. So better don't consider it.
>
> Signed-off-by: Alex Shi <alex.shi@intel.com>
> ---
>  kernel/sched/fair.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 91e60ac..75c200c 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1339,7 +1339,7 @@ static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
>         struct task_group *tg = cfs_rq->tg;
>         s64 tg_contrib;
>
> -       tg_contrib = cfs_rq->runnable_load_avg + cfs_rq->blocked_load_avg;

Nack -- This is necessary for correct shares distribution.
T
> +       tg_contrib = cfs_rq->runnable_load_avg;
>         tg_contrib -= cfs_rq->tg_load_contrib;
>
>         if (force_update || abs64(tg_contrib) > cfs_rq->tg_load_contrib / 8) {
> --
> 1.7.5.4
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 0/8]: use runnable load avg in balance
  2013-05-10 15:17 [patch 0/8]: use runnable load avg in balance Alex Shi
                   ` (8 preceding siblings ...)
  2013-05-14  8:07 ` [patch 0/8]: use runnable load avg in balance Alex Shi
@ 2013-05-14  9:34 ` Paul Turner
  2013-05-14 14:35   ` Alex Shi
  2013-05-16  7:29 ` Michael Wang
  10 siblings, 1 reply; 32+ messages in thread
From: Paul Turner @ 2013-05-14  9:34 UTC (permalink / raw)
  To: Alex Shi
  Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Andrew Morton,
	Borislav Petkov, Namhyung Kim, Mike Galbraith, Morten Rasmussen,
	Vincent Guittot, Preeti U Murthy, Viresh Kumar, LKML, Mel Gorman,
	Rik van Riel, Michael Wang

On Fri, May 10, 2013 at 8:17 AM, Alex Shi <alex.shi@intel.com> wrote:
> This patchset bases on tip/sched/core.
>
> This version changed the runnable load avg value setting for new task
> in patch 3rd.
>
> We also tried to include blocked load avg in balance. but find many benchmark
> performance dropping. Guess the too bigger cpu load drive task to be waken
> on remote CPU, and cause wrong decision in periodic balance.

Fundamentally, I think we should be exploring this space.

While it's perhaps not surprising that it's not a drop-in, since the
current code was tuned always considering the instaneous balance, it
seems the likely path to increased balance stability.

Although, if the code is yielding substantive benefits in its current
form we should consider merging it in the interim.

> I retested on Intel core2, NHM, SNB, IVB, 2 and 4 sockets machines with
> benchmark kbuild, aim7, dbench, tbench, hackbench, oltp, and netperf loopback
> etc. The performance is better now.
>
> On SNB EP 4 sockets machine, the hackbench increased about 50%, and result
> become stable. on other machines, hackbench increased about 2~10%.
> oltp increased about 30% in NHM EX box.
> netperf loopback also increased on SNB EP 4 sockets box.
> no clear changes on other benchmarks.
>
> Michael Wang had tested previous version on pgbench on his box:
> https://lkml.org/lkml/2013/4/2/1022
>
> And Morten tested previous version too.
> http://comments.gmane.org/gmane.linux.kernel/1463371
>
> Thanks comments from Peter, Paul, Morten, Miacheal and Preeti.
> And more comments are appreciated!
>
> Regards
> Alex
>
> [patch v6 1/8] Revert "sched: Introduce temporary FAIR_GROUP_SCHED
> [patch v6 2/8] sched: move few runnable tg variables into CONFIG_SMP
> [patch v6 3/8] sched: set initial value of runnable avg for new
> [patch v6 4/8] sched: fix slept time double counting in enqueue
> [patch v6 5/8] sched: update cpu load after task_tick.
> [patch v6 6/8] sched: compute runnable load avg in cpu_load and
> [patch v6 7/8] sched: consider runnable load average in move_tasks
> [patch v6 8/8] sched: remove blocked_load_avg in tg

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch v6 8/8] sched: remove blocked_load_avg in tg
  2013-05-14  8:31   ` Peter Zijlstra
@ 2013-05-14 11:35     ` Alex Shi
  2013-05-16  9:23       ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Alex Shi @ 2013-05-14 11:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, akpm, bp, pjt, namhyung, efault, morten.rasmussen,
	vincent.guittot, preeti, viresh.kumar, linux-kernel, mgorman,
	riel, wangyun

On 05/14/2013 04:31 PM, Peter Zijlstra wrote:
> On Fri, May 10, 2013 at 11:17:29PM +0800, Alex Shi wrote:
>> > blocked_load_avg sometime is too heavy and far bigger than runnable load
>> > avg. that make balance make wrong decision. So better don't consider it.
> Would you happen to have an example around that illustrates this? 

Sorry, No.
> 
> Also, you've just changed the cgroup balancing -- did you run any tests on that?
> 

I tested all benchmarks on cover letter maintained, aim7, kbuild etc.
with autogroup enabled. There is no clear performance change.
But since the machine just run benchmark without anyother load, that
doesn't enough.

-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch v6 8/8] sched: remove blocked_load_avg in tg
  2013-05-14  9:05   ` Paul Turner
@ 2013-05-14 11:37     ` Alex Shi
  0 siblings, 0 replies; 32+ messages in thread
From: Alex Shi @ 2013-05-14 11:37 UTC (permalink / raw)
  To: Paul Turner
  Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Andrew Morton,
	Borislav Petkov, Namhyung Kim, Mike Galbraith, Morten Rasmussen,
	Vincent Guittot, Preeti U Murthy, Viresh Kumar, LKML, Mel Gorman,
	Rik van Riel, Michael Wang

On 05/14/2013 05:05 PM, Paul Turner wrote:
>> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> > index 91e60ac..75c200c 100644
>> > --- a/kernel/sched/fair.c
>> > +++ b/kernel/sched/fair.c
>> > @@ -1339,7 +1339,7 @@ static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
>> >         struct task_group *tg = cfs_rq->tg;
>> >         s64 tg_contrib;
>> >
>> > -       tg_contrib = cfs_rq->runnable_load_avg + cfs_rq->blocked_load_avg;
> Nack -- This is necessary for correct shares distribution.

I was going to set this patch as RFC. :)

BTW, did you do some test of this part?

-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 0/8]: use runnable load avg in balance
  2013-05-14  9:34 ` Paul Turner
@ 2013-05-14 14:35   ` Alex Shi
  0 siblings, 0 replies; 32+ messages in thread
From: Alex Shi @ 2013-05-14 14:35 UTC (permalink / raw)
  To: Paul Turner
  Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Andrew Morton,
	Borislav Petkov, Namhyung Kim, Mike Galbraith, Morten Rasmussen,
	Vincent Guittot, Preeti U Murthy, Viresh Kumar, LKML, Mel Gorman,
	Rik van Riel, Michael Wang

On 05/14/2013 05:34 PM, Paul Turner wrote:
>> >
>> > We also tried to include blocked load avg in balance. but find many benchmark
>> > performance dropping. Guess the too bigger cpu load drive task to be waken
>> > on remote CPU, and cause wrong decision in periodic balance.
> Fundamentally, I think we should be exploring this space.

I thought something of this. but can not figure out a direction or stand
by some theories.
> 
> While it's perhaps not surprising that it's not a drop-in, since the
> current code was tuned always considering the instaneous balance, it
> seems the likely path to increased balance stability.
> 
> Although, if the code is yielding substantive benefits in its current
> form we should consider merging it in the interim.

Sorry, I can not follow you here.
> 



-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch v6 6/8] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task
  2013-05-14  7:27     ` Alex Shi
@ 2013-05-16  5:49       ` Michael Wang
  2013-05-16  6:58         ` Alex Shi
  0 siblings, 1 reply; 32+ messages in thread
From: Michael Wang @ 2013-05-16  5:49 UTC (permalink / raw)
  To: Alex Shi
  Cc: Peter Zijlstra, mingo, tglx, akpm, bp, pjt, namhyung, efault,
	morten.rasmussen, vincent.guittot, preeti, viresh.kumar,
	linux-kernel, mgorman, riel

Hi, Alex

On 05/14/2013 03:27 PM, Alex Shi wrote:
[snip]
>  }
> diff --git a/kernel/sched/proc.c b/kernel/sched/proc.c

This patch seems to be based on 3.10-rc1, while below one

[patch v6 3/8] sched: set initial value of runnable avg for new forked task

is conflict with 3.10-rc1... I think it may need some rebase?

Regards,
Michael Wang


> index bb3a6a0..ce5cd48 100644
> --- a/kernel/sched/proc.c
> +++ b/kernel/sched/proc.c
> @@ -501,6 +501,18 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
>  	sched_avg_update(this_rq);
>  }
> 
> +#ifdef CONFIG_SMP
> +unsigned long get_rq_runnable_load(struct rq *rq)
> +{
> +	return rq->cfs.runnable_load_avg;
> +}
> +#else
> +unsigned long get_rq_runnable_load(struct rq *rq)
> +{
> +	return rq->load.weight;
> +}
> +#endif
> +
>  #ifdef CONFIG_NO_HZ_COMMON
>  /*
>   * There is no sane way to deal with nohz on smp when using jiffies because the
> @@ -522,7 +534,7 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
>  void update_idle_cpu_load(struct rq *this_rq)
>  {
>  	unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
> -	unsigned long load = this_rq->load.weight;
> +	unsigned long load = get_rq_runnable_load(this_rq);
>  	unsigned long pending_updates;
> 
>  	/*
> @@ -568,11 +580,12 @@ void update_cpu_load_nohz(void)
>   */
>  void update_cpu_load_active(struct rq *this_rq)
>  {
> +	unsigned long load = get_rq_runnable_load(this_rq);
>  	/*
>  	 * See the mess around update_idle_cpu_load() / update_cpu_load_nohz().
>  	 */
>  	this_rq->last_load_update_tick = jiffies;
> -	__update_cpu_load(this_rq, this_rq->load.weight, 1);
> +	__update_cpu_load(this_rq, load, 1);
> 
>  	calc_load_account_active(this_rq);
>  }
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch v6 3/8] sched: set initial value of runnable avg for new forked task
  2013-05-10 15:17 ` [patch v6 3/8] sched: set initial value of runnable avg for new forked task Alex Shi
@ 2013-05-16  6:28   ` Alex Shi
  0 siblings, 0 replies; 32+ messages in thread
From: Alex Shi @ 2013-05-16  6:28 UTC (permalink / raw)
  To: Alex Shi
  Cc: mingo, peterz, tglx, akpm, bp, pjt, namhyung, efault,
	morten.rasmussen, vincent.guittot, preeti, viresh.kumar,
	linux-kernel, mgorman, riel, wangyun

On 05/10/2013 11:17 PM, Alex Shi wrote:
> We need initialize the se.avg.{decay_count, load_avg_contrib} for a
> new forked task.
> Otherwise random values of above variables cause mess when do new task
> enqueue:
>     enqueue_task_fair
>         enqueue_entity
>             enqueue_entity_load_avg
> 
> and make forking balancing imbalance since incorrect load_avg_contrib.
> 
> Further more, Morten Rasmussen notice some tasks were not launched at
> once after created. So Paul and Peter suggest giving a start value for
> new task runnable avg time same as sched_slice().
> 

updated, it fits latest linus and tip/sched/core tree.

>From 30ba6d80b256c17861e2c9128fdf41cc048af05a Mon Sep 17 00:00:00 2001
From: Alex Shi <alex.shi@intel.com>
Date: Tue, 14 May 2013 09:41:09 +0800
Subject: [PATCH 3/8] sched: set initial value of runnable avg for new forked
 task

We need initialize the se.avg.{decay_count, load_avg_contrib} for a
new forked task.
Otherwise random values of above variables cause mess when do new task
enqueue:
    enqueue_task_fair
        enqueue_entity
            enqueue_entity_load_avg

and make forking balancing imbalance since incorrect load_avg_contrib.

Further more, Morten Rasmussen notice some tasks were not launched at
once after created. So Paul and Peter suggest giving a start value for
new task runnable avg time same as sched_slice().

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 kernel/sched/core.c  |  6 ++----
 kernel/sched/fair.c  | 23 +++++++++++++++++++++++
 kernel/sched/sched.h |  2 ++
 3 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ee1cbc6..920d346 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1596,10 +1596,6 @@ static void __sched_fork(struct task_struct *p)
 	p->se.vruntime			= 0;
 	INIT_LIST_HEAD(&p->se.group_node);
 
-#ifdef CONFIG_SMP
-	p->se.avg.runnable_avg_period = 0;
-	p->se.avg.runnable_avg_sum = 0;
-#endif
 #ifdef CONFIG_SCHEDSTATS
 	memset(&p->se.statistics, 0, sizeof(p->se.statistics));
 #endif
@@ -1743,6 +1739,8 @@ void wake_up_new_task(struct task_struct *p)
 	set_task_cpu(p, select_task_rq(p, SD_BALANCE_FORK, 0));
 #endif
 
+	/* Give new task start runnable values */
+	set_task_runnable_avg(p);
 	rq = __task_rq_lock(p);
 	activate_task(rq, p, 0);
 	p->on_rq = 1;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e8f3c8f..add32a6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -680,6 +680,26 @@ static u64 sched_vslice(struct cfs_rq *cfs_rq, struct sched_entity *se)
 	return calc_delta_fair(sched_slice(cfs_rq, se), se);
 }
 
+#ifdef CONFIG_SMP
+static inline void __update_task_entity_contrib(struct sched_entity *se);
+
+/* Give new task start runnable values to heavy its load in infant time */
+void set_task_runnable_avg(struct task_struct *p)
+{
+	u32 slice;
+
+	p->se.avg.decay_count = 0;
+	slice = sched_slice(task_cfs_rq(p), &p->se) >> 10;
+	p->se.avg.runnable_avg_sum = slice;
+	p->se.avg.runnable_avg_period = slice;
+	__update_task_entity_contrib(&p->se);
+}
+#else
+void set_task_runnable_avg(struct task_struct *p)
+{
+}
+#endif
+
 /*
  * Update the current task's runtime statistics. Skip current tasks that
  * are not in our scheduling class.
@@ -1527,6 +1547,9 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq,
 	 * We track migrations using entity decay_count <= 0, on a wake-up
 	 * migration we use a negative decay count to track the remote decays
 	 * accumulated while sleeping.
+	 *
+	 * When enqueue a new forked task, the se->avg.decay_count == 0, so
+	 * we bypass update_entity_load_avg(), use avg.load_avg_contrib direct.
 	 */
 	if (unlikely(se->avg.decay_count <= 0)) {
 		se->avg.last_runnable_update = rq_of(cfs_rq)->clock_task;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 0272fa4..564cecd 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1049,6 +1049,8 @@ extern void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 runtime
 
 extern void update_idle_cpu_load(struct rq *this_rq);
 
+extern void set_task_runnable_avg(struct task_struct *p);
+
 #ifdef CONFIG_PARAVIRT
 static inline u64 steal_ticks(u64 steal)
 {
-- 
1.7.12


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [patch v6 6/8] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task
  2013-05-16  5:49       ` Michael Wang
@ 2013-05-16  6:58         ` Alex Shi
  0 siblings, 0 replies; 32+ messages in thread
From: Alex Shi @ 2013-05-16  6:58 UTC (permalink / raw)
  To: Michael Wang
  Cc: Peter Zijlstra, mingo, tglx, akpm, bp, pjt, namhyung, efault,
	morten.rasmussen, vincent.guittot, preeti, viresh.kumar,
	linux-kernel, mgorman, riel

On 05/16/2013 01:49 PM, Michael Wang wrote:
> On 05/14/2013 03:27 PM, Alex Shi wrote:
> [snip]
>> >  }
>> > diff --git a/kernel/sched/proc.c b/kernel/sched/proc.c
> This patch seems to be based on 3.10-rc1, while below one
> 
> [patch v6 3/8] sched: set initial value of runnable avg for new forked task
> 
> is conflict with 3.10-rc1... I think it may need some rebase?

With the updated 3rd patch. all patchset works on latest tip/sched/core.
Thanks for testing, Michael!

-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 0/8]: use runnable load avg in balance
  2013-05-10 15:17 [patch 0/8]: use runnable load avg in balance Alex Shi
                   ` (9 preceding siblings ...)
  2013-05-14  9:34 ` Paul Turner
@ 2013-05-16  7:29 ` Michael Wang
  2013-05-16  7:35   ` Alex Shi
  2013-05-28 13:31   ` Alex Shi
  10 siblings, 2 replies; 32+ messages in thread
From: Michael Wang @ 2013-05-16  7:29 UTC (permalink / raw)
  To: Alex Shi
  Cc: mingo, peterz, tglx, akpm, bp, pjt, namhyung, efault,
	morten.rasmussen, vincent.guittot, preeti, viresh.kumar,
	linux-kernel, mgorman, riel

On 05/10/2013 11:17 PM, Alex Shi wrote:
> This patchset bases on tip/sched/core.
> 
> This version changed the runnable load avg value setting for new task
> in patch 3rd.
> 
> We also tried to include blocked load avg in balance. but find many benchmark
> performance dropping. Guess the too bigger cpu load drive task to be waken
> on remote CPU, and cause wrong decision in periodic balance.
> 
> I retested on Intel core2, NHM, SNB, IVB, 2 and 4 sockets machines with
> benchmark kbuild, aim7, dbench, tbench, hackbench, oltp, and netperf loopback
> etc. The performance is better now. 
> 
> On SNB EP 4 sockets machine, the hackbench increased about 50%, and result
> become stable. on other machines, hackbench increased about 2~10%.
> oltp increased about 30% in NHM EX box.
> netperf loopback also increased on SNB EP 4 sockets box.
> no clear changes on other benchmarks.
> 
> Michael Wang had tested previous version on pgbench on his box:
> https://lkml.org/lkml/2013/4/2/1022

Tested the latest patch set (new 3/8 and 6/8) with pgbench, tip
3.10.0-rc1 and 12 cpu X86 box, works well and still benefit ;-)

Regards,
Michael Wang

> 
> And Morten tested previous version too.
> http://comments.gmane.org/gmane.linux.kernel/1463371
> 
> Thanks comments from Peter, Paul, Morten, Miacheal and Preeti.
> And more comments are appreciated!
> 
> Regards
> Alex
> 
> [patch v6 1/8] Revert "sched: Introduce temporary FAIR_GROUP_SCHED
> [patch v6 2/8] sched: move few runnable tg variables into CONFIG_SMP
> [patch v6 3/8] sched: set initial value of runnable avg for new
> [patch v6 4/8] sched: fix slept time double counting in enqueue
> [patch v6 5/8] sched: update cpu load after task_tick.
> [patch v6 6/8] sched: compute runnable load avg in cpu_load and
> [patch v6 7/8] sched: consider runnable load average in move_tasks
> [patch v6 8/8] sched: remove blocked_load_avg in tg
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 0/8]: use runnable load avg in balance
  2013-05-16  7:29 ` Michael Wang
@ 2013-05-16  7:35   ` Alex Shi
  2013-05-28 13:31   ` Alex Shi
  1 sibling, 0 replies; 32+ messages in thread
From: Alex Shi @ 2013-05-16  7:35 UTC (permalink / raw)
  To: Michael Wang
  Cc: mingo, peterz, tglx, akpm, bp, pjt, namhyung, efault,
	morten.rasmussen, vincent.guittot, preeti, viresh.kumar,
	linux-kernel, mgorman, riel

On 05/16/2013 03:29 PM, Michael Wang wrote:
>> > Michael Wang had tested previous version on pgbench on his box:
>> > https://lkml.org/lkml/2013/4/2/1022
> Tested the latest patch set (new 3/8 and 6/8) with pgbench, tip
> 3.10.0-rc1 and 12 cpu X86 box, works well and still benefit ;-)


Thanks Michael! :)

-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch v6 8/8] sched: remove blocked_load_avg in tg
  2013-05-14 11:35     ` Alex Shi
@ 2013-05-16  9:23       ` Peter Zijlstra
  2013-05-23  7:32         ` Changlong Xie
  2013-05-28 13:36         ` Alex Shi
  0 siblings, 2 replies; 32+ messages in thread
From: Peter Zijlstra @ 2013-05-16  9:23 UTC (permalink / raw)
  To: Alex Shi
  Cc: mingo, tglx, akpm, bp, pjt, namhyung, efault, morten.rasmussen,
	vincent.guittot, preeti, viresh.kumar, linux-kernel, mgorman,
	riel, wangyun

On Tue, May 14, 2013 at 07:35:25PM +0800, Alex Shi wrote:

> I tested all benchmarks on cover letter maintained, aim7, kbuild etc.
> with autogroup enabled. There is no clear performance change.
> But since the machine just run benchmark without anyother load, that
> doesn't enough.

Back when we started with smp-fair cgroup muck someone wrote a test for it. I
_think_ it ended up in the LTP test-suite.

Now I don't know if that's up-to-date enough to catch some of the cases we've
recently fixed (as in the past few years) so it might want to be updated.

Paul, do you guys at Google have some nice test-cases for all this?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch v6 8/8] sched: remove blocked_load_avg in tg
  2013-05-16  9:23       ` Peter Zijlstra
@ 2013-05-23  7:32         ` Changlong Xie
  2013-05-23  8:19           ` Alex Shi
  2013-05-28 13:36         ` Alex Shi
  1 sibling, 1 reply; 32+ messages in thread
From: Changlong Xie @ 2013-05-23  7:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alex Shi, mingo, tglx, akpm, bp, pjt, namhyung, efault,
	morten.rasmussen, vincent.guittot, preeti, viresh.kumar,
	Linux Kernel Mailing List, mgorman, riel, wangyun, Changlong Xie

2013/5/16 Peter Zijlstra <peterz@infradead.org>:
> On Tue, May 14, 2013 at 07:35:25PM +0800, Alex Shi wrote:
>
>> I tested all benchmarks on cover letter maintained, aim7, kbuild etc.
>> with autogroup enabled. There is no clear performance change.
>> But since the machine just run benchmark without anyother load, that
>> doesn't enough.
>
> Back when we started with smp-fair cgroup muck someone wrote a test for it. I
> _think_ it ended up in the LTP test-suite.
>

Hi Peter

I just download the lastest ltp from
http://sourceforge.net/projects/ltp/files/LTP%20Source/ltp-20130503/
and do cgroup benchmark tests on our SB-EP machine with 2S*8CORE*2SMT,
64G memory.

Following is my testing procedures:
1. tar -xvf ltp-full-20130503.tar
2. cd ltp-full-20130503
3. ./configure prefix=/mnt/ltp && make -j32 && sudo make install
4. cd /mnt/ltp

# create general testcase named cgroup_fj
5. echo -e "cgroup_fj  run_cgroup_test_fj.sh" > runtest/cgroup

# we only test cpuset/cpu/cpuacct cgroup benchmark cases, here is my
cgroup_fj_testcases.sh
6. [changlox@lkp-sb03 bin]$ cat testcases/bin/cgroup_fj_testcases.sh
stress 2 2 1 1 1
stress 4 2 1 1 1
stress 5 2 1 1 1
stress 2 1 1 1 2
stress 2 1 1 2 1
stress 2 1 1 2 2
stress 2 1 1 2 3
stress 2 1 2 1 1
stress 2 1 2 1 2
stress 2 1 2 1 3
stress 2 1 2 2 1
stress 2 1 2 2 2
stress 4 1 1 1 2
stress 4 1 2 1 1
stress 4 1 2 1 2
stress 4 1 2 1 3
stress 5 1 1 1 2
stress 5 1 1 2 1
stress 5 1 1 2 2
stress 5 1 1 2 3
stress 5 1 2 1 1
stress 5 1 2 1 2
stress 5 1 2 1 3
stress 5 1 2 2 1
stress 5 1 2 2 2

# run test
7. sudo ./runltp -p -l /tmp/cgroup.results.log  -d /tmp -o
/tmp/cgroup.log -f cgroup

my test results:
3.10-rc1          patch1-7         patch1-8
duration=764   duration=754   duration=750
duration=764   duration=754   duration=751
duration=763   duration=755   duration=751

duration means the seconds of testing cost.

Tested-by: Changlong Xie <changlongx.xie@intel.com>

> Now I don't know if that's up-to-date enough to catch some of the cases we've
> recently fixed (as in the past few years) so it might want to be updated.
>
> Paul, do you guys at Google have some nice test-cases for all this?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



--
Best regards
Changlox

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch v6 8/8] sched: remove blocked_load_avg in tg
  2013-05-23  7:32         ` Changlong Xie
@ 2013-05-23  8:19           ` Alex Shi
  0 siblings, 0 replies; 32+ messages in thread
From: Alex Shi @ 2013-05-23  8:19 UTC (permalink / raw)
  To: Changlong Xie
  Cc: Peter Zijlstra, mingo, tglx, akpm, bp, pjt, namhyung, efault,
	morten.rasmussen, vincent.guittot, preeti, viresh.kumar,
	Linux Kernel Mailing List, mgorman, riel, wangyun, Changlong Xie

On 05/23/2013 03:32 PM, Changlong Xie wrote:
> 2013/5/16 Peter Zijlstra <peterz@infradead.org>:
>> On Tue, May 14, 2013 at 07:35:25PM +0800, Alex Shi wrote:
>>
>>> I tested all benchmarks on cover letter maintained, aim7, kbuild etc.
>>> with autogroup enabled. There is no clear performance change.
>>> But since the machine just run benchmark without anyother load, that
>>> doesn't enough.
>>
>> Back when we started with smp-fair cgroup muck someone wrote a test for it. I
>> _think_ it ended up in the LTP test-suite.
>>
> 
> Hi Peter
> 

> my test results:
> 3.10-rc1          patch1-7         patch1-8
> duration=764   duration=754   duration=750
> duration=764   duration=754   duration=751
> duration=763   duration=755   duration=751
> 
> duration means the seconds of testing cost.
> 
> Tested-by: Changlong Xie <changlongx.xie@intel.com>

Seems the 8th patch is helpful on cgroup. Thanks Changlong!

-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 0/8]: use runnable load avg in balance
  2013-05-16  7:29 ` Michael Wang
  2013-05-16  7:35   ` Alex Shi
@ 2013-05-28 13:31   ` Alex Shi
  2013-05-29 13:28     ` Alex Shi
  1 sibling, 1 reply; 32+ messages in thread
From: Alex Shi @ 2013-05-28 13:31 UTC (permalink / raw)
  To: Michael Wang, mingo, peterz, pjt
  Cc: tglx, akpm, bp, namhyung, efault, morten.rasmussen,
	vincent.guittot, preeti, viresh.kumar, linux-kernel, mgorman,
	riel

On 05/16/2013 03:29 PM, Michael Wang wrote:
>> > This version changed the runnable load avg value setting for new task
>> > in patch 3rd.
>> > 
>> > We also tried to include blocked load avg in balance. but find many benchmark
>> > performance dropping. Guess the too bigger cpu load drive task to be waken
>> > on remote CPU, and cause wrong decision in periodic balance.
>> > 
>> > I retested on Intel core2, NHM, SNB, IVB, 2 and 4 sockets machines with
>> > benchmark kbuild, aim7, dbench, tbench, hackbench, oltp, and netperf loopback
>> > etc. The performance is better now. 
>> > 
>> > On SNB EP 4 sockets machine, the hackbench increased about 50%, and result
>> > become stable. on other machines, hackbench increased about 2~10%.
>> > oltp increased about 30% in NHM EX box.
>> > netperf loopback also increased on SNB EP 4 sockets box.
>> > no clear changes on other benchmarks.
>> > 
>> > Michael Wang had tested previous version on pgbench on his box:
>> > https://lkml.org/lkml/2013/4/2/1022
> Tested the latest patch set (new 3/8 and 6/8) with pgbench, tip
> 3.10.0-rc1 and 12 cpu X86 box, works well and still benefit ;-)


Paul:

Would you like to give more comments/ideas of this patch set?

-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch v6 8/8] sched: remove blocked_load_avg in tg
  2013-05-16  9:23       ` Peter Zijlstra
  2013-05-23  7:32         ` Changlong Xie
@ 2013-05-28 13:36         ` Alex Shi
  1 sibling, 0 replies; 32+ messages in thread
From: Alex Shi @ 2013-05-28 13:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, akpm, bp, pjt, namhyung, efault, morten.rasmussen,
	vincent.guittot, preeti, viresh.kumar, linux-kernel, mgorman,
	riel, wangyun

On 05/16/2013 05:23 PM, Peter Zijlstra wrote:
> On Tue, May 14, 2013 at 07:35:25PM +0800, Alex Shi wrote:
> 
>> > I tested all benchmarks on cover letter maintained, aim7, kbuild etc.
>> > with autogroup enabled. There is no clear performance change.
>> > But since the machine just run benchmark without anyother load, that
>> > doesn't enough.
> Back when we started with smp-fair cgroup muck someone wrote a test for it. I
> _think_ it ended up in the LTP test-suite.

Peter:

copy changlong's testing result again, the ltp cgroup stress testing
show this patchset can reduce the stress testing time:

# run test
7. sudo ./runltp -p -l /tmp/cgroup.results.log  -d /tmp -o
/tmp/cgroup.log -f cgroup

my test results:
3.10-rc1          patch1-7         patch1-8
duration=764   duration=754   duration=750
duration=764   duration=754   duration=751
duration=763   duration=755   duration=751

duration means the seconds of testing cost.

Tested-by: Changlong Xie <changlongx.xie@intel.com>

Paul, would you like to give some comments?

> 
> Now I don't know if that's up-to-date enough to catch some of the cases we've
> recently fixed (as in the past few years) so it might want to be updated.
> 
> Paul, do you guys at Google have some nice test-cases for all this?



-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 0/8]: use runnable load avg in balance
  2013-05-28 13:31   ` Alex Shi
@ 2013-05-29 13:28     ` Alex Shi
  0 siblings, 0 replies; 32+ messages in thread
From: Alex Shi @ 2013-05-29 13:28 UTC (permalink / raw)
  To: Michael Wang, mingo, peterz, pjt
  Cc: tglx, akpm, bp, namhyung, efault, morten.rasmussen,
	vincent.guittot, preeti, viresh.kumar, linux-kernel, mgorman,
	riel

On 05/28/2013 09:31 PM, Alex Shi wrote:
>> > Tested the latest patch set (new 3/8 and 6/8) with pgbench, tip
>> > 3.10.0-rc1 and 12 cpu X86 box, works well and still benefit ;-)
> 
> Paul:
> 
> Would you like to give more comments/ideas of this patch set?

Peter,

If no more idea of the blocked_load_avg usages, could we have the
patchset clobbered in tip tree? Any way we get better performance on
hackbench/pgbench/cgroup stress etc benchmarks.

-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch v6 8/8] sched: remove blocked_load_avg in tg
  2013-05-10 15:17 ` [patch v6 8/8] sched: remove blocked_load_avg in tg Alex Shi
  2013-05-14  8:31   ` Peter Zijlstra
  2013-05-14  9:05   ` Paul Turner
@ 2013-05-29 17:00   ` Jason Low
  2013-05-30  0:44     ` Alex Shi
  2 siblings, 1 reply; 32+ messages in thread
From: Jason Low @ 2013-05-29 17:00 UTC (permalink / raw)
  To: Alex Shi, jason.low2
  Cc: mingo, peterz, tglx, akpm, bp, pjt, namhyung, efault,
	morten.rasmussen, vincent.guittot, preeti, viresh.kumar,
	linux-kernel, mgorman, riel, wangyun

On Fri, 2013-05-10 at 23:17 +0800, Alex Shi wrote:
> blocked_load_avg sometime is too heavy and far bigger than runnable load
> avg. that make balance make wrong decision. So better don't consider it.
> 
> Signed-off-by: Alex Shi <alex.shi@intel.com>

Hi Alex,

I have been testing these patches with a Java server workload on an 8
socket (80 core) box with Hyperthreading enabled, and I have been seeing
good results with these patches.

When using a 3.10-rc2 tip kernel with patches 1-8, there was about a 40%
improvement in performance of the workload compared to when using the
vanilla 3.10-rc2 tip kernel with no patches. When using a 3.10-rc2 tip
kernel with just patches 1-7, the performance improvement of the
workload over the vanilla 3.10-rc2 tip kernel was about 25%.

Tested-by: Jason Low <jason.low2@hp.com>

Thanks,
Jason


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch v6 8/8] sched: remove blocked_load_avg in tg
  2013-05-29 17:00   ` Jason Low
@ 2013-05-30  0:44     ` Alex Shi
  0 siblings, 0 replies; 32+ messages in thread
From: Alex Shi @ 2013-05-30  0:44 UTC (permalink / raw)
  To: Jason Low, peterz
  Cc: mingo, tglx, akpm, bp, pjt, namhyung, efault, morten.rasmussen,
	vincent.guittot, preeti, viresh.kumar, linux-kernel, mgorman,
	riel, wangyun

On 05/30/2013 01:00 AM, Jason Low wrote:
> On Fri, 2013-05-10 at 23:17 +0800, Alex Shi wrote:
>> blocked_load_avg sometime is too heavy and far bigger than runnable load
>> avg. that make balance make wrong decision. So better don't consider it.
>>
>> Signed-off-by: Alex Shi <alex.shi@intel.com>
> 
> Hi Alex,
> 
> I have been testing these patches with a Java server workload on an 8
> socket (80 core) box with Hyperthreading enabled, and I have been seeing
> good results with these patches.
> 
> When using a 3.10-rc2 tip kernel with patches 1-8, there was about a 40%
> improvement in performance of the workload compared to when using the
> vanilla 3.10-rc2 tip kernel with no patches. When using a 3.10-rc2 tip
> kernel with just patches 1-7, the performance improvement of the
> workload over the vanilla 3.10-rc2 tip kernel was about 25%.
> 
> Tested-by: Jason Low <jason.low2@hp.com>
> 

That is impressive!

Thanks a lot for your testing! Just curious, what the benchmark are you
using? :)

> Thanks,
> Jason
> 


-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2013-05-30  0:44 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-10 15:17 [patch 0/8]: use runnable load avg in balance Alex Shi
2013-05-10 15:17 ` [patch v6 1/8] Revert "sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking" Alex Shi
2013-05-10 15:17 ` [patch v6 2/8] sched: move few runnable tg variables into CONFIG_SMP Alex Shi
2013-05-10 15:17 ` [patch v6 3/8] sched: set initial value of runnable avg for new forked task Alex Shi
2013-05-16  6:28   ` Alex Shi
2013-05-10 15:17 ` [patch v6 4/8] sched: fix slept time double counting in enqueue entity Alex Shi
2013-05-10 15:17 ` [patch v6 5/8] sched: update cpu load after task_tick Alex Shi
2013-05-10 15:17 ` [patch v6 6/8] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task Alex Shi
2013-05-13 14:06   ` Peter Zijlstra
2013-05-14  0:51     ` Alex Shi
2013-05-14  7:27     ` Alex Shi
2013-05-16  5:49       ` Michael Wang
2013-05-16  6:58         ` Alex Shi
2013-05-10 15:17 ` [patch v6 7/8] sched: consider runnable load average in move_tasks Alex Shi
2013-05-10 15:17 ` [patch v6 8/8] sched: remove blocked_load_avg in tg Alex Shi
2013-05-14  8:31   ` Peter Zijlstra
2013-05-14 11:35     ` Alex Shi
2013-05-16  9:23       ` Peter Zijlstra
2013-05-23  7:32         ` Changlong Xie
2013-05-23  8:19           ` Alex Shi
2013-05-28 13:36         ` Alex Shi
2013-05-14  9:05   ` Paul Turner
2013-05-14 11:37     ` Alex Shi
2013-05-29 17:00   ` Jason Low
2013-05-30  0:44     ` Alex Shi
2013-05-14  8:07 ` [patch 0/8]: use runnable load avg in balance Alex Shi
2013-05-14  9:34 ` Paul Turner
2013-05-14 14:35   ` Alex Shi
2013-05-16  7:29 ` Michael Wang
2013-05-16  7:35   ` Alex Shi
2013-05-28 13:31   ` Alex Shi
2013-05-29 13:28     ` Alex Shi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.