All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] sched,numa: improve NUMA convergence times
@ 2014-06-23 15:41 riel
  2014-06-23 15:41 ` [PATCH 1/7] sched,numa: use group's max nid as task's preferred nid riel
                   ` (7 more replies)
  0 siblings, 8 replies; 25+ messages in thread
From: riel @ 2014-06-23 15:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: chegu_vinod, peterz, mgorman, mingo

Running things like the below pointed out a number of situations in
which the current NUMA code has extremely slow task convergence, and
even some situations in which tasks do not converge at all.

 ###
 # 160 tasks will execute (on 4 nodes, 80 CPUs):
 #         -1x     0MB global  shared mem operations
 #         -1x  1000MB process shared mem operations
 #         -1x     0MB thread  local  mem operations
 ###

 ###
 #
 #    0.0%  [0.2 mins]  0/0   1/1  36/2   0/0  [36/3 ] l:  0-0   (  0) {0-2}
 #    0.0%  [0.3 mins] 43/3  37/2  39/2  41/3  [ 6/10] l:  0-1   (  1) {1-2}
 #    0.0%  [0.4 mins] 42/3  38/2  40/2  40/2  [ 4/9 ] l:  1-2   (  1) [50.0%] {1-2}
 #    0.0%  [0.6 mins] 41/3  39/2  40/2  40/2  [ 2/9 ] l:  2-4   (  2) [50.0%] {1-2}
 #    0.0%  [0.7 mins] 40/2  40/2  40/2  40/2  [ 0/8 ] l:  3-5   (  2) [40.0%] (  41.8s converged)

In this example, convergence requires that a task be moved from node
0 to node 1. Before this patch series, the load balancer would have
to perform that task move, because the NUMA code would only consider
a task swap when all the CPUs on a target node are busy...

Various related items have been fixed, and task convergence times are
way down now with various numbers of proceses and threads when doing
"perf bench numa mem -m -0 -P 1000 -p X -t Y" runs.

Before the patch series, convergence sometimes did not happen at all,
or randomly got delayed by many minutes.

With the patch series, convergence generally happens in 10-20 seconds,
with a few spikes up to 30-40 seconds, and very rare instances where
things take a few minutes.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/7] sched,numa: use group's max nid as task's preferred nid
  2014-06-23 15:41 [PATCH 0/7] sched,numa: improve NUMA convergence times riel
@ 2014-06-23 15:41 ` riel
  2014-06-25 10:31   ` Mel Gorman
  2014-07-05 10:44   ` [tip:sched/core] sched/numa: Use group's max nid as task' s " tip-bot for Rik van Riel
  2014-06-23 15:41 ` [PATCH 3/7] sched,numa: use effective_load to balance NUMA loads riel
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 25+ messages in thread
From: riel @ 2014-06-23 15:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: chegu_vinod, peterz, mgorman, mingo

From: Rik van Riel <riel@redhat.com>

>From task_numa_placement, always try to consolidate the tasks
in a group on the group's top nid.

In case this task is part of a group that is interleaved over
multiple nodes, task_numa_migrate will set the task's preferred
nid to the best node it could find for the task, so this patch
will cause at most one run through task_numa_migrate.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 kernel/sched/fair.c | 17 +----------------
 1 file changed, 1 insertion(+), 16 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1f9c457..4a58e79 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1594,23 +1594,8 @@ static void task_numa_placement(struct task_struct *p)
 
 	if (p->numa_group) {
 		update_numa_active_node_mask(p->numa_group);
-		/*
-		 * If the preferred task and group nids are different,
-		 * iterate over the nodes again to find the best place.
-		 */
-		if (max_nid != max_group_nid) {
-			unsigned long weight, max_weight = 0;
-
-			for_each_online_node(nid) {
-				weight = task_weight(p, nid) + group_weight(p, nid);
-				if (weight > max_weight) {
-					max_weight = weight;
-					max_nid = nid;
-				}
-			}
-		}
-
 		spin_unlock_irq(group_lock);
+		max_nid = max_group_nid;
 	}
 
 	if (max_faults) {
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/7] sched,numa: use effective_load to balance NUMA loads
  2014-06-23 15:41 [PATCH 0/7] sched,numa: improve NUMA convergence times riel
  2014-06-23 15:41 ` [PATCH 1/7] sched,numa: use group's max nid as task's preferred nid riel
@ 2014-06-23 15:41 ` riel
  2014-06-23 15:41 ` [PATCH 4/7] sched,numa: simplify task_numa_compare riel
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: riel @ 2014-06-23 15:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: chegu_vinod, peterz, mgorman, mingo

From: Rik van Riel <riel@redhat.com>

When CONFIG_FAIR_GROUP_SCHED is enabled, the load that a task places
on a CPU is determined by the group the task is in. This is conveniently
calculated for us by effective_load(), which task_numa_compare should
use.

The active groups on the source and destination CPU can be different,
so the calculation needs to be done separately for each CPU.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 kernel/sched/fair.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 612c963..41b75a6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1151,6 +1151,7 @@ static void task_numa_compare(struct task_numa_env *env,
 	struct rq *src_rq = cpu_rq(env->src_cpu);
 	struct rq *dst_rq = cpu_rq(env->dst_cpu);
 	struct task_struct *cur;
+	struct task_group *tg;
 	long src_load, dst_load;
 	long load;
 	long imp = (groupimp > 0) ? groupimp : taskimp;
@@ -1225,14 +1226,21 @@ static void task_numa_compare(struct task_numa_env *env,
 	 * In the overloaded case, try and keep the load balanced.
 	 */
 balance:
+	src_load = env->src_stats.load;
+	dst_load = env->dst_stats.load;
+
+	/* Calculate the effect of moving env->p from src to dst. */
 	load = task_h_load(env->p);
-	dst_load = env->dst_stats.load + load;
-	src_load = env->src_stats.load - load;
+	tg = task_group(env->p);
+	src_load += effective_load(tg, env->src_cpu, -load, -load);
+	dst_load += effective_load(tg, env->dst_cpu, load, load);
 
 	if (cur) {
+		/* Cur moves in the opposite direction. */
 		load = task_h_load(cur);
-		dst_load -= load;
-		src_load += load;
+		tg = task_group(cur);
+		src_load += effective_load(tg, env->src_cpu, load, load);
+		dst_load += effective_load(tg, env->dst_cpu, -load, -load);
 	}
 
 	if (load_too_imbalanced(src_load, dst_load, env))
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/7] sched,numa: simplify task_numa_compare
  2014-06-23 15:41 [PATCH 0/7] sched,numa: improve NUMA convergence times riel
  2014-06-23 15:41 ` [PATCH 1/7] sched,numa: use group's max nid as task's preferred nid riel
  2014-06-23 15:41 ` [PATCH 3/7] sched,numa: use effective_load to balance NUMA loads riel
@ 2014-06-23 15:41 ` riel
  2014-06-25 10:39   ` Mel Gorman
  2014-06-23 15:41 ` [PATCH 5/7] sched,numa: examine a task move when examining a task swap riel
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: riel @ 2014-06-23 15:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: chegu_vinod, peterz, mgorman, mingo

From: Rik van Riel <riel@redhat.com>

When a task is part of a numa_group, the comparison should always use
the group weight, in order to make workloads converge.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 kernel/sched/fair.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 41b75a6..2eb845c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1154,7 +1154,7 @@ static void task_numa_compare(struct task_numa_env *env,
 	struct task_group *tg;
 	long src_load, dst_load;
 	long load;
-	long imp = (groupimp > 0) ? groupimp : taskimp;
+	long imp = env->p->numa_group ? groupimp : taskimp;
 
 	rcu_read_lock();
 	cur = ACCESS_ONCE(dst_rq->curr);
@@ -1192,11 +1192,6 @@ static void task_numa_compare(struct task_numa_env *env,
 			 * itself (not part of a group), use the task weight
 			 * instead.
 			 */
-			if (env->p->numa_group)
-				imp = groupimp;
-			else
-				imp = taskimp;
-
 			if (cur->numa_group)
 				imp += group_weight(cur, env->src_nid) -
 				       group_weight(cur, env->dst_nid);
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/7] sched,numa: examine a task move when examining a task swap
  2014-06-23 15:41 [PATCH 0/7] sched,numa: improve NUMA convergence times riel
                   ` (2 preceding siblings ...)
  2014-06-23 15:41 ` [PATCH 4/7] sched,numa: simplify task_numa_compare riel
@ 2014-06-23 15:41 ` riel
  2014-06-23 15:41 ` [PATCH 6/7] sched,numa: rework best node setting in task_numa_migrate riel
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: riel @ 2014-06-23 15:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: chegu_vinod, peterz, mgorman, mingo

From: Rik van Riel <riel@redhat.com>

Running "perf bench numa mem -0 -m -P 1000 -p 8 -t 20" on a 4
node system results in 160 runnable threads on a system with 80
CPU threads.

Once a process has nearly converged, with 39 threads on one node
and 1 thread on another node, the remaining thread will be unable
to migrate to its preferred node through a task swap.

However, a simple task move would make the workload converge,
witout causing an imbalance.

Test for this unlikely occurrence, and attempt a task move to
the preferred nid when it happens.

 # Running main, "perf bench numa mem -p 8 -t 20 -0 -m -P 1000"

 ###
 # 160 tasks will execute (on 4 nodes, 80 CPUs):
 #         -1x     0MB global  shared mem operations
 #         -1x  1000MB process shared mem operations
 #         -1x     0MB thread  local  mem operations
 ###

 ###
 #
 #    0.0%  [0.2 mins]  0/0   1/1  36/2   0/0  [36/3 ] l:  0-0   (  0) {0-2}
 #    0.0%  [0.3 mins] 43/3  37/2  39/2  41/3  [ 6/10] l:  0-1   (  1) {1-2}
 #    0.0%  [0.4 mins] 42/3  38/2  40/2  40/2  [ 4/9 ] l:  1-2   (  1) [50.0%] {1-2}
 #    0.0%  [0.6 mins] 41/3  39/2  40/2  40/2  [ 2/9 ] l:  2-4   (  2) [50.0%] {1-2}
 #    0.0%  [0.7 mins] 40/2  40/2  40/2  40/2  [ 0/8 ] l:  3-5   (  2) [40.0%] (  41.8s converged)

Without this patch, this same perf bench numa mem run had to
rely on the scheduler load balancer to first balance out the
load (moving a random task), before a task swap could complete
the NUMA convergence.

The load balancer does not normally take action unless the load
difference exceeds 25%. Convergence times of over half an hour
have been observed without this patch.

With this patch, the NUMA balancing code will simply migrate the
task, if that does not cause an imbalance.

Also skip examining a CPU in detail if the improvement on that CPU
is no more than the best we already have.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 kernel/sched/fair.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2eb845c..d525451 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1155,6 +1155,7 @@ static void task_numa_compare(struct task_numa_env *env,
 	long src_load, dst_load;
 	long load;
 	long imp = env->p->numa_group ? groupimp : taskimp;
+	long moveimp = imp;
 
 	rcu_read_lock();
 	cur = ACCESS_ONCE(dst_rq->curr);
@@ -1201,7 +1202,7 @@ static void task_numa_compare(struct task_numa_env *env,
 		}
 	}
 
-	if (imp < env->best_imp)
+	if (imp <= env->best_imp && moveimp <= env->best_imp)
 		goto unlock;
 
 	if (!cur) {
@@ -1214,7 +1215,8 @@ static void task_numa_compare(struct task_numa_env *env,
 	}
 
 	/* Balance doesn't matter much if we're running a task per cpu */
-	if (src_rq->nr_running == 1 && dst_rq->nr_running == 1)
+	if (imp > env->best_imp && src_rq->nr_running == 1 &&
+			dst_rq->nr_running == 1)
 		goto assign;
 
 	/*
@@ -1230,6 +1232,23 @@ static void task_numa_compare(struct task_numa_env *env,
 	src_load += effective_load(tg, env->src_cpu, -load, -load);
 	dst_load += effective_load(tg, env->dst_cpu, load, load);
 
+	if (moveimp > imp && moveimp > env->best_imp) {
+		/*
+		 * If the improvement from just moving env->p direction is
+		 * better than swapping tasks around, check if a move is
+		 * possible. Store a slightly smaller score than moveimp,
+		 * so an actually idle CPU will win.
+		 */
+		if (!load_too_imbalanced(src_load, dst_load, env)) {
+			imp = moveimp - 1;
+			cur = NULL;
+			goto assign;
+		}
+	}
+
+	if (imp <= env->best_imp)
+		goto unlock;
+
 	if (cur) {
 		/* Cur moves in the opposite direction. */
 		load = task_h_load(cur);
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 6/7] sched,numa: rework best node setting in task_numa_migrate
  2014-06-23 15:41 [PATCH 0/7] sched,numa: improve NUMA convergence times riel
                   ` (3 preceding siblings ...)
  2014-06-23 15:41 ` [PATCH 5/7] sched,numa: examine a task move when examining a task swap riel
@ 2014-06-23 15:41 ` riel
  2014-07-05 10:45   ` [tip:sched/core] sched/numa: Rework best node setting in task_numa_migrate() tip-bot for Rik van Riel
  2014-06-23 15:41 ` [PATCH 7/7] sched,numa: change scan period code to match intent riel
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: riel @ 2014-06-23 15:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: chegu_vinod, peterz, mgorman, mingo

From: Rik van Riel <riel@redhat.com>

Fix up the best node setting in task_numa_migrate to deal with a task
in a pseudo-interleaved NUMA group, which is already running in the
best location.

Set the task's preferred nid to the current nid, so task migration is
not retried at a high rate.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 kernel/sched/fair.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d525451..ee35576 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1354,10 +1354,6 @@ static int task_numa_migrate(struct task_struct *p)
 		}
 	}
 
-	/* No better CPU than the current one was found. */
-	if (env.best_cpu == -1)
-		return -EAGAIN;
-
 	/*
 	 * If the task is part of a workload that spans multiple NUMA nodes,
 	 * and is migrating into one of the workload's active nodes, remember
@@ -1366,8 +1362,19 @@ static int task_numa_migrate(struct task_struct *p)
 	 * A task that migrated to a second choice node will be better off
 	 * trying for a better one later. Do not set the preferred node here.
 	 */
-	if (p->numa_group && node_isset(env.dst_nid, p->numa_group->active_nodes))
-		sched_setnuma(p, env.dst_nid);
+	if (p->numa_group) {
+		if (env.best_cpu == -1)
+			nid = env.src_nid;
+		else
+			nid = env.dst_nid;
+
+		if (node_isset(nid, p->numa_group->active_nodes))
+			sched_setnuma(p, env.dst_nid);
+	}
+
+	/* No better CPU than the current one was found. */
+	if (env.best_cpu == -1)
+		return -EAGAIN;
 
 	/*
 	 * Reset the scan period if the task is being rescheduled on an
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 7/7] sched,numa: change scan period code to match intent
  2014-06-23 15:41 [PATCH 0/7] sched,numa: improve NUMA convergence times riel
                   ` (4 preceding siblings ...)
  2014-06-23 15:41 ` [PATCH 6/7] sched,numa: rework best node setting in task_numa_migrate riel
@ 2014-06-23 15:41 ` riel
  2014-06-25 10:19   ` Mel Gorman
  2014-07-05 10:45   ` [tip:sched/core] sched/numa: Change " tip-bot for Rik van Riel
  2014-06-23 22:30 ` [PATCH 8/7] sched,numa: do not let a move increase the imbalance Rik van Riel
  2014-06-24 19:14 ` [PATCH 9/7] sched,numa: remove task_h_load from task_numa_compare Rik van Riel
  7 siblings, 2 replies; 25+ messages in thread
From: riel @ 2014-06-23 15:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: chegu_vinod, peterz, mgorman, mingo

From: Rik van Riel <riel@redhat.com>

Reading through the scan period code and comment, it appears the
intent was to slow down NUMA scanning when a majority of accesses
are on the local node, specifically a local:remote ratio of 3:1.

However, the code actually tests local / (local + remote), and
the actual cut-off point was around 30% local accesses, well before
a task has actually converged on a node.

Changing the threshold to 7 means scanning slows down when a task
has around 70% of its accesses local, which appears to match the
intent of the code more closely.

Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Rik van Riel <riel@redhat.com>
---
 kernel/sched/fair.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ee35576..1aaa3b4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1452,12 +1452,12 @@ static void update_numa_active_node_mask(struct numa_group *numa_group)
 /*
  * When adapting the scan rate, the period is divided into NUMA_PERIOD_SLOTS
  * increments. The more local the fault statistics are, the higher the scan
- * period will be for the next scan window. If local/remote ratio is below
- * NUMA_PERIOD_THRESHOLD (where range of ratio is 1..NUMA_PERIOD_SLOTS) the
- * scan period will decrease
+ * period will be for the next scan window. If local/(local+remote) ratio is
+ * below NUMA_PERIOD_THRESHOLD (where range of ratio is 1..NUMA_PERIOD_SLOTS)
+ * the scan period will decrease. Aim for 70% local accesses.
  */
 #define NUMA_PERIOD_SLOTS 10
-#define NUMA_PERIOD_THRESHOLD 3
+#define NUMA_PERIOD_THRESHOLD 7
 
 /*
  * Increase the scan period (slow down scanning) if the majority of
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 8/7] sched,numa: do not let a move increase the imbalance
  2014-06-23 15:41 [PATCH 0/7] sched,numa: improve NUMA convergence times riel
                   ` (5 preceding siblings ...)
  2014-06-23 15:41 ` [PATCH 7/7] sched,numa: change scan period code to match intent riel
@ 2014-06-23 22:30 ` Rik van Riel
  2014-06-24 14:38   ` Peter Zijlstra
  2014-06-24 19:14 ` [PATCH 9/7] sched,numa: remove task_h_load from task_numa_compare Rik van Riel
  7 siblings, 1 reply; 25+ messages in thread
From: Rik van Riel @ 2014-06-23 22:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: chegu_vinod, peterz, mgorman, mingo

The HP DL980 system has a different NUMA topology from the 8 node
system I am testing on, and showed some bad behaviour I have not
managed to reproduce. This patch makes sure workloads converge.

When both a task swap and a task move are possible, do not let the
task move cause an increase in the load imbalance. Forcing task swaps
can help untangle workloads that have gotten stuck fighting over the
same nodes, like this run of "perf bench numa -m -0 -p 1000 -p 16 -t 15":

Per-node process memory usage (in MBs)
38035 (process 0      2      0      0      1   1000      0      0      0  1003
38036 (process 1      2      0      0      1      0   1000      0      0  1003
38037 (process 2    230    772      0      1      0      0      0      0  1003
38038 (process 3      1      0      0   1003      0      0      0      0  1004
38039 (process 4      2      0      0      1      0      0    994      6  1003
38040 (process 5      2      0      0      1    994      0      0      6  1003
38041 (process 6      2      0   1000      1      0      0      0      0  1003
38042 (process 7   1003      0      0      1      0      0      0      0  1004
38043 (process 8      2      0      0      1      0   1000      0      0  1003
38044 (process 9      2      0      0      1      0      0      0   1000  1003
38045 (process 1   1002      0      0      1      0      0      0      0  1003
38046 (process 1      3      0    954      1      0      0      0     46  1004
38047 (process 1      2   1000      0      1      0      0      0      0  1003
38048 (process 1      2      0      0      1      0      0   1000      0  1003
38049 (process 1      2      0      0   1001      0      0      0      0  1003
38050 (process 1      2    934      0     67      0      0      0      0  1003

Allowing task moves to increase the imbalance even slightly causes
tasks to move towards node 1, and not towards node 7, which prevents
the workload from converging once the above scenario has been reached.

Reported-and-tested-by: Vinod Chegu <chegu_vinod@hp.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
---
 kernel/sched/fair.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4723234..e98d290 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1314,6 +1314,12 @@ static void task_numa_compare(struct task_numa_env *env,
 
 	if (moveimp > imp && moveimp > env->best_imp) {
 		/*
+		 * A task swap is possible, do not let a task move
+		 * increase the imbalance.
+		 */
+		int imbalance_pct = env->imbalance_pct;
+		env->imbalance_pct = 100;
+		/*
 		 * If the improvement from just moving env->p direction is
 		 * better than swapping tasks around, check if a move is
 		 * possible. Store a slightly smaller score than moveimp,
@@ -1324,6 +1330,8 @@ static void task_numa_compare(struct task_numa_env *env,
 			cur = NULL;
 			goto assign;
 		}
+
+		env->imbalance_pct = imbalance_pct;
 	}
 
 	if (imp <= env->best_imp)


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 8/7] sched,numa: do not let a move increase the imbalance
  2014-06-23 22:30 ` [PATCH 8/7] sched,numa: do not let a move increase the imbalance Rik van Riel
@ 2014-06-24 14:38   ` Peter Zijlstra
  2014-06-24 15:30     ` Rik van Riel
  2014-06-25  1:57     ` Rik van Riel
  0 siblings, 2 replies; 25+ messages in thread
From: Peter Zijlstra @ 2014-06-24 14:38 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-kernel, chegu_vinod, mgorman, mingo

[-- Attachment #1: Type: text/plain, Size: 2891 bytes --]

On Mon, Jun 23, 2014 at 06:30:11PM -0400, Rik van Riel wrote:
> The HP DL980 system has a different NUMA topology from the 8 node
> system I am testing on, and showed some bad behaviour I have not
> managed to reproduce. This patch makes sure workloads converge.
> 
> When both a task swap and a task move are possible, do not let the
> task move cause an increase in the load imbalance. Forcing task swaps
> can help untangle workloads that have gotten stuck fighting over the
> same nodes, like this run of "perf bench numa -m -0 -p 1000 -p 16 -t 15":
> 
> Per-node process memory usage (in MBs)
> 38035 (process 0      2      0      0      1   1000      0      0      0  1003
> 38036 (process 1      2      0      0      1      0   1000      0      0  1003
> 38037 (process 2    230    772      0      1      0      0      0      0  1003
> 38038 (process 3      1      0      0   1003      0      0      0      0  1004
> 38039 (process 4      2      0      0      1      0      0    994      6  1003
> 38040 (process 5      2      0      0      1    994      0      0      6  1003
> 38041 (process 6      2      0   1000      1      0      0      0      0  1003
> 38042 (process 7   1003      0      0      1      0      0      0      0  1004
> 38043 (process 8      2      0      0      1      0   1000      0      0  1003
> 38044 (process 9      2      0      0      1      0      0      0   1000  1003
> 38045 (process 1   1002      0      0      1      0      0      0      0  1003
> 38046 (process 1      3      0    954      1      0      0      0     46  1004
> 38047 (process 1      2   1000      0      1      0      0      0      0  1003
> 38048 (process 1      2      0      0      1      0      0   1000      0  1003
> 38049 (process 1      2      0      0   1001      0      0      0      0  1003
> 38050 (process 1      2    934      0     67      0      0      0      0  1003
> 
> Allowing task moves to increase the imbalance even slightly causes
> tasks to move towards node 1, and not towards node 7, which prevents
> the workload from converging once the above scenario has been reached.
> 
> Reported-and-tested-by: Vinod Chegu <chegu_vinod@hp.com>
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
>  kernel/sched/fair.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 4723234..e98d290 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1314,6 +1314,12 @@ static void task_numa_compare(struct task_numa_env *env,
>  
>  	if (moveimp > imp && moveimp > env->best_imp) {
>  		/*
> +		 * A task swap is possible, do not let a task move
> +		 * increase the imbalance.
> +		 */
> +		int imbalance_pct = env->imbalance_pct;
> +		env->imbalance_pct = 100;
> +		/*

I would feel so much better if we could say _why_ this is so.


[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 8/7] sched,numa: do not let a move increase the imbalance
  2014-06-24 14:38   ` Peter Zijlstra
@ 2014-06-24 15:30     ` Rik van Riel
  2014-06-25  1:57     ` Rik van Riel
  1 sibling, 0 replies; 25+ messages in thread
From: Rik van Riel @ 2014-06-24 15:30 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, chegu_vinod, mgorman, mingo

On Tue, 24 Jun 2014 16:38:20 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, Jun 23, 2014 at 06:30:11PM -0400, Rik van Riel wrote:
> > The HP DL980 system has a different NUMA topology from the 8 node
> > system I am testing on, and showed some bad behaviour I have not
> > managed to reproduce. This patch makes sure workloads converge.
> > 
> > When both a task swap and a task move are possible, do not let the
> > task move cause an increase in the load imbalance. Forcing task
> > swaps can help untangle workloads that have gotten stuck fighting
> > over the same nodes, like this run of "perf bench numa -m -0 -p
> > 1000 -p 16 -t 15":
> > 
> > Per-node process memory usage (in MBs)
> > 38035 (process 0      2      0      0      1   1000      0
> > 0      0  1003 38036 (process 1      2      0      0      1
> > 0   1000      0      0  1003 38037 (process 2    230    772
> > 0      1      0      0      0      0  1003 38038 (process 3
> > 1      0      0   1003      0      0      0      0  1004 38039
> > (process 4      2      0      0      1      0      0    994      6
> > 1003 38040 (process 5      2      0      0      1    994
> > 0      0      6  1003 38041 (process 6      2      0   1000
> > 1      0      0      0      0  1003 38042 (process 7   1003
> > 0      0      1      0      0      0      0  1004 38043 (process
> > 8      2      0      0      1      0   1000      0      0  1003
> > 38044 (process 9      2      0      0      1      0      0      0
> > 1000  1003 38045 (process 1   1002      0      0      1      0
> > 0      0      0  1003 38046 (process 1      3      0    954
> > 1      0      0      0     46  1004 38047 (process 1      2
> > 1000      0      1      0      0      0      0  1003 38048 (process
> > 1      2      0      0      1      0      0   1000      0  1003
> > 38049 (process 1      2      0      0   1001      0      0
> > 0      0  1003 38050 (process 1      2    934      0     67
> > 0      0      0      0  1003
> > 
> > Allowing task moves to increase the imbalance even slightly causes
> > tasks to move towards node 1, and not towards node 7, which prevents
> > the workload from converging once the above scenario has been
> > reached.
> > 
> > Reported-and-tested-by: Vinod Chegu <chegu_vinod@hp.com>
> > Signed-off-by: Rik van Riel <riel@redhat.com>
> > ---
> >  kernel/sched/fair.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 4723234..e98d290 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -1314,6 +1314,12 @@ static void task_numa_compare(struct
> > task_numa_env *env, 
> >  	if (moveimp > imp && moveimp > env->best_imp) {
> >  		/*
> > +		 * A task swap is possible, do not let a task move
> > +		 * increase the imbalance.
> > +		 */
> > +		int imbalance_pct = env->imbalance_pct;
> > +		env->imbalance_pct = 100;
> > +		/*
> 
> I would feel so much better if we could say _why_ this is so.

I can explain why, and will need to think a little about how to
write it best down in a concise form for a comment...

Basically, when we have more numa_groups than nodes on the
system, say 2x the number of nodes, it is possible that one node
is the most desirable node for 3 of the tasks or numa_groups
(node A), while another node is desirable to just 1 group (node B).

If we allow task moves to create an imbalance, the load balancer
will move tasks from groups 1, 2 & 3 from node A to node B,
while the NUMA code is allowed to move tasks back from node B
to node A.

Each of the numa groups are allowed equal movement here. A task
move has a higher improvement than a task swap, so the system
will prefer a task move.

By not doing the task moves, the workloads never "untangle" with
two of them winning node A, and the other ending up predominantly
on node B, until node B becomes its preferred nid.

Does that make sense?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 9/7] sched,numa: remove task_h_load from task_numa_compare
  2014-06-23 15:41 [PATCH 0/7] sched,numa: improve NUMA convergence times riel
                   ` (6 preceding siblings ...)
  2014-06-23 22:30 ` [PATCH 8/7] sched,numa: do not let a move increase the imbalance Rik van Riel
@ 2014-06-24 19:14 ` Rik van Riel
  2014-06-25  5:07   ` Peter Zijlstra
  7 siblings, 1 reply; 25+ messages in thread
From: Rik van Riel @ 2014-06-24 19:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: chegu_vinod, peterz, mgorman, mingo

The function effective_load already makes the calculations that
task_h_load makes. Making them twice can throw off the calculations,
and is generally a bad idea.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 kernel/sched/fair.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1aaa3b4..318a275 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1227,7 +1227,7 @@ static void task_numa_compare(struct task_numa_env *env,
 	dst_load = env->dst_stats.load;
 
 	/* Calculate the effect of moving env->p from src to dst. */
-	load = task_h_load(env->p);
+	load = env->p->se.load.weight;
 	tg = task_group(env->p);
 	src_load += effective_load(tg, env->src_cpu, -load, -load);
 	dst_load += effective_load(tg, env->dst_cpu, load, load);
@@ -1251,7 +1251,7 @@ static void task_numa_compare(struct task_numa_env *env,
 
 	if (cur) {
 		/* Cur moves in the opposite direction. */
-		load = task_h_load(cur);
+		load = cur->se.load.weight;
 		tg = task_group(cur);
 		src_load += effective_load(tg, env->src_cpu, load, load);
 		dst_load += effective_load(tg, env->dst_cpu, -load, -load);

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 8/7] sched,numa: do not let a move increase the imbalance
  2014-06-24 14:38   ` Peter Zijlstra
  2014-06-24 15:30     ` Rik van Riel
@ 2014-06-25  1:57     ` Rik van Riel
  1 sibling, 0 replies; 25+ messages in thread
From: Rik van Riel @ 2014-06-25  1:57 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, chegu_vinod, mgorman, mingo

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/24/2014 10:38 AM, Peter Zijlstra wrote:

>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index
>> 4723234..e98d290 100644 --- a/kernel/sched/fair.c +++
>> b/kernel/sched/fair.c @@ -1314,6 +1314,12 @@ static void
>> task_numa_compare(struct task_numa_env *env,
>> 
>> if (moveimp > imp && moveimp > env->best_imp) { /* +		 * A task
>> swap is possible, do not let a task move +		 * increase the
>> imbalance. +		 */ +		int imbalance_pct = env->imbalance_pct; +
>> env->imbalance_pct = 100; +		/*
> 
> I would feel so much better if we could say _why_ this is so.

OK, I think patch 9 supercedes this one.  The imbalance tests
were buggy, that is what caused the problem.

- -- 
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTqiyIAAoJEM553pKExN6DCs0IALtGfif7FzFsB+Sf8SkVCjjj
smkFIiSE4JWulnSR+zCtB8nsCyR1vAGBlY5D6Z4TKbfNnXCrsczOfSPK2p4P8nqy
lQ8LyiOn0MkjOy4BAmL4nQsuiBwlNGAG5I2AeesumSjAMCLpFYP+6UuU9UrdPtyF
hKHiafoud2z+zzOoDdKlvb5MIhSwHn8hpBJ51Jlg0AMRMKroEutsYzyzwOErLFTz
4jJ9xOnjmuVaST4qT14JHbwJZYlmolWcvrVeT1AbVvR7pHUuBMwHH4p/PuDa5umO
7srbnDk9Dn856T4JFKQa92dBoXGHjpmRVl37cpqw4PzhlvhMMRKUFvObJZNG8ys=
=NeKl
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 9/7] sched,numa: remove task_h_load from task_numa_compare
  2014-06-24 19:14 ` [PATCH 9/7] sched,numa: remove task_h_load from task_numa_compare Rik van Riel
@ 2014-06-25  5:07   ` Peter Zijlstra
  2014-06-25  5:09     ` Rik van Riel
  2014-06-25  5:21     ` Peter Zijlstra
  0 siblings, 2 replies; 25+ messages in thread
From: Peter Zijlstra @ 2014-06-25  5:07 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-kernel, chegu_vinod, peterz, mgorman, mingo

[-- Attachment #1: Type: text/plain, Size: 1360 bytes --]

On Tue, Jun 24, 2014 at 03:14:54PM -0400, Rik van Riel wrote:
> The function effective_load already makes the calculations that
> task_h_load makes. Making them twice can throw off the calculations,
> and is generally a bad idea.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
>  kernel/sched/fair.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 1aaa3b4..318a275 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1227,7 +1227,7 @@ static void task_numa_compare(struct task_numa_env *env,
>  	dst_load = env->dst_stats.load;
>  
>  	/* Calculate the effect of moving env->p from src to dst. */
> -	load = task_h_load(env->p);
> +	load = env->p->se.load.weight;
>  	tg = task_group(env->p);
>  	src_load += effective_load(tg, env->src_cpu, -load, -load);
>  	dst_load += effective_load(tg, env->dst_cpu, load, load);
> @@ -1251,7 +1251,7 @@ static void task_numa_compare(struct task_numa_env *env,
>  
>  	if (cur) {
>  		/* Cur moves in the opposite direction. */
> -		load = task_h_load(cur);
> +		load = cur->se.load.weight;
>  		tg = task_group(cur);
>  		src_load += effective_load(tg, env->src_cpu, load, load);
>  		dst_load += effective_load(tg, env->dst_cpu, -load, -load);

Shall I merge this into patch 3?

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 9/7] sched,numa: remove task_h_load from task_numa_compare
  2014-06-25  5:07   ` Peter Zijlstra
@ 2014-06-25  5:09     ` Rik van Riel
  2014-06-25  5:21     ` Peter Zijlstra
  1 sibling, 0 replies; 25+ messages in thread
From: Rik van Riel @ 2014-06-25  5:09 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, chegu_vinod, peterz, mgorman, mingo

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/25/2014 01:07 AM, Peter Zijlstra wrote:
> On Tue, Jun 24, 2014 at 03:14:54PM -0400, Rik van Riel wrote:
>> The function effective_load already makes the calculations that 
>> task_h_load makes. Making them twice can throw off the
>> calculations, and is generally a bad idea.
>> 
>> Signed-off-by: Rik van Riel <riel@redhat.com> --- 
>> kernel/sched/fair.c | 4 ++-- 1 file changed, 2 insertions(+), 2
>> deletions(-)
>> 
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index
>> 1aaa3b4..318a275 100644 --- a/kernel/sched/fair.c +++
>> b/kernel/sched/fair.c @@ -1227,7 +1227,7 @@ static void
>> task_numa_compare(struct task_numa_env *env, dst_load =
>> env->dst_stats.load;
>> 
>> /* Calculate the effect of moving env->p from src to dst. */ -
>> load = task_h_load(env->p); +	load = env->p->se.load.weight; tg =
>> task_group(env->p); src_load += effective_load(tg, env->src_cpu,
>> -load, -load); dst_load += effective_load(tg, env->dst_cpu, load,
>> load); @@ -1251,7 +1251,7 @@ static void task_numa_compare(struct
>> task_numa_env *env,
>> 
>> if (cur) { /* Cur moves in the opposite direction. */ -		load =
>> task_h_load(cur); +		load = cur->se.load.weight; tg =
>> task_group(cur); src_load += effective_load(tg, env->src_cpu,
>> load, load); dst_load += effective_load(tg, env->dst_cpu, -load,
>> -load);
> 
> Shall I merge this into patch 3?

Please do, I guess that's where it belongs.

- -- 
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTqll3AAoJEM553pKExN6D9EcIAJA28G/oKfeDBPlt7p29vgGa
xHBVRH9a9qGTyhyqxqXzlSvqBvoaEXED6mZq0nHqvDUgC5HyQ6LD4bWCfgFj1xKH
oFTB4R5r9ef13lzPb4RlYJUvLO6MfDNyzs/Ekb+qO3+LmkSrim00nkZyKrAYeIOf
JnsAcrlavR5hnlEM6oBNPfhpqOhj8JI0v8MKm4IPm20ib4bEeYn0zwLwG06keo24
fa4mCr1Kd17wyPg8zx7nJ2UN5ZwVs0+SQ3JXWrFGhNawFFneBYi1n9UPwD+wPWuF
5rupd35m64XmAdBXbs3vR03HehX2tQ6k710WRdcg021IMFQIFa5+9ZbG5xGZFT0=
=NRBb
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 9/7] sched,numa: remove task_h_load from task_numa_compare
  2014-06-25  5:07   ` Peter Zijlstra
  2014-06-25  5:09     ` Rik van Riel
@ 2014-06-25  5:21     ` Peter Zijlstra
  2014-06-25  5:25       ` Rik van Riel
  1 sibling, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2014-06-25  5:21 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-kernel, chegu_vinod, mgorman, mingo

[-- Attachment #1: Type: text/plain, Size: 2841 bytes --]

On Wed, Jun 25, 2014 at 07:07:35AM +0200, Peter Zijlstra wrote:
> Shall I merge this into patch 3?

Which gets me the below; which is has a wrong changelog.

task_h_load() already computes the load as seen from the root group.
effective_load() just does a better (and more expensive) job of
computing the task movement implications of a move.

So the total effect of this patch shouldn't be very big; regular load
balancing also only uses task_h_load(), see move_tasks().

Now, we don't run with preemption disabled, don't run as often, etc..,
so maybe we can indeed use the more expensive variant just fine, but
does it really matter?

---
Subject: sched,numa: use effective_load to balance NUMA loads
From: Rik van Riel <riel@redhat.com>
Date: Mon, 23 Jun 2014 11:46:14 -0400

When CONFIG_FAIR_GROUP_SCHED is enabled, the load that a task places
on a CPU is determined by the group the task is in. This is conveniently
calculated for us by effective_load(), which task_numa_compare should
use.

The active groups on the source and destination CPU can be different,
so the calculation needs to be done separately for each CPU.

Cc: mgorman@suse.de
Cc: mingo@kernel.org
Cc: chegu_vinod@hp.com
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1403538378-31571-3-git-send-email-riel@redhat.com
---
 kernel/sched/fair.c |   20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1151,6 +1151,7 @@ static void task_numa_compare(struct tas
 	struct rq *src_rq = cpu_rq(env->src_cpu);
 	struct rq *dst_rq = cpu_rq(env->dst_cpu);
 	struct task_struct *cur;
+	struct task_group *tg;
 	long src_load, dst_load;
 	long load;
 	long imp = (groupimp > 0) ? groupimp : taskimp;
@@ -1225,14 +1226,21 @@ static void task_numa_compare(struct tas
 	 * In the overloaded case, try and keep the load balanced.
 	 */
 balance:
-	load = task_h_load(env->p);
-	dst_load = env->dst_stats.load + load;
-	src_load = env->src_stats.load - load;
+	src_load = env->src_stats.load;
+	dst_load = env->dst_stats.load;
+
+	/* Calculate the effect of moving env->p from src to dst. */
+	load = env->p->se.load.weight;
+	tg = task_group(env->p);
+	src_load += effective_load(tg, env->src_cpu, -load, -load);
+	dst_load += effective_load(tg, env->dst_cpu, load, load);
 
 	if (cur) {
-		load = task_h_load(cur);
-		dst_load -= load;
-		src_load += load;
+		/* Cur moves in the opposite direction. */
+		load = cur->se.load.weight;
+		tg = task_group(cur);
+		src_load += effective_load(tg, env->src_cpu, load, load);
+		dst_load += effective_load(tg, env->dst_cpu, -load, -load);
 	}
 
 	if (load_too_imbalanced(src_load, dst_load, env))



[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 9/7] sched,numa: remove task_h_load from task_numa_compare
  2014-06-25  5:21     ` Peter Zijlstra
@ 2014-06-25  5:25       ` Rik van Riel
  2014-06-25  5:31         ` Peter Zijlstra
  0 siblings, 1 reply; 25+ messages in thread
From: Rik van Riel @ 2014-06-25  5:25 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, chegu_vinod, mgorman, mingo

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/25/2014 01:21 AM, Peter Zijlstra wrote:
> On Wed, Jun 25, 2014 at 07:07:35AM +0200, Peter Zijlstra wrote:
>> Shall I merge this into patch 3?
> 
> Which gets me the below; which is has a wrong changelog.
> 
> task_h_load() already computes the load as seen from the root
> group. effective_load() just does a better (and more expensive) job
> of computing the task movement implications of a move.
> 
> So the total effect of this patch shouldn't be very big; regular
> load balancing also only uses task_h_load(), see move_tasks().
> 
> Now, we don't run with preemption disabled, don't run as often,
> etc.., so maybe we can indeed use the more expensive variant just
> fine, but does it really matter?

In my testing, it appears to make a difference between workloads
converging, and workloads sitting with one last thread stuck on
another node that never gets moved...

> --- Subject: sched,numa: use effective_load to balance NUMA loads 
> From: Rik van Riel <riel@redhat.com> Date: Mon, 23 Jun 2014
> 11:46:14 -0400
> 
> When CONFIG_FAIR_GROUP_SCHED is enabled, the load that a task
> places on a CPU is determined by the group the task is in. This is
> conveniently calculated for us by effective_load(), which
> task_numa_compare should use.
> 
> The active groups on the source and destination CPU can be
> different, so the calculation needs to be done separately for each
> CPU.
> 
> Cc: mgorman@suse.de Cc: mingo@kernel.org Cc: chegu_vinod@hp.com 
> Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter
> Zijlstra <peterz@infradead.org> Link:
> http://lkml.kernel.org/r/1403538378-31571-3-git-send-email-riel@redhat.com
>
> 
- ---
> kernel/sched/fair.c |   20 ++++++++++++++------ 1 file changed, 14
> insertions(+), 6 deletions(-)
> 
> --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1151,6
> +1151,7 @@ static void task_numa_compare(struct tas struct rq
> *src_rq = cpu_rq(env->src_cpu); struct rq *dst_rq =
> cpu_rq(env->dst_cpu); struct task_struct *cur; +	struct task_group
> *tg; long src_load, dst_load; long load; long imp = (groupimp > 0)
> ? groupimp : taskimp; @@ -1225,14 +1226,21 @@ static void
> task_numa_compare(struct tas * In the overloaded case, try and keep
> the load balanced. */ balance: -	load = task_h_load(env->p); -
> dst_load = env->dst_stats.load + load; -	src_load =
> env->src_stats.load - load; +	src_load = env->src_stats.load; +
> dst_load = env->dst_stats.load; + +	/* Calculate the effect of
> moving env->p from src to dst. */ +	load = env->p->se.load.weight; 
> +	tg = task_group(env->p); +	src_load += effective_load(tg,
> env->src_cpu, -load, -load); +	dst_load += effective_load(tg,
> env->dst_cpu, load, load);
> 
> if (cur) { -		load = task_h_load(cur); -		dst_load -= load; -
> src_load += load; +		/* Cur moves in the opposite direction. */ +
> load = cur->se.load.weight; +		tg = task_group(cur); +		src_load +=
> effective_load(tg, env->src_cpu, load, load); +		dst_load +=
> effective_load(tg, env->dst_cpu, -load, -load); }
> 
> if (load_too_imbalanced(src_load, dst_load, env))
> 
> 


- -- 
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTql0sAAoJEM553pKExN6DUEsH/2a+gHC79Ywih7PmbvS8q3JF
AW5tOHrwNK4nU+12hpOuPgF7rK2SqDEMievxGI9oNf2zNx8JsKUFQes5WVHeN4JZ
dOeF1iCZV1KOTxsTOgnfR+dmbZAw3dRRVCysQae4Xm1PlXPgUef0HZSJ4rgdxARb
HqkKokfxDframBBnkeu8QMNzvygt8oZsXYmT2fjJ8lhXuaSIgIl5GWPb4hK5ntfs
0NjE3xPKV5nJxD6ut0gGSrWAaNTg3k7IAhAJf72pzs0gnIHqa0M4AV0NIPgS5QUd
qcCLYSVisdoOiMlHxqxESNhBYTnfmpJdcQ6Cj3CWkaKdiFCGEyu0391AY3Or+zg=
=udqV
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 9/7] sched,numa: remove task_h_load from task_numa_compare
  2014-06-25  5:25       ` Rik van Riel
@ 2014-06-25  5:31         ` Peter Zijlstra
  2014-06-25  5:39           ` Rik van Riel
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2014-06-25  5:31 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-kernel, chegu_vinod, mgorman, mingo

[-- Attachment #1: Type: text/plain, Size: 1127 bytes --]

On Wed, Jun 25, 2014 at 01:25:00AM -0400, Rik van Riel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 06/25/2014 01:21 AM, Peter Zijlstra wrote:
> > On Wed, Jun 25, 2014 at 07:07:35AM +0200, Peter Zijlstra wrote:
> >> Shall I merge this into patch 3?
> > 
> > Which gets me the below; which is has a wrong changelog.
> > 
> > task_h_load() already computes the load as seen from the root
> > group. effective_load() just does a better (and more expensive) job
> > of computing the task movement implications of a move.
> > 
> > So the total effect of this patch shouldn't be very big; regular
> > load balancing also only uses task_h_load(), see move_tasks().
> > 
> > Now, we don't run with preemption disabled, don't run as often,
> > etc.., so maybe we can indeed use the more expensive variant just
> > fine, but does it really matter?
> 
> In my testing, it appears to make a difference between workloads
> converging, and workloads sitting with one last thread stuck on
> another node that never gets moved...

Fair enough; can you provide a new Changelog that I can paste in?

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 9/7] sched,numa: remove task_h_load from task_numa_compare
  2014-06-25  5:31         ` Peter Zijlstra
@ 2014-06-25  5:39           ` Rik van Riel
  2014-06-25  5:57             ` Peter Zijlstra
  0 siblings, 1 reply; 25+ messages in thread
From: Rik van Riel @ 2014-06-25  5:39 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, chegu_vinod, mgorman, mingo

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/25/2014 01:31 AM, Peter Zijlstra wrote:
> On Wed, Jun 25, 2014 at 01:25:00AM -0400, Rik van Riel wrote:
>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>> 
>> On 06/25/2014 01:21 AM, Peter Zijlstra wrote:
>>> On Wed, Jun 25, 2014 at 07:07:35AM +0200, Peter Zijlstra
>>> wrote:
>>>> Shall I merge this into patch 3?
>>> 
>>> Which gets me the below; which is has a wrong changelog.
>>> 
>>> task_h_load() already computes the load as seen from the root 
>>> group. effective_load() just does a better (and more expensive)
>>> job of computing the task movement implications of a move.
>>> 
>>> So the total effect of this patch shouldn't be very big;
>>> regular load balancing also only uses task_h_load(), see
>>> move_tasks().
>>> 
>>> Now, we don't run with preemption disabled, don't run as
>>> often, etc.., so maybe we can indeed use the more expensive
>>> variant just fine, but does it really matter?
>> 
>> In my testing, it appears to make a difference between workloads 
>> converging, and workloads sitting with one last thread stuck on 
>> another node that never gets moved...
> 
> Fair enough; can you provide a new Changelog that I can paste in?

Here it goes:

When CONFIG_FAIR_GROUP_SCHED is enabled, the load that a task places
on a CPU is determined by the group the task is in. The active groups
on the source and destination CPU can be different, resulting in a
different load contribution by the same task at its source and at its
destination. As a result, the load needs to be calculated separately
for each CPU, instead of estimated once with task_h_load.

Getting this calculation right allows some workloads to converge,
where previously the last thread could get stuck on another node,
without being able to migrate to its final destination.




- -- 
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTqmCPAAoJEM553pKExN6D4/IH/1Ez7G3jAnYFpQYvH/wSm75V
kbH+mouLAqeICjHRdXAr1SGuD8i85JeUeDU2+SymdhC+hwZXbvR/aQfX0/ok4kN7
e7kJbaNS6Lrq3bDjm74aTpMKB+zK2OExqR1DQBXwynbUahAyx3+9uXNDYp35yZwo
tt+h3Rdrmy2lTTpE0fuEjGc8ODrEJjeWyYAVxT/aQXnwgfXfp6BZ1SEXyRRmrxR0
BunsgWTO7uBxGGEIZrrm/l7mdIrsi4oAN9C4RA7v6LMR6cUW9Fj5o6iva9X714wG
txP4/AGowucS5VckN1RIaM8/pzMB3MVuAmCTX4PqWg1jf3eggcQpe5/4/bVYUqQ=
=QgYM
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 9/7] sched,numa: remove task_h_load from task_numa_compare
  2014-06-25  5:39           ` Rik van Riel
@ 2014-06-25  5:57             ` Peter Zijlstra
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2014-06-25  5:57 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-kernel, chegu_vinod, mgorman, mingo

[-- Attachment #1: Type: text/plain, Size: 1421 bytes --]

On Wed, Jun 25, 2014 at 01:39:27AM -0400, Rik van Riel wrote:
> 
> Here it goes:
> 
> When CONFIG_FAIR_GROUP_SCHED is enabled, the load that a task places
> on a CPU is determined by the group the task is in. The active groups
> on the source and destination CPU can be different, resulting in a
> different load contribution by the same task at its source and at its
> destination. As a result, the load needs to be calculated separately
> for each CPU, instead of estimated once with task_h_load.
> 
> Getting this calculation right allows some workloads to converge,
> where previously the last thread could get stuck on another node,
> without being able to migrate to its final destination.
> 

Thanks!

> 
> 
> - -- 
> All rights reversed
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
> 
> iQEcBAEBAgAGBQJTqmCPAAoJEM553pKExN6D4/IH/1Ez7G3jAnYFpQYvH/wSm75V
> kbH+mouLAqeICjHRdXAr1SGuD8i85JeUeDU2+SymdhC+hwZXbvR/aQfX0/ok4kN7
> e7kJbaNS6Lrq3bDjm74aTpMKB+zK2OExqR1DQBXwynbUahAyx3+9uXNDYp35yZwo
> tt+h3Rdrmy2lTTpE0fuEjGc8ODrEJjeWyYAVxT/aQXnwgfXfp6BZ1SEXyRRmrxR0
> BunsgWTO7uBxGGEIZrrm/l7mdIrsi4oAN9C4RA7v6LMR6cUW9Fj5o6iva9X714wG
> txP4/AGowucS5VckN1RIaM8/pzMB3MVuAmCTX4PqWg1jf3eggcQpe5/4/bVYUqQ=
> =QgYM
> -----END PGP SIGNATURE-----

http://www.tummy.com/blogs/2008/09/05/enigmail-message-composition-and-mutt/

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 7/7] sched,numa: change scan period code to match intent
  2014-06-23 15:41 ` [PATCH 7/7] sched,numa: change scan period code to match intent riel
@ 2014-06-25 10:19   ` Mel Gorman
  2014-07-05 10:45   ` [tip:sched/core] sched/numa: Change " tip-bot for Rik van Riel
  1 sibling, 0 replies; 25+ messages in thread
From: Mel Gorman @ 2014-06-25 10:19 UTC (permalink / raw)
  To: riel; +Cc: linux-kernel, chegu_vinod, peterz, mingo

On Mon, Jun 23, 2014 at 11:41:35AM -0400, riel@redhat.com wrote:
> From: Rik van Riel <riel@redhat.com>
> 
> Reading through the scan period code and comment, it appears the
> intent was to slow down NUMA scanning when a majority of accesses
> are on the local node, specifically a local:remote ratio of 3:1.
> 
> However, the code actually tests local / (local + remote), and
> the actual cut-off point was around 30% local accesses, well before
> a task has actually converged on a node.
> 
> Changing the threshold to 7 means scanning slows down when a task
> has around 70% of its accesses local, which appears to match the
> intent of the code more closely.
> 
> Cc: Mel Gorman <mgorman@suse.de>
> Signed-off-by: Rik van Riel <riel@redhat.com>

The threshold is indeed very low and was selected to favour slowing
down scanning over convergence time. This was with the intent that we
should never perform worse than disabling NUMA balancing -- an aim that
has mixed results with recent Java-based workloads. With slower scanning,
we converge eventually so for long-lived workloads we're ok.  On the other
hand if scan rate is continually high and we're not converging then system
overhead stays consistently high. I considered the slow convergence to be
the lesser of two possible evils.

At the time of writing there were basic workloads that were only seeing about
20-30% locality hence that threshold. Since then, things have changed that
may affect that decision -- pseudo-interleaving was introduced for example.

I've no problem with the patch because it could do with re-evaluation in
the context of the other recent changes so

Acked-by: Mel Gorman <mgorman@suse.de>

Watch for consistently high scanning activity or high system CPU usage and
if either is reported it's worth looking to see if that 70% threshold is
ever been reached.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/7] sched,numa: use group's max nid as task's preferred nid
  2014-06-23 15:41 ` [PATCH 1/7] sched,numa: use group's max nid as task's preferred nid riel
@ 2014-06-25 10:31   ` Mel Gorman
  2014-07-05 10:44   ` [tip:sched/core] sched/numa: Use group's max nid as task' s " tip-bot for Rik van Riel
  1 sibling, 0 replies; 25+ messages in thread
From: Mel Gorman @ 2014-06-25 10:31 UTC (permalink / raw)
  To: riel; +Cc: linux-kernel, chegu_vinod, peterz, mingo

On Mon, Jun 23, 2014 at 11:41:29AM -0400, riel@redhat.com wrote:
> From: Rik van Riel <riel@redhat.com>
> 
> From task_numa_placement, always try to consolidate the tasks
> in a group on the group's top nid.
> 
> In case this task is part of a group that is interleaved over
> multiple nodes, task_numa_migrate will set the task's preferred
> nid to the best node it could find for the task, so this patch
> will cause at most one run through task_numa_migrate.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>

That loop was written prior to pseudo-interleaving as well and in the
context of that change this makes sense and removes a potentially
expensive loop.

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/7] sched,numa: simplify task_numa_compare
  2014-06-23 15:41 ` [PATCH 4/7] sched,numa: simplify task_numa_compare riel
@ 2014-06-25 10:39   ` Mel Gorman
  0 siblings, 0 replies; 25+ messages in thread
From: Mel Gorman @ 2014-06-25 10:39 UTC (permalink / raw)
  To: riel; +Cc: linux-kernel, chegu_vinod, peterz, mingo

On Mon, Jun 23, 2014 at 11:41:32AM -0400, riel@redhat.com wrote:
> From: Rik van Riel <riel@redhat.com>
> 
> When a task is part of a numa_group, the comparison should always use
> the group weight, in order to make workloads converge.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [tip:sched/core] sched/numa: Use group's max nid as task' s preferred nid
  2014-06-23 15:41 ` [PATCH 1/7] sched,numa: use group's max nid as task's preferred nid riel
  2014-06-25 10:31   ` Mel Gorman
@ 2014-07-05 10:44   ` tip-bot for Rik van Riel
  1 sibling, 0 replies; 25+ messages in thread
From: tip-bot for Rik van Riel @ 2014-07-05 10:44 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, riel, hpa, mingo, torvalds, peterz, tglx

Commit-ID:  f0b8a4afd6a8c500161e45065a91738b490bf5ae
Gitweb:     http://git.kernel.org/tip/f0b8a4afd6a8c500161e45065a91738b490bf5ae
Author:     Rik van Riel <riel@redhat.com>
AuthorDate: Mon, 23 Jun 2014 11:41:29 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 5 Jul 2014 11:17:33 +0200

sched/numa: Use group's max nid as task's preferred nid

>From task_numa_placement, always try to consolidate the tasks
in a group on the group's top nid.

In case this task is part of a group that is interleaved over
multiple nodes, task_numa_migrate will set the task's preferred
nid to the best node it could find for the task, so this patch
will cause at most one run through task_numa_migrate.

Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: mgorman@suse.de
Cc: chegu_vinod@hp.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1403538095-31256-2-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 17 +----------------
 1 file changed, 1 insertion(+), 16 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e3ff3d1..96b2d39 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1594,23 +1594,8 @@ static void task_numa_placement(struct task_struct *p)
 
 	if (p->numa_group) {
 		update_numa_active_node_mask(p->numa_group);
-		/*
-		 * If the preferred task and group nids are different,
-		 * iterate over the nodes again to find the best place.
-		 */
-		if (max_nid != max_group_nid) {
-			unsigned long weight, max_weight = 0;
-
-			for_each_online_node(nid) {
-				weight = task_weight(p, nid) + group_weight(p, nid);
-				if (weight > max_weight) {
-					max_weight = weight;
-					max_nid = nid;
-				}
-			}
-		}
-
 		spin_unlock_irq(group_lock);
+		max_nid = max_group_nid;
 	}
 
 	if (max_faults) {

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip:sched/core] sched/numa: Rework best node setting in task_numa_migrate()
  2014-06-23 15:41 ` [PATCH 6/7] sched,numa: rework best node setting in task_numa_migrate riel
@ 2014-07-05 10:45   ` tip-bot for Rik van Riel
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot for Rik van Riel @ 2014-07-05 10:45 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, riel, hpa, mingo, torvalds, peterz, tglx

Commit-ID:  db015daedb56251b73f956f70b3b8813f80d8ee1
Gitweb:     http://git.kernel.org/tip/db015daedb56251b73f956f70b3b8813f80d8ee1
Author:     Rik van Riel <riel@redhat.com>
AuthorDate: Mon, 23 Jun 2014 11:41:34 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 5 Jul 2014 11:17:39 +0200

sched/numa: Rework best node setting in task_numa_migrate()

Fix up the best node setting in task_numa_migrate() to deal with a task
in a pseudo-interleaved NUMA group, which is already running in the
best location.

Set the task's preferred nid to the current nid, so task migration is
not retried at a high rate.

Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: mgorman@suse.de
Cc: chegu_vinod@hp.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1403538095-31256-7-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9d1734a..7bb2f46 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1354,10 +1354,6 @@ static int task_numa_migrate(struct task_struct *p)
 		}
 	}
 
-	/* No better CPU than the current one was found. */
-	if (env.best_cpu == -1)
-		return -EAGAIN;
-
 	/*
 	 * If the task is part of a workload that spans multiple NUMA nodes,
 	 * and is migrating into one of the workload's active nodes, remember
@@ -1366,8 +1362,19 @@ static int task_numa_migrate(struct task_struct *p)
 	 * A task that migrated to a second choice node will be better off
 	 * trying for a better one later. Do not set the preferred node here.
 	 */
-	if (p->numa_group && node_isset(env.dst_nid, p->numa_group->active_nodes))
-		sched_setnuma(p, env.dst_nid);
+	if (p->numa_group) {
+		if (env.best_cpu == -1)
+			nid = env.src_nid;
+		else
+			nid = env.dst_nid;
+
+		if (node_isset(nid, p->numa_group->active_nodes))
+			sched_setnuma(p, env.dst_nid);
+	}
+
+	/* No better CPU than the current one was found. */
+	if (env.best_cpu == -1)
+		return -EAGAIN;
 
 	/*
 	 * Reset the scan period if the task is being rescheduled on an

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip:sched/core] sched/numa: Change scan period code to match intent
  2014-06-23 15:41 ` [PATCH 7/7] sched,numa: change scan period code to match intent riel
  2014-06-25 10:19   ` Mel Gorman
@ 2014-07-05 10:45   ` tip-bot for Rik van Riel
  1 sibling, 0 replies; 25+ messages in thread
From: tip-bot for Rik van Riel @ 2014-07-05 10:45 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, riel, hpa, mingo, torvalds, peterz, tglx

Commit-ID:  a22b4b012340b988dbe7a58461d6fcc582f34aa0
Gitweb:     http://git.kernel.org/tip/a22b4b012340b988dbe7a58461d6fcc582f34aa0
Author:     Rik van Riel <riel@redhat.com>
AuthorDate: Mon, 23 Jun 2014 11:41:35 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 5 Jul 2014 11:17:40 +0200

sched/numa: Change scan period code to match intent

Reading through the scan period code and comment, it appears the
intent was to slow down NUMA scanning when a majority of accesses
are on the local node, specifically a local:remote ratio of 3:1.

However, the code actually tests local / (local + remote), and
the actual cut-off point was around 30% local accesses, well before
a task has actually converged on a node.

Changing the threshold to 7 means scanning slows down when a task
has around 70% of its accesses local, which appears to match the
intent of the code more closely.

Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: mgorman@suse.de
Cc: chegu_vinod@hp.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1403538095-31256-8-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7bb2f46..a140c6a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1452,12 +1452,12 @@ static void update_numa_active_node_mask(struct numa_group *numa_group)
 /*
  * When adapting the scan rate, the period is divided into NUMA_PERIOD_SLOTS
  * increments. The more local the fault statistics are, the higher the scan
- * period will be for the next scan window. If local/remote ratio is below
- * NUMA_PERIOD_THRESHOLD (where range of ratio is 1..NUMA_PERIOD_SLOTS) the
- * scan period will decrease
+ * period will be for the next scan window. If local/(local+remote) ratio is
+ * below NUMA_PERIOD_THRESHOLD (where range of ratio is 1..NUMA_PERIOD_SLOTS)
+ * the scan period will decrease. Aim for 70% local accesses.
  */
 #define NUMA_PERIOD_SLOTS 10
-#define NUMA_PERIOD_THRESHOLD 3
+#define NUMA_PERIOD_THRESHOLD 7
 
 /*
  * Increase the scan period (slow down scanning) if the majority of

^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2014-07-05 10:46 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-23 15:41 [PATCH 0/7] sched,numa: improve NUMA convergence times riel
2014-06-23 15:41 ` [PATCH 1/7] sched,numa: use group's max nid as task's preferred nid riel
2014-06-25 10:31   ` Mel Gorman
2014-07-05 10:44   ` [tip:sched/core] sched/numa: Use group's max nid as task' s " tip-bot for Rik van Riel
2014-06-23 15:41 ` [PATCH 3/7] sched,numa: use effective_load to balance NUMA loads riel
2014-06-23 15:41 ` [PATCH 4/7] sched,numa: simplify task_numa_compare riel
2014-06-25 10:39   ` Mel Gorman
2014-06-23 15:41 ` [PATCH 5/7] sched,numa: examine a task move when examining a task swap riel
2014-06-23 15:41 ` [PATCH 6/7] sched,numa: rework best node setting in task_numa_migrate riel
2014-07-05 10:45   ` [tip:sched/core] sched/numa: Rework best node setting in task_numa_migrate() tip-bot for Rik van Riel
2014-06-23 15:41 ` [PATCH 7/7] sched,numa: change scan period code to match intent riel
2014-06-25 10:19   ` Mel Gorman
2014-07-05 10:45   ` [tip:sched/core] sched/numa: Change " tip-bot for Rik van Riel
2014-06-23 22:30 ` [PATCH 8/7] sched,numa: do not let a move increase the imbalance Rik van Riel
2014-06-24 14:38   ` Peter Zijlstra
2014-06-24 15:30     ` Rik van Riel
2014-06-25  1:57     ` Rik van Riel
2014-06-24 19:14 ` [PATCH 9/7] sched,numa: remove task_h_load from task_numa_compare Rik van Riel
2014-06-25  5:07   ` Peter Zijlstra
2014-06-25  5:09     ` Rik van Riel
2014-06-25  5:21     ` Peter Zijlstra
2014-06-25  5:25       ` Rik van Riel
2014-06-25  5:31         ` Peter Zijlstra
2014-06-25  5:39           ` Rik van Riel
2014-06-25  5:57             ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.