linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: riel@redhat.com
To: linux-kernel@vger.kernel.org
Cc: mingo@kernel.org, peterz@infradead.org, mgorman@suse.de,
	chegu_vinod@hp.com
Subject: [PATCH 4/4] sched,numa: pull workloads towards their preferred nodes
Date: Thu,  8 May 2014 13:23:31 -0400	[thread overview]
Message-ID: <1399569811-14362-5-git-send-email-riel@redhat.com> (raw)
In-Reply-To: <1399569811-14362-1-git-send-email-riel@redhat.com>

From: Rik van Riel <riel@redhat.com>

Give a bonus to nodes near a workload's preferred node. This will pull
workloads towards their preferred node.

For workloads that span multiple NUMA nodes, pseudo-interleaving will
even out the memory use between nodes over time, causing the preferred
node to move around over time.

This movement over time will cause the preferred nodes to be on opposite
sides of the system eventually, untangling workloads that were spread
all over the system, and moving them onto adjacent nodes.

The perturbation introduced by this patch enables the kernel to
reliably untangled 2 4-node wide SPECjbb2005 instances on an 8 node
system, improving average performance from 857814 to 931792 bops.

Signed-off-by: Rik van Riel <riel@redhat.com>
Tested-by: Chegu Vinod <chegu_vinod@hp.com>
---
 kernel/sched/fair.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 99cc829..cffa829 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -932,7 +932,7 @@ static inline unsigned long group_faults_cpu(struct numa_group *group, int nid)
  * the proximity of those nodes.
  */
 static inline unsigned long nearby_nodes_score(struct task_struct *p, int nid,
-						bool task)
+						bool task, bool *preferred_nid)
 {
 	int max_distance = max_node_distance();
 	unsigned long score = 0;
@@ -949,6 +949,15 @@ static inline unsigned long nearby_nodes_score(struct task_struct *p, int nid,
 		int distance;
 		unsigned long faults;
 
+		/*
+		 * Pseudo-interleaving balances out the memory use between the
+		 * nodes where a workload runs, so the preferred node should
+		 * change over time. This helps separate two workloads onto
+		 * separate sides of the system.
+		 */
+		if (p->numa_group && node == p->numa_group->preferred_nid)
+			*preferred_nid = true;
+
 		/* Already scored by the calling function. */
 		if (node == nid)
 			continue;
@@ -989,6 +998,7 @@ static inline unsigned long nearby_nodes_score(struct task_struct *p, int nid,
 static inline unsigned long task_weight(struct task_struct *p, int nid)
 {
 	unsigned long total_faults, score;
+	bool near_preferred_nid = false;
 
 	if (!p->numa_faults_memory)
 		return 0;
@@ -999,7 +1009,7 @@ static inline unsigned long task_weight(struct task_struct *p, int nid)
 		return 0;
 
 	score = 1000 * task_faults(p, nid);
-	score += nearby_nodes_score(p, nid, true);
+	score += nearby_nodes_score(p, nid, true, &near_preferred_nid);
 
 	score /= total_faults;
 
@@ -1009,6 +1019,7 @@ static inline unsigned long task_weight(struct task_struct *p, int nid)
 static inline unsigned long group_weight(struct task_struct *p, int nid)
 {
 	unsigned long total_faults, score;
+	bool near_preferred_nid = false;
 
 	if (!p->numa_group)
 		return 0;
@@ -1019,7 +1030,15 @@ static inline unsigned long group_weight(struct task_struct *p, int nid)
 		return 0;
 
 	score = 1000 * group_faults(p, nid);
-	score += nearby_nodes_score(p, nid, false);
+	score += nearby_nodes_score(p, nid, false, &near_preferred_nid);
+
+	/*
+	 * Pull workloads towards their preferred node, with the minimum
+	 * multiplier required to be a tie-breaker when two groups of nodes
+	 * have the same amount of memory.
+	 */
+	if (near_preferred_nid)
+		score *= (max_node_distance() - LOCAL_DISTANCE);
 
 	score /= total_faults;
 
-- 
1.8.5.3


      parent reply	other threads:[~2014-05-08 17:23 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-08 17:23 [PATCH 0/4] sched,numa: task placement for complex NUMA topologies riel
2014-05-08 17:23 ` [PATCH 1/4] numa,x86: store maximum numa node distance riel
2014-05-09  9:45   ` Peter Zijlstra
2014-05-09 15:08     ` Rik van Riel
2014-05-08 17:23 ` [PATCH 2/4] sched,numa: weigh nearby nodes for task placement on complex NUMA topologies riel
2014-05-09  9:53   ` Peter Zijlstra
2014-05-09 15:14     ` Rik van Riel
2014-05-09  9:54   ` Peter Zijlstra
2014-05-09 10:03   ` Peter Zijlstra
2014-05-09 15:16     ` Rik van Riel
2014-05-09 10:11   ` Peter Zijlstra
2014-05-09 15:11     ` Rik van Riel
2014-05-09 10:13   ` Peter Zijlstra
2014-05-09 15:03     ` Rik van Riel
2014-05-08 17:23 ` [PATCH 3/4] sched,numa: store numa_group's preferred nid riel
2014-05-08 17:23 ` riel [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1399569811-14362-5-git-send-email-riel@redhat.com \
    --to=riel@redhat.com \
    --cc=chegu_vinod@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).