All of lore.kernel.org
 help / color / mirror / Atom feed
From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
To: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Rik van Riel <riel@surriel.com>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: [PATCH v2 15/19] sched/numa: Use group_weights to identify if migration degrades locality
Date: Wed, 20 Jun 2018 22:32:56 +0530	[thread overview]
Message-ID: <1529514181-9842-16-git-send-email-srikar@linux.vnet.ibm.com> (raw)
In-Reply-To: <1529514181-9842-1-git-send-email-srikar@linux.vnet.ibm.com>

On NUMA_BACKPLANE and NUMA_GLUELESS_MESH systems, tasks/memory should be
consolidated to the closest group of nodes. In such a case, relying on
group_fault metric may not always help to consolidate. There can always
be a case where a node closer to the preferred node may have lesser
faults than a node further away from the preferred node. In such a case,
moving to node with more faults might avoid numa consolidation.

Using group_weight would help to consolidate task/memory around the
preferred_node.

While here, to be on the conservative side, don't override migrate thread
degrades locality logic for CPU_NEWLY_IDLE load balancing.

Note: Similar problems exist with should_numa_migrate_memory and will be
dealt separately.

Running SPECjbb2005 on a 4 node machine and comparing bops/JVM
JVMS  LAST_PATCH  WITH_PATCH  %CHANGE
16    25645.4     25960       1.22
1     72142       73550       1.95

Running SPECjbb2005 on a 16 node machine and comparing bops/JVM
JVMS  LAST_PATCH  WITH_PATCH  %CHANGE
8     110199      120071      8.958
1     176303      176249      -0.03

(numbers from v1 based on v4.17-rc5)
Testcase       Time:         Min         Max         Avg      StdDev
numa01.sh      Real:      490.04      774.86      596.26       96.46
numa01.sh       Sys:      151.52      242.88      184.82       31.71
numa01.sh      User:    41418.41    60844.59    48776.09     6564.27
numa02.sh      Real:       60.14       62.94       60.98        1.00
numa02.sh       Sys:       16.11       30.77       21.20        5.28
numa02.sh      User:     5184.33     5311.09     5228.50       44.24
numa03.sh      Real:      790.95      856.35      826.41       24.11
numa03.sh       Sys:      114.93      118.85      117.05        1.63
numa03.sh      User:    60990.99    64959.28    63470.43     1415.44
numa04.sh      Real:      434.37      597.92      504.87       59.70
numa04.sh       Sys:      237.63      397.40      289.74       55.98
numa04.sh      User:    34854.87    41121.83    38572.52     2615.84
numa05.sh      Real:      386.77      448.90      417.22       22.79
numa05.sh       Sys:      149.23      379.95      303.04       79.55
numa05.sh      User:    32951.76    35959.58    34562.18     1034.05

Testcase       Time:         Min         Max         Avg      StdDev 	 %Change
numa01.sh      Real:      493.19      672.88      597.51       59.38 	 -0.20%
numa01.sh       Sys:      150.09      245.48      207.76       34.26 	 -11.0%
numa01.sh      User:    41928.51    53779.17    48747.06     3901.39 	 0.059%
numa02.sh      Real:       60.63       62.87       61.22        0.83 	 -0.39%
numa02.sh       Sys:       16.64       27.97       20.25        4.06 	 4.691%
numa02.sh      User:     5222.92     5309.60     5254.03       29.98 	 -0.48%
numa03.sh      Real:      821.52      902.15      863.60       32.41 	 -4.30%
numa03.sh       Sys:      112.04      130.66      118.35        7.08 	 -1.09%
numa03.sh      User:    62245.16    69165.14    66443.04     2450.32 	 -4.47%
numa04.sh      Real:      414.53      519.57      476.25       37.00 	 6.009%
numa04.sh       Sys:      181.84      335.67      280.41       54.07 	 3.327%
numa04.sh      User:    33924.50    39115.39    37343.78     1934.26 	 3.290%
numa05.sh      Real:      408.30      441.45      417.90       12.05 	 -0.16%
numa05.sh       Sys:      233.41      381.60      295.58       57.37 	 2.523%
numa05.sh      User:    33301.31    35972.50    34335.19      938.94 	 0.661%

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 kernel/sched/fair.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 711b533..9db09a6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7222,8 +7222,8 @@ static int task_hot(struct task_struct *p, struct lb_env *env)
 static int migrate_degrades_locality(struct task_struct *p, struct lb_env *env)
 {
 	struct numa_group *numa_group = rcu_dereference(p->numa_group);
-	unsigned long src_faults, dst_faults;
-	int src_nid, dst_nid;
+	unsigned long src_weight, dst_weight;
+	int src_nid, dst_nid, dist;
 
 	if (!static_branch_likely(&sched_numa_balancing))
 		return -1;
@@ -7250,18 +7250,19 @@ static int migrate_degrades_locality(struct task_struct *p, struct lb_env *env)
 		return 0;
 
 	/* Leaving a core idle is often worse than degrading locality. */
-	if (env->idle != CPU_NOT_IDLE)
+	if (env->idle == CPU_IDLE)
 		return -1;
 
+	dist = node_distance(src_nid, dst_nid);
 	if (numa_group) {
-		src_faults = group_faults(p, src_nid);
-		dst_faults = group_faults(p, dst_nid);
+		src_weight = group_weight(p, src_nid, dist);
+		dst_weight = group_weight(p, dst_nid, dist);
 	} else {
-		src_faults = task_faults(p, src_nid);
-		dst_faults = task_faults(p, dst_nid);
+		src_weight = task_weight(p, src_nid, dist);
+		dst_weight = task_weight(p, dst_nid, dist);
 	}
 
-	return dst_faults < src_faults;
+	return dst_weight < src_weight;
 }
 
 #else
-- 
1.8.3.1


  parent reply	other threads:[~2018-06-20 17:05 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-20 17:02 [PATCH v2 00/19] Fixes for sched/numa_balancing Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 01/19] sched/numa: Remove redundant field Srikar Dronamraju
2018-07-25 14:23   ` [tip:sched/core] " tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 02/19] sched/numa: Evaluate move once per node Srikar Dronamraju
2018-06-21  9:06   ` Mel Gorman
2018-07-25 14:24   ` [tip:sched/core] " tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 03/19] sched/numa: Simplify load_too_imbalanced Srikar Dronamraju
2018-07-25 14:24   ` [tip:sched/core] sched/numa: Simplify load_too_imbalanced() tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 04/19] sched/numa: Set preferred_node based on best_cpu Srikar Dronamraju
2018-06-21  9:17   ` Mel Gorman
2018-07-25 14:25   ` [tip:sched/core] " tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 05/19] sched/numa: Use task faults only if numa_group is not yet setup Srikar Dronamraju
2018-06-21  9:38   ` Mel Gorman
2018-07-25 14:25   ` [tip:sched/core] sched/numa: Use task faults only if numa_group is not yet set up tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 06/19] sched/debug: Reverse the order of printing faults Srikar Dronamraju
2018-07-25 14:26   ` [tip:sched/core] " tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 07/19] sched/numa: Skip nodes that are at hoplimit Srikar Dronamraju
2018-07-25 14:27   ` [tip:sched/core] sched/numa: Skip nodes that are at 'hoplimit' tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 08/19] sched/numa: Remove unused task_capacity from numa_stats Srikar Dronamraju
2018-07-25 14:27   ` [tip:sched/core] sched/numa: Remove unused task_capacity from 'struct numa_stats' tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 09/19] sched/numa: Modify migrate_swap to accept additional params Srikar Dronamraju
2018-07-25 14:28   ` [tip:sched/core] sched/numa: Modify migrate_swap() to accept additional parameters tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 10/19] sched/numa: Stop multiple tasks from moving to the cpu at the same time Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 11/19] sched/numa: Restrict migrating in parallel to the same node Srikar Dronamraju
2018-07-23 10:38   ` Peter Zijlstra
2018-07-23 11:16     ` Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 12/19] sched/numa: Remove numa_has_capacity Srikar Dronamraju
2018-07-25 14:28   ` [tip:sched/core] sched/numa: Remove numa_has_capacity() tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 13/19] mm/migrate: Use xchg instead of spinlock Srikar Dronamraju
2018-06-21  9:51   ` Mel Gorman
2018-07-23 10:54   ` Peter Zijlstra
2018-07-23 11:20     ` Srikar Dronamraju
2018-07-23 14:04       ` Peter Zijlstra
2018-06-20 17:02 ` [PATCH v2 14/19] sched/numa: Updation of scan period need not be in lock Srikar Dronamraju
2018-06-21  9:51   ` Mel Gorman
2018-07-25 14:29   ` [tip:sched/core] sched/numa: Update the scan period without holding the numa_group lock tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` Srikar Dronamraju [this message]
2018-07-25 14:29   ` [tip:sched/core] sched/numa: Use group_weights to identify if migration degrades locality tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 16/19] sched/numa: Detect if node actively handling migration Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 17/19] sched/numa: Pass destination cpu as a parameter to migrate_task_rq Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 18/19] sched/numa: Reset scan rate whenever task moves across nodes Srikar Dronamraju
2018-06-21 10:05   ` Mel Gorman
2018-07-04 11:19     ` Srikar Dronamraju
2018-06-20 17:03 ` [PATCH v2 19/19] sched/numa: Move task_placement closer to numa_migrate_preferred Srikar Dronamraju
2018-06-21 10:06   ` Mel Gorman
2018-07-25 14:30   ` [tip:sched/core] sched/numa: Move task_numa_placement() closer to numa_migrate_preferred() tip-bot for Srikar Dronamraju
2018-06-20 17:03 ` [PATCH v2 00/19] Fixes for sched/numa_balancing Srikar Dronamraju
2018-07-23 13:57 ` Peter Zijlstra
2018-07-23 15:09   ` Srikar Dronamraju
2018-07-23 15:21     ` Peter Zijlstra
2018-07-23 16:29       ` Srikar Dronamraju
2018-07-23 16:47         ` Peter Zijlstra
2018-07-23 15:33     ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1529514181-9842-16-git-send-email-srikar@linux.vnet.ibm.com \
    --to=srikar@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.