From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
To: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
Mel Gorman <mgorman@techsingularity.net>,
Rik van Riel <riel@surriel.com>,
Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
Thomas Gleixner <tglx@linutronix.de>
Subject: [PATCH v2 04/19] sched/numa: Set preferred_node based on best_cpu
Date: Wed, 20 Jun 2018 22:32:45 +0530 [thread overview]
Message-ID: <1529514181-9842-5-git-send-email-srikar@linux.vnet.ibm.com> (raw)
In-Reply-To: <1529514181-9842-1-git-send-email-srikar@linux.vnet.ibm.com>
Currently preferred node is set to dst_nid which is the last node in the
iteration whose group weight or task weight is greater than the current
node. However it doesn't guarantee that dst_nid has the numa capacity
to move. It also doesn't guarantee that dst_nid has the best_cpu which
is the cpu/node ideal for node migration.
Lets consider faults on a 4 node system with group weight numbers
in different nodes being in 0 < 1 < 2 < 3 proportion. Consider the task
is running on 3 and 0 is its preferred node but its capacity is full.
Consider nodes 1, 2 and 3 have capacity. Then the task should be
migrated to node 1. Currently the task gets moved to node 2. env.dst_nid
points to the last node whose faults were greater than current node.
Modify to set the preferred node based of best_cpu. Earlier setting
preferred node was skipped if nr_active_nodes is 1. This could result in
the task being moved out of the preferred node to a random node during
regular load balancing.
Also while modifying task_numa_migrate(), use sched_setnuma to set
preferred node. This ensures out numa accounting is correct.
Running SPECjbb2005 on a 4 node machine and comparing bops/JVM
JVMS LAST_PATCH WITH_PATCH %CHANGE
16 25122.9 25549.6 1.698
1 73850 73190 -0.89
Running SPECjbb2005 on a 16 node machine and comparing bops/JVM
JVMS LAST_PATCH WITH_PATCH %CHANGE
8 105930 113437 7.08676
1 178624 196130 9.80047
(numbers from v1 based on v4.17-rc5)
Testcase Time: Min Max Avg StdDev
numa01.sh Real: 435.78 653.81 534.58 83.20
numa01.sh Sys: 121.93 187.18 145.90 23.47
numa01.sh User: 37082.81 51402.80 43647.60 5409.75
numa02.sh Real: 60.64 61.63 61.19 0.40
numa02.sh Sys: 14.72 25.68 19.06 4.03
numa02.sh User: 5210.95 5266.69 5233.30 20.82
numa03.sh Real: 746.51 808.24 780.36 23.88
numa03.sh Sys: 97.26 108.48 105.07 4.28
numa03.sh User: 58956.30 61397.05 60162.95 1050.82
numa04.sh Real: 465.97 519.27 484.81 19.62
numa04.sh Sys: 304.43 359.08 334.68 20.64
numa04.sh User: 37544.16 41186.15 39262.44 1314.91
numa05.sh Real: 411.57 457.20 433.29 16.58
numa05.sh Sys: 230.05 435.48 339.95 67.58
numa05.sh User: 33325.54 36896.31 35637.84 1222.64
Testcase Time: Min Max Avg StdDev %Change
numa01.sh Real: 506.35 794.46 599.06 104.26 -10.76%
numa01.sh Sys: 150.37 223.56 195.99 24.94 -25.55%
numa01.sh User: 43450.69 61752.04 49281.50 6635.33 -11.43%
numa02.sh Real: 60.33 62.40 61.31 0.90 -0.195%
numa02.sh Sys: 18.12 31.66 24.28 5.89 -21.49%
numa02.sh User: 5203.91 5325.32 5260.29 49.98 -0.513%
numa03.sh Real: 696.47 853.62 745.80 57.28 4.6339%
numa03.sh Sys: 85.68 123.71 97.89 13.48 7.3347%
numa03.sh User: 55978.45 66418.63 59254.94 3737.97 1.5323%
numa04.sh Real: 444.05 514.83 497.06 26.85 -2.464%
numa04.sh Sys: 230.39 375.79 316.23 48.58 5.8343%
numa04.sh User: 35403.12 41004.10 39720.80 2163.08 -1.153%
numa05.sh Real: 423.09 460.41 439.57 13.92 -1.428%
numa05.sh Sys: 287.38 480.15 369.37 68.52 -7.964%
numa05.sh User: 34732.12 38016.80 36255.85 1070.51 -1.704%
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
Changelog v1->v2:
Fix setting sched_setnuma under !sd pointed by Peter Zijlstra.
Modify commit message to describe the reason for change.
kernel/sched/fair.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 285d7ae..2366fda2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1726,7 +1726,7 @@ static int task_numa_migrate(struct task_struct *p)
* elsewhere, so there is no point in (re)trying.
*/
if (unlikely(!sd)) {
- p->numa_preferred_nid = task_node(p);
+ sched_setnuma(p, task_node(p));
return -EINVAL;
}
@@ -1785,15 +1785,13 @@ static int task_numa_migrate(struct task_struct *p)
* trying for a better one later. Do not set the preferred node here.
*/
if (p->numa_group) {
- struct numa_group *ng = p->numa_group;
-
if (env.best_cpu == -1)
nid = env.src_nid;
else
- nid = env.dst_nid;
+ nid = cpu_to_node(env.best_cpu);
- if (ng->active_nodes > 1 && numa_is_active_node(env.dst_nid, ng))
- sched_setnuma(p, env.dst_nid);
+ if (nid != p->numa_preferred_nid)
+ sched_setnuma(p, nid);
}
/* No better CPU than the current one was found. */
--
1.8.3.1
next prev parent reply other threads:[~2018-06-20 17:03 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-20 17:02 [PATCH v2 00/19] Fixes for sched/numa_balancing Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 01/19] sched/numa: Remove redundant field Srikar Dronamraju
2018-07-25 14:23 ` [tip:sched/core] " tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 02/19] sched/numa: Evaluate move once per node Srikar Dronamraju
2018-06-21 9:06 ` Mel Gorman
2018-07-25 14:24 ` [tip:sched/core] " tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 03/19] sched/numa: Simplify load_too_imbalanced Srikar Dronamraju
2018-07-25 14:24 ` [tip:sched/core] sched/numa: Simplify load_too_imbalanced() tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` Srikar Dronamraju [this message]
2018-06-21 9:17 ` [PATCH v2 04/19] sched/numa: Set preferred_node based on best_cpu Mel Gorman
2018-07-25 14:25 ` [tip:sched/core] " tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 05/19] sched/numa: Use task faults only if numa_group is not yet setup Srikar Dronamraju
2018-06-21 9:38 ` Mel Gorman
2018-07-25 14:25 ` [tip:sched/core] sched/numa: Use task faults only if numa_group is not yet set up tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 06/19] sched/debug: Reverse the order of printing faults Srikar Dronamraju
2018-07-25 14:26 ` [tip:sched/core] " tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 07/19] sched/numa: Skip nodes that are at hoplimit Srikar Dronamraju
2018-07-25 14:27 ` [tip:sched/core] sched/numa: Skip nodes that are at 'hoplimit' tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 08/19] sched/numa: Remove unused task_capacity from numa_stats Srikar Dronamraju
2018-07-25 14:27 ` [tip:sched/core] sched/numa: Remove unused task_capacity from 'struct numa_stats' tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 09/19] sched/numa: Modify migrate_swap to accept additional params Srikar Dronamraju
2018-07-25 14:28 ` [tip:sched/core] sched/numa: Modify migrate_swap() to accept additional parameters tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 10/19] sched/numa: Stop multiple tasks from moving to the cpu at the same time Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 11/19] sched/numa: Restrict migrating in parallel to the same node Srikar Dronamraju
2018-07-23 10:38 ` Peter Zijlstra
2018-07-23 11:16 ` Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 12/19] sched/numa: Remove numa_has_capacity Srikar Dronamraju
2018-07-25 14:28 ` [tip:sched/core] sched/numa: Remove numa_has_capacity() tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 13/19] mm/migrate: Use xchg instead of spinlock Srikar Dronamraju
2018-06-21 9:51 ` Mel Gorman
2018-07-23 10:54 ` Peter Zijlstra
2018-07-23 11:20 ` Srikar Dronamraju
2018-07-23 14:04 ` Peter Zijlstra
2018-06-20 17:02 ` [PATCH v2 14/19] sched/numa: Updation of scan period need not be in lock Srikar Dronamraju
2018-06-21 9:51 ` Mel Gorman
2018-07-25 14:29 ` [tip:sched/core] sched/numa: Update the scan period without holding the numa_group lock tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 15/19] sched/numa: Use group_weights to identify if migration degrades locality Srikar Dronamraju
2018-07-25 14:29 ` [tip:sched/core] " tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 16/19] sched/numa: Detect if node actively handling migration Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 17/19] sched/numa: Pass destination cpu as a parameter to migrate_task_rq Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 18/19] sched/numa: Reset scan rate whenever task moves across nodes Srikar Dronamraju
2018-06-21 10:05 ` Mel Gorman
2018-07-04 11:19 ` Srikar Dronamraju
2018-06-20 17:03 ` [PATCH v2 19/19] sched/numa: Move task_placement closer to numa_migrate_preferred Srikar Dronamraju
2018-06-21 10:06 ` Mel Gorman
2018-07-25 14:30 ` [tip:sched/core] sched/numa: Move task_numa_placement() closer to numa_migrate_preferred() tip-bot for Srikar Dronamraju
2018-06-20 17:03 ` [PATCH v2 00/19] Fixes for sched/numa_balancing Srikar Dronamraju
2018-07-23 13:57 ` Peter Zijlstra
2018-07-23 15:09 ` Srikar Dronamraju
2018-07-23 15:21 ` Peter Zijlstra
2018-07-23 16:29 ` Srikar Dronamraju
2018-07-23 16:47 ` Peter Zijlstra
2018-07-23 15:33 ` Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1529514181-9842-5-git-send-email-srikar@linux.vnet.ibm.com \
--to=srikar@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=riel@surriel.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).