linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
To: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Rik van Riel <riel@surriel.com>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: [PATCH v2 09/19] sched/numa: Modify migrate_swap to accept additional params
Date: Wed, 20 Jun 2018 22:32:50 +0530	[thread overview]
Message-ID: <1529514181-9842-10-git-send-email-srikar@linux.vnet.ibm.com> (raw)
In-Reply-To: <1529514181-9842-1-git-send-email-srikar@linux.vnet.ibm.com>

There are checks in migrate_swap_stop that check if the task/cpu
combination is as per migrate_swap_arg before migrating.

However atleast one of the two tasks to be swapped by migrate_swap could
have migrated to a completely different cpu before updating the
migrate_swap_arg. The new cpu where the task is currently running could
be a different node too. If the task has migrated, numa balancer might
end up placing a task in a wrong node.  Instead of achieving node
consolidation, it may end up spreading the load across nodes.

To avoid that pass the cpus as additional parameters.

While here, place migrate_swap under CONFIG_NUMA_BALANCING.

Running SPECjbb2005 on a 4 node machine and comparing bops/JVM
JVMS  LAST_PATCH  WITH_PATCH  %CHANGE
16    25377.3     25226.6     -0.59
1     72287       73326       1.437

Acked-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 kernel/sched/core.c  | 9 ++++++---
 kernel/sched/fair.c  | 3 ++-
 kernel/sched/sched.h | 3 ++-
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 211890e..36f1c7c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1197,6 +1197,7 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 	__set_task_cpu(p, new_cpu);
 }
 
+#ifdef CONFIG_NUMA_BALANCING
 static void __migrate_swap_task(struct task_struct *p, int cpu)
 {
 	if (task_on_rq_queued(p)) {
@@ -1278,16 +1279,17 @@ static int migrate_swap_stop(void *data)
 /*
  * Cross migrate two tasks
  */
-int migrate_swap(struct task_struct *cur, struct task_struct *p)
+int migrate_swap(struct task_struct *cur, struct task_struct *p,
+		int target_cpu, int curr_cpu)
 {
 	struct migration_swap_arg arg;
 	int ret = -EINVAL;
 
 	arg = (struct migration_swap_arg){
 		.src_task = cur,
-		.src_cpu = task_cpu(cur),
+		.src_cpu = curr_cpu,
 		.dst_task = p,
-		.dst_cpu = task_cpu(p),
+		.dst_cpu = target_cpu,
 	};
 
 	if (arg.src_cpu == arg.dst_cpu)
@@ -1312,6 +1314,7 @@ int migrate_swap(struct task_struct *cur, struct task_struct *p)
 out:
 	return ret;
 }
+#endif /* CONFIG_NUMA_BALANCING */
 
 /*
  * wait_task_inactive - wait for a thread to unschedule.
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0580a27..0d0248b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1809,7 +1809,8 @@ static int task_numa_migrate(struct task_struct *p)
 		return ret;
 	}
 
-	ret = migrate_swap(p, env.best_task);
+	ret = migrate_swap(p, env.best_task, env.best_cpu, env.src_cpu);
+
 	if (ret != 0)
 		trace_sched_stick_numa(p, env.src_cpu, task_cpu(env.best_task));
 	put_task_struct(env.best_task);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index cb467c22..52ba2d6 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1068,7 +1068,8 @@ enum numa_faults_stats {
 };
 extern void sched_setnuma(struct task_struct *p, int node);
 extern int migrate_task_to(struct task_struct *p, int cpu);
-extern int migrate_swap(struct task_struct *, struct task_struct *);
+extern int migrate_swap(struct task_struct *p, struct task_struct *t,
+			int cpu, int scpu);
 #endif /* CONFIG_NUMA_BALANCING */
 
 #ifdef CONFIG_SMP
-- 
1.8.3.1


  parent reply	other threads:[~2018-06-20 17:06 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-20 17:02 [PATCH v2 00/19] Fixes for sched/numa_balancing Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 01/19] sched/numa: Remove redundant field Srikar Dronamraju
2018-07-25 14:23   ` [tip:sched/core] " tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 02/19] sched/numa: Evaluate move once per node Srikar Dronamraju
2018-06-21  9:06   ` Mel Gorman
2018-07-25 14:24   ` [tip:sched/core] " tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 03/19] sched/numa: Simplify load_too_imbalanced Srikar Dronamraju
2018-07-25 14:24   ` [tip:sched/core] sched/numa: Simplify load_too_imbalanced() tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 04/19] sched/numa: Set preferred_node based on best_cpu Srikar Dronamraju
2018-06-21  9:17   ` Mel Gorman
2018-07-25 14:25   ` [tip:sched/core] " tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 05/19] sched/numa: Use task faults only if numa_group is not yet setup Srikar Dronamraju
2018-06-21  9:38   ` Mel Gorman
2018-07-25 14:25   ` [tip:sched/core] sched/numa: Use task faults only if numa_group is not yet set up tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 06/19] sched/debug: Reverse the order of printing faults Srikar Dronamraju
2018-07-25 14:26   ` [tip:sched/core] " tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 07/19] sched/numa: Skip nodes that are at hoplimit Srikar Dronamraju
2018-07-25 14:27   ` [tip:sched/core] sched/numa: Skip nodes that are at 'hoplimit' tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 08/19] sched/numa: Remove unused task_capacity from numa_stats Srikar Dronamraju
2018-07-25 14:27   ` [tip:sched/core] sched/numa: Remove unused task_capacity from 'struct numa_stats' tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` Srikar Dronamraju [this message]
2018-07-25 14:28   ` [tip:sched/core] sched/numa: Modify migrate_swap() to accept additional parameters tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 10/19] sched/numa: Stop multiple tasks from moving to the cpu at the same time Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 11/19] sched/numa: Restrict migrating in parallel to the same node Srikar Dronamraju
2018-07-23 10:38   ` Peter Zijlstra
2018-07-23 11:16     ` Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 12/19] sched/numa: Remove numa_has_capacity Srikar Dronamraju
2018-07-25 14:28   ` [tip:sched/core] sched/numa: Remove numa_has_capacity() tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 13/19] mm/migrate: Use xchg instead of spinlock Srikar Dronamraju
2018-06-21  9:51   ` Mel Gorman
2018-07-23 10:54   ` Peter Zijlstra
2018-07-23 11:20     ` Srikar Dronamraju
2018-07-23 14:04       ` Peter Zijlstra
2018-06-20 17:02 ` [PATCH v2 14/19] sched/numa: Updation of scan period need not be in lock Srikar Dronamraju
2018-06-21  9:51   ` Mel Gorman
2018-07-25 14:29   ` [tip:sched/core] sched/numa: Update the scan period without holding the numa_group lock tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 15/19] sched/numa: Use group_weights to identify if migration degrades locality Srikar Dronamraju
2018-07-25 14:29   ` [tip:sched/core] " tip-bot for Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 16/19] sched/numa: Detect if node actively handling migration Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 17/19] sched/numa: Pass destination cpu as a parameter to migrate_task_rq Srikar Dronamraju
2018-06-20 17:02 ` [PATCH v2 18/19] sched/numa: Reset scan rate whenever task moves across nodes Srikar Dronamraju
2018-06-21 10:05   ` Mel Gorman
2018-07-04 11:19     ` Srikar Dronamraju
2018-06-20 17:03 ` [PATCH v2 19/19] sched/numa: Move task_placement closer to numa_migrate_preferred Srikar Dronamraju
2018-06-21 10:06   ` Mel Gorman
2018-07-25 14:30   ` [tip:sched/core] sched/numa: Move task_numa_placement() closer to numa_migrate_preferred() tip-bot for Srikar Dronamraju
2018-06-20 17:03 ` [PATCH v2 00/19] Fixes for sched/numa_balancing Srikar Dronamraju
2018-07-23 13:57 ` Peter Zijlstra
2018-07-23 15:09   ` Srikar Dronamraju
2018-07-23 15:21     ` Peter Zijlstra
2018-07-23 16:29       ` Srikar Dronamraju
2018-07-23 16:47         ` Peter Zijlstra
2018-07-23 15:33     ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1529514181-9842-10-git-send-email-srikar@linux.vnet.ibm.com \
    --to=srikar@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).