All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] sched/numa: do not balance tasks onto isolated cpus
@ 2018-07-26  8:19 Cheng Lin
  2018-07-27  7:48 ` Peter Zijlstra
  0 siblings, 1 reply; 2+ messages in thread
From: Cheng Lin @ 2018-07-26  8:19 UTC (permalink / raw)
  To: mingo, peterz; +Cc: linux-kernel, jiang.biao2, zhong.weidong, tan.hu, Cheng Lin

By default, there is one sched domain covering all CPUs, including
those isolated ones using "isolcpus=" boot parameter. However, the
isolated CPUs will not participate in load balancing, and will not
have tasks running on them unless explicitly assigning by CPU
affinity.

But, NUMA balancing has not taken *isolcpus(isolated cpus)* into 
consideration. It may migrate tasks onto isolated cpus and the 
migrated tasks will never escape from the isolated cpus, which will
break the isolation provided by *isolcpus* boot parameter and 
intrduce various problems. The typical scenario is,

When we wanna use the isolated CPUs in a cgroup, cpuset must include
them(e.g. in container).In that case, task's CPU-affinity in the
cgroup includes the isolated CPU by default; If we pin a task onto an
isolated CPU or a CPU which on the same NUMA node with the isolated
CPU, and if there is another task sharing memory with the pinned task,
it will be migrated to the same NUMA node by NUMA-balancing for better
performance. In this case, the isolated CPU maybe chosen as the target
CPU.

Although Load-balancing never migrate a task onto isolated CPU, 
NUMA-balancing does not consider isolated CPU currently. This patch
ensure NUMA balancing not to balance tasks onto isolated

Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn>
Reviewed-by: Tan Hu <tan.hu@zte.com.cn>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
---
v2: 
* rework and retest on latest kernel
* detail the scenario in the commit log
* fix the SoB chain

 kernel/sched/core.c | 9 ++++++---
 kernel/sched/fair.c | 3 ++-
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fe365c9..170a673 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1302,10 +1302,12 @@ int migrate_swap(struct task_struct *cur, struct task_struct *p)
 	if (!cpu_active(arg.src_cpu) || !cpu_active(arg.dst_cpu))
 		goto out;
 
-	if (!cpumask_test_cpu(arg.dst_cpu, &arg.src_task->cpus_allowed))
+	if ((!cpumask_test_cpu(arg.dst_cpu, &arg.src_task->cpus_allowed))
+		|| !housekeeping_test_cpu(arg.dst_cpu, HK_FLAG_DOMAIN))
 		goto out;
 
-	if (!cpumask_test_cpu(arg.src_cpu, &arg.dst_task->cpus_allowed))
+	if ((!cpumask_test_cpu(arg.src_cpu, &arg.dst_task->cpus_allowed))
+		|| !housekeeping_test_cpu(arg.src_cpu, HK_FLAG_DOMAIN))
 		goto out;
 
 	trace_sched_swap_numa(cur, arg.src_cpu, p, arg.dst_cpu);
@@ -5508,7 +5510,8 @@ int migrate_task_to(struct task_struct *p, int target_cpu)
 	if (curr_cpu == target_cpu)
 		return 0;
 
-	if (!cpumask_test_cpu(target_cpu, &p->cpus_allowed))
+	if ((!cpumask_test_cpu(target_cpu, &p->cpus_allowed))
+		|| !housekeeping_test_cpu(target_cpu, HK_FLAG_DOMAIN))
 		return -EINVAL;
 
 	/* TODO: This is not properly updating schedstats */
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2f0a0be..1ea2953 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1724,7 +1724,8 @@ static void task_numa_find_cpu(struct task_numa_env *env,
 
 	for_each_cpu(cpu, cpumask_of_node(env->dst_nid)) {
 		/* Skip this CPU if the source task cannot migrate */
-		if (!cpumask_test_cpu(cpu, &env->p->cpus_allowed))
+		if ((!cpumask_test_cpu(cpu, &env->p->cpus_allowed))
+			|| !housekeeping_test_cpu(cpu, HK_FLAG_DOMAIN))
 			continue;
 
 		env->dst_cpu = cpu;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH v2] sched/numa: do not balance tasks onto isolated cpus
  2018-07-26  8:19 [PATCH v2] sched/numa: do not balance tasks onto isolated cpus Cheng Lin
@ 2018-07-27  7:48 ` Peter Zijlstra
  0 siblings, 0 replies; 2+ messages in thread
From: Peter Zijlstra @ 2018-07-27  7:48 UTC (permalink / raw)
  To: Cheng Lin; +Cc: mingo, linux-kernel, jiang.biao2, zhong.weidong, tan.hu

On Thu, Jul 26, 2018 at 04:19:08PM +0800, Cheng Lin wrote:
> -	if (!cpumask_test_cpu(arg.dst_cpu, &arg.src_task->cpus_allowed))
> +	if ((!cpumask_test_cpu(arg.dst_cpu, &arg.src_task->cpus_allowed))
> +		|| !housekeeping_test_cpu(arg.dst_cpu, HK_FLAG_DOMAIN))
>  		goto out;

You did not read the comment I provided last time. Using isolcpus (and
thus it's renamed houskeeping thing) is the wrong thing to do. Load
balancing should be limited to it's root domain.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-07-27  7:48 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-26  8:19 [PATCH v2] sched/numa: do not balance tasks onto isolated cpus Cheng Lin
2018-07-27  7:48 ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.