[PATCH v2] sched/core: Don't mix isolcpus and housekeeping CPUs

* [PATCH v2] sched/core: Don't mix isolcpus and housekeeping CPUs
@ 2018-10-24  3:02 Srikar Dronamraju
  2018-10-24  8:56 ` Mel Gorman
  2018-10-24 10:03 ` Peter Zijlstra
  0 siblings, 2 replies; 15+ messages in thread
From: Srikar Dronamraju @ 2018-10-24  3:02 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: LKML, Mel Gorman, Rik van Riel, Yi Wang, zhong.weidong, Yi Liu,
	Srikar Dronamraju, Frederic Weisbecker, Thomas Gleixner

Load balancer and NUMA balancer are not suppose to work on isolcpus.

Currently when setting sched affinity, there are no checks to see if the
requested cpumask has CPUs from both isolcpus and housekeeping CPUs.

If user passes a mix of isolcpus and housekeeping CPUs, then
NUMA balancer can pick a isolcpu to schedule.
With this change, if a combination of isolcpus and housekeeping CPUs are
provided, then we restrict ourselves to housekeeping CPUs.

For example: System with 32 CPUs
$ grep -o "isolcpus=[,,1-9]*" /proc/cmdline
isolcpus=1,5,9,13
$ grep -i cpus_allowed /proc/$$/status
Cpus_allowed:   ffffdddd
Cpus_allowed_list:      0,2-4,6-8,10-12,14-31

Running "perf bench numa mem --no-data_rand_walk -p 4 -t 8 -G 0 -P 3072
-T 0 -l 50 -c -s 1000" which  calls sched_setaffinity to all CPUs in
system.

Without patch
------------
$ for i in $(pgrep -f perf); do  grep -i cpus_allowed_list  /proc/$i/task/*/status ; done | head -n 10
Cpus_allowed_list:      0,2-4,6-8,10-12,14-31
/proc/2107/task/2107/status:Cpus_allowed_list:  0-31
/proc/2107/task/2196/status:Cpus_allowed_list:  0-31
/proc/2107/task/2197/status:Cpus_allowed_list:  0-31
/proc/2107/task/2198/status:Cpus_allowed_list:  0-31
/proc/2107/task/2199/status:Cpus_allowed_list:  0-31
/proc/2107/task/2200/status:Cpus_allowed_list:  0-31
/proc/2107/task/2201/status:Cpus_allowed_list:  0-31
/proc/2107/task/2202/status:Cpus_allowed_list:  0-31
/proc/2107/task/2203/status:Cpus_allowed_list:  0-31

With patch
----------
$ for i in $(pgrep -f perf); do  grep -i cpus_allowed_list  /proc/$i/task/*/status ; done | head -n 10
Cpus_allowed_list:      0,2-4,6-8,10-12,14-31
/proc/18591/task/18591/status:Cpus_allowed_list:        0,2-4,6-8,10-12,14-31
/proc/18591/task/18603/status:Cpus_allowed_list:        0,2-4,6-8,10-12,14-31
/proc/18591/task/18604/status:Cpus_allowed_list:        0,2-4,6-8,10-12,14-31
/proc/18591/task/18605/status:Cpus_allowed_list:        0,2-4,6-8,10-12,14-31
/proc/18591/task/18606/status:Cpus_allowed_list:        0,2-4,6-8,10-12,14-31
/proc/18591/task/18607/status:Cpus_allowed_list:        0,2-4,6-8,10-12,14-31
/proc/18591/task/18608/status:Cpus_allowed_list:        0,2-4,6-8,10-12,14-31
/proc/18591/task/18609/status:Cpus_allowed_list:        0,2-4,6-8,10-12,14-31
/proc/18591/task/18610/status:Cpus_allowed_list:        0,2-4,6-8,10-12,14-31

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
Changelog v1->v2:
 constification of hk_mask (reported by kbuild test robot)

 kernel/sched/core.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ad97f3b..54e7207 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4734,6 +4734,7 @@ static int sched_read_attr(struct sched_attr __user *uattr,
 long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 {
 	cpumask_var_t cpus_allowed, new_mask;
+	const struct cpumask *hk_mask;
 	struct task_struct *p;
 	int retval;
 
@@ -4778,6 +4779,19 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 
 	cpuset_cpus_allowed(p, cpus_allowed);
 	cpumask_and(new_mask, in_mask, cpus_allowed);
+	hk_mask = housekeeping_cpumask(HK_FLAG_DOMAIN);
+
+	/*
+	 * If the cpumask provided has CPUs that are part of isolated and
+	 * housekeeping_cpumask, then restrict it to just the CPUs that
+	 * are part of the housekeeping_cpumask.
+	 */
+	if (!cpumask_subset(new_mask, hk_mask) &&
+			cpumask_intersects(new_mask, hk_mask)) {
+		pr_info("pid %d: Mix of isolcpus and non-isolcpus provided\n",
+			       p->pid);
+		cpumask_and(new_mask, new_mask, hk_mask);
+	}
 
 	/*
 	 * Since bandwidth control happens on root_domain basis,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread