* [PATCH v3 1/3] sched/core: Warn if cpumask has a mix of isolcpus and housekeeping CPUs
2018-10-25 18:42 [PATCH v3 0/3] isolcpus Srikar Dronamraju
@ 2018-10-25 18:42 ` Srikar Dronamraju
2018-10-26 8:32 ` Peter Zijlstra
2018-10-25 18:42 ` [PATCH v3 2/3] sched/core: Don't mix " Srikar Dronamraju
2018-10-25 18:42 ` [PATCH v3 3/3] sched/core: Error out if cpumask has a mix of " Srikar Dronamraju
2 siblings, 1 reply; 6+ messages in thread
From: Srikar Dronamraju @ 2018-10-25 18:42 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra
Cc: LKML, Mel Gorman, Rik van Riel, Srikar Dronamraju,
Thomas Gleixner, Wang, zhong.weidong, Yi Liu,
Frederic Weisbecker
Currently when setting sched affinity, there are no checks to see if the
requested cpumask has CPUs from both isolcpus and housekeeping CPUs.
Mixing of isolcpus and housekeeping CPUs may lead to inconsistent
behaviours like tasks running on isolcpus with no load balancing.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
Changelog v2->v3:
Only a warning in sched_setaffinity. The actual detection is moved to
set_cpus_allowed_common.
kernel/sched/core.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ad97f3b..3064e0f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4734,6 +4734,7 @@ static int sched_read_attr(struct sched_attr __user *uattr,
long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
{
cpumask_var_t cpus_allowed, new_mask;
+ const struct cpumask *hk_mask;
struct task_struct *p;
int retval;
@@ -4778,6 +4779,16 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
cpuset_cpus_allowed(p, cpus_allowed);
cpumask_and(new_mask, in_mask, cpus_allowed);
+ hk_mask = housekeeping_cpumask(HK_FLAG_DOMAIN);
+
+ /*
+ * Warn if the cpumask provided has CPUs that are part of isolated and
+ * housekeeping_cpumask
+ */
+ if (!cpumask_subset(new_mask, hk_mask) &&
+ cpumask_intersects(new_mask, hk_mask))
+ pr_warn("pid %d: Mix of isolcpus and non-isolcpus provided\n",
+ p->pid);
/*
* Since bandwidth control happens on root_domain basis,
--
1.8.3.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v3 2/3] sched/core: Don't mix isolcpus and housekeeping CPUs
2018-10-25 18:42 [PATCH v3 0/3] isolcpus Srikar Dronamraju
2018-10-25 18:42 ` [PATCH v3 1/3] sched/core: Warn if cpumask has a mix of isolcpus and housekeeping CPUs Srikar Dronamraju
@ 2018-10-25 18:42 ` Srikar Dronamraju
2018-10-26 8:33 ` Peter Zijlstra
2018-10-25 18:42 ` [PATCH v3 3/3] sched/core: Error out if cpumask has a mix of " Srikar Dronamraju
2 siblings, 1 reply; 6+ messages in thread
From: Srikar Dronamraju @ 2018-10-25 18:42 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra
Cc: LKML, Mel Gorman, Rik van Riel, Srikar Dronamraju,
Thomas Gleixner, Wang, zhong.weidong, Yi Liu,
Frederic Weisbecker
Load balancer and NUMA balancer are not suppose to work on isolcpus.
Currently when setting cpus_allowed for a task, there are no checks to see
if the requested cpumask has CPUs from both isolcpus and housekeeping CPUs.
If user passes a mix of isolcpus and housekeeping CPUs, then NUMA balancer
can pick a isolcpu to schedule. With this change, if a combination of
isolcpus and housekeeping CPUs are provided, then we restrict it to
housekeeping CPUs only.
For example: System with 32 CPUs
$ grep -o "isolcpus=[,,1-9]*" /proc/cmdline
isolcpus=1,5,9,13
$ grep -i cpus_allowed /proc/$$/status
Cpus_allowed: ffffdddd
Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
Running "perf bench numa mem --no-data_rand_walk -p 4 -t 8 -G 0 -P 3072
-T 0 -l 50 -c -s 1000" which calls sched_setaffinity to all CPUs in
system.
Without patch
------------
$ for i in $(pgrep -f perf); do grep -i cpus_allowed_list /proc/$i/task/*/status ; done | head -n 10
Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/2107/task/2107/status:Cpus_allowed_list: 0-31
/proc/2107/task/2196/status:Cpus_allowed_list: 0-31
/proc/2107/task/2197/status:Cpus_allowed_list: 0-31
/proc/2107/task/2198/status:Cpus_allowed_list: 0-31
/proc/2107/task/2199/status:Cpus_allowed_list: 0-31
/proc/2107/task/2200/status:Cpus_allowed_list: 0-31
/proc/2107/task/2201/status:Cpus_allowed_list: 0-31
/proc/2107/task/2202/status:Cpus_allowed_list: 0-31
/proc/2107/task/2203/status:Cpus_allowed_list: 0-31
With patch
----------
$ for i in $(pgrep -f perf); do grep -i cpus_allowed_list /proc/$i/task/*/status ; done | head -n 10
Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18591/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18603/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18604/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18605/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18606/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18607/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18608/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18609/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18610/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
Changelog v2->v3:
The actual detection is moved to set_cpus_allowed_common from
sched_setaffinity. This helps to solve all cases where task cpus_allowed is
set.
kernel/sched/core.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3064e0f..37e62b8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1003,7 +1003,19 @@ static int migration_cpu_stop(void *data)
*/
void set_cpus_allowed_common(struct task_struct *p, const struct cpumask *new_mask)
{
- cpumask_copy(&p->cpus_allowed, new_mask);
+ const struct cpumask *hk_mask = housekeeping_cpumask(HK_FLAG_DOMAIN);
+
+ /*
+ * If the cpumask provided has CPUs that are part of isolated and
+ * housekeeping_cpumask, then restrict it to just the CPUs that
+ * are part of the housekeeping_cpumask.
+ */
+ if (!cpumask_subset(new_mask, hk_mask) &&
+ cpumask_intersects(new_mask, hk_mask))
+ cpumask_and(&p->cpus_allowed, new_mask, hk_mask);
+ else
+ cpumask_copy(&p->cpus_allowed, new_mask);
+
p->nr_cpus_allowed = cpumask_weight(new_mask);
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v3 3/3] sched/core: Error out if cpumask has a mix of isolcpus and housekeeping CPUs
2018-10-25 18:42 [PATCH v3 0/3] isolcpus Srikar Dronamraju
2018-10-25 18:42 ` [PATCH v3 1/3] sched/core: Warn if cpumask has a mix of isolcpus and housekeeping CPUs Srikar Dronamraju
2018-10-25 18:42 ` [PATCH v3 2/3] sched/core: Don't mix " Srikar Dronamraju
@ 2018-10-25 18:42 ` Srikar Dronamraju
2 siblings, 0 replies; 6+ messages in thread
From: Srikar Dronamraju @ 2018-10-25 18:42 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra
Cc: LKML, Mel Gorman, Rik van Riel, Srikar Dronamraju,
Thomas Gleixner, Wang, zhong.weidong, Yi Liu,
Frederic Weisbecker
Return EINVAL, if the user has passed a mix of isolcpus and housekeeping
CPUs in cpumask to sched_setaffinity(). This will ensure that users are
notified so that they can take corrective actions to ensure consistent
behaviour. This might cause a change in sched_setaffinity behaviour when
isolcpus is set.
Suggested-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
kernel/sched/core.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 37e62b8..3842471 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4798,9 +4798,12 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
* housekeeping_cpumask
*/
if (!cpumask_subset(new_mask, hk_mask) &&
- cpumask_intersects(new_mask, hk_mask))
+ cpumask_intersects(new_mask, hk_mask)) {
pr_warn("pid %d: Mix of isolcpus and non-isolcpus provided\n",
p->pid);
+ retval = -EINVAL;
+ goto out_free_new_mask;
+ }
/*
* Since bandwidth control happens on root_domain basis,
--
1.8.3.1
^ permalink raw reply related [flat|nested] 6+ messages in thread