linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] isolcpus
@ 2018-10-25 18:42 Srikar Dronamraju
  2018-10-25 18:42 ` [PATCH v3 1/3] sched/core: Warn if cpumask has a mix of isolcpus and housekeeping CPUs Srikar Dronamraju
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Srikar Dronamraju @ 2018-10-25 18:42 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: LKML, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Thomas Gleixner, Wang, zhong.weidong, Yi Liu,
	Frederic Weisbecker

It looks like cpus_allowed can have a mix of isolcpus and non-isolcpus.
However that seems to cause some inconsistent behaviour esp with numa
balancing.
The first patch will only add a warning whenever a user tries to pass a mask
that has a mix of both isolcpus and nonisolcpus.

The second patch will detect and correct mixed cpumask but silently.
Since set_cpus_allowed_common is under a spinlock, it doesnt add any hints
when it corrects the cpumask.

The third patch returns an error if a user passes a mixed cpumask.  Its an
addition to the first patch. However separating it out helps if ever we have
to revert the earlier behaviour. This might cause a change in
sched_setaffinity behaviour when isolcpus is set.

Srikar Dronamraju (3):
  sched/core: Warn if cpumask has a mix of isolcpus and housekeeping
    CPUs
  sched/core: Don't mix isolcpus and housekeeping CPUs
  sched/core: Error out if cpumask has a mix of isolcpus and
    housekeeping CPUs

 kernel/sched/core.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v3 1/3] sched/core: Warn if cpumask has a mix of isolcpus and housekeeping CPUs
  2018-10-25 18:42 [PATCH v3 0/3] isolcpus Srikar Dronamraju
@ 2018-10-25 18:42 ` Srikar Dronamraju
  2018-10-26  8:32   ` Peter Zijlstra
  2018-10-25 18:42 ` [PATCH v3 2/3] sched/core: Don't mix " Srikar Dronamraju
  2018-10-25 18:42 ` [PATCH v3 3/3] sched/core: Error out if cpumask has a mix of " Srikar Dronamraju
  2 siblings, 1 reply; 6+ messages in thread
From: Srikar Dronamraju @ 2018-10-25 18:42 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: LKML, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Thomas Gleixner, Wang, zhong.weidong, Yi Liu,
	Frederic Weisbecker

Currently when setting sched affinity, there are no checks to see if the
requested cpumask has CPUs from both isolcpus and housekeeping CPUs.
Mixing of isolcpus and housekeeping CPUs may lead to inconsistent
behaviours like tasks running on isolcpus with no load balancing.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
Changelog v2->v3:
Only a warning in sched_setaffinity. The actual detection is moved to
set_cpus_allowed_common.

 kernel/sched/core.c | 11 +++++++++++

 1 file changed, 11 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ad97f3b..3064e0f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4734,6 +4734,7 @@ static int sched_read_attr(struct sched_attr __user *uattr,
 long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 {
 	cpumask_var_t cpus_allowed, new_mask;
+	const struct cpumask *hk_mask;
 	struct task_struct *p;
 	int retval;
 
@@ -4778,6 +4779,16 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 
 	cpuset_cpus_allowed(p, cpus_allowed);
 	cpumask_and(new_mask, in_mask, cpus_allowed);
+	hk_mask = housekeeping_cpumask(HK_FLAG_DOMAIN);
+
+	/*
+	 * Warn if the cpumask provided has CPUs that are part of isolated and
+	 * housekeeping_cpumask
+	 */
+	if (!cpumask_subset(new_mask, hk_mask) &&
+			cpumask_intersects(new_mask, hk_mask))
+		pr_warn("pid %d: Mix of isolcpus and non-isolcpus provided\n",
+			p->pid);
 
 	/*
 	 * Since bandwidth control happens on root_domain basis,
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v3 2/3] sched/core: Don't mix isolcpus and housekeeping CPUs
  2018-10-25 18:42 [PATCH v3 0/3] isolcpus Srikar Dronamraju
  2018-10-25 18:42 ` [PATCH v3 1/3] sched/core: Warn if cpumask has a mix of isolcpus and housekeeping CPUs Srikar Dronamraju
@ 2018-10-25 18:42 ` Srikar Dronamraju
  2018-10-26  8:33   ` Peter Zijlstra
  2018-10-25 18:42 ` [PATCH v3 3/3] sched/core: Error out if cpumask has a mix of " Srikar Dronamraju
  2 siblings, 1 reply; 6+ messages in thread
From: Srikar Dronamraju @ 2018-10-25 18:42 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: LKML, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Thomas Gleixner, Wang, zhong.weidong, Yi Liu,
	Frederic Weisbecker

Load balancer and NUMA balancer are not suppose to work on isolcpus.

Currently when setting cpus_allowed for a task, there are no checks to see
if the requested cpumask has CPUs from both isolcpus and housekeeping CPUs.

If user passes a mix of isolcpus and housekeeping CPUs, then NUMA balancer
can pick a isolcpu to schedule.  With this change, if a combination of
isolcpus and housekeeping CPUs are provided, then we restrict it to
housekeeping CPUs only.

For example: System with 32 CPUs
$ grep -o "isolcpus=[,,1-9]*" /proc/cmdline
isolcpus=1,5,9,13
$ grep -i cpus_allowed /proc/$$/status
Cpus_allowed:   ffffdddd
Cpus_allowed_list:      0,2-4,6-8,10-12,14-31

Running "perf bench numa mem --no-data_rand_walk -p 4 -t 8 -G 0 -P 3072
-T 0 -l 50 -c -s 1000" which  calls sched_setaffinity to all CPUs in
system.

Without patch
------------
$ for i in $(pgrep -f perf); do  grep -i cpus_allowed_list  /proc/$i/task/*/status ; done | head -n 10
Cpus_allowed_list:      0,2-4,6-8,10-12,14-31
/proc/2107/task/2107/status:Cpus_allowed_list:  0-31
/proc/2107/task/2196/status:Cpus_allowed_list:  0-31
/proc/2107/task/2197/status:Cpus_allowed_list:  0-31
/proc/2107/task/2198/status:Cpus_allowed_list:  0-31
/proc/2107/task/2199/status:Cpus_allowed_list:  0-31
/proc/2107/task/2200/status:Cpus_allowed_list:  0-31
/proc/2107/task/2201/status:Cpus_allowed_list:  0-31
/proc/2107/task/2202/status:Cpus_allowed_list:  0-31
/proc/2107/task/2203/status:Cpus_allowed_list:  0-31

With patch
----------
$ for i in $(pgrep -f perf); do  grep -i cpus_allowed_list  /proc/$i/task/*/status ; done | head -n 10
Cpus_allowed_list:      0,2-4,6-8,10-12,14-31
/proc/18591/task/18591/status:Cpus_allowed_list:        0,2-4,6-8,10-12,14-31
/proc/18591/task/18603/status:Cpus_allowed_list:        0,2-4,6-8,10-12,14-31
/proc/18591/task/18604/status:Cpus_allowed_list:        0,2-4,6-8,10-12,14-31
/proc/18591/task/18605/status:Cpus_allowed_list:        0,2-4,6-8,10-12,14-31
/proc/18591/task/18606/status:Cpus_allowed_list:        0,2-4,6-8,10-12,14-31
/proc/18591/task/18607/status:Cpus_allowed_list:        0,2-4,6-8,10-12,14-31
/proc/18591/task/18608/status:Cpus_allowed_list:        0,2-4,6-8,10-12,14-31
/proc/18591/task/18609/status:Cpus_allowed_list:        0,2-4,6-8,10-12,14-31
/proc/18591/task/18610/status:Cpus_allowed_list:        0,2-4,6-8,10-12,14-31

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
Changelog v2->v3:
The actual detection is moved to set_cpus_allowed_common from
sched_setaffinity. This helps to solve all cases where task cpus_allowed is
set.

 kernel/sched/core.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3064e0f..37e62b8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1003,7 +1003,19 @@ static int migration_cpu_stop(void *data)
  */
 void set_cpus_allowed_common(struct task_struct *p, const struct cpumask *new_mask)
 {
-	cpumask_copy(&p->cpus_allowed, new_mask);
+	const struct cpumask *hk_mask = housekeeping_cpumask(HK_FLAG_DOMAIN);
+
+	/*
+	 * If the cpumask provided has CPUs that are part of isolated and
+	 * housekeeping_cpumask, then restrict it to just the CPUs that
+	 * are part of the housekeeping_cpumask.
+	 */
+	if (!cpumask_subset(new_mask, hk_mask) &&
+			cpumask_intersects(new_mask, hk_mask))
+		cpumask_and(&p->cpus_allowed, new_mask, hk_mask);
+	else
+		cpumask_copy(&p->cpus_allowed, new_mask);
+
 	p->nr_cpus_allowed = cpumask_weight(new_mask);
 }
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v3 3/3] sched/core: Error out if cpumask has a mix of isolcpus and housekeeping CPUs
  2018-10-25 18:42 [PATCH v3 0/3] isolcpus Srikar Dronamraju
  2018-10-25 18:42 ` [PATCH v3 1/3] sched/core: Warn if cpumask has a mix of isolcpus and housekeeping CPUs Srikar Dronamraju
  2018-10-25 18:42 ` [PATCH v3 2/3] sched/core: Don't mix " Srikar Dronamraju
@ 2018-10-25 18:42 ` Srikar Dronamraju
  2 siblings, 0 replies; 6+ messages in thread
From: Srikar Dronamraju @ 2018-10-25 18:42 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: LKML, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Thomas Gleixner, Wang, zhong.weidong, Yi Liu,
	Frederic Weisbecker

Return EINVAL, if the user has passed a mix of isolcpus and housekeeping
CPUs in cpumask to sched_setaffinity(). This will ensure that users are
notified so that they can take corrective actions to ensure consistent
behaviour. This might cause a change in sched_setaffinity behaviour when
isolcpus is set.

Suggested-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 kernel/sched/core.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 37e62b8..3842471 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4798,9 +4798,12 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 	 * housekeeping_cpumask
 	 */
 	if (!cpumask_subset(new_mask, hk_mask) &&
-			cpumask_intersects(new_mask, hk_mask))
+			cpumask_intersects(new_mask, hk_mask)) {
 		pr_warn("pid %d: Mix of isolcpus and non-isolcpus provided\n",
 			p->pid);
+		retval = -EINVAL;
+		goto out_free_new_mask;
+	}
 
 	/*
 	 * Since bandwidth control happens on root_domain basis,
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 1/3] sched/core: Warn if cpumask has a mix of isolcpus and housekeeping CPUs
  2018-10-25 18:42 ` [PATCH v3 1/3] sched/core: Warn if cpumask has a mix of isolcpus and housekeeping CPUs Srikar Dronamraju
@ 2018-10-26  8:32   ` Peter Zijlstra
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Zijlstra @ 2018-10-26  8:32 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, LKML, Mel Gorman, Rik van Riel, Thomas Gleixner,
	Wang, zhong.weidong, Yi Liu, Frederic Weisbecker

On Fri, Oct 26, 2018 at 12:12:21AM +0530, Srikar Dronamraju wrote:
> @@ -4778,6 +4779,16 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
>  
>  	cpuset_cpus_allowed(p, cpus_allowed);
>  	cpumask_and(new_mask, in_mask, cpus_allowed);
> +	hk_mask = housekeeping_cpumask(HK_FLAG_DOMAIN);
> +
> +	/*
> +	 * Warn if the cpumask provided has CPUs that are part of isolated and
> +	 * housekeeping_cpumask
> +	 */
> +	if (!cpumask_subset(new_mask, hk_mask) &&
> +			cpumask_intersects(new_mask, hk_mask))
> +		pr_warn("pid %d: Mix of isolcpus and non-isolcpus provided\n",
> +			p->pid);
>  
>  	/*
>  	 * Since bandwidth control happens on root_domain basis,

That is horribly coding style, and completely pointless. Also user
trigerable printl like that should be rate limited at the very least.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 2/3] sched/core: Don't mix isolcpus and housekeeping CPUs
  2018-10-25 18:42 ` [PATCH v3 2/3] sched/core: Don't mix " Srikar Dronamraju
@ 2018-10-26  8:33   ` Peter Zijlstra
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Zijlstra @ 2018-10-26  8:33 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, LKML, Mel Gorman, Rik van Riel, Thomas Gleixner,
	Wang, zhong.weidong, Yi Liu, Frederic Weisbecker

On Fri, Oct 26, 2018 at 12:12:22AM +0530, Srikar Dronamraju wrote:
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 3064e0f..37e62b8 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1003,7 +1003,19 @@ static int migration_cpu_stop(void *data)
>   */
>  void set_cpus_allowed_common(struct task_struct *p, const struct cpumask *new_mask)
>  {
> -	cpumask_copy(&p->cpus_allowed, new_mask);
> +	const struct cpumask *hk_mask = housekeeping_cpumask(HK_FLAG_DOMAIN);
> +
> +	/*
> +	 * If the cpumask provided has CPUs that are part of isolated and
> +	 * housekeeping_cpumask, then restrict it to just the CPUs that
> +	 * are part of the housekeeping_cpumask.
> +	 */
> +	if (!cpumask_subset(new_mask, hk_mask) &&
> +			cpumask_intersects(new_mask, hk_mask))
> +		cpumask_and(&p->cpus_allowed, new_mask, hk_mask);
> +	else
> +		cpumask_copy(&p->cpus_allowed, new_mask);
> +
>  	p->nr_cpus_allowed = cpumask_weight(new_mask);

NAK, I already explained that.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-10-26  8:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-25 18:42 [PATCH v3 0/3] isolcpus Srikar Dronamraju
2018-10-25 18:42 ` [PATCH v3 1/3] sched/core: Warn if cpumask has a mix of isolcpus and housekeeping CPUs Srikar Dronamraju
2018-10-26  8:32   ` Peter Zijlstra
2018-10-25 18:42 ` [PATCH v3 2/3] sched/core: Don't mix " Srikar Dronamraju
2018-10-26  8:33   ` Peter Zijlstra
2018-10-25 18:42 ` [PATCH v3 3/3] sched/core: Error out if cpumask has a mix of " Srikar Dronamraju

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).