linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched/core: Don't mix isolcpus and housekeeping CPUs
@ 2018-10-23 17:54 Srikar Dronamraju
  2018-10-23 19:12 ` kbuild test robot
  0 siblings, 1 reply; 3+ messages in thread
From: Srikar Dronamraju @ 2018-10-23 17:54 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: LKML, Mel Gorman, Rik van Riel, Yi Wang, zhong.weidong, Yi Liu,
	Srikar Dronamraju, Frederic Weisbecker, Thomas Gleixner

Load balancer and NUMA balancer are not suppose to work on isolcpus.

Currently when setting sched affinity, there are no checks to see if the
requested cpumask has CPUs from both isolcpus and housekeeping CPUs.

If user passes a mix of isolcpus and housekeeping CPUs, then
NUMA balancer can pick a isolcpu to schedule.
With this change, if a combination of isolcpus and housekeeping CPUs are
provided, then we restrict ourselves to housekeeping CPUs.

For example: System with 32 CPUs
$ grep -o "isolcpus=[,,1-9]*" /proc/cmdline
isolcpus=1,5,9,13
$ grep -i cpus_allowed /proc/$$/status
Cpus_allowed:   ffffdddd
Cpus_allowed_list:      0,2-4,6-8,10-12,14-31

Running "perf bench numa mem --no-data_rand_walk -p 4 -t 8 -G 0 -P 3072
-T 0 -l 50 -c -s 1000" which  calls sched_setaffinity to all CPUs in
system.

Without patch
------------
$ for i in $(pgrep -f perf); do  grep -i cpus_allowed_list  /proc/$i/task/*/status ; done | head -n 10
Cpus_allowed_list:      0,2-4,6-8,10-12,14-31
/proc/2107/task/2107/status:Cpus_allowed_list:  0-31
/proc/2107/task/2196/status:Cpus_allowed_list:  0-31
/proc/2107/task/2197/status:Cpus_allowed_list:  0-31
/proc/2107/task/2198/status:Cpus_allowed_list:  0-31
/proc/2107/task/2199/status:Cpus_allowed_list:  0-31
/proc/2107/task/2200/status:Cpus_allowed_list:  0-31
/proc/2107/task/2201/status:Cpus_allowed_list:  0-31
/proc/2107/task/2202/status:Cpus_allowed_list:  0-31
/proc/2107/task/2203/status:Cpus_allowed_list:  0-31


With patch
----------
$ for i in $(pgrep -f perf); do  grep -i cpus_allowed_list  /proc/$i/task/*/status ; done | head -n 10
Cpus_allowed_list:      0,2-4,6-8,10-12,14-31
/proc/18591/task/18591/status:Cpus_allowed_list:        0,2-4,6-8,10-12,14-31
/proc/18591/task/18603/status:Cpus_allowed_list:        0,2-4,6-8,10-12,14-31
/proc/18591/task/18604/status:Cpus_allowed_list:        0,2-4,6-8,10-12,14-31
/proc/18591/task/18605/status:Cpus_allowed_list:        0,2-4,6-8,10-12,14-31
/proc/18591/task/18606/status:Cpus_allowed_list:        0,2-4,6-8,10-12,14-31
/proc/18591/task/18607/status:Cpus_allowed_list:        0,2-4,6-8,10-12,14-31
/proc/18591/task/18608/status:Cpus_allowed_list:        0,2-4,6-8,10-12,14-31
/proc/18591/task/18609/status:Cpus_allowed_list:        0,2-4,6-8,10-12,14-31
/proc/18591/task/18610/status:Cpus_allowed_list:        0,2-4,6-8,10-12,14-31

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 kernel/sched/core.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ad97f3b..fbb1f7c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4735,6 +4735,7 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 {
 	cpumask_var_t cpus_allowed, new_mask;
 	struct task_struct *p;
+	struct cpumask *hk_mask;
 	int retval;
 
 	rcu_read_lock();
@@ -4778,6 +4779,19 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 
 	cpuset_cpus_allowed(p, cpus_allowed);
 	cpumask_and(new_mask, in_mask, cpus_allowed);
+	hk_mask = housekeeping_cpumask(HK_FLAG_DOMAIN);
+
+	/*
+	 * If the cpumask provided has CPUs that are part of isolated and
+	 * housekeeping_cpumask, then restrict it to just the CPUs that
+	 * are part of the housekeeping_cpumask.
+	 */
+	if (!cpumask_subset(new_mask, hk_mask) &&
+			cpumask_intersects(new_mask, hk_mask)) {
+		pr_info("pid %d: Mix of isolcpus and non-isolcpus provided\n",
+			       p->pid);
+		cpumask_and(new_mask, new_mask, hk_mask);
+	}
 
 	/*
 	 * Since bandwidth control happens on root_domain basis,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] sched/core: Don't mix isolcpus and housekeeping CPUs
  2018-10-23 17:54 [PATCH] sched/core: Don't mix isolcpus and housekeeping CPUs Srikar Dronamraju
@ 2018-10-23 19:12 ` kbuild test robot
  2018-10-24  3:23   ` Srikar Dronamraju
  0 siblings, 1 reply; 3+ messages in thread
From: kbuild test robot @ 2018-10-23 19:12 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: kbuild-all, Ingo Molnar, Peter Zijlstra, LKML, Mel Gorman,
	Rik van Riel, Yi Wang, zhong.weidong, Yi Liu, Srikar Dronamraju,
	Frederic Weisbecker, Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 4225 bytes --]

Hi Srikar,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tip/sched/core]
[also build test WARNING on v4.19 next-20181019]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Srikar-Dronamraju/sched-core-Don-t-mix-isolcpus-and-housekeeping-CPUs/20181024-025019
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   kernel/sched/core.c: In function 'sched_setaffinity':
>> kernel/sched/core.c:4783:10: warning: assignment discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
     hk_mask = housekeeping_cpumask(HK_FLAG_DOMAIN);
             ^

vim +/const +4783 kernel/sched/core.c

  4734	
  4735	long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
  4736	{
  4737		cpumask_var_t cpus_allowed, new_mask;
  4738		struct task_struct *p;
  4739		struct cpumask *hk_mask;
  4740		int retval;
  4741	
  4742		rcu_read_lock();
  4743	
  4744		p = find_process_by_pid(pid);
  4745		if (!p) {
  4746			rcu_read_unlock();
  4747			return -ESRCH;
  4748		}
  4749	
  4750		/* Prevent p going away */
  4751		get_task_struct(p);
  4752		rcu_read_unlock();
  4753	
  4754		if (p->flags & PF_NO_SETAFFINITY) {
  4755			retval = -EINVAL;
  4756			goto out_put_task;
  4757		}
  4758		if (!alloc_cpumask_var(&cpus_allowed, GFP_KERNEL)) {
  4759			retval = -ENOMEM;
  4760			goto out_put_task;
  4761		}
  4762		if (!alloc_cpumask_var(&new_mask, GFP_KERNEL)) {
  4763			retval = -ENOMEM;
  4764			goto out_free_cpus_allowed;
  4765		}
  4766		retval = -EPERM;
  4767		if (!check_same_owner(p)) {
  4768			rcu_read_lock();
  4769			if (!ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE)) {
  4770				rcu_read_unlock();
  4771				goto out_free_new_mask;
  4772			}
  4773			rcu_read_unlock();
  4774		}
  4775	
  4776		retval = security_task_setscheduler(p);
  4777		if (retval)
  4778			goto out_free_new_mask;
  4779	
  4780	
  4781		cpuset_cpus_allowed(p, cpus_allowed);
  4782		cpumask_and(new_mask, in_mask, cpus_allowed);
> 4783		hk_mask = housekeeping_cpumask(HK_FLAG_DOMAIN);
  4784	
  4785		/*
  4786		 * If the cpumask provided has CPUs that are part of isolated and
  4787		 * housekeeping_cpumask, then restrict it to just the CPUs that
  4788		 * are part of the housekeeping_cpumask.
  4789		 */
  4790		if (!cpumask_subset(new_mask, hk_mask) &&
  4791				cpumask_intersects(new_mask, hk_mask)) {
  4792			pr_info("pid %d: Mix of isolcpus and non-isolcpus provided\n",
  4793				       p->pid);
  4794			cpumask_and(new_mask, new_mask, hk_mask);
  4795		}
  4796	
  4797		/*
  4798		 * Since bandwidth control happens on root_domain basis,
  4799		 * if admission test is enabled, we only admit -deadline
  4800		 * tasks allowed to run on all the CPUs in the task's
  4801		 * root_domain.
  4802		 */
  4803	#ifdef CONFIG_SMP
  4804		if (task_has_dl_policy(p) && dl_bandwidth_enabled()) {
  4805			rcu_read_lock();
  4806			if (!cpumask_subset(task_rq(p)->rd->span, new_mask)) {
  4807				retval = -EBUSY;
  4808				rcu_read_unlock();
  4809				goto out_free_new_mask;
  4810			}
  4811			rcu_read_unlock();
  4812		}
  4813	#endif
  4814	again:
  4815		retval = __set_cpus_allowed_ptr(p, new_mask, true);
  4816	
  4817		if (!retval) {
  4818			cpuset_cpus_allowed(p, cpus_allowed);
  4819			if (!cpumask_subset(new_mask, cpus_allowed)) {
  4820				/*
  4821				 * We must have raced with a concurrent cpuset
  4822				 * update. Just reset the cpus_allowed to the
  4823				 * cpuset's cpus_allowed
  4824				 */
  4825				cpumask_copy(new_mask, cpus_allowed);
  4826				goto again;
  4827			}
  4828		}
  4829	out_free_new_mask:
  4830		free_cpumask_var(new_mask);
  4831	out_free_cpus_allowed:
  4832		free_cpumask_var(cpus_allowed);
  4833	out_put_task:
  4834		put_task_struct(p);
  4835		return retval;
  4836	}
  4837	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6493 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] sched/core: Don't mix isolcpus and housekeeping CPUs
  2018-10-23 19:12 ` kbuild test robot
@ 2018-10-24  3:23   ` Srikar Dronamraju
  0 siblings, 0 replies; 3+ messages in thread
From: Srikar Dronamraju @ 2018-10-24  3:23 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, Ingo Molnar, Peter Zijlstra, LKML, Mel Gorman,
	Rik van Riel, Yi Wang, zhong.weidong, Yi Liu,
	Frederic Weisbecker, Thomas Gleixner

> url:    https://github.com/0day-ci/linux/commits/Srikar-Dronamraju/sched-core-Don-t-mix-isolcpus-and-housekeeping-CPUs/20181024-025019
> config: i386-tinyconfig (attached as .config)
> compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
> reproduce:
>         # save the attached .config to linux build tree
>         make ARCH=i386 
> 
> All warnings (new ones prefixed by >>):
> 
>    kernel/sched/core.c: In function 'sched_setaffinity':
> >> kernel/sched/core.c:4783:10: warning: assignment discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
>      hk_mask = housekeeping_cpumask(HK_FLAG_DOMAIN);
>              ^

Thanks, I have posted a v2 fixing this.

http://lkml.kernel.org/r/1540350169-18581-1-git-send-email-srikar@linux.vnet.ibm.com

-- 
Thanks and Regards
Srikar Dronamraju


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-10-24  3:23 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-23 17:54 [PATCH] sched/core: Don't mix isolcpus and housekeeping CPUs Srikar Dronamraju
2018-10-23 19:12 ` kbuild test robot
2018-10-24  3:23   ` Srikar Dronamraju

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).