* [PATCH] sched/core: Don't mix isolcpus and housekeeping CPUs
@ 2018-10-23 17:54 Srikar Dronamraju
2018-10-23 19:12 ` kbuild test robot
0 siblings, 1 reply; 3+ messages in thread
From: Srikar Dronamraju @ 2018-10-23 17:54 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra
Cc: LKML, Mel Gorman, Rik van Riel, Yi Wang, zhong.weidong, Yi Liu,
Srikar Dronamraju, Frederic Weisbecker, Thomas Gleixner
Load balancer and NUMA balancer are not suppose to work on isolcpus.
Currently when setting sched affinity, there are no checks to see if the
requested cpumask has CPUs from both isolcpus and housekeeping CPUs.
If user passes a mix of isolcpus and housekeeping CPUs, then
NUMA balancer can pick a isolcpu to schedule.
With this change, if a combination of isolcpus and housekeeping CPUs are
provided, then we restrict ourselves to housekeeping CPUs.
For example: System with 32 CPUs
$ grep -o "isolcpus=[,,1-9]*" /proc/cmdline
isolcpus=1,5,9,13
$ grep -i cpus_allowed /proc/$$/status
Cpus_allowed: ffffdddd
Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
Running "perf bench numa mem --no-data_rand_walk -p 4 -t 8 -G 0 -P 3072
-T 0 -l 50 -c -s 1000" which calls sched_setaffinity to all CPUs in
system.
Without patch
------------
$ for i in $(pgrep -f perf); do grep -i cpus_allowed_list /proc/$i/task/*/status ; done | head -n 10
Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/2107/task/2107/status:Cpus_allowed_list: 0-31
/proc/2107/task/2196/status:Cpus_allowed_list: 0-31
/proc/2107/task/2197/status:Cpus_allowed_list: 0-31
/proc/2107/task/2198/status:Cpus_allowed_list: 0-31
/proc/2107/task/2199/status:Cpus_allowed_list: 0-31
/proc/2107/task/2200/status:Cpus_allowed_list: 0-31
/proc/2107/task/2201/status:Cpus_allowed_list: 0-31
/proc/2107/task/2202/status:Cpus_allowed_list: 0-31
/proc/2107/task/2203/status:Cpus_allowed_list: 0-31
With patch
----------
$ for i in $(pgrep -f perf); do grep -i cpus_allowed_list /proc/$i/task/*/status ; done | head -n 10
Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18591/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18603/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18604/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18605/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18606/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18607/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18608/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18609/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18610/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
kernel/sched/core.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ad97f3b..fbb1f7c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4735,6 +4735,7 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
{
cpumask_var_t cpus_allowed, new_mask;
struct task_struct *p;
+ struct cpumask *hk_mask;
int retval;
rcu_read_lock();
@@ -4778,6 +4779,19 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
cpuset_cpus_allowed(p, cpus_allowed);
cpumask_and(new_mask, in_mask, cpus_allowed);
+ hk_mask = housekeeping_cpumask(HK_FLAG_DOMAIN);
+
+ /*
+ * If the cpumask provided has CPUs that are part of isolated and
+ * housekeeping_cpumask, then restrict it to just the CPUs that
+ * are part of the housekeeping_cpumask.
+ */
+ if (!cpumask_subset(new_mask, hk_mask) &&
+ cpumask_intersects(new_mask, hk_mask)) {
+ pr_info("pid %d: Mix of isolcpus and non-isolcpus provided\n",
+ p->pid);
+ cpumask_and(new_mask, new_mask, hk_mask);
+ }
/*
* Since bandwidth control happens on root_domain basis,
--
1.8.3.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] sched/core: Don't mix isolcpus and housekeeping CPUs
2018-10-23 17:54 [PATCH] sched/core: Don't mix isolcpus and housekeeping CPUs Srikar Dronamraju
@ 2018-10-23 19:12 ` kbuild test robot
2018-10-24 3:23 ` Srikar Dronamraju
0 siblings, 1 reply; 3+ messages in thread
From: kbuild test robot @ 2018-10-23 19:12 UTC (permalink / raw)
To: Srikar Dronamraju
Cc: kbuild-all, Ingo Molnar, Peter Zijlstra, LKML, Mel Gorman,
Rik van Riel, Yi Wang, zhong.weidong, Yi Liu, Srikar Dronamraju,
Frederic Weisbecker, Thomas Gleixner
[-- Attachment #1: Type: text/plain, Size: 4225 bytes --]
Hi Srikar,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on tip/sched/core]
[also build test WARNING on v4.19 next-20181019]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Srikar-Dronamraju/sched-core-Don-t-mix-isolcpus-and-housekeeping-CPUs/20181024-025019
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
All warnings (new ones prefixed by >>):
kernel/sched/core.c: In function 'sched_setaffinity':
>> kernel/sched/core.c:4783:10: warning: assignment discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
hk_mask = housekeeping_cpumask(HK_FLAG_DOMAIN);
^
vim +/const +4783 kernel/sched/core.c
4734
4735 long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
4736 {
4737 cpumask_var_t cpus_allowed, new_mask;
4738 struct task_struct *p;
4739 struct cpumask *hk_mask;
4740 int retval;
4741
4742 rcu_read_lock();
4743
4744 p = find_process_by_pid(pid);
4745 if (!p) {
4746 rcu_read_unlock();
4747 return -ESRCH;
4748 }
4749
4750 /* Prevent p going away */
4751 get_task_struct(p);
4752 rcu_read_unlock();
4753
4754 if (p->flags & PF_NO_SETAFFINITY) {
4755 retval = -EINVAL;
4756 goto out_put_task;
4757 }
4758 if (!alloc_cpumask_var(&cpus_allowed, GFP_KERNEL)) {
4759 retval = -ENOMEM;
4760 goto out_put_task;
4761 }
4762 if (!alloc_cpumask_var(&new_mask, GFP_KERNEL)) {
4763 retval = -ENOMEM;
4764 goto out_free_cpus_allowed;
4765 }
4766 retval = -EPERM;
4767 if (!check_same_owner(p)) {
4768 rcu_read_lock();
4769 if (!ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE)) {
4770 rcu_read_unlock();
4771 goto out_free_new_mask;
4772 }
4773 rcu_read_unlock();
4774 }
4775
4776 retval = security_task_setscheduler(p);
4777 if (retval)
4778 goto out_free_new_mask;
4779
4780
4781 cpuset_cpus_allowed(p, cpus_allowed);
4782 cpumask_and(new_mask, in_mask, cpus_allowed);
> 4783 hk_mask = housekeeping_cpumask(HK_FLAG_DOMAIN);
4784
4785 /*
4786 * If the cpumask provided has CPUs that are part of isolated and
4787 * housekeeping_cpumask, then restrict it to just the CPUs that
4788 * are part of the housekeeping_cpumask.
4789 */
4790 if (!cpumask_subset(new_mask, hk_mask) &&
4791 cpumask_intersects(new_mask, hk_mask)) {
4792 pr_info("pid %d: Mix of isolcpus and non-isolcpus provided\n",
4793 p->pid);
4794 cpumask_and(new_mask, new_mask, hk_mask);
4795 }
4796
4797 /*
4798 * Since bandwidth control happens on root_domain basis,
4799 * if admission test is enabled, we only admit -deadline
4800 * tasks allowed to run on all the CPUs in the task's
4801 * root_domain.
4802 */
4803 #ifdef CONFIG_SMP
4804 if (task_has_dl_policy(p) && dl_bandwidth_enabled()) {
4805 rcu_read_lock();
4806 if (!cpumask_subset(task_rq(p)->rd->span, new_mask)) {
4807 retval = -EBUSY;
4808 rcu_read_unlock();
4809 goto out_free_new_mask;
4810 }
4811 rcu_read_unlock();
4812 }
4813 #endif
4814 again:
4815 retval = __set_cpus_allowed_ptr(p, new_mask, true);
4816
4817 if (!retval) {
4818 cpuset_cpus_allowed(p, cpus_allowed);
4819 if (!cpumask_subset(new_mask, cpus_allowed)) {
4820 /*
4821 * We must have raced with a concurrent cpuset
4822 * update. Just reset the cpus_allowed to the
4823 * cpuset's cpus_allowed
4824 */
4825 cpumask_copy(new_mask, cpus_allowed);
4826 goto again;
4827 }
4828 }
4829 out_free_new_mask:
4830 free_cpumask_var(new_mask);
4831 out_free_cpus_allowed:
4832 free_cpumask_var(cpus_allowed);
4833 out_put_task:
4834 put_task_struct(p);
4835 return retval;
4836 }
4837
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6493 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] sched/core: Don't mix isolcpus and housekeeping CPUs
2018-10-23 19:12 ` kbuild test robot
@ 2018-10-24 3:23 ` Srikar Dronamraju
0 siblings, 0 replies; 3+ messages in thread
From: Srikar Dronamraju @ 2018-10-24 3:23 UTC (permalink / raw)
To: kbuild test robot
Cc: kbuild-all, Ingo Molnar, Peter Zijlstra, LKML, Mel Gorman,
Rik van Riel, Yi Wang, zhong.weidong, Yi Liu,
Frederic Weisbecker, Thomas Gleixner
> url: https://github.com/0day-ci/linux/commits/Srikar-Dronamraju/sched-core-Don-t-mix-isolcpus-and-housekeeping-CPUs/20181024-025019
> config: i386-tinyconfig (attached as .config)
> compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=i386
>
> All warnings (new ones prefixed by >>):
>
> kernel/sched/core.c: In function 'sched_setaffinity':
> >> kernel/sched/core.c:4783:10: warning: assignment discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
> hk_mask = housekeeping_cpumask(HK_FLAG_DOMAIN);
> ^
Thanks, I have posted a v2 fixing this.
http://lkml.kernel.org/r/1540350169-18581-1-git-send-email-srikar@linux.vnet.ibm.com
--
Thanks and Regards
Srikar Dronamraju
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2018-10-24 3:23 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-23 17:54 [PATCH] sched/core: Don't mix isolcpus and housekeeping CPUs Srikar Dronamraju
2018-10-23 19:12 ` kbuild test robot
2018-10-24 3:23 ` Srikar Dronamraju
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).