[RFC PATCH 0/2] sched: Introduce CPU soft affinity for processes

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH 0/2] sched: Introduce CPU soft affinity for processes
@ 2017-09-19 22:37 Rohit Jain
  2017-09-19 22:37 ` [PATCH 1/2] sched: Introduce new flags to sched_setaffinity to support soft affinity Rohit Jain
  2017-09-19 22:37 ` [PATCH 2/2] sched: Actual changes after adding SCHED_SOFT_AFFINITY to make it work with the scheduler Rohit Jain
  0 siblings, 2 replies; 7+ messages in thread
From: Rohit Jain @ 2017-09-19 22:37 UTC (permalink / raw)
  To: linux-kernel, eas-dev; +Cc: peterz, mingo, joelaf

For multi-tenancy currently there are mechanisms to share the system CPUs
by time-sharing (e.g: CFS) and by dividing up the system in 'rigid'
containers by using system calls like sched_setaffinity. There is no
existing way in the linux kernel today, for flexible workloads where there
is a need to give the whole system while still maintaining a notion of
preference to CPUs.

This patch introduces a new CPU mask, 'cpus_preferred' within the
task_struct structure and allows applications a way to specify a set of
CPUs which the application would like to run on. The scheduler will try
to honor the applications' request the best it can, however if the 
scheduler finds that there are no idle CPUs within the preferred list,
it shall run the application anywhere within the system.

This can be used to design soft containers which allows a tenant to use
more capacity than he is entitled to when others aren't fully using
theirs. The advantage of space sharing the system as opposed to time
sharing is that you maintain more cache locality when the soft
containers are being utilized.

Since this behavior is observed on every scheduling decision, the
application gets to run on its preferred CPUs as long as the application
does not overuse its specified resources.

The design of soft containers still needs more user-space code however,
this is what is needed from the kernel.

FAQs:

Q) What if I set "hard" affinity after I set a preference by using soft
affinity?

A: Hard affinity will over-ride any previous soft affinity.

Q) What if my application had already specified a "hard" affinity? Can I
still provide a set of CPUs for soft affinity?

A: Yes, it will work as long as the new soft affinity is a subset of the
"hard" affinity.

Q) Can I have mutually exclusive hard and soft affinities?

A: No, soft affinity is always a subset of hard affinity.

Note:
Ignore the kernel/sched/tick-sched.c change. It is just fixing a build
error on Peter's tree.

Rohit Jain (2):
  sched: Introduce new flags to sched_setaffinity to support soft
    affinity.
  sched: Actual changes after adding SCHED_SOFT_AFFINITY to make it work
    with the scheduler

 arch/x86/entry/syscalls/syscall_64.tbl |   1 +
 include/linux/init_task.h              |   1 +
 include/linux/sched.h                  |   4 +-
 include/linux/syscalls.h               |   3 +
 include/uapi/asm-generic/unistd.h      |   4 +-
 include/uapi/linux/sched.h             |   3 +
 kernel/compat.c                        |   2 +-
 kernel/sched/core.c                    | 167 ++++++++++++++++++++++++++++-----
 kernel/sched/cpudeadline.c             |   4 +-
 kernel/sched/cpupri.c                  |   4 +-
 kernel/sched/fair.c                    | 116 +++++++++++++++++------
 kernel/time/tick-sched.c               |   1 +
 12 files changed, 250 insertions(+), 60 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] sched: Introduce new flags to sched_setaffinity to support soft affinity.
  2017-09-19 22:37 [RFC PATCH 0/2] sched: Introduce CPU soft affinity for processes Rohit Jain
@ 2017-09-19 22:37 ` Rohit Jain
  2017-09-21  6:34   ` kbuild test robot
  2017-09-21 10:38   ` kbuild test robot
  2017-09-19 22:37 ` [PATCH 2/2] sched: Actual changes after adding SCHED_SOFT_AFFINITY to make it work with the scheduler Rohit Jain
  1 sibling, 2 replies; 7+ messages in thread
From: Rohit Jain @ 2017-09-19 22:37 UTC (permalink / raw)
  To: linux-kernel, eas-dev; +Cc: peterz, mingo, joelaf

These are the changes for supporting the system call and set the
cpus_preferred mask as the application wants. This patch does not make
the cpus_preferred take any action.

Signed-off-by: Rohit Jain <rohit.k.jain@oracle.com>
---
 arch/x86/entry/syscalls/syscall_64.tbl |   1 +
 include/linux/init_task.h              |   1 +
 include/linux/sched.h                  |   4 +-
 include/linux/syscalls.h               |   3 +
 include/uapi/asm-generic/unistd.h      |   4 +-
 include/uapi/linux/sched.h             |   3 +
 kernel/compat.c                        |   2 +-
 kernel/sched/core.c                    | 167 ++++++++++++++++++++++++++++-----
 kernel/time/tick-sched.c               |   1 +
 9 files changed, 159 insertions(+), 27 deletions(-)

diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 5aef183..bd5f346 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -339,6 +339,7 @@
 330	common	pkey_alloc		sys_pkey_alloc
 331	common	pkey_free		sys_pkey_free
 332	common	statx			sys_statx
+333	common	sched_setaffinity_flags	sys_sched_setaffinity_flags
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 0e84971..bb8a8e1 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -235,6 +235,7 @@ extern struct cred init_cred;
 	.normal_prio	= MAX_PRIO-20,					\
 	.policy		= SCHED_NORMAL,					\
 	.cpus_allowed	= CPU_MASK_ALL,					\
+	.cpus_preferred = CPU_MASK_ALL,					\
 	.nr_cpus_allowed= NR_CPUS,					\
 	.mm		= NULL,						\
 	.active_mm	= &init_mm,					\
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 534542d..7e08ae8 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -582,6 +582,7 @@ struct task_struct {
 	unsigned int			policy;
 	int				nr_cpus_allowed;
 	cpumask_t			cpus_allowed;
+	cpumask_t			cpus_preferred;
 
 #ifdef CONFIG_PREEMPT_RCU
 	int				rcu_read_lock_nesting;
@@ -1647,7 +1648,8 @@ static inline void set_task_cpu(struct task_struct *p, unsigned int cpu)
 # define vcpu_is_preempted(cpu)	false
 #endif
 
-extern long sched_setaffinity(pid_t pid, const struct cpumask *new_mask);
+extern long sched_setaffinity(pid_t pid, const struct cpumask *new_mask,
+			      int flags);
 extern long sched_getaffinity(pid_t pid, struct cpumask *mask);
 
 #ifndef TASK_SIZE_OF
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index d4dfac8..83d04da 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -326,6 +326,9 @@ asmlinkage long sys_sched_get_priority_max(int policy);
 asmlinkage long sys_sched_get_priority_min(int policy);
 asmlinkage long sys_sched_rr_get_interval(pid_t pid,
 					struct timespec __user *interval);
+asmlinkage long sys_sched_setaffinity_flags(pid_t pid, unsigned int len,
+					    unsigned long __user *user_mask_ptr,
+					    int flags);
 asmlinkage long sys_setpriority(int which, int who, int niceval);
 asmlinkage long sys_getpriority(int which, int who);
 
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 061185a..5e88941 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -376,6 +376,8 @@ __SYSCALL(__NR_sched_getparam, sys_sched_getparam)
 #define __NR_sched_setaffinity 122
 __SC_COMP(__NR_sched_setaffinity, sys_sched_setaffinity, \
 	  compat_sys_sched_setaffinity)
+#define __NR_sched_setaffinity_flags 293
+__SYSCALL(__NR_sched_setaffinity_flags, sys_sched_setaffinity_flags)
 #define __NR_sched_getaffinity 123
 __SC_COMP(__NR_sched_getaffinity, sys_sched_getaffinity, \
 	  compat_sys_sched_getaffinity)
@@ -733,7 +735,7 @@ __SYSCALL(__NR_pkey_free,     sys_pkey_free)
 __SYSCALL(__NR_statx,     sys_statx)
 
 #undef __NR_syscalls
-#define __NR_syscalls 292
+#define __NR_syscalls 293
 
 /*
  * All syscalls below here should go away really,
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index e2a6c7b..81c17f5 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -49,4 +49,7 @@
 #define SCHED_FLAG_RESET_ON_FORK	0x01
 #define SCHED_FLAG_RECLAIM		0x02
 
+#define SCHED_HARD_AFFINITY	0
+#define SCHED_SOFT_AFFINITY	1
+
 #endif /* _UAPI_LINUX_SCHED_H */
diff --git a/kernel/compat.c b/kernel/compat.c
index 6f0a0e7..0ec60ea 100644
--- a/kernel/compat.c
+++ b/kernel/compat.c
@@ -356,7 +356,7 @@ COMPAT_SYSCALL_DEFINE3(sched_setaffinity, compat_pid_t, pid,
 	if (retval)
 		goto out;
 
-	retval = sched_setaffinity(pid, new_mask);
+	retval = sched_setaffinity(pid, new_mask, SCHED_HARD_AFFINITY);
 out:
 	free_cpumask_var(new_mask);
 	return retval;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ec80d2f..2e8d392 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1031,6 +1031,11 @@ void set_cpus_allowed_common(struct task_struct *p, const struct cpumask *new_ma
 	p->nr_cpus_allowed = cpumask_weight(new_mask);
 }
 
+void set_cpus_preferred_common(struct task_struct *p, const struct cpumask *new_mask)
+{
+	cpumask_copy(&p->cpus_preferred, new_mask);
+}
+
 void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
 {
 	struct rq *rq = task_rq(p);
@@ -1053,6 +1058,36 @@ void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
 		put_prev_task(rq, p);
 
 	p->sched_class->set_cpus_allowed(p, new_mask);
+	set_cpus_preferred_common(p, new_mask);
+
+	if (queued)
+		enqueue_task(rq, p, ENQUEUE_RESTORE | ENQUEUE_NOCLOCK);
+	if (running)
+		set_curr_task(rq, p);
+}
+
+void do_set_cpus_preferred(struct task_struct *p, const struct cpumask *new_mask)
+{
+	struct rq *rq = task_rq(p);
+	bool queued, running;
+
+	lockdep_assert_held(&p->pi_lock);
+
+	queued = task_on_rq_queued(p);
+	running = task_current(rq, p);
+
+	if (queued) {
+		/*
+		 * Because __kthread_bind() calls this on blocked tasks without
+		 * holding rq->lock.
+		 */
+		lockdep_assert_held(&rq->lock);
+		dequeue_task(rq, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK);
+	}
+	if (running)
+		put_prev_task(rq, p);
+
+	set_cpus_preferred_common(p, new_mask);
 
 	if (queued)
 		enqueue_task(rq, p, ENQUEUE_RESTORE | ENQUEUE_NOCLOCK);
@@ -1142,6 +1177,63 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
 	return ret;
 }
 
+static int
+__set_cpus_preferred_ptr(struct task_struct *p, const struct cpumask *new_mask)
+{
+	const struct cpumask *cpu_valid_mask = cpu_active_mask;
+	unsigned int dest_cpu;
+	struct rq_flags rf;
+	struct rq *rq;
+	int ret = 0;
+
+	rq = task_rq_lock(p, &rf);
+	update_rq_clock(rq);
+
+	if (p->flags & PF_KTHREAD) {
+		/*
+		 * Kernel threads are allowed on online && !active CPUs
+		 */
+		cpu_valid_mask = cpu_online_mask;
+	}
+
+	if (cpumask_equal(&p->cpus_preferred, new_mask))
+		goto out;
+
+	if (!cpumask_intersects(new_mask, cpu_valid_mask)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	do_set_cpus_preferred(p, new_mask);
+
+	if (p->flags & PF_KTHREAD) {
+		/*
+		 * For kernel threads that do indeed end up on online &&
+		 * !active we want to ensure they are strict per-CPU threads.
+		 */
+		WARN_ON(cpumask_intersects(new_mask, cpu_online_mask) &&
+			!cpumask_intersects(new_mask, cpu_active_mask) &&
+			p->nr_cpus_allowed != 1);
+	}
+
+	/* Can the task run on the task's current CPU? If so, we're done */
+	if (cpumask_test_cpu(task_cpu(p), new_mask))
+		goto out;
+
+	dest_cpu = cpumask_any_and(cpu_valid_mask, new_mask);
+	if (task_on_rq_queued(p)) {
+		/*
+		 * OK, since we're going to drop the lock immediately
+		 * afterwards anyway.
+		 */
+		rq = move_queued_task(rq, &rf, p, dest_cpu);
+	}
+out:
+	task_rq_unlock(rq, p, &rf);
+
+	return ret;
+}
+
 int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
 {
 	return __set_cpus_allowed_ptr(p, new_mask, false);
@@ -4620,7 +4712,7 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
 	return retval;
 }
 
-long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
+long sched_setaffinity(pid_t pid, const struct cpumask *in_mask, int flags)
 {
 	cpumask_var_t cpus_allowed, new_mask;
 	struct task_struct *p;
@@ -4686,19 +4778,23 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 	}
 #endif
 again:
-	retval = __set_cpus_allowed_ptr(p, new_mask, true);
-
-	if (!retval) {
-		cpuset_cpus_allowed(p, cpus_allowed);
-		if (!cpumask_subset(new_mask, cpus_allowed)) {
-			/*
-			 * We must have raced with a concurrent cpuset
-			 * update. Just reset the cpus_allowed to the
-			 * cpuset's cpus_allowed
-			 */
-			cpumask_copy(new_mask, cpus_allowed);
-			goto again;
+	if (flags == SCHED_HARD_AFFINITY) {
+		retval = __set_cpus_allowed_ptr(p, new_mask, true);
+
+		if (!retval) {
+			cpuset_cpus_allowed(p, cpus_allowed);
+			if (!cpumask_subset(new_mask, cpus_allowed)) {
+				/*
+				 * We must have raced with a concurrent cpuset
+				 * update. Just reset the cpus_allowed to the
+				 * cpuset's cpus_allowed
+				 */
+				cpumask_copy(new_mask, cpus_allowed);
+				goto again;
+			}
 		}
+	} else if (flags == SCHED_SOFT_AFFINITY) {
+		retval = __set_cpus_preferred_ptr(p, new_mask);
 	}
 out_free_new_mask:
 	free_cpumask_var(new_mask);
@@ -4720,30 +4816,53 @@ static int get_user_cpu_mask(unsigned long __user *user_mask_ptr, unsigned len,
 	return copy_from_user(new_mask, user_mask_ptr, len) ? -EFAULT : 0;
 }
 
-/**
- * sys_sched_setaffinity - set the CPU affinity of a process
- * @pid: pid of the process
- * @len: length in bytes of the bitmask pointed to by user_mask_ptr
- * @user_mask_ptr: user-space pointer to the new CPU mask
- *
- * Return: 0 on success. An error code otherwise.
- */
-SYSCALL_DEFINE3(sched_setaffinity, pid_t, pid, unsigned int, len,
-		unsigned long __user *, user_mask_ptr)
+static bool
+valid_affinity_flags(int flags)
+{
+	return flags == SCHED_HARD_AFFINITY || flags == SCHED_SOFT_AFFINITY;
+}
+
+static int
+sched_setaffinity_common(pid_t pid, unsigned int len,
+			 unsigned long __user *user_mask_ptr, int flags)
 {
 	cpumask_var_t new_mask;
 	int retval;
 
+	if (!valid_affinity_flags(flags))
+		return -EINVAL;
+
 	if (!alloc_cpumask_var(&new_mask, GFP_KERNEL))
 		return -ENOMEM;
 
 	retval = get_user_cpu_mask(user_mask_ptr, len, new_mask);
 	if (retval == 0)
-		retval = sched_setaffinity(pid, new_mask);
+		retval = sched_setaffinity(pid, new_mask, flags);
 	free_cpumask_var(new_mask);
 	return retval;
 }
 
+SYSCALL_DEFINE4(sched_setaffinity_flags, pid_t, pid, unsigned int, len,
+		unsigned long __user *, user_mask_ptr, int, flags)
+{
+	return sched_setaffinity_common(pid, len, user_mask_ptr, flags);
+}
+
+/**
+ * sys_sched_setaffinity - set the CPU affinity of a process
+ * @pid: pid of the process
+ * @len: length in bytes of the bitmask pointed to by user_mask_ptr
+ * @user_mask_ptr: user-space pointer to the new CPU mask
+ *
+ * Return: 0 on success. An error code otherwise.
+ */
+SYSCALL_DEFINE3(sched_setaffinity, pid_t, pid, unsigned int, len,
+		unsigned long __user *, user_mask_ptr)
+{
+	return sched_setaffinity_common(pid, len, user_mask_ptr,
+					SCHED_HARD_AFFINITY);
+}
+
 long sched_getaffinity(pid_t pid, struct cpumask *mask)
 {
 	struct task_struct *p;
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index eb0e975..ede1add 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -19,6 +19,7 @@
 #include <linux/percpu.h>
 #include <linux/nmi.h>
 #include <linux/profile.h>
+#include <linux/vmstat.h>
 #include <linux/sched/signal.h>
 #include <linux/sched/clock.h>
 #include <linux/sched/stat.h>
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] sched: Actual changes after adding SCHED_SOFT_AFFINITY to make it work with the scheduler
  2017-09-19 22:37 [RFC PATCH 0/2] sched: Introduce CPU soft affinity for processes Rohit Jain
  2017-09-19 22:37 ` [PATCH 1/2] sched: Introduce new flags to sched_setaffinity to support soft affinity Rohit Jain
@ 2017-09-19 22:37 ` Rohit Jain
  2017-09-21 11:52   ` Peter Zijlstra
  1 sibling, 1 reply; 7+ messages in thread
From: Rohit Jain @ 2017-09-19 22:37 UTC (permalink / raw)
  To: linux-kernel, eas-dev; +Cc: peterz, mingo, joelaf

These are the changes which change the scheduler behavior based on the
cpus_preferred mask. Keep in mind that when the system call changes
cpus_allowed mask, cpus_preferred and cpus_allowed become the same.

Signed-off-by: Rohit Jain <rohit.k.jain@oracle.com>
---
 kernel/sched/cpudeadline.c |   4 +-
 kernel/sched/cpupri.c      |   4 +-
 kernel/sched/fair.c        | 116 +++++++++++++++++++++++++++++++++------------
 3 files changed, 91 insertions(+), 33 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 8d9562d..32135b9 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -127,13 +127,13 @@ int cpudl_find(struct cpudl *cp, struct task_struct *p,
 	const struct sched_dl_entity *dl_se = &p->dl;
 
 	if (later_mask &&
-	    cpumask_and(later_mask, cp->free_cpus, &p->cpus_allowed)) {
+	    cpumask_and(later_mask, cp->free_cpus, &p->cpus_preferred)) {
 		return 1;
 	} else {
 		int best_cpu = cpudl_maximum(cp);
 		WARN_ON(best_cpu != -1 && !cpu_present(best_cpu));
 
-		if (cpumask_test_cpu(best_cpu, &p->cpus_allowed) &&
+		if (cpumask_test_cpu(best_cpu, &p->cpus_preferred) &&
 		    dl_time_before(dl_se->deadline, cp->elements[0].dl)) {
 			if (later_mask)
 				cpumask_set_cpu(best_cpu, later_mask);
diff --git a/kernel/sched/cpupri.c b/kernel/sched/cpupri.c
index 2511aba..9641b8d 100644
--- a/kernel/sched/cpupri.c
+++ b/kernel/sched/cpupri.c
@@ -103,11 +103,11 @@ int cpupri_find(struct cpupri *cp, struct task_struct *p,
 		if (skip)
 			continue;
 
-		if (cpumask_any_and(&p->cpus_allowed, vec->mask) >= nr_cpu_ids)
+		if (cpumask_any_and(&p->cpus_preferred, vec->mask) >= nr_cpu_ids)
 			continue;
 
 		if (lowest_mask) {
-			cpumask_and(lowest_mask, &p->cpus_allowed, vec->mask);
+			cpumask_and(lowest_mask, &p->cpus_preferred, vec->mask);
 
 			/*
 			 * We have to ensure that we have at least one bit
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index eca6a57..35e73c7 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5805,7 +5805,7 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p,
 
 		/* Skip over this group if it has no CPUs allowed */
 		if (!cpumask_intersects(sched_group_span(group),
-					&p->cpus_allowed))
+					&p->cpus_preferred))
 			continue;
 
 		local_group = cpumask_test_cpu(this_cpu,
@@ -5925,7 +5925,7 @@ find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
 		return cpumask_first(sched_group_span(group));
 
 	/* Traverse only the allowed CPUs */
-	for_each_cpu_and(i, sched_group_span(group), &p->cpus_allowed) {
+	for_each_cpu_and(i, sched_group_span(group), &p->cpus_preferred) {
 		if (idle_cpu(i)) {
 			struct rq *rq = cpu_rq(i);
 			struct cpuidle_state *idle = idle_get_state(rq);
@@ -6011,6 +6011,27 @@ void __update_idle_core(struct rq *rq)
 	rcu_read_unlock();
 }
 
+static inline int
+scan_cpu_mask_for_idle_cores(struct cpumask *cpus, int target)
+{
+	int core, cpu;
+
+	for_each_cpu_wrap(core, cpus, target) {
+		bool idle = true;
+
+		for_each_cpu(cpu, cpu_smt_mask(core)) {
+			cpumask_clear_cpu(cpu, cpus);
+			if (!idle_cpu(cpu))
+				idle = false;
+		}
+
+		if (idle)
+			return core;
+	}
+
+	return -1;
+}
+
 /*
  * Scan the entire LLC domain for idle cores; this dynamically switches off if
  * there are no idle cores left in the system; tracked through
@@ -6019,7 +6040,8 @@ void __update_idle_core(struct rq *rq)
 static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
 {
 	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
-	int core, cpu;
+	struct cpumask *pcpus = this_cpu_cpumask_var_ptr(select_idle_mask);
+	int core;
 
 	if (!static_branch_likely(&sched_smt_present))
 		return -1;
@@ -6028,20 +6050,21 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
 		return -1;
 
 	cpumask_and(cpus, sched_domain_span(sd), &p->cpus_allowed);
+	cpumask_and(pcpus, cpus, &p->cpus_preferred);
+	core = scan_cpu_mask_for_idle_cores(pcpus, target);
 
-	for_each_cpu_wrap(core, cpus, target) {
-		bool idle = true;
+	if (core >= 0)
+		return core;
 
-		for_each_cpu(cpu, cpu_smt_mask(core)) {
-			cpumask_clear_cpu(cpu, cpus);
-			if (!idle_cpu(cpu))
-				idle = false;
-		}
+	if (cpumask_equal(cpus, pcpus))
+		goto out;
 
-		if (idle)
-			return core;
-	}
+	cpumask_andnot(cpus, cpus, pcpus);
+	core = scan_cpu_mask_for_idle_cores(cpus, target);
 
+	if (core >= 0)
+		return core;
+out:
 	/*
 	 * Failed to find an idle core; stop looking for one.
 	 */
@@ -6050,24 +6073,40 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
 	return -1;
 }
 
+static inline int
+scan_cpu_mask_for_idle_smt(struct cpumask *cpus, int target)
+{
+	int cpu;
+
+	for_each_cpu(cpu, cpu_smt_mask(target)) {
+		if (!cpumask_test_cpu(cpu, cpus))
+			continue;
+		if (idle_cpu(cpu))
+			return cpu;
+	}
+
+	return -1;
+}
+
 /*
  * Scan the local SMT mask for idle CPUs.
  */
 static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
 {
+	struct cpumask *cpus = &p->cpus_allowed;
 	int cpu;
 
 	if (!static_branch_likely(&sched_smt_present))
 		return -1;
 
-	for_each_cpu(cpu, cpu_smt_mask(target)) {
-		if (!cpumask_test_cpu(cpu, &p->cpus_allowed))
-			continue;
-		if (idle_cpu(cpu))
-			return cpu;
-	}
+	cpu = scan_cpu_mask_for_idle_smt(&p->cpus_preferred, target);
 
-	return -1;
+	if (cpu >= 0 || cpumask_equal(&p->cpus_preferred, cpus))
+		return cpu;
+
+	cpumask_andnot(cpus, cpus, &p->cpus_preferred);
+
+	return scan_cpu_mask_for_idle_smt(cpus, target);
 }
 
 #else /* CONFIG_SCHED_SMT */
@@ -6084,6 +6123,24 @@ static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd
 
 #endif /* CONFIG_SCHED_SMT */
 
+static inline int
+scan_cpu_mask_for_idle_cpu(struct cpumask *cpus, int target,
+			   struct sched_domain *sd, int nr)
+{
+	int cpu;
+
+	for_each_cpu_wrap(cpu, sched_domain_span(sd), target) {
+		if (!--nr)
+			return -1;
+		if (!cpumask_test_cpu(cpu, cpus))
+			continue;
+		if (idle_cpu(cpu))
+			break;
+	}
+
+	return cpu;
+}
+
 /*
  * Scan the LLC domain for idle CPUs; this is dynamically regulated by
  * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
@@ -6092,6 +6149,7 @@ static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd
 static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
 {
 	struct sched_domain *this_sd;
+	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
 	u64 avg_cost, avg_idle;
 	u64 time, cost;
 	s64 delta;
@@ -6121,15 +6179,15 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 
 	time = local_clock();
 
-	for_each_cpu_wrap(cpu, sched_domain_span(sd), target) {
-		if (!--nr)
-			return -1;
-		if (!cpumask_test_cpu(cpu, &p->cpus_allowed))
-			continue;
-		if (idle_cpu(cpu))
-			break;
-	}
+	cpu = scan_cpu_mask_for_idle_cpu(&p->cpus_preferred, target, sd, nr);
+
+	if (cpu >= 0 || cpumask_equal(&p->cpus_preferred, &p->cpus_allowed))
+		goto out;
 
+	cpumask_andnot(cpus, &p->cpus_allowed, &p->cpus_preferred);
+
+	cpu = scan_cpu_mask_for_idle_cpu(cpus, target, sd, nr);
+out:
 	time = local_clock() - time;
 	cost = this_sd->avg_scan_cost;
 	delta = (s64)(time - cost) / 8;
@@ -6279,7 +6337,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
 	if (sd_flag & SD_BALANCE_WAKE) {
 		record_wakee(p);
 		want_affine = !wake_wide(p) && !wake_cap(p, cpu, prev_cpu)
-			      && cpumask_test_cpu(cpu, &p->cpus_allowed);
+			      && cpumask_test_cpu(cpu, &p->cpus_preferred);
 	}
 
 	rcu_read_lock();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] sched: Introduce new flags to sched_setaffinity to support soft affinity.
  2017-09-19 22:37 ` [PATCH 1/2] sched: Introduce new flags to sched_setaffinity to support soft affinity Rohit Jain
@ 2017-09-21  6:34   ` kbuild test robot
  2017-09-21 10:38   ` kbuild test robot
  1 sibling, 0 replies; 7+ messages in thread
From: kbuild test robot @ 2017-09-21  6:34 UTC (permalink / raw)
  To: Rohit Jain; +Cc: kbuild-all, linux-kernel, eas-dev, peterz, mingo, joelaf

[-- Attachment #1: Type: text/plain, Size: 4804 bytes --]

Hi Rohit,

[auto build test ERROR on tip/sched/core]
[also build test ERROR on v4.14-rc1 next-20170920]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Rohit-Jain/sched-Introduce-new-flags-to-sched_setaffinity-to-support-soft-affinity/20170921-140313
config: i386-randconfig-x018-201738 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   kernel//trace/trace_hwlat.c: In function 'move_to_next_cpu':
>> kernel//trace/trace_hwlat.c:299:2: error: too few arguments to function 'sched_setaffinity'
     sched_setaffinity(0, current_mask);
     ^~~~~~~~~~~~~~~~~
   In file included from include/linux/kthread.h:5:0,
                    from kernel//trace/trace_hwlat.c:42:
   include/linux/sched.h:1636:13: note: declared here
    extern long sched_setaffinity(pid_t pid, const struct cpumask *new_mask,
                ^~~~~~~~~~~~~~~~~
   kernel//trace/trace_hwlat.c: In function 'start_kthread':
   kernel//trace/trace_hwlat.c:372:2: error: too few arguments to function 'sched_setaffinity'
     sched_setaffinity(kthread->pid, current_mask);
     ^~~~~~~~~~~~~~~~~
   In file included from include/linux/kthread.h:5:0,
                    from kernel//trace/trace_hwlat.c:42:
   include/linux/sched.h:1636:13: note: declared here
    extern long sched_setaffinity(pid_t pid, const struct cpumask *new_mask,
                ^~~~~~~~~~~~~~~~~

vim +/sched_setaffinity +299 kernel//trace/trace_hwlat.c

0330f7aa Steven Rostedt (Red Hat  2016-07-15  269) 
f447c196 Steven Rostedt (VMware   2017-01-31  270) static void move_to_next_cpu(void)
0330f7aa Steven Rostedt (Red Hat  2016-07-15  271) {
f447c196 Steven Rostedt (VMware   2017-01-31  272) 	struct cpumask *current_mask = &save_cpumask;
0330f7aa Steven Rostedt (Red Hat  2016-07-15  273) 	int next_cpu;
0330f7aa Steven Rostedt (Red Hat  2016-07-15  274) 
0330f7aa Steven Rostedt (Red Hat  2016-07-15  275) 	if (disable_migrate)
0330f7aa Steven Rostedt (Red Hat  2016-07-15  276) 		return;
0330f7aa Steven Rostedt (Red Hat  2016-07-15  277) 	/*
0330f7aa Steven Rostedt (Red Hat  2016-07-15  278) 	 * If for some reason the user modifies the CPU affinity
0330f7aa Steven Rostedt (Red Hat  2016-07-15  279) 	 * of this thread, than stop migrating for the duration
0330f7aa Steven Rostedt (Red Hat  2016-07-15  280) 	 * of the current test.
0330f7aa Steven Rostedt (Red Hat  2016-07-15  281) 	 */
0330f7aa Steven Rostedt (Red Hat  2016-07-15  282) 	if (!cpumask_equal(current_mask, &current->cpus_allowed))
0330f7aa Steven Rostedt (Red Hat  2016-07-15  283) 		goto disable;
0330f7aa Steven Rostedt (Red Hat  2016-07-15  284) 
0330f7aa Steven Rostedt (Red Hat  2016-07-15  285) 	get_online_cpus();
0330f7aa Steven Rostedt (Red Hat  2016-07-15  286) 	cpumask_and(current_mask, cpu_online_mask, tracing_buffer_mask);
0330f7aa Steven Rostedt (Red Hat  2016-07-15  287) 	next_cpu = cpumask_next(smp_processor_id(), current_mask);
0330f7aa Steven Rostedt (Red Hat  2016-07-15  288) 	put_online_cpus();
0330f7aa Steven Rostedt (Red Hat  2016-07-15  289) 
0330f7aa Steven Rostedt (Red Hat  2016-07-15  290) 	if (next_cpu >= nr_cpu_ids)
0330f7aa Steven Rostedt (Red Hat  2016-07-15  291) 		next_cpu = cpumask_first(current_mask);
0330f7aa Steven Rostedt (Red Hat  2016-07-15  292) 
0330f7aa Steven Rostedt (Red Hat  2016-07-15  293) 	if (next_cpu >= nr_cpu_ids) /* Shouldn't happen! */
0330f7aa Steven Rostedt (Red Hat  2016-07-15  294) 		goto disable;
0330f7aa Steven Rostedt (Red Hat  2016-07-15  295) 
0330f7aa Steven Rostedt (Red Hat  2016-07-15  296) 	cpumask_clear(current_mask);
0330f7aa Steven Rostedt (Red Hat  2016-07-15  297) 	cpumask_set_cpu(next_cpu, current_mask);
0330f7aa Steven Rostedt (Red Hat  2016-07-15  298) 
0330f7aa Steven Rostedt (Red Hat  2016-07-15 @299) 	sched_setaffinity(0, current_mask);
0330f7aa Steven Rostedt (Red Hat  2016-07-15  300) 	return;
0330f7aa Steven Rostedt (Red Hat  2016-07-15  301) 
0330f7aa Steven Rostedt (Red Hat  2016-07-15  302)  disable:
0330f7aa Steven Rostedt (Red Hat  2016-07-15  303) 	disable_migrate = true;
0330f7aa Steven Rostedt (Red Hat  2016-07-15  304) }
0330f7aa Steven Rostedt (Red Hat  2016-07-15  305) 

:::::: The code at line 299 was first introduced by commit
:::::: 0330f7aa8ee63d0c435c0cb4e47ea06235ee4b7f tracing: Have hwlat trace migrate across tracing_cpumask CPUs

:::::: TO: Steven Rostedt (Red Hat) <rostedt@goodmis.org>
:::::: CC: Steven Rostedt <rostedt@goodmis.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 27719 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] sched: Introduce new flags to sched_setaffinity to support soft affinity.
  2017-09-19 22:37 ` [PATCH 1/2] sched: Introduce new flags to sched_setaffinity to support soft affinity Rohit Jain
  2017-09-21  6:34   ` kbuild test robot
@ 2017-09-21 10:38   ` kbuild test robot
  1 sibling, 0 replies; 7+ messages in thread
From: kbuild test robot @ 2017-09-21 10:38 UTC (permalink / raw)
  To: Rohit Jain; +Cc: kbuild-all, linux-kernel, eas-dev, peterz, mingo, joelaf

Hi Rohit,

[auto build test WARNING on tip/sched/core]
[also build test WARNING on v4.14-rc1 next-20170921]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Rohit-Jain/sched-Introduce-new-flags-to-sched_setaffinity-to-support-soft-affinity/20170921-140313
reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)


vim +299 kernel/trace/trace_hwlat.c

0330f7aa Steven Rostedt (Red Hat  2016-07-15  269) 
f447c196 Steven Rostedt (VMware   2017-01-31  270) static void move_to_next_cpu(void)
0330f7aa Steven Rostedt (Red Hat  2016-07-15  271) {
f447c196 Steven Rostedt (VMware   2017-01-31  272) 	struct cpumask *current_mask = &save_cpumask;
0330f7aa Steven Rostedt (Red Hat  2016-07-15  273) 	int next_cpu;
0330f7aa Steven Rostedt (Red Hat  2016-07-15  274) 
0330f7aa Steven Rostedt (Red Hat  2016-07-15  275) 	if (disable_migrate)
0330f7aa Steven Rostedt (Red Hat  2016-07-15  276) 		return;
0330f7aa Steven Rostedt (Red Hat  2016-07-15  277) 	/*
0330f7aa Steven Rostedt (Red Hat  2016-07-15  278) 	 * If for some reason the user modifies the CPU affinity
0330f7aa Steven Rostedt (Red Hat  2016-07-15  279) 	 * of this thread, than stop migrating for the duration
0330f7aa Steven Rostedt (Red Hat  2016-07-15  280) 	 * of the current test.
0330f7aa Steven Rostedt (Red Hat  2016-07-15  281) 	 */
0330f7aa Steven Rostedt (Red Hat  2016-07-15  282) 	if (!cpumask_equal(current_mask, &current->cpus_allowed))
0330f7aa Steven Rostedt (Red Hat  2016-07-15  283) 		goto disable;
0330f7aa Steven Rostedt (Red Hat  2016-07-15  284) 
0330f7aa Steven Rostedt (Red Hat  2016-07-15  285) 	get_online_cpus();
0330f7aa Steven Rostedt (Red Hat  2016-07-15  286) 	cpumask_and(current_mask, cpu_online_mask, tracing_buffer_mask);
0330f7aa Steven Rostedt (Red Hat  2016-07-15  287) 	next_cpu = cpumask_next(smp_processor_id(), current_mask);
0330f7aa Steven Rostedt (Red Hat  2016-07-15  288) 	put_online_cpus();
0330f7aa Steven Rostedt (Red Hat  2016-07-15  289) 
0330f7aa Steven Rostedt (Red Hat  2016-07-15  290) 	if (next_cpu >= nr_cpu_ids)
0330f7aa Steven Rostedt (Red Hat  2016-07-15  291) 		next_cpu = cpumask_first(current_mask);
0330f7aa Steven Rostedt (Red Hat  2016-07-15  292) 
0330f7aa Steven Rostedt (Red Hat  2016-07-15  293) 	if (next_cpu >= nr_cpu_ids) /* Shouldn't happen! */
0330f7aa Steven Rostedt (Red Hat  2016-07-15  294) 		goto disable;
0330f7aa Steven Rostedt (Red Hat  2016-07-15  295) 
0330f7aa Steven Rostedt (Red Hat  2016-07-15  296) 	cpumask_clear(current_mask);
0330f7aa Steven Rostedt (Red Hat  2016-07-15  297) 	cpumask_set_cpu(next_cpu, current_mask);
0330f7aa Steven Rostedt (Red Hat  2016-07-15  298) 
0330f7aa Steven Rostedt (Red Hat  2016-07-15 @299) 	sched_setaffinity(0, current_mask);
0330f7aa Steven Rostedt (Red Hat  2016-07-15  300) 	return;
0330f7aa Steven Rostedt (Red Hat  2016-07-15  301) 
0330f7aa Steven Rostedt (Red Hat  2016-07-15  302)  disable:
0330f7aa Steven Rostedt (Red Hat  2016-07-15  303) 	disable_migrate = true;
0330f7aa Steven Rostedt (Red Hat  2016-07-15  304) }
0330f7aa Steven Rostedt (Red Hat  2016-07-15  305) 

:::::: The code at line 299 was first introduced by commit
:::::: 0330f7aa8ee63d0c435c0cb4e47ea06235ee4b7f tracing: Have hwlat trace migrate across tracing_cpumask CPUs

:::::: TO: Steven Rostedt (Red Hat) <rostedt@goodmis.org>
:::::: CC: Steven Rostedt <rostedt@goodmis.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] sched: Actual changes after adding SCHED_SOFT_AFFINITY to make it work with the scheduler
  2017-09-19 22:37 ` [PATCH 2/2] sched: Actual changes after adding SCHED_SOFT_AFFINITY to make it work with the scheduler Rohit Jain
@ 2017-09-21 11:52   ` Peter Zijlstra
  2017-09-21 17:30     ` Rohit Jain
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2017-09-21 11:52 UTC (permalink / raw)
  To: Rohit Jain; +Cc: linux-kernel, eas-dev, mingo, joelaf

On Tue, Sep 19, 2017 at 03:37:12PM -0700, Rohit Jain wrote:
> @@ -6019,7 +6040,8 @@ void __update_idle_core(struct rq *rq)
>  static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
>  {
>  	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
> +	struct cpumask *pcpus = this_cpu_cpumask_var_ptr(select_idle_mask);

This is broken... they're the exact _same_ variable.

> +	int core;
>  
>  	if (!static_branch_likely(&sched_smt_present))
>  		return -1;
> @@ -6028,20 +6050,21 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
>  		return -1;
>  
>  	cpumask_and(cpus, sched_domain_span(sd), &p->cpus_allowed);
> +	cpumask_and(pcpus, cpus, &p->cpus_preferred);
> +	core = scan_cpu_mask_for_idle_cores(pcpus, target);
>  
> +	if (core >= 0)
> +		return core;
>  
> +	if (cpumask_equal(cpus, pcpus))
> +		goto out;

Therefore this _must_ be true.


Also, you're touching one of the hottest paths in the whole scheduler
for this half arsed feature, not going to happen.

You further failed to teach the actual load-balancer of this new mask,
so it will still happily move tasks around.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] sched: Actual changes after adding SCHED_SOFT_AFFINITY to make it work with the scheduler
  2017-09-21 11:52   ` Peter Zijlstra
@ 2017-09-21 17:30     ` Rohit Jain
  0 siblings, 0 replies; 7+ messages in thread
From: Rohit Jain @ 2017-09-21 17:30 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, eas-dev, mingo, joelaf, Atish Patra

On 09/21/2017 04:52 AM, Peter Zijlstra wrote:
> On Tue, Sep 19, 2017 at 03:37:12PM -0700, Rohit Jain wrote:
>> @@ -6019,7 +6040,8 @@ void __update_idle_core(struct rq *rq)
>>   static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
>>   {
>>   	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
>> +	struct cpumask *pcpus = this_cpu_cpumask_var_ptr(select_idle_mask);
> This is broken... they're the exact _same_ variable.
>
>> +	int core;
>>   
>>   	if (!static_branch_likely(&sched_smt_present))
>>   		return -1;
>> @@ -6028,20 +6050,21 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
>>   		return -1;
>>   
>>   	cpumask_and(cpus, sched_domain_span(sd), &p->cpus_allowed);
>> +	cpumask_and(pcpus, cpus, &p->cpus_preferred);
>> +	core = scan_cpu_mask_for_idle_cores(pcpus, target);
>>   
>> +	if (core >= 0)
>> +		return core;
>>   
>> +	if (cpumask_equal(cpus, pcpus))
>> +		goto out;
> Therefore this _must_ be true.

You are right, totally screwed it up here :(

>
> Also, you're touching one of the hottest paths in the whole scheduler
>

Yes, I am touching that code path but the number of CPUs scanned will
still remain the same (by intent at least). Because we will scan the
preferred first and then the allowed mask. I just added an 'ordering' in
searching. Please correct me if you meant something else entirely.

>
> You further failed to teach the actual load-balancer of this new mask,
> so it will still happily move tasks around.

We were thinking because on every wakeup in idle CPU search we 'force'
it to search the preferred CPUs first may be its OK to let it be stolen
in idle CPU load balancing because eventually it will come back to the
preferred CPUs? Again, may be my understanding is not right.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-09-21 17:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-19 22:37 [RFC PATCH 0/2] sched: Introduce CPU soft affinity for processes Rohit Jain
2017-09-19 22:37 ` [PATCH 1/2] sched: Introduce new flags to sched_setaffinity to support soft affinity Rohit Jain
2017-09-21  6:34   ` kbuild test robot
2017-09-21 10:38   ` kbuild test robot
2017-09-19 22:37 ` [PATCH 2/2] sched: Actual changes after adding SCHED_SOFT_AFFINITY to make it work with the scheduler Rohit Jain
2017-09-21 11:52   ` Peter Zijlstra
2017-09-21 17:30     ` Rohit Jain

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.