All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] scheduler tinification
@ 2017-05-29 21:02 Nicolas Pitre
  2017-05-29 21:02 ` [PATCH 1/7] cpuset/sched: cpuset makes sense for SMP only Nicolas Pitre
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: Nicolas Pitre @ 2017-05-29 21:02 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra; +Cc: linux-kernel

Many embedded systems don't need the full scheduler support. Most of the
time, user space is tightly controlled and many of the scheduler facilities
are simply unused.

This patch series makes it possible to configure out some parts of the
scheduler such as the deadline and realtime scheduler classes. The saving
in kernel footprint is non negligible.

Small ARM kernel config before this series:

   text    data     bss     dec     hex filename
  28623    3404     128   32155    7d9b kernel/sched/built-in.o

With this series and dl and rt classes disabled:

   text    data     bss     dec     hex filename
  20734    3334      40   24108    5e2c kernel/sched/built-in.o

And for the record, my Fedora workstation still boots and apparently runs
fine with those patches activated. I didn't test it at length though.

A significant part of the remaining code is support for various system calls
that could be automatically removed when user space doesn't use them but that
is a topic for another day.

diffstat for this series:

 include/linux/init_task.h      |  15 +-
 include/linux/rtmutex.h        |  69 ++++
 include/linux/sched.h          |   4 +
 include/linux/sched/deadline.h |   2 +-
 include/linux/sched/rt.h       |   4 +-
 init/Kconfig                   |  23 +-
 kernel/locking/Makefile        |   2 +
 kernel/locking/rtmutex.c       |   9 +
 kernel/sched/Makefile          |   7 +-
 kernel/sched/core.c            | 777 +++++------------------------------
 kernel/sched/deadline.c        | 336 +++++++++++++++
 kernel/sched/debug.c           |   6 +
 kernel/sched/rt.c              | 320 ++++++++++++++-
 kernel/sched/sched.h           |  35 +-
 kernel/sched/stop_task.c       |   6 +
 kernel/sched/topology.c        |   6 +
 kernel/sysctl.c                |   4 +-
 kernel/time/posix-cpu-timers.c |   6 +-
 lib/Kconfig.debug              |   2 +-
 19 files changed, 930 insertions(+), 703 deletions(-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/7] cpuset/sched: cpuset makes sense for SMP only
  2017-05-29 21:02 [PATCH 0/7] scheduler tinification Nicolas Pitre
@ 2017-05-29 21:02 ` Nicolas Pitre
  2017-05-29 21:02 ` [PATCH 2/7] sched: omit stop_sched_class when !SMP Nicolas Pitre
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Nicolas Pitre @ 2017-05-29 21:02 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra; +Cc: linux-kernel

Make CONFIG_CPUSETS depend on SMP as this feature makes no sense
on UP. This allows for configuring out cpuset_cpumask_can_shrink()
and task_can_attach() entirely.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
---
 init/Kconfig        | 1 +
 kernel/sched/core.c | 7 +++----
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 4ef946b466..b9aed60cac 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1156,6 +1156,7 @@ config CGROUP_HUGETLB
 
 config CPUSETS
 	bool "Cpuset controller"
+	depends on SMP
 	help
 	  This option will let you create and manage CPUSETs which
 	  allow dynamically partitioning a system into sets of CPUs and
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 803c3bc274..de274b1bd2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5463,6 +5463,8 @@ void init_idle(struct task_struct *idle, int cpu)
 #endif
 }
 
+#ifdef CONFIG_SMP
+
 int cpuset_cpumask_can_shrink(const struct cpumask *cur,
 			      const struct cpumask *trial)
 {
@@ -5506,7 +5508,6 @@ int task_can_attach(struct task_struct *p,
 		goto out;
 	}
 
-#ifdef CONFIG_SMP
 	if (dl_task(p) && !cpumask_intersects(task_rq(p)->rd->span,
 					      cs_cpus_allowed)) {
 		unsigned int dest_cpu = cpumask_any_and(cpu_active_mask,
@@ -5536,13 +5537,11 @@ int task_can_attach(struct task_struct *p,
 		rcu_read_unlock_sched();
 
 	}
-#endif
+
 out:
 	return ret;
 }
 
-#ifdef CONFIG_SMP
-
 bool sched_smp_initialized __read_mostly;
 
 #ifdef CONFIG_NUMA_BALANCING
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/7] sched: omit stop_sched_class when !SMP
  2017-05-29 21:02 [PATCH 0/7] scheduler tinification Nicolas Pitre
  2017-05-29 21:02 ` [PATCH 1/7] cpuset/sched: cpuset makes sense for SMP only Nicolas Pitre
@ 2017-05-29 21:02 ` Nicolas Pitre
  2017-06-08  9:30   ` [tip:sched/core] sched/core: Omit building " tip-bot for Nicolas Pitre
  2017-05-29 21:02 ` [PATCH 3/7] sched/deadline: move dl related code out of sched/core.c Nicolas Pitre
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Nicolas Pitre @ 2017-05-29 21:02 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra; +Cc: linux-kernel

The stop class is invoked through stop_machine only.
This is dead code on UP builds.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
---
 kernel/sched/Makefile |  4 ++--
 kernel/sched/core.c   | 60 +++++++++++++++++++++++++--------------------------
 kernel/sched/sched.h  |  4 ++++
 3 files changed, 36 insertions(+), 32 deletions(-)

diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index 89ab675866..5e4c2e7a63 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -16,9 +16,9 @@ CFLAGS_core.o := $(PROFILING) -fno-omit-frame-pointer
 endif
 
 obj-y += core.o loadavg.o clock.o cputime.o
-obj-y += idle_task.o fair.o rt.o deadline.o stop_task.o
+obj-y += idle_task.o fair.o rt.o deadline.o
 obj-y += wait.o swait.o completion.o idle.o
-obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o
+obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o
 obj-$(CONFIG_SCHED_AUTOGROUP) += autogroup.o
 obj-$(CONFIG_SCHEDSTATS) += stats.o
 obj-$(CONFIG_SCHED_DEBUG) += debug.o
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index de274b1bd2..94fa712791 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -788,36 +788,6 @@ void deactivate_task(struct rq *rq, struct task_struct *p, int flags)
 	dequeue_task(rq, p, flags);
 }
 
-void sched_set_stop_task(int cpu, struct task_struct *stop)
-{
-	struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 };
-	struct task_struct *old_stop = cpu_rq(cpu)->stop;
-
-	if (stop) {
-		/*
-		 * Make it appear like a SCHED_FIFO task, its something
-		 * userspace knows about and won't get confused about.
-		 *
-		 * Also, it will make PI more or less work without too
-		 * much confusion -- but then, stop work should not
-		 * rely on PI working anyway.
-		 */
-		sched_setscheduler_nocheck(stop, SCHED_FIFO, &param);
-
-		stop->sched_class = &stop_sched_class;
-	}
-
-	cpu_rq(cpu)->stop = stop;
-
-	if (old_stop) {
-		/*
-		 * Reset it back to a normal scheduling class so that
-		 * it can die in pieces.
-		 */
-		old_stop->sched_class = &rt_sched_class;
-	}
-}
-
 /*
  * __normal_prio - return the priority that is based on the static prio
  */
@@ -1588,6 +1558,36 @@ static void update_avg(u64 *avg, u64 sample)
 	*avg += diff >> 3;
 }
 
+void sched_set_stop_task(int cpu, struct task_struct *stop)
+{
+	struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 };
+	struct task_struct *old_stop = cpu_rq(cpu)->stop;
+
+	if (stop) {
+		/*
+		 * Make it appear like a SCHED_FIFO task, its something
+		 * userspace knows about and won't get confused about.
+		 *
+		 * Also, it will make PI more or less work without too
+		 * much confusion -- but then, stop work should not
+		 * rely on PI working anyway.
+		 */
+		sched_setscheduler_nocheck(stop, SCHED_FIFO, &param);
+
+		stop->sched_class = &stop_sched_class;
+	}
+
+	cpu_rq(cpu)->stop = stop;
+
+	if (old_stop) {
+		/*
+		 * Reset it back to a normal scheduling class so that
+		 * it can die in pieces.
+		 */
+		old_stop->sched_class = &rt_sched_class;
+	}
+}
+
 #else
 
 static inline int __set_cpus_allowed_ptr(struct task_struct *p,
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 6dda2aab73..053f60afb7 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1422,7 +1422,11 @@ static inline void set_curr_task(struct rq *rq, struct task_struct *curr)
 	curr->sched_class->set_curr_task(rq);
 }
 
+#ifdef CONFIG_SMP
 #define sched_class_highest (&stop_sched_class)
+#else
+#define sched_class_highest (&dl_sched_class)
+#endif
 #define for_each_class(class) \
    for (class = sched_class_highest; class; class = class->next)
 
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/7] sched/deadline: move dl related code out of sched/core.c
  2017-05-29 21:02 [PATCH 0/7] scheduler tinification Nicolas Pitre
  2017-05-29 21:02 ` [PATCH 1/7] cpuset/sched: cpuset makes sense for SMP only Nicolas Pitre
  2017-05-29 21:02 ` [PATCH 2/7] sched: omit stop_sched_class when !SMP Nicolas Pitre
@ 2017-05-29 21:02 ` Nicolas Pitre
  2017-05-29 21:02 ` [PATCH 4/7] sched/deadline: make it configurable Nicolas Pitre
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Nicolas Pitre @ 2017-05-29 21:02 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra; +Cc: linux-kernel

... to sched/deadline.c. This will help configuring the deadline
scheduling class out of the kernel build.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
---
 kernel/sched/core.c     | 335 +----------------------------------------------
 kernel/sched/deadline.c | 336 ++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/sched.h    |  14 ++
 3 files changed, 356 insertions(+), 329 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 94fa712791..93ce28ea34 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2148,23 +2148,6 @@ int wake_up_state(struct task_struct *p, unsigned int state)
 }
 
 /*
- * This function clears the sched_dl_entity static params.
- */
-void __dl_clear_params(struct task_struct *p)
-{
-	struct sched_dl_entity *dl_se = &p->dl;
-
-	dl_se->dl_runtime = 0;
-	dl_se->dl_deadline = 0;
-	dl_se->dl_period = 0;
-	dl_se->flags = 0;
-	dl_se->dl_bw = 0;
-
-	dl_se->dl_throttled = 0;
-	dl_se->dl_yielded = 0;
-}
-
-/*
  * Perform scheduler related setup for a newly forked process p.
  * p is forked by current.
  *
@@ -2443,90 +2426,6 @@ unsigned long to_ratio(u64 period, u64 runtime)
 	return div64_u64(runtime << 20, period);
 }
 
-#ifdef CONFIG_SMP
-inline struct dl_bw *dl_bw_of(int i)
-{
-	RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held(),
-			 "sched RCU must be held");
-	return &cpu_rq(i)->rd->dl_bw;
-}
-
-static inline int dl_bw_cpus(int i)
-{
-	struct root_domain *rd = cpu_rq(i)->rd;
-	int cpus = 0;
-
-	RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held(),
-			 "sched RCU must be held");
-	for_each_cpu_and(i, rd->span, cpu_active_mask)
-		cpus++;
-
-	return cpus;
-}
-#else
-inline struct dl_bw *dl_bw_of(int i)
-{
-	return &cpu_rq(i)->dl.dl_bw;
-}
-
-static inline int dl_bw_cpus(int i)
-{
-	return 1;
-}
-#endif
-
-/*
- * We must be sure that accepting a new task (or allowing changing the
- * parameters of an existing one) is consistent with the bandwidth
- * constraints. If yes, this function also accordingly updates the currently
- * allocated bandwidth to reflect the new situation.
- *
- * This function is called while holding p's rq->lock.
- *
- * XXX we should delay bw change until the task's 0-lag point, see
- * __setparam_dl().
- */
-static int dl_overflow(struct task_struct *p, int policy,
-		       const struct sched_attr *attr)
-{
-
-	struct dl_bw *dl_b = dl_bw_of(task_cpu(p));
-	u64 period = attr->sched_period ?: attr->sched_deadline;
-	u64 runtime = attr->sched_runtime;
-	u64 new_bw = dl_policy(policy) ? to_ratio(period, runtime) : 0;
-	int cpus, err = -1;
-
-	/* !deadline task may carry old deadline bandwidth */
-	if (new_bw == p->dl.dl_bw && task_has_dl_policy(p))
-		return 0;
-
-	/*
-	 * Either if a task, enters, leave, or stays -deadline but changes
-	 * its parameters, we may need to update accordingly the total
-	 * allocated bandwidth of the container.
-	 */
-	raw_spin_lock(&dl_b->lock);
-	cpus = dl_bw_cpus(task_cpu(p));
-	if (dl_policy(policy) && !task_has_dl_policy(p) &&
-	    !__dl_overflow(dl_b, cpus, 0, new_bw)) {
-		__dl_add(dl_b, new_bw);
-		err = 0;
-	} else if (dl_policy(policy) && task_has_dl_policy(p) &&
-		   !__dl_overflow(dl_b, cpus, p->dl.dl_bw, new_bw)) {
-		__dl_clear(dl_b, p->dl.dl_bw);
-		__dl_add(dl_b, new_bw);
-		err = 0;
-	} else if (!dl_policy(policy) && task_has_dl_policy(p)) {
-		__dl_clear(dl_b, p->dl.dl_bw);
-		err = 0;
-	}
-	raw_spin_unlock(&dl_b->lock);
-
-	return err;
-}
-
-extern void init_dl_bw(struct dl_bw *dl_b);
-
 /*
  * wake_up_new_task - wake up a newly created task for the first time.
  *
@@ -4009,46 +3908,6 @@ static struct task_struct *find_process_by_pid(pid_t pid)
 }
 
 /*
- * This function initializes the sched_dl_entity of a newly becoming
- * SCHED_DEADLINE task.
- *
- * Only the static values are considered here, the actual runtime and the
- * absolute deadline will be properly calculated when the task is enqueued
- * for the first time with its new policy.
- */
-static void
-__setparam_dl(struct task_struct *p, const struct sched_attr *attr)
-{
-	struct sched_dl_entity *dl_se = &p->dl;
-
-	dl_se->dl_runtime = attr->sched_runtime;
-	dl_se->dl_deadline = attr->sched_deadline;
-	dl_se->dl_period = attr->sched_period ?: dl_se->dl_deadline;
-	dl_se->flags = attr->sched_flags;
-	dl_se->dl_bw = to_ratio(dl_se->dl_period, dl_se->dl_runtime);
-
-	/*
-	 * Changing the parameters of a task is 'tricky' and we're not doing
-	 * the correct thing -- also see task_dead_dl() and switched_from_dl().
-	 *
-	 * What we SHOULD do is delay the bandwidth release until the 0-lag
-	 * point. This would include retaining the task_struct until that time
-	 * and change dl_overflow() to not immediately decrement the current
-	 * amount.
-	 *
-	 * Instead we retain the current runtime/deadline and let the new
-	 * parameters take effect after the current reservation period lapses.
-	 * This is safe (albeit pessimistic) because the 0-lag point is always
-	 * before the current scheduling deadline.
-	 *
-	 * We can still have temporary overloads because we do not delay the
-	 * change in bandwidth until that time; so admission control is
-	 * not on the safe side. It does however guarantee tasks will never
-	 * consume more than promised.
-	 */
-}
-
-/*
  * sched_setparam() passes in -1 for its policy, to let the functions
  * it calls know not to change it.
  */
@@ -4101,59 +3960,6 @@ static void __setscheduler(struct rq *rq, struct task_struct *p,
 		p->sched_class = &fair_sched_class;
 }
 
-static void
-__getparam_dl(struct task_struct *p, struct sched_attr *attr)
-{
-	struct sched_dl_entity *dl_se = &p->dl;
-
-	attr->sched_priority = p->rt_priority;
-	attr->sched_runtime = dl_se->dl_runtime;
-	attr->sched_deadline = dl_se->dl_deadline;
-	attr->sched_period = dl_se->dl_period;
-	attr->sched_flags = dl_se->flags;
-}
-
-/*
- * This function validates the new parameters of a -deadline task.
- * We ask for the deadline not being zero, and greater or equal
- * than the runtime, as well as the period of being zero or
- * greater than deadline. Furthermore, we have to be sure that
- * user parameters are above the internal resolution of 1us (we
- * check sched_runtime only since it is always the smaller one) and
- * below 2^63 ns (we have to check both sched_deadline and
- * sched_period, as the latter can be zero).
- */
-static bool
-__checkparam_dl(const struct sched_attr *attr)
-{
-	/* deadline != 0 */
-	if (attr->sched_deadline == 0)
-		return false;
-
-	/*
-	 * Since we truncate DL_SCALE bits, make sure we're at least
-	 * that big.
-	 */
-	if (attr->sched_runtime < (1ULL << DL_SCALE))
-		return false;
-
-	/*
-	 * Since we use the MSB for wrap-around and sign issues, make
-	 * sure it's not set (mind that period can be equal to zero).
-	 */
-	if (attr->sched_deadline & (1ULL << 63) ||
-	    attr->sched_period & (1ULL << 63))
-		return false;
-
-	/* runtime <= deadline <= period (if period != 0) */
-	if ((attr->sched_period != 0 &&
-	     attr->sched_period < attr->sched_deadline) ||
-	    attr->sched_deadline < attr->sched_runtime)
-		return false;
-
-	return true;
-}
-
 /*
  * Check the target process has a UID that matches the current process's:
  */
@@ -4170,19 +3976,6 @@ static bool check_same_owner(struct task_struct *p)
 	return match;
 }
 
-static bool dl_param_changed(struct task_struct *p, const struct sched_attr *attr)
-{
-	struct sched_dl_entity *dl_se = &p->dl;
-
-	if (dl_se->dl_runtime != attr->sched_runtime ||
-		dl_se->dl_deadline != attr->sched_deadline ||
-		dl_se->dl_period != attr->sched_period ||
-		dl_se->flags != attr->sched_flags)
-		return true;
-
-	return false;
-}
-
 static int __sched_setscheduler(struct task_struct *p,
 				const struct sched_attr *attr,
 				bool user, bool pi)
@@ -4362,7 +4155,7 @@ static int __sched_setscheduler(struct task_struct *p,
 	 * of a SCHED_DEADLINE task) we need to check if enough bandwidth
 	 * is available.
 	 */
-	if ((dl_policy(policy) || dl_task(p)) && dl_overflow(p, policy, attr)) {
+	if ((dl_policy(policy) || dl_task(p)) && sched_dl_overflow(p, policy, attr)) {
 		task_rq_unlock(rq, p, &rf);
 		return -EBUSY;
 	}
@@ -5468,23 +5261,12 @@ void init_idle(struct task_struct *idle, int cpu)
 int cpuset_cpumask_can_shrink(const struct cpumask *cur,
 			      const struct cpumask *trial)
 {
-	int ret = 1, trial_cpus;
-	struct dl_bw *cur_dl_b;
-	unsigned long flags;
+	int ret = 1;
 
 	if (!cpumask_weight(cur))
 		return ret;
 
-	rcu_read_lock_sched();
-	cur_dl_b = dl_bw_of(cpumask_any(cur));
-	trial_cpus = cpumask_weight(trial);
-
-	raw_spin_lock_irqsave(&cur_dl_b->lock, flags);
-	if (cur_dl_b->bw != -1 &&
-	    cur_dl_b->bw * trial_cpus < cur_dl_b->total_bw)
-		ret = 0;
-	raw_spin_unlock_irqrestore(&cur_dl_b->lock, flags);
-	rcu_read_unlock_sched();
+	ret = dl_cpuset_cpumask_can_shrink(cur, trial);
 
 	return ret;
 }
@@ -5509,34 +5291,8 @@ int task_can_attach(struct task_struct *p,
 	}
 
 	if (dl_task(p) && !cpumask_intersects(task_rq(p)->rd->span,
-					      cs_cpus_allowed)) {
-		unsigned int dest_cpu = cpumask_any_and(cpu_active_mask,
-							cs_cpus_allowed);
-		struct dl_bw *dl_b;
-		bool overflow;
-		int cpus;
-		unsigned long flags;
-
-		rcu_read_lock_sched();
-		dl_b = dl_bw_of(dest_cpu);
-		raw_spin_lock_irqsave(&dl_b->lock, flags);
-		cpus = dl_bw_cpus(dest_cpu);
-		overflow = __dl_overflow(dl_b, cpus, 0, p->dl.dl_bw);
-		if (overflow)
-			ret = -EBUSY;
-		else {
-			/*
-			 * We reserve space for this task in the destination
-			 * root_domain, as we can't fail after this point.
-			 * We will free resources in the source root_domain
-			 * later on (see set_cpus_allowed_dl()).
-			 */
-			__dl_add(dl_b, p->dl.dl_bw);
-		}
-		raw_spin_unlock_irqrestore(&dl_b->lock, flags);
-		rcu_read_unlock_sched();
-
-	}
+					      cs_cpus_allowed))
+		ret = dl_task_can_attach(p, cs_cpus_allowed);
 
 out:
 	return ret;
@@ -5804,23 +5560,8 @@ static void cpuset_cpu_active(void)
 
 static int cpuset_cpu_inactive(unsigned int cpu)
 {
-	unsigned long flags;
-	struct dl_bw *dl_b;
-	bool overflow;
-	int cpus;
-
 	if (!cpuhp_tasks_frozen) {
-		rcu_read_lock_sched();
-		dl_b = dl_bw_of(cpu);
-
-		raw_spin_lock_irqsave(&dl_b->lock, flags);
-		cpus = dl_bw_cpus(cpu);
-		overflow = __dl_overflow(dl_b, cpus, 0, 0);
-		raw_spin_unlock_irqrestore(&dl_b->lock, flags);
-
-		rcu_read_unlock_sched();
-
-		if (overflow)
+		if (dl_cpu_busy(cpu))
 			return -EBUSY;
 		cpuset_update_active_cpus();
 	} else {
@@ -6740,70 +6481,6 @@ static int sched_rt_global_constraints(void)
 }
 #endif /* CONFIG_RT_GROUP_SCHED */
 
-static int sched_dl_global_validate(void)
-{
-	u64 runtime = global_rt_runtime();
-	u64 period = global_rt_period();
-	u64 new_bw = to_ratio(period, runtime);
-	struct dl_bw *dl_b;
-	int cpu, ret = 0;
-	unsigned long flags;
-
-	/*
-	 * Here we want to check the bandwidth not being set to some
-	 * value smaller than the currently allocated bandwidth in
-	 * any of the root_domains.
-	 *
-	 * FIXME: Cycling on all the CPUs is overdoing, but simpler than
-	 * cycling on root_domains... Discussion on different/better
-	 * solutions is welcome!
-	 */
-	for_each_possible_cpu(cpu) {
-		rcu_read_lock_sched();
-		dl_b = dl_bw_of(cpu);
-
-		raw_spin_lock_irqsave(&dl_b->lock, flags);
-		if (new_bw < dl_b->total_bw)
-			ret = -EBUSY;
-		raw_spin_unlock_irqrestore(&dl_b->lock, flags);
-
-		rcu_read_unlock_sched();
-
-		if (ret)
-			break;
-	}
-
-	return ret;
-}
-
-static void sched_dl_do_global(void)
-{
-	u64 new_bw = -1;
-	struct dl_bw *dl_b;
-	int cpu;
-	unsigned long flags;
-
-	def_dl_bandwidth.dl_period = global_rt_period();
-	def_dl_bandwidth.dl_runtime = global_rt_runtime();
-
-	if (global_rt_runtime() != RUNTIME_INF)
-		new_bw = to_ratio(global_rt_period(), global_rt_runtime());
-
-	/*
-	 * FIXME: As above...
-	 */
-	for_each_possible_cpu(cpu) {
-		rcu_read_lock_sched();
-		dl_b = dl_bw_of(cpu);
-
-		raw_spin_lock_irqsave(&dl_b->lock, flags);
-		dl_b->bw = new_bw;
-		raw_spin_unlock_irqrestore(&dl_b->lock, flags);
-
-		rcu_read_unlock_sched();
-	}
-}
-
 static int sched_rt_global_validate(void)
 {
 	if (sysctl_sched_rt_period <= 0)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index a2ce590156..e879feae5f 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -17,6 +17,7 @@
 #include "sched.h"
 
 #include <linux/slab.h>
+#include <uapi/linux/sched/types.h>
 
 struct dl_bandwidth def_dl_bandwidth;
 
@@ -1854,6 +1855,341 @@ const struct sched_class dl_sched_class = {
 	.update_curr		= update_curr_dl,
 };
 
+#ifdef CONFIG_SMP
+struct dl_bw *dl_bw_of(int i)
+{
+	RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held(),
+			 "sched RCU must be held");
+	return &cpu_rq(i)->rd->dl_bw;
+}
+
+static inline int dl_bw_cpus(int i)
+{
+	struct root_domain *rd = cpu_rq(i)->rd;
+	int cpus = 0;
+
+	RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held(),
+			 "sched RCU must be held");
+	for_each_cpu_and(i, rd->span, cpu_active_mask)
+		cpus++;
+
+	return cpus;
+}
+#else
+struct dl_bw *dl_bw_of(int i)
+{
+	return &cpu_rq(i)->dl.dl_bw;
+}
+
+static inline int dl_bw_cpus(int i)
+{
+	return 1;
+}
+#endif
+
+int sched_dl_global_validate(void)
+{
+	u64 runtime = global_rt_runtime();
+	u64 period = global_rt_period();
+	u64 new_bw = to_ratio(period, runtime);
+	struct dl_bw *dl_b;
+	int cpu, ret = 0;
+	unsigned long flags;
+
+	/*
+	 * Here we want to check the bandwidth not being set to some
+	 * value smaller than the currently allocated bandwidth in
+	 * any of the root_domains.
+	 *
+	 * FIXME: Cycling on all the CPUs is overdoing, but simpler than
+	 * cycling on root_domains... Discussion on different/better
+	 * solutions is welcome!
+	 */
+	for_each_possible_cpu(cpu) {
+		rcu_read_lock_sched();
+		dl_b = dl_bw_of(cpu);
+
+		raw_spin_lock_irqsave(&dl_b->lock, flags);
+		if (new_bw < dl_b->total_bw)
+			ret = -EBUSY;
+		raw_spin_unlock_irqrestore(&dl_b->lock, flags);
+
+		rcu_read_unlock_sched();
+
+		if (ret)
+			break;
+	}
+
+	return ret;
+}
+
+void sched_dl_do_global(void)
+{
+	u64 new_bw = -1;
+	struct dl_bw *dl_b;
+	int cpu;
+	unsigned long flags;
+
+	def_dl_bandwidth.dl_period = global_rt_period();
+	def_dl_bandwidth.dl_runtime = global_rt_runtime();
+
+	if (global_rt_runtime() != RUNTIME_INF)
+		new_bw = to_ratio(global_rt_period(), global_rt_runtime());
+
+	/*
+	 * FIXME: As above...
+	 */
+	for_each_possible_cpu(cpu) {
+		rcu_read_lock_sched();
+		dl_b = dl_bw_of(cpu);
+
+		raw_spin_lock_irqsave(&dl_b->lock, flags);
+		dl_b->bw = new_bw;
+		raw_spin_unlock_irqrestore(&dl_b->lock, flags);
+
+		rcu_read_unlock_sched();
+	}
+}
+
+/*
+ * We must be sure that accepting a new task (or allowing changing the
+ * parameters of an existing one) is consistent with the bandwidth
+ * constraints. If yes, this function also accordingly updates the currently
+ * allocated bandwidth to reflect the new situation.
+ *
+ * This function is called while holding p's rq->lock.
+ *
+ * XXX we should delay bw change until the task's 0-lag point, see
+ * __setparam_dl().
+ */
+int sched_dl_overflow(struct task_struct *p, int policy,
+		      const struct sched_attr *attr)
+{
+	struct dl_bw *dl_b = dl_bw_of(task_cpu(p));
+	u64 period = attr->sched_period ?: attr->sched_deadline;
+	u64 runtime = attr->sched_runtime;
+	u64 new_bw = dl_policy(policy) ? to_ratio(period, runtime) : 0;
+	int cpus, err = -1;
+
+	/* !deadline task may carry old deadline bandwidth */
+	if (new_bw == p->dl.dl_bw && task_has_dl_policy(p))
+		return 0;
+
+	/*
+	 * Either if a task, enters, leave, or stays -deadline but changes
+	 * its parameters, we may need to update accordingly the total
+	 * allocated bandwidth of the container.
+	 */
+	raw_spin_lock(&dl_b->lock);
+	cpus = dl_bw_cpus(task_cpu(p));
+	if (dl_policy(policy) && !task_has_dl_policy(p) &&
+	    !__dl_overflow(dl_b, cpus, 0, new_bw)) {
+		__dl_add(dl_b, new_bw);
+		err = 0;
+	} else if (dl_policy(policy) && task_has_dl_policy(p) &&
+		   !__dl_overflow(dl_b, cpus, p->dl.dl_bw, new_bw)) {
+		__dl_clear(dl_b, p->dl.dl_bw);
+		__dl_add(dl_b, new_bw);
+		err = 0;
+	} else if (!dl_policy(policy) && task_has_dl_policy(p)) {
+		__dl_clear(dl_b, p->dl.dl_bw);
+		err = 0;
+	}
+	raw_spin_unlock(&dl_b->lock);
+
+	return err;
+}
+
+/*
+ * This function initializes the sched_dl_entity of a newly becoming
+ * SCHED_DEADLINE task.
+ *
+ * Only the static values are considered here, the actual runtime and the
+ * absolute deadline will be properly calculated when the task is enqueued
+ * for the first time with its new policy.
+ */
+void __setparam_dl(struct task_struct *p, const struct sched_attr *attr)
+{
+	struct sched_dl_entity *dl_se = &p->dl;
+
+	dl_se->dl_runtime = attr->sched_runtime;
+	dl_se->dl_deadline = attr->sched_deadline;
+	dl_se->dl_period = attr->sched_period ?: dl_se->dl_deadline;
+	dl_se->flags = attr->sched_flags;
+	dl_se->dl_bw = to_ratio(dl_se->dl_period, dl_se->dl_runtime);
+
+	/*
+	 * Changing the parameters of a task is 'tricky' and we're not doing
+	 * the correct thing -- also see task_dead_dl() and switched_from_dl().
+	 *
+	 * What we SHOULD do is delay the bandwidth release until the 0-lag
+	 * point. This would include retaining the task_struct until that time
+	 * and change sched_dl_overflow() to not immediately decrement the
+	 * current amount.
+	 *
+	 * Instead we retain the current runtime/deadline and let the new
+	 * parameters take effect after the current reservation period lapses.
+	 * This is safe (albeit pessimistic) because the 0-lag point is always
+	 * before the current scheduling deadline.
+	 *
+	 * We can still have temporary overloads because we do not delay the
+	 * change in bandwidth until that time; so admission control is
+	 * not on the safe side. It does however guarantee tasks will never
+	 * consume more than promised.
+	 */
+}
+
+void __getparam_dl(struct task_struct *p, struct sched_attr *attr)
+{
+	struct sched_dl_entity *dl_se = &p->dl;
+
+	attr->sched_priority = p->rt_priority;
+	attr->sched_runtime = dl_se->dl_runtime;
+	attr->sched_deadline = dl_se->dl_deadline;
+	attr->sched_period = dl_se->dl_period;
+	attr->sched_flags = dl_se->flags;
+}
+
+/*
+ * This function validates the new parameters of a -deadline task.
+ * We ask for the deadline not being zero, and greater or equal
+ * than the runtime, as well as the period of being zero or
+ * greater than deadline. Furthermore, we have to be sure that
+ * user parameters are above the internal resolution of 1us (we
+ * check sched_runtime only since it is always the smaller one) and
+ * below 2^63 ns (we have to check both sched_deadline and
+ * sched_period, as the latter can be zero).
+ */
+bool __checkparam_dl(const struct sched_attr *attr)
+{
+	/* deadline != 0 */
+	if (attr->sched_deadline == 0)
+		return false;
+
+	/*
+	 * Since we truncate DL_SCALE bits, make sure we're at least
+	 * that big.
+	 */
+	if (attr->sched_runtime < (1ULL << DL_SCALE))
+		return false;
+
+	/*
+	 * Since we use the MSB for wrap-around and sign issues, make
+	 * sure it's not set (mind that period can be equal to zero).
+	 */
+	if (attr->sched_deadline & (1ULL << 63) ||
+	    attr->sched_period & (1ULL << 63))
+		return false;
+
+	/* runtime <= deadline <= period (if period != 0) */
+	if ((attr->sched_period != 0 &&
+	     attr->sched_period < attr->sched_deadline) ||
+	    attr->sched_deadline < attr->sched_runtime)
+		return false;
+
+	return true;
+}
+
+/*
+ * This function clears the sched_dl_entity static params.
+ */
+void __dl_clear_params(struct task_struct *p)
+{
+	struct sched_dl_entity *dl_se = &p->dl;
+
+	dl_se->dl_runtime = 0;
+	dl_se->dl_deadline = 0;
+	dl_se->dl_period = 0;
+	dl_se->flags = 0;
+	dl_se->dl_bw = 0;
+
+	dl_se->dl_throttled = 0;
+	dl_se->dl_yielded = 0;
+}
+
+bool dl_param_changed(struct task_struct *p, const struct sched_attr *attr)
+{
+	struct sched_dl_entity *dl_se = &p->dl;
+
+	if (dl_se->dl_runtime != attr->sched_runtime ||
+	    dl_se->dl_deadline != attr->sched_deadline ||
+	    dl_se->dl_period != attr->sched_period ||
+	    dl_se->flags != attr->sched_flags)
+		return true;
+
+	return false;
+}
+
+#ifdef CONFIG_SMP
+int dl_task_can_attach(struct task_struct *p, const struct cpumask *cs_cpus_allowed)
+{
+	unsigned int dest_cpu = cpumask_any_and(cpu_active_mask,
+							cs_cpus_allowed);
+	struct dl_bw *dl_b;
+	bool overflow;
+	int cpus, ret;
+	unsigned long flags;
+
+	rcu_read_lock_sched();
+	dl_b = dl_bw_of(dest_cpu);
+	raw_spin_lock_irqsave(&dl_b->lock, flags);
+	cpus = dl_bw_cpus(dest_cpu);
+	overflow = __dl_overflow(dl_b, cpus, 0, p->dl.dl_bw);
+	if (overflow)
+		ret = -EBUSY;
+	else {
+		/*
+		 * We reserve space for this task in the destination
+		 * root_domain, as we can't fail after this point.
+		 * We will free resources in the source root_domain
+		 * later on (see set_cpus_allowed_dl()).
+		 */
+		__dl_add(dl_b, p->dl.dl_bw);
+		ret = 0;
+	}
+	raw_spin_unlock_irqrestore(&dl_b->lock, flags);
+	rcu_read_unlock_sched();
+	return ret;
+}
+
+int dl_cpuset_cpumask_can_shrink(const struct cpumask *cur,
+				 const struct cpumask *trial)
+{
+	int ret = 1, trial_cpus;
+	struct dl_bw *cur_dl_b;
+	unsigned long flags;
+
+	rcu_read_lock_sched();
+	cur_dl_b = dl_bw_of(cpumask_any(cur));
+	trial_cpus = cpumask_weight(trial);
+
+	raw_spin_lock_irqsave(&cur_dl_b->lock, flags);
+	if (cur_dl_b->bw != -1 &&
+	    cur_dl_b->bw * trial_cpus < cur_dl_b->total_bw)
+		ret = 0;
+	raw_spin_unlock_irqrestore(&cur_dl_b->lock, flags);
+	rcu_read_unlock_sched();
+	return ret;
+}
+
+bool dl_cpu_busy(unsigned int cpu)
+{
+	unsigned long flags;
+	struct dl_bw *dl_b;
+	bool overflow;
+	int cpus;
+
+	rcu_read_lock_sched();
+	dl_b = dl_bw_of(cpu);
+	raw_spin_lock_irqsave(&dl_b->lock, flags);
+	cpus = dl_bw_cpus(cpu);
+	overflow = __dl_overflow(dl_b, cpus, 0, 0);
+	raw_spin_unlock_irqrestore(&dl_b->lock, flags);
+	rcu_read_unlock_sched();
+	return overflow;
+}
+#endif
+
 #ifdef CONFIG_SCHED_DEBUG
 extern void print_dl_rq(struct seq_file *m, int cpu, struct dl_rq *dl_rq);
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 053f60afb7..4a845c19b8 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -245,6 +245,20 @@ bool __dl_overflow(struct dl_bw *dl_b, int cpus, u64 old_bw, u64 new_bw)
 }
 
 extern void init_dl_bw(struct dl_bw *dl_b);
+extern int sched_dl_global_validate(void);
+extern void sched_dl_do_global(void);
+extern int sched_dl_overflow(struct task_struct *p, int policy,
+			     const struct sched_attr *attr);
+extern void __setparam_dl(struct task_struct *p, const struct sched_attr *attr);
+extern void __getparam_dl(struct task_struct *p, struct sched_attr *attr);
+extern bool __checkparam_dl(const struct sched_attr *attr);
+extern void __dl_clear_params(struct task_struct *p);
+extern bool dl_param_changed(struct task_struct *p, const struct sched_attr *attr);
+extern int dl_task_can_attach(struct task_struct *p,
+			      const struct cpumask *cs_cpus_allowed);
+extern int dl_cpuset_cpumask_can_shrink(const struct cpumask *cur,
+					const struct cpumask *trial);
+extern bool dl_cpu_busy(unsigned int cpu);
 
 #ifdef CONFIG_CGROUP_SCHED
 
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/7] sched/deadline: make it configurable
  2017-05-29 21:02 [PATCH 0/7] scheduler tinification Nicolas Pitre
                   ` (2 preceding siblings ...)
  2017-05-29 21:02 ` [PATCH 3/7] sched/deadline: move dl related code out of sched/core.c Nicolas Pitre
@ 2017-05-29 21:02 ` Nicolas Pitre
  2017-05-29 21:03 ` [PATCH 5/7] sched/rt: move rt related code out of sched/core.c Nicolas Pitre
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Nicolas Pitre @ 2017-05-29 21:02 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra; +Cc: linux-kernel

On most small systems, the deadline scheduler class is a luxury that
rarely gets used if at all. It is preferable to have the ability to
configure it out to reduce the kernel size in that case.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
---
 include/linux/sched.h          |  2 ++
 include/linux/sched/deadline.h |  2 +-
 init/Kconfig                   |  8 ++++++++
 kernel/locking/rtmutex.c       |  9 +++++++++
 kernel/sched/Makefile          |  5 +++--
 kernel/sched/core.c            | 37 ++++++++++++++++++++++++++-----------
 kernel/sched/debug.c           |  4 ++++
 kernel/sched/rt.c              |  7 +++++--
 kernel/sched/sched.h           |  9 +++++++--
 kernel/sched/stop_task.c       |  4 ++++
 kernel/sched/topology.c        |  6 ++++++
 11 files changed, 75 insertions(+), 18 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2b69fc6502..ba0c203669 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -522,7 +522,9 @@ struct task_struct {
 #ifdef CONFIG_CGROUP_SCHED
 	struct task_group		*sched_task_group;
 #endif
+#ifdef CONFIG_SCHED_DL
 	struct sched_dl_entity		dl;
+#endif
 
 #ifdef CONFIG_PREEMPT_NOTIFIERS
 	/* List of struct preempt_notifier: */
diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h
index 975be862e0..308ca2482a 100644
--- a/include/linux/sched/deadline.h
+++ b/include/linux/sched/deadline.h
@@ -13,7 +13,7 @@
 
 static inline int dl_prio(int prio)
 {
-	if (unlikely(prio < MAX_DL_PRIO))
+	if (IS_ENABLED(CONFIG_SCHED_DL) && unlikely(prio < MAX_DL_PRIO))
 		return 1;
 	return 0;
 }
diff --git a/init/Kconfig b/init/Kconfig
index b9aed60cac..f73e3f0940 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1303,6 +1303,14 @@ config SCHED_AUTOGROUP
 	  desktop applications.  Task group autogeneration is currently based
 	  upon task session.
 
+config SCHED_DL
+	bool "Deadline Task Scheduling" if EXPERT
+	default y
+	help
+	  This adds the sched_dl scheduling class to the kernel providing
+	  support for the SCHED_DEADLINE policy. You might want to disable
+	  this to reduce the kernel size. If unsure say y.
+
 config SYSFS_DEPRECATED
 	bool "Enable deprecated sysfs features to support old userspace tools"
 	depends on SYSFS
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 28cd09e635..f42c1b1e52 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -227,8 +227,13 @@ static inline bool unlock_rt_mutex_safe(struct rt_mutex *lock,
 /*
  * Only use with rt_mutex_waiter_{less,equal}()
  */
+#ifdef CONFIG_SCHED_DL
 #define task_to_waiter(p)	\
 	&(struct rt_mutex_waiter){ .prio = (p)->prio, .deadline = (p)->dl.deadline }
+#else
+#define task_to_waiter(p)	\
+	&(struct rt_mutex_waiter){ .prio = (p)->prio, .deadline = 0 }
+#endif
 
 static inline int
 rt_mutex_waiter_less(struct rt_mutex_waiter *left,
@@ -692,7 +697,9 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task,
 	 * the values of the node being removed.
 	 */
 	waiter->prio = task->prio;
+#ifdef CONFIG_SCHED_DL
 	waiter->deadline = task->dl.deadline;
+#endif
 
 	rt_mutex_enqueue(lock, waiter);
 
@@ -967,7 +974,9 @@ static int task_blocks_on_rt_mutex(struct rt_mutex *lock,
 	waiter->task = task;
 	waiter->lock = lock;
 	waiter->prio = task->prio;
+#ifdef CONFIG_SCHED_DL
 	waiter->deadline = task->dl.deadline;
+#endif
 
 	/* Get the top priority waiter on the lock */
 	if (rt_mutex_has_waiters(lock))
diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index 5e4c2e7a63..3bd6a7c1cc 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -16,9 +16,10 @@ CFLAGS_core.o := $(PROFILING) -fno-omit-frame-pointer
 endif
 
 obj-y += core.o loadavg.o clock.o cputime.o
-obj-y += idle_task.o fair.o rt.o deadline.o
 obj-y += wait.o swait.o completion.o idle.o
-obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o
+obj-y += idle_task.o fair.o rt.o
+obj-$(CONFIG_SCHED_DL) += deadline.o $(if $(CONFIG_SMP),cpudeadline.o)
+obj-$(CONFIG_SMP) += cpupri.o topology.o stop_task.o
 obj-$(CONFIG_SCHED_AUTOGROUP) += autogroup.o
 obj-$(CONFIG_SCHEDSTATS) += stats.o
 obj-$(CONFIG_SCHED_DEBUG) += debug.o
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 93ce28ea34..d2d2791f32 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -634,9 +634,11 @@ bool sched_can_stop_tick(struct rq *rq)
 {
 	int fifo_nr_running;
 
+#ifdef CONFIG_SCHED_DL
 	/* Deadline tasks, even if single, need the tick */
 	if (rq->dl.dl_nr_running)
 		return false;
+#endif
 
 	/*
 	 * If there are more than one RR tasks, we need the tick to effect the
@@ -2174,9 +2176,11 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
 	memset(&p->se.statistics, 0, sizeof(p->se.statistics));
 #endif
 
+#ifdef CONFIG_SCHED_DL
 	RB_CLEAR_NODE(&p->dl.rb_node);
 	init_dl_task_timer(&p->dl);
 	__dl_clear_params(p);
+#endif
 
 	INIT_LIST_HEAD(&p->rt.run_list);
 	p->rt.timeout		= 0;
@@ -3699,6 +3703,9 @@ void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task)
 	 *      --> -dl task blocks on mutex A and could preempt the
 	 *          running task
 	 */
+#ifdef CONFIG_SCHED_DL
+	if (dl_prio(oldprio))
+		p->dl.dl_boosted = 0;
 	if (dl_prio(prio)) {
 		if (!dl_prio(p->normal_prio) ||
 		    (pi_task && dl_entity_preempt(&pi_task->dl, &p->dl))) {
@@ -3707,15 +3714,13 @@ void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task)
 		} else
 			p->dl.dl_boosted = 0;
 		p->sched_class = &dl_sched_class;
-	} else if (rt_prio(prio)) {
-		if (dl_prio(oldprio))
-			p->dl.dl_boosted = 0;
+	} else
+#endif
+	if (rt_prio(prio)) {
 		if (oldprio < prio)
 			queue_flag |= ENQUEUE_HEAD;
 		p->sched_class = &rt_sched_class;
 	} else {
-		if (dl_prio(oldprio))
-			p->dl.dl_boosted = 0;
 		if (rt_prio(oldprio))
 			p->rt.timeout = 0;
 		p->sched_class = &fair_sched_class;
@@ -5266,7 +5271,8 @@ int cpuset_cpumask_can_shrink(const struct cpumask *cur,
 	if (!cpumask_weight(cur))
 		return ret;
 
-	ret = dl_cpuset_cpumask_can_shrink(cur, trial);
+	if (IS_ENABLED(CONFIG_SCHED_DL))
+		ret = dl_cpuset_cpumask_can_shrink(cur, trial);
 
 	return ret;
 }
@@ -5561,7 +5567,7 @@ static void cpuset_cpu_active(void)
 static int cpuset_cpu_inactive(unsigned int cpu)
 {
 	if (!cpuhp_tasks_frozen) {
-		if (dl_cpu_busy(cpu))
+		if (IS_ENABLED(CONFIG_SCHED_DL) && dl_cpu_busy(cpu))
 			return -EBUSY;
 		cpuset_update_active_cpus();
 	} else {
@@ -5721,7 +5727,9 @@ void __init sched_init_smp(void)
 	free_cpumask_var(non_isolated_cpus);
 
 	init_sched_rt_class();
+#ifdef CONFIG_SCHED_DL
 	init_sched_dl_class();
+#endif
 
 	sched_init_smt();
 	sched_clock_init_late();
@@ -5825,7 +5833,9 @@ void __init sched_init(void)
 #endif /* CONFIG_CPUMASK_OFFSTACK */
 
 	init_rt_bandwidth(&def_rt_bandwidth, global_rt_period(), global_rt_runtime());
+#ifdef CONFIG_SCHED_DL
 	init_dl_bandwidth(&def_dl_bandwidth, global_rt_period(), global_rt_runtime());
+#endif
 
 #ifdef CONFIG_SMP
 	init_defrootdomain();
@@ -5855,7 +5865,9 @@ void __init sched_init(void)
 		rq->calc_load_update = jiffies + LOAD_FREQ;
 		init_cfs_rq(&rq->cfs);
 		init_rt_rq(&rq->rt);
+#ifdef CONFIG_SCHED_DL
 		init_dl_rq(&rq->dl);
+#endif
 #ifdef CONFIG_FAIR_GROUP_SCHED
 		root_task_group.shares = ROOT_TASK_GROUP_LOAD;
 		INIT_LIST_HEAD(&rq->leaf_cfs_rq_list);
@@ -6518,16 +6530,19 @@ int sched_rt_handler(struct ctl_table *table, int write,
 		if (ret)
 			goto undo;
 
-		ret = sched_dl_global_validate();
-		if (ret)
-			goto undo;
+		if (IS_ENABLED(CONFIG_SCHED_DL)) {
+			ret = sched_dl_global_validate();
+			if (ret)
+				goto undo;
+		}
 
 		ret = sched_rt_global_constraints();
 		if (ret)
 			goto undo;
 
 		sched_rt_do_global();
-		sched_dl_do_global();
+		if (IS_ENABLED(CONFIG_SCHED_DL))
+			sched_dl_do_global();
 	}
 	if (0) {
 undo:
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 38f019324f..84f80a81ab 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -646,7 +646,9 @@ do {									\
 	spin_lock_irqsave(&sched_debug_lock, flags);
 	print_cfs_stats(m, cpu);
 	print_rt_stats(m, cpu);
+#ifdef CONFIG_SCHED_DL
 	print_dl_stats(m, cpu);
+#endif
 
 	print_rq(m, rq, cpu);
 	spin_unlock_irqrestore(&sched_debug_lock, flags);
@@ -954,10 +956,12 @@ void proc_sched_show_task(struct task_struct *p, struct seq_file *m)
 #endif
 	P(policy);
 	P(prio);
+#ifdef CONFIG_SCHED_DL
 	if (p->policy == SCHED_DEADLINE) {
 		P(dl.runtime);
 		P(dl.deadline);
 	}
+#endif
 #undef PN_SCHEDSTAT
 #undef PN
 #undef __PN
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 979b734100..a3206ef3e8 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1544,9 +1544,12 @@ pick_next_task_rt(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 		 * means a dl or stop task can slip in, in which case we need
 		 * to re-start task selection.
 		 */
-		if (unlikely((rq->stop && task_on_rq_queued(rq->stop)) ||
-			     rq->dl.dl_nr_running))
+		if (unlikely((rq->stop && task_on_rq_queued(rq->stop))))
 			return RETRY_TASK;
+#ifdef CONFIG_SCHED_DL
+		if (unlikely(rq->dl.dl_nr_running))
+			return RETRY_TASK;
+#endif
 	}
 
 	/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 4a845c19b8..ec9a84aad4 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -137,7 +137,7 @@ static inline int rt_policy(int policy)
 
 static inline int dl_policy(int policy)
 {
-	return policy == SCHED_DEADLINE;
+	return IS_ENABLED(CONFIG_SCHED_DL) && policy == SCHED_DEADLINE;
 }
 static inline bool valid_policy(int policy)
 {
@@ -667,7 +667,9 @@ struct rq {
 
 	struct cfs_rq cfs;
 	struct rt_rq rt;
+#ifdef CONFIG_SCHED_DL
 	struct dl_rq dl;
+#endif
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	/* list of leaf cfs_rq on this cpu: */
@@ -1438,9 +1440,12 @@ static inline void set_curr_task(struct rq *rq, struct task_struct *curr)
 
 #ifdef CONFIG_SMP
 #define sched_class_highest (&stop_sched_class)
-#else
+#elif defined(CONFIG_SCHED_DL)
 #define sched_class_highest (&dl_sched_class)
+#else
+#define sched_class_highest (&rt_sched_class)
 #endif
+
 #define for_each_class(class) \
    for (class = sched_class_highest; class; class = class->next)
 
diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index 9f69fb6308..5632dc3e63 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -110,7 +110,11 @@ static void update_curr_stop(struct rq *rq)
  * Simple, special scheduling class for the per-CPU stop tasks:
  */
 const struct sched_class stop_sched_class = {
+#ifdef CONFIG_SCHED_DL
 	.next			= &dl_sched_class,
+#else
+	.next			= &rt_sched_class,
+#endif
 
 	.enqueue_task		= enqueue_task_stop,
 	.dequeue_task		= dequeue_task_stop,
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 1b0b4fb128..25328bfca6 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -195,7 +195,9 @@ static void free_rootdomain(struct rcu_head *rcu)
 	struct root_domain *rd = container_of(rcu, struct root_domain, rcu);
 
 	cpupri_cleanup(&rd->cpupri);
+#ifdef CONFIG_SCHED_DL
 	cpudl_cleanup(&rd->cpudl);
+#endif
 	free_cpumask_var(rd->dlo_mask);
 	free_cpumask_var(rd->rto_mask);
 	free_cpumask_var(rd->online);
@@ -253,16 +255,20 @@ static int init_rootdomain(struct root_domain *rd)
 	if (!zalloc_cpumask_var(&rd->rto_mask, GFP_KERNEL))
 		goto free_dlo_mask;
 
+#ifdef CONFIG_SCHED_DL
 	init_dl_bw(&rd->dl_bw);
 	if (cpudl_init(&rd->cpudl) != 0)
 		goto free_rto_mask;
+#endif
 
 	if (cpupri_init(&rd->cpupri) != 0)
 		goto free_cpudl;
 	return 0;
 
 free_cpudl:
+#ifdef CONFIG_SCHED_DL
 	cpudl_cleanup(&rd->cpudl);
+#endif
 free_rto_mask:
 	free_cpumask_var(rd->rto_mask);
 free_dlo_mask:
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/7] sched/rt: move rt related code out of sched/core.c
  2017-05-29 21:02 [PATCH 0/7] scheduler tinification Nicolas Pitre
                   ` (3 preceding siblings ...)
  2017-05-29 21:02 ` [PATCH 4/7] sched/deadline: make it configurable Nicolas Pitre
@ 2017-05-29 21:03 ` Nicolas Pitre
  2017-05-29 21:03 ` [PATCH 6/7] sched/rt: make it configurable Nicolas Pitre
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Nicolas Pitre @ 2017-05-29 21:03 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra; +Cc: linux-kernel

... to sched/rt.c. This will help configuring the realtime scheduling
class out of the kernel build.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
---
 kernel/sched/core.c  | 318 ---------------------------------------------------
 kernel/sched/rt.c    | 313 ++++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/sched.h |   5 +
 3 files changed, 318 insertions(+), 318 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d2d2791f32..a7b004e440 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6259,324 +6259,6 @@ void sched_move_task(struct task_struct *tsk)
 
 	task_rq_unlock(rq, tsk, &rf);
 }
-#endif /* CONFIG_CGROUP_SCHED */
-
-#ifdef CONFIG_RT_GROUP_SCHED
-/*
- * Ensure that the real time constraints are schedulable.
- */
-static DEFINE_MUTEX(rt_constraints_mutex);
-
-/* Must be called with tasklist_lock held */
-static inline int tg_has_rt_tasks(struct task_group *tg)
-{
-	struct task_struct *g, *p;
-
-	/*
-	 * Autogroups do not have RT tasks; see autogroup_create().
-	 */
-	if (task_group_is_autogroup(tg))
-		return 0;
-
-	for_each_process_thread(g, p) {
-		if (rt_task(p) && task_group(p) == tg)
-			return 1;
-	}
-
-	return 0;
-}
-
-struct rt_schedulable_data {
-	struct task_group *tg;
-	u64 rt_period;
-	u64 rt_runtime;
-};
-
-static int tg_rt_schedulable(struct task_group *tg, void *data)
-{
-	struct rt_schedulable_data *d = data;
-	struct task_group *child;
-	unsigned long total, sum = 0;
-	u64 period, runtime;
-
-	period = ktime_to_ns(tg->rt_bandwidth.rt_period);
-	runtime = tg->rt_bandwidth.rt_runtime;
-
-	if (tg == d->tg) {
-		period = d->rt_period;
-		runtime = d->rt_runtime;
-	}
-
-	/*
-	 * Cannot have more runtime than the period.
-	 */
-	if (runtime > period && runtime != RUNTIME_INF)
-		return -EINVAL;
-
-	/*
-	 * Ensure we don't starve existing RT tasks.
-	 */
-	if (rt_bandwidth_enabled() && !runtime && tg_has_rt_tasks(tg))
-		return -EBUSY;
-
-	total = to_ratio(period, runtime);
-
-	/*
-	 * Nobody can have more than the global setting allows.
-	 */
-	if (total > to_ratio(global_rt_period(), global_rt_runtime()))
-		return -EINVAL;
-
-	/*
-	 * The sum of our children's runtime should not exceed our own.
-	 */
-	list_for_each_entry_rcu(child, &tg->children, siblings) {
-		period = ktime_to_ns(child->rt_bandwidth.rt_period);
-		runtime = child->rt_bandwidth.rt_runtime;
-
-		if (child == d->tg) {
-			period = d->rt_period;
-			runtime = d->rt_runtime;
-		}
-
-		sum += to_ratio(period, runtime);
-	}
-
-	if (sum > total)
-		return -EINVAL;
-
-	return 0;
-}
-
-static int __rt_schedulable(struct task_group *tg, u64 period, u64 runtime)
-{
-	int ret;
-
-	struct rt_schedulable_data data = {
-		.tg = tg,
-		.rt_period = period,
-		.rt_runtime = runtime,
-	};
-
-	rcu_read_lock();
-	ret = walk_tg_tree(tg_rt_schedulable, tg_nop, &data);
-	rcu_read_unlock();
-
-	return ret;
-}
-
-static int tg_set_rt_bandwidth(struct task_group *tg,
-		u64 rt_period, u64 rt_runtime)
-{
-	int i, err = 0;
-
-	/*
-	 * Disallowing the root group RT runtime is BAD, it would disallow the
-	 * kernel creating (and or operating) RT threads.
-	 */
-	if (tg == &root_task_group && rt_runtime == 0)
-		return -EINVAL;
-
-	/* No period doesn't make any sense. */
-	if (rt_period == 0)
-		return -EINVAL;
-
-	mutex_lock(&rt_constraints_mutex);
-	read_lock(&tasklist_lock);
-	err = __rt_schedulable(tg, rt_period, rt_runtime);
-	if (err)
-		goto unlock;
-
-	raw_spin_lock_irq(&tg->rt_bandwidth.rt_runtime_lock);
-	tg->rt_bandwidth.rt_period = ns_to_ktime(rt_period);
-	tg->rt_bandwidth.rt_runtime = rt_runtime;
-
-	for_each_possible_cpu(i) {
-		struct rt_rq *rt_rq = tg->rt_rq[i];
-
-		raw_spin_lock(&rt_rq->rt_runtime_lock);
-		rt_rq->rt_runtime = rt_runtime;
-		raw_spin_unlock(&rt_rq->rt_runtime_lock);
-	}
-	raw_spin_unlock_irq(&tg->rt_bandwidth.rt_runtime_lock);
-unlock:
-	read_unlock(&tasklist_lock);
-	mutex_unlock(&rt_constraints_mutex);
-
-	return err;
-}
-
-static int sched_group_set_rt_runtime(struct task_group *tg, long rt_runtime_us)
-{
-	u64 rt_runtime, rt_period;
-
-	rt_period = ktime_to_ns(tg->rt_bandwidth.rt_period);
-	rt_runtime = (u64)rt_runtime_us * NSEC_PER_USEC;
-	if (rt_runtime_us < 0)
-		rt_runtime = RUNTIME_INF;
-
-	return tg_set_rt_bandwidth(tg, rt_period, rt_runtime);
-}
-
-static long sched_group_rt_runtime(struct task_group *tg)
-{
-	u64 rt_runtime_us;
-
-	if (tg->rt_bandwidth.rt_runtime == RUNTIME_INF)
-		return -1;
-
-	rt_runtime_us = tg->rt_bandwidth.rt_runtime;
-	do_div(rt_runtime_us, NSEC_PER_USEC);
-	return rt_runtime_us;
-}
-
-static int sched_group_set_rt_period(struct task_group *tg, u64 rt_period_us)
-{
-	u64 rt_runtime, rt_period;
-
-	rt_period = rt_period_us * NSEC_PER_USEC;
-	rt_runtime = tg->rt_bandwidth.rt_runtime;
-
-	return tg_set_rt_bandwidth(tg, rt_period, rt_runtime);
-}
-
-static long sched_group_rt_period(struct task_group *tg)
-{
-	u64 rt_period_us;
-
-	rt_period_us = ktime_to_ns(tg->rt_bandwidth.rt_period);
-	do_div(rt_period_us, NSEC_PER_USEC);
-	return rt_period_us;
-}
-#endif /* CONFIG_RT_GROUP_SCHED */
-
-#ifdef CONFIG_RT_GROUP_SCHED
-static int sched_rt_global_constraints(void)
-{
-	int ret = 0;
-
-	mutex_lock(&rt_constraints_mutex);
-	read_lock(&tasklist_lock);
-	ret = __rt_schedulable(NULL, 0, 0);
-	read_unlock(&tasklist_lock);
-	mutex_unlock(&rt_constraints_mutex);
-
-	return ret;
-}
-
-static int sched_rt_can_attach(struct task_group *tg, struct task_struct *tsk)
-{
-	/* Don't accept realtime tasks when there is no way for them to run */
-	if (rt_task(tsk) && tg->rt_bandwidth.rt_runtime == 0)
-		return 0;
-
-	return 1;
-}
-
-#else /* !CONFIG_RT_GROUP_SCHED */
-static int sched_rt_global_constraints(void)
-{
-	unsigned long flags;
-	int i;
-
-	raw_spin_lock_irqsave(&def_rt_bandwidth.rt_runtime_lock, flags);
-	for_each_possible_cpu(i) {
-		struct rt_rq *rt_rq = &cpu_rq(i)->rt;
-
-		raw_spin_lock(&rt_rq->rt_runtime_lock);
-		rt_rq->rt_runtime = global_rt_runtime();
-		raw_spin_unlock(&rt_rq->rt_runtime_lock);
-	}
-	raw_spin_unlock_irqrestore(&def_rt_bandwidth.rt_runtime_lock, flags);
-
-	return 0;
-}
-#endif /* CONFIG_RT_GROUP_SCHED */
-
-static int sched_rt_global_validate(void)
-{
-	if (sysctl_sched_rt_period <= 0)
-		return -EINVAL;
-
-	if ((sysctl_sched_rt_runtime != RUNTIME_INF) &&
-		(sysctl_sched_rt_runtime > sysctl_sched_rt_period))
-		return -EINVAL;
-
-	return 0;
-}
-
-static void sched_rt_do_global(void)
-{
-	def_rt_bandwidth.rt_runtime = global_rt_runtime();
-	def_rt_bandwidth.rt_period = ns_to_ktime(global_rt_period());
-}
-
-int sched_rt_handler(struct ctl_table *table, int write,
-		void __user *buffer, size_t *lenp,
-		loff_t *ppos)
-{
-	int old_period, old_runtime;
-	static DEFINE_MUTEX(mutex);
-	int ret;
-
-	mutex_lock(&mutex);
-	old_period = sysctl_sched_rt_period;
-	old_runtime = sysctl_sched_rt_runtime;
-
-	ret = proc_dointvec(table, write, buffer, lenp, ppos);
-
-	if (!ret && write) {
-		ret = sched_rt_global_validate();
-		if (ret)
-			goto undo;
-
-		if (IS_ENABLED(CONFIG_SCHED_DL)) {
-			ret = sched_dl_global_validate();
-			if (ret)
-				goto undo;
-		}
-
-		ret = sched_rt_global_constraints();
-		if (ret)
-			goto undo;
-
-		sched_rt_do_global();
-		if (IS_ENABLED(CONFIG_SCHED_DL))
-			sched_dl_do_global();
-	}
-	if (0) {
-undo:
-		sysctl_sched_rt_period = old_period;
-		sysctl_sched_rt_runtime = old_runtime;
-	}
-	mutex_unlock(&mutex);
-
-	return ret;
-}
-
-int sched_rr_handler(struct ctl_table *table, int write,
-		void __user *buffer, size_t *lenp,
-		loff_t *ppos)
-{
-	int ret;
-	static DEFINE_MUTEX(mutex);
-
-	mutex_lock(&mutex);
-	ret = proc_dointvec(table, write, buffer, lenp, ppos);
-	/*
-	 * Make sure that internally we keep jiffies.
-	 * Also, writing zero resets the timeslice to default:
-	 */
-	if (!ret && write) {
-		sched_rr_timeslice =
-			sysctl_sched_rr_timeslice <= 0 ? RR_TIMESLICE :
-			msecs_to_jiffies(sysctl_sched_rr_timeslice);
-	}
-	mutex_unlock(&mutex);
-	return ret;
-}
-
-#ifdef CONFIG_CGROUP_SCHED
 
 static inline struct task_group *css_tg(struct cgroup_subsys_state *css)
 {
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index a3206ef3e8..a615c795ee 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2441,6 +2441,319 @@ const struct sched_class rt_sched_class = {
 	.update_curr		= update_curr_rt,
 };
 
+#ifdef CONFIG_RT_GROUP_SCHED
+/*
+ * Ensure that the real time constraints are schedulable.
+ */
+static DEFINE_MUTEX(rt_constraints_mutex);
+
+/* Must be called with tasklist_lock held */
+static inline int tg_has_rt_tasks(struct task_group *tg)
+{
+	struct task_struct *g, *p;
+
+	/*
+	 * Autogroups do not have RT tasks; see autogroup_create().
+	 */
+	if (task_group_is_autogroup(tg))
+		return 0;
+
+	for_each_process_thread(g, p) {
+		if (rt_task(p) && task_group(p) == tg)
+			return 1;
+	}
+
+	return 0;
+}
+
+struct rt_schedulable_data {
+	struct task_group *tg;
+	u64 rt_period;
+	u64 rt_runtime;
+};
+
+static int tg_rt_schedulable(struct task_group *tg, void *data)
+{
+	struct rt_schedulable_data *d = data;
+	struct task_group *child;
+	unsigned long total, sum = 0;
+	u64 period, runtime;
+
+	period = ktime_to_ns(tg->rt_bandwidth.rt_period);
+	runtime = tg->rt_bandwidth.rt_runtime;
+
+	if (tg == d->tg) {
+		period = d->rt_period;
+		runtime = d->rt_runtime;
+	}
+
+	/*
+	 * Cannot have more runtime than the period.
+	 */
+	if (runtime > period && runtime != RUNTIME_INF)
+		return -EINVAL;
+
+	/*
+	 * Ensure we don't starve existing RT tasks.
+	 */
+	if (rt_bandwidth_enabled() && !runtime && tg_has_rt_tasks(tg))
+		return -EBUSY;
+
+	total = to_ratio(period, runtime);
+
+	/*
+	 * Nobody can have more than the global setting allows.
+	 */
+	if (total > to_ratio(global_rt_period(), global_rt_runtime()))
+		return -EINVAL;
+
+	/*
+	 * The sum of our children's runtime should not exceed our own.
+	 */
+	list_for_each_entry_rcu(child, &tg->children, siblings) {
+		period = ktime_to_ns(child->rt_bandwidth.rt_period);
+		runtime = child->rt_bandwidth.rt_runtime;
+
+		if (child == d->tg) {
+			period = d->rt_period;
+			runtime = d->rt_runtime;
+		}
+
+		sum += to_ratio(period, runtime);
+	}
+
+	if (sum > total)
+		return -EINVAL;
+
+	return 0;
+}
+
+static int __rt_schedulable(struct task_group *tg, u64 period, u64 runtime)
+{
+	int ret;
+
+	struct rt_schedulable_data data = {
+		.tg = tg,
+		.rt_period = period,
+		.rt_runtime = runtime,
+	};
+
+	rcu_read_lock();
+	ret = walk_tg_tree(tg_rt_schedulable, tg_nop, &data);
+	rcu_read_unlock();
+
+	return ret;
+}
+
+static int tg_set_rt_bandwidth(struct task_group *tg,
+		u64 rt_period, u64 rt_runtime)
+{
+	int i, err = 0;
+
+	/*
+	 * Disallowing the root group RT runtime is BAD, it would disallow the
+	 * kernel creating (and or operating) RT threads.
+	 */
+	if (tg == &root_task_group && rt_runtime == 0)
+		return -EINVAL;
+
+	/* No period doesn't make any sense. */
+	if (rt_period == 0)
+		return -EINVAL;
+
+	mutex_lock(&rt_constraints_mutex);
+	read_lock(&tasklist_lock);
+	err = __rt_schedulable(tg, rt_period, rt_runtime);
+	if (err)
+		goto unlock;
+
+	raw_spin_lock_irq(&tg->rt_bandwidth.rt_runtime_lock);
+	tg->rt_bandwidth.rt_period = ns_to_ktime(rt_period);
+	tg->rt_bandwidth.rt_runtime = rt_runtime;
+
+	for_each_possible_cpu(i) {
+		struct rt_rq *rt_rq = tg->rt_rq[i];
+
+		raw_spin_lock(&rt_rq->rt_runtime_lock);
+		rt_rq->rt_runtime = rt_runtime;
+		raw_spin_unlock(&rt_rq->rt_runtime_lock);
+	}
+	raw_spin_unlock_irq(&tg->rt_bandwidth.rt_runtime_lock);
+unlock:
+	read_unlock(&tasklist_lock);
+	mutex_unlock(&rt_constraints_mutex);
+
+	return err;
+}
+
+int sched_group_set_rt_runtime(struct task_group *tg, long rt_runtime_us)
+{
+	u64 rt_runtime, rt_period;
+
+	rt_period = ktime_to_ns(tg->rt_bandwidth.rt_period);
+	rt_runtime = (u64)rt_runtime_us * NSEC_PER_USEC;
+	if (rt_runtime_us < 0)
+		rt_runtime = RUNTIME_INF;
+
+	return tg_set_rt_bandwidth(tg, rt_period, rt_runtime);
+}
+
+long sched_group_rt_runtime(struct task_group *tg)
+{
+	u64 rt_runtime_us;
+
+	if (tg->rt_bandwidth.rt_runtime == RUNTIME_INF)
+		return -1;
+
+	rt_runtime_us = tg->rt_bandwidth.rt_runtime;
+	do_div(rt_runtime_us, NSEC_PER_USEC);
+	return rt_runtime_us;
+}
+
+int sched_group_set_rt_period(struct task_group *tg, u64 rt_period_us)
+{
+	u64 rt_runtime, rt_period;
+
+	rt_period = rt_period_us * NSEC_PER_USEC;
+	rt_runtime = tg->rt_bandwidth.rt_runtime;
+
+	return tg_set_rt_bandwidth(tg, rt_period, rt_runtime);
+}
+
+long sched_group_rt_period(struct task_group *tg)
+{
+	u64 rt_period_us;
+
+	rt_period_us = ktime_to_ns(tg->rt_bandwidth.rt_period);
+	do_div(rt_period_us, NSEC_PER_USEC);
+	return rt_period_us;
+}
+
+static int sched_rt_global_constraints(void)
+{
+	int ret = 0;
+
+	mutex_lock(&rt_constraints_mutex);
+	read_lock(&tasklist_lock);
+	ret = __rt_schedulable(NULL, 0, 0);
+	read_unlock(&tasklist_lock);
+	mutex_unlock(&rt_constraints_mutex);
+
+	return ret;
+}
+
+int sched_rt_can_attach(struct task_group *tg, struct task_struct *tsk)
+{
+	/* Don't accept realtime tasks when there is no way for them to run */
+	if (rt_task(tsk) && tg->rt_bandwidth.rt_runtime == 0)
+		return 0;
+
+	return 1;
+}
+
+#else /* !CONFIG_RT_GROUP_SCHED */
+static int sched_rt_global_constraints(void)
+{
+	unsigned long flags;
+	int i;
+
+	raw_spin_lock_irqsave(&def_rt_bandwidth.rt_runtime_lock, flags);
+	for_each_possible_cpu(i) {
+		struct rt_rq *rt_rq = &cpu_rq(i)->rt;
+
+		raw_spin_lock(&rt_rq->rt_runtime_lock);
+		rt_rq->rt_runtime = global_rt_runtime();
+		raw_spin_unlock(&rt_rq->rt_runtime_lock);
+	}
+	raw_spin_unlock_irqrestore(&def_rt_bandwidth.rt_runtime_lock, flags);
+
+	return 0;
+}
+#endif /* CONFIG_RT_GROUP_SCHED */
+
+static int sched_rt_global_validate(void)
+{
+	if (sysctl_sched_rt_period <= 0)
+		return -EINVAL;
+
+	if ((sysctl_sched_rt_runtime != RUNTIME_INF) &&
+		(sysctl_sched_rt_runtime > sysctl_sched_rt_period))
+		return -EINVAL;
+
+	return 0;
+}
+
+static void sched_rt_do_global(void)
+{
+	def_rt_bandwidth.rt_runtime = global_rt_runtime();
+	def_rt_bandwidth.rt_period = ns_to_ktime(global_rt_period());
+}
+
+int sched_rt_handler(struct ctl_table *table, int write,
+		void __user *buffer, size_t *lenp,
+		loff_t *ppos)
+{
+	int old_period, old_runtime;
+	static DEFINE_MUTEX(mutex);
+	int ret;
+
+	mutex_lock(&mutex);
+	old_period = sysctl_sched_rt_period;
+	old_runtime = sysctl_sched_rt_runtime;
+
+	ret = proc_dointvec(table, write, buffer, lenp, ppos);
+
+	if (!ret && write) {
+		ret = sched_rt_global_validate();
+		if (ret)
+			goto undo;
+
+		if (IS_ENABLED(CONFIG_SCHED_DL)) {
+			ret = sched_dl_global_validate();
+			if (ret)
+				goto undo;
+		}
+
+		ret = sched_rt_global_constraints();
+		if (ret)
+			goto undo;
+
+		sched_rt_do_global();
+		if (IS_ENABLED(CONFIG_SCHED_DL))
+			sched_dl_do_global();
+	}
+	if (0) {
+undo:
+		sysctl_sched_rt_period = old_period;
+		sysctl_sched_rt_runtime = old_runtime;
+	}
+	mutex_unlock(&mutex);
+
+	return ret;
+}
+
+int sched_rr_handler(struct ctl_table *table, int write,
+		void __user *buffer, size_t *lenp,
+		loff_t *ppos)
+{
+	int ret;
+	static DEFINE_MUTEX(mutex);
+
+	mutex_lock(&mutex);
+	ret = proc_dointvec(table, write, buffer, lenp, ppos);
+	/*
+	 * Make sure that internally we keep jiffies.
+	 * Also, writing zero resets the timeslice to default:
+	 */
+	if (!ret && write) {
+		sched_rr_timeslice =
+			sysctl_sched_rr_timeslice <= 0 ? RR_TIMESLICE :
+			msecs_to_jiffies(sysctl_sched_rr_timeslice);
+	}
+	mutex_unlock(&mutex);
+	return ret;
+}
+
 #ifdef CONFIG_SCHED_DEBUG
 extern void print_rt_rq(struct seq_file *m, int cpu, struct rt_rq *rt_rq);
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ec9a84aad4..41dc10b707 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -380,6 +380,11 @@ extern int alloc_rt_sched_group(struct task_group *tg, struct task_group *parent
 extern void init_tg_rt_entry(struct task_group *tg, struct rt_rq *rt_rq,
 		struct sched_rt_entity *rt_se, int cpu,
 		struct sched_rt_entity *parent);
+extern int sched_group_set_rt_runtime(struct task_group *tg, long rt_runtime_us);
+extern int sched_group_set_rt_period(struct task_group *tg, u64 rt_period_us);
+extern long sched_group_rt_runtime(struct task_group *tg);
+extern long sched_group_rt_period(struct task_group *tg);
+extern int sched_rt_can_attach(struct task_group *tg, struct task_struct *tsk);
 
 extern struct task_group *sched_create_group(struct task_group *parent);
 extern void sched_online_group(struct task_group *tg,
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/7] sched/rt: make it configurable
  2017-05-29 21:02 [PATCH 0/7] scheduler tinification Nicolas Pitre
                   ` (4 preceding siblings ...)
  2017-05-29 21:03 ` [PATCH 5/7] sched/rt: move rt related code out of sched/core.c Nicolas Pitre
@ 2017-05-29 21:03 ` Nicolas Pitre
  2017-05-30  8:41   ` Peter Zijlstra
  2017-05-31 16:06   ` [6/7] " Rob Herring
  2017-05-29 21:03 ` [PATCH 7/7] rtmutex: compatibility with CONFIG_SCHED_RT=n Nicolas Pitre
  2017-05-30  8:32 ` [PATCH 0/7] scheduler tinification Peter Zijlstra
  7 siblings, 2 replies; 18+ messages in thread
From: Nicolas Pitre @ 2017-05-29 21:03 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra; +Cc: linux-kernel

On most small systems where user space is tightly controlled, the realtime
scheduling class can often be dispensed with to reduce the kernel footprint.
Let's make it configurable.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
---
 include/linux/init_task.h      | 15 +++++++++++----
 include/linux/sched.h          |  2 ++
 include/linux/sched/rt.h       |  4 ++--
 init/Kconfig                   | 14 ++++++++++++--
 kernel/sched/Makefile          |  4 ++--
 kernel/sched/core.c            | 42 +++++++++++++++++++++++++++++++++++++++---
 kernel/sched/debug.c           |  2 ++
 kernel/sched/sched.h           |  7 +++++--
 kernel/sched/stop_task.c       |  4 +++-
 kernel/sysctl.c                |  4 +++-
 kernel/time/posix-cpu-timers.c |  6 +++++-
 11 files changed, 86 insertions(+), 18 deletions(-)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index e049526bc1..6befc0aa61 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -225,6 +225,16 @@ extern struct cred init_cred;
 #define INIT_TASK_SECURITY
 #endif
 
+#ifdef CONFIG_SCHED_RT
+#define INIT_TASK_RT(tsk)						\
+	.rt		= {						\
+		.run_list	= LIST_HEAD_INIT(tsk.rt.run_list),	\
+		.time_slice	= RR_TIMESLICE,				\
+	},
+#else
+#define INIT_TASK_RT(tsk)
+#endif
+
 /*
  *  INIT_TASK is used to set up the first task table, touch at
  * your own risk!. Base=0, limit=0x1fffff (=2MB)
@@ -250,10 +260,7 @@ extern struct cred init_cred;
 	.se		= {						\
 		.group_node 	= LIST_HEAD_INIT(tsk.se.group_node),	\
 	},								\
-	.rt		= {						\
-		.run_list	= LIST_HEAD_INIT(tsk.rt.run_list),	\
-		.time_slice	= RR_TIMESLICE,				\
-	},								\
+	INIT_TASK_RT(tsk)						\
 	.tasks		= LIST_HEAD_INIT(tsk.tasks),			\
 	INIT_PUSHABLE_TASKS(tsk)					\
 	INIT_CGROUP_SCHED(tsk)						\
diff --git a/include/linux/sched.h b/include/linux/sched.h
index ba0c203669..71a43480ed 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -518,7 +518,9 @@ struct task_struct {
 
 	const struct sched_class	*sched_class;
 	struct sched_entity		se;
+#ifdef CONFIG_SCHED_RT
 	struct sched_rt_entity		rt;
+#endif
 #ifdef CONFIG_CGROUP_SCHED
 	struct task_group		*sched_task_group;
 #endif
diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h
index f93329aba3..f2d636582d 100644
--- a/include/linux/sched/rt.h
+++ b/include/linux/sched/rt.h
@@ -7,7 +7,7 @@ struct task_struct;
 
 static inline int rt_prio(int prio)
 {
-	if (unlikely(prio < MAX_RT_PRIO))
+	if (IS_ENABLED(CONFIG_SCHED_RT) && unlikely(prio < MAX_RT_PRIO))
 		return 1;
 	return 0;
 }
@@ -17,7 +17,7 @@ static inline int rt_task(struct task_struct *p)
 	return rt_prio(p->prio);
 }
 
-#ifdef CONFIG_RT_MUTEXES
+#if defined(CONFIG_RT_MUTEXES) && defined(CONFIG_SCHED_RT)
 /*
  * Must hold either p->pi_lock or task_rq(p)->lock.
  */
diff --git a/init/Kconfig b/init/Kconfig
index f73e3f0940..3bcd49f576 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -687,7 +687,7 @@ config TREE_RCU_TRACE
 
 config RCU_BOOST
 	bool "Enable RCU priority boosting"
-	depends on RT_MUTEXES && PREEMPT_RCU && RCU_EXPERT
+	depends on SCHED_RT && RT_MUTEXES && PREEMPT_RCU && RCU_EXPERT
 	default n
 	help
 	  This option boosts the priority of preempted RCU readers that
@@ -1090,7 +1090,7 @@ config CFS_BANDWIDTH
 
 config RT_GROUP_SCHED
 	bool "Group scheduling for SCHED_RR/FIFO"
-	depends on CGROUP_SCHED
+	depends on CGROUP_SCHED && SCHED_RT
 	default n
 	help
 	  This feature lets you explicitly allocate real CPU bandwidth
@@ -1303,8 +1303,17 @@ config SCHED_AUTOGROUP
 	  desktop applications.  Task group autogeneration is currently based
 	  upon task session.
 
+config SCHED_RT
+	bool "Real Time Task Scheduling" if EXPERT
+	default y
+	help
+	  This adds the sched_rt scheduling class to the kernel providing
+ 	  support for the SCHED_FIFO and SCHED_RR policies. You might want
+	  to disable this to reduce the kernel size. If unsure say y.
+
 config SCHED_DL
 	bool "Deadline Task Scheduling" if EXPERT
+	depends on SCHED_RT
 	default y
 	help
 	  This adds the sched_dl scheduling class to the kernel providing
@@ -1632,6 +1641,7 @@ config BASE_FULL
 config FUTEX
 	bool "Enable futex support" if EXPERT
 	default y
+	depends on SCHED_RT
 	select RT_MUTEXES
 	help
 	  Disabling this option will cause the kernel to be built without
diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index 3bd6a7c1cc..bccbef85e5 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -16,8 +16,8 @@ CFLAGS_core.o := $(PROFILING) -fno-omit-frame-pointer
 endif
 
 obj-y += core.o loadavg.o clock.o cputime.o
-obj-y += wait.o swait.o completion.o idle.o
-obj-y += idle_task.o fair.o rt.o
+obj-y += wait.o swait.o completion.o idle.o idle_task.o fair.o
+obj-$(CONFIG_SCHED_RT) += rt.o
 obj-$(CONFIG_SCHED_DL) += deadline.o $(if $(CONFIG_SMP),cpudeadline.o)
 obj-$(CONFIG_SMP) += cpupri.o topology.o stop_task.o
 obj-$(CONFIG_SCHED_AUTOGROUP) += autogroup.o
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a7b004e440..3dd6fce750 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -640,6 +640,7 @@ bool sched_can_stop_tick(struct rq *rq)
 		return false;
 #endif
 
+#ifdef CONFIG_SCHED_RT
 	/*
 	 * If there are more than one RR tasks, we need the tick to effect the
 	 * actual RR behaviour.
@@ -658,6 +659,7 @@ bool sched_can_stop_tick(struct rq *rq)
 	fifo_nr_running = rq->rt.rt_nr_running - rq->rt.rr_nr_running;
 	if (fifo_nr_running)
 		return true;
+#endif
 
 	/*
 	 * If there are no DL,RR/FIFO tasks, there must only be CFS tasks left;
@@ -1586,7 +1588,7 @@ void sched_set_stop_task(int cpu, struct task_struct *stop)
 		 * Reset it back to a normal scheduling class so that
 		 * it can die in pieces.
 		 */
-		old_stop->sched_class = &rt_sched_class;
+		old_stop->sched_class = stop_sched_class.next;
 	}
 }
 
@@ -2182,11 +2184,13 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
 	__dl_clear_params(p);
 #endif
 
+#ifdef CONFIG_SCHED_RT
 	INIT_LIST_HEAD(&p->rt.run_list);
 	p->rt.timeout		= 0;
 	p->rt.time_slice	= sched_rr_timeslice;
 	p->rt.on_rq		= 0;
 	p->rt.on_list		= 0;
+#endif
 
 #ifdef CONFIG_PREEMPT_NOTIFIERS
 	INIT_HLIST_HEAD(&p->preempt_notifiers);
@@ -3716,13 +3720,18 @@ void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task)
 		p->sched_class = &dl_sched_class;
 	} else
 #endif
+#ifdef CONFIG_SCHED_RT
 	if (rt_prio(prio)) {
 		if (oldprio < prio)
 			queue_flag |= ENQUEUE_HEAD;
 		p->sched_class = &rt_sched_class;
-	} else {
+	} else
+#endif
+	{
+#ifdef CONFIG_SCHED_RT
 		if (rt_prio(oldprio))
 			p->rt.timeout = 0;
+#endif
 		p->sched_class = &fair_sched_class;
 	}
 
@@ -3997,6 +4006,23 @@ static int __sched_setscheduler(struct task_struct *p,
 
 	/* May grab non-irq protected spin_locks: */
 	BUG_ON(in_interrupt());
+
+	/*
+	 * When the RT scheduling class is disabled, let's make sure kernel threads
+	 * wanting RT still get lowest nice value to give them highest available
+	 * priority rather than simply returning an error. Obviously we can't test
+	 * rt_policy() here as it is always false in that case.
+	 */
+	if (!IS_ENABLED(CONFIG_SCHED_RT) && !user &&
+	    (policy == SCHED_FIFO || policy == SCHED_RR)) {
+		static const struct sched_attr k_attr = {
+			.sched_policy = SCHED_NORMAL,
+			.sched_nice = MIN_NICE,
+		};
+		attr = &k_attr;
+		policy = SCHED_NORMAL;
+	}
+
 recheck:
 	/* Double check policy once rq lock held: */
 	if (policy < 0) {
@@ -5726,7 +5752,9 @@ void __init sched_init_smp(void)
 	sched_init_granularity();
 	free_cpumask_var(non_isolated_cpus);
 
+#ifdef CONFIG_SCHED_RT
 	init_sched_rt_class();
+#endif
 #ifdef CONFIG_SCHED_DL
 	init_sched_dl_class();
 #endif
@@ -5832,7 +5860,9 @@ void __init sched_init(void)
 	}
 #endif /* CONFIG_CPUMASK_OFFSTACK */
 
+#ifdef CONFIG_SCHED_RT
 	init_rt_bandwidth(&def_rt_bandwidth, global_rt_period(), global_rt_runtime());
+#endif
 #ifdef CONFIG_SCHED_DL
 	init_dl_bandwidth(&def_dl_bandwidth, global_rt_period(), global_rt_runtime());
 #endif
@@ -5864,7 +5894,10 @@ void __init sched_init(void)
 		rq->calc_load_active = 0;
 		rq->calc_load_update = jiffies + LOAD_FREQ;
 		init_cfs_rq(&rq->cfs);
+#ifdef CONFIG_SCHED_RT
 		init_rt_rq(&rq->rt);
+		rq->rt.rt_runtime = def_rt_bandwidth.rt_runtime;
+#endif
 #ifdef CONFIG_SCHED_DL
 		init_dl_rq(&rq->dl);
 #endif
@@ -5895,7 +5928,6 @@ void __init sched_init(void)
 		init_tg_cfs_entry(&root_task_group, &rq->cfs, NULL, i, NULL);
 #endif /* CONFIG_FAIR_GROUP_SCHED */
 
-		rq->rt.rt_runtime = def_rt_bandwidth.rt_runtime;
 #ifdef CONFIG_RT_GROUP_SCHED
 		init_tg_rt_entry(&root_task_group, &rq->rt, NULL, i, NULL);
 #endif
@@ -6132,7 +6164,9 @@ static DEFINE_SPINLOCK(task_group_lock);
 static void sched_free_group(struct task_group *tg)
 {
 	free_fair_sched_group(tg);
+#ifdef CONFIG_SCHED_RT
 	free_rt_sched_group(tg);
+#endif
 	autogroup_free(tg);
 	kmem_cache_free(task_group_cache, tg);
 }
@@ -6149,8 +6183,10 @@ struct task_group *sched_create_group(struct task_group *parent)
 	if (!alloc_fair_sched_group(tg, parent))
 		goto err;
 
+#ifdef CONFIG_SCHED_RT
 	if (!alloc_rt_sched_group(tg, parent))
 		goto err;
+#endif
 
 	return tg;
 
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 84f80a81ab..c550723ce9 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -645,7 +645,9 @@ do {									\
 
 	spin_lock_irqsave(&sched_debug_lock, flags);
 	print_cfs_stats(m, cpu);
+#ifdef CONFIG_SCHED_RT
 	print_rt_stats(m, cpu);
+#endif
 #ifdef CONFIG_SCHED_DL
 	print_dl_stats(m, cpu);
 #endif
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 41dc10b707..38439eefd3 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -132,7 +132,8 @@ static inline int fair_policy(int policy)
 
 static inline int rt_policy(int policy)
 {
-	return policy == SCHED_FIFO || policy == SCHED_RR;
+	return IS_ENABLED(CONFIG_SCHED_RT) &&
+	       (policy == SCHED_FIFO || policy == SCHED_RR);
 }
 
 static inline int dl_policy(int policy)
@@ -1447,8 +1448,10 @@ static inline void set_curr_task(struct rq *rq, struct task_struct *curr)
 #define sched_class_highest (&stop_sched_class)
 #elif defined(CONFIG_SCHED_DL)
 #define sched_class_highest (&dl_sched_class)
-#else
+#elif defined(CONFIG_SCHED_RT)
 #define sched_class_highest (&rt_sched_class)
+#else
+#define sched_class_highest (&fair_sched_class)
 #endif
 
 #define for_each_class(class) \
diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index 5632dc3e63..7cad8c1540 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -112,8 +112,10 @@ static void update_curr_stop(struct rq *rq)
 const struct sched_class stop_sched_class = {
 #ifdef CONFIG_SCHED_DL
 	.next			= &dl_sched_class,
-#else
+#elif defined(CONFIG_SCHED_RT)
 	.next			= &rt_sched_class,
+#else
+	.next			= &fair_sched_class,
 #endif
 
 	.enqueue_task		= enqueue_task_stop,
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 4dfba1a76c..1c670f4053 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -401,6 +401,7 @@ static struct ctl_table kern_table[] = {
 	},
 #endif /* CONFIG_NUMA_BALANCING */
 #endif /* CONFIG_SCHED_DEBUG */
+#ifdef CONFIG_SCHED_RT
 	{
 		.procname	= "sched_rt_period_us",
 		.data		= &sysctl_sched_rt_period,
@@ -422,6 +423,7 @@ static struct ctl_table kern_table[] = {
 		.mode		= 0644,
 		.proc_handler	= sched_rr_handler,
 	},
+#endif
 #ifdef CONFIG_SCHED_AUTOGROUP
 	{
 		.procname	= "sched_autogroup_enabled",
@@ -1071,7 +1073,7 @@ static struct ctl_table kern_table[] = {
 		.extra1		= &neg_one,
 	},
 #endif
-#ifdef CONFIG_RT_MUTEXES
+#if defined(CONFIG_RT_MUTEXES) && defined(CONFIG_SCHED_RT)
 	{
 		.procname	= "max_lock_depth",
 		.data		= &max_lock_depth,
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index d2a1e6dd02..010efb0e91 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -790,10 +790,12 @@ static void check_thread_timers(struct task_struct *tsk,
 				struct list_head *firing)
 {
 	struct list_head *timers = tsk->cpu_timers;
-	struct signal_struct *const sig = tsk->signal;
 	struct task_cputime *tsk_expires = &tsk->cputime_expires;
 	u64 expires;
+#ifdef CONFIG_SCHED_RT
+	struct signal_struct *const sig = tsk->signal;
 	unsigned long soft;
+#endif
 
 	/*
 	 * If cputime_expires is zero, then there are no active
@@ -811,6 +813,7 @@ static void check_thread_timers(struct task_struct *tsk,
 	tsk_expires->sched_exp = check_timers_list(++timers, firing,
 						   tsk->se.sum_exec_runtime);
 
+#ifdef CONFIG_SCHED_RT
 	/*
 	 * Check for the special case thread timers.
 	 */
@@ -847,6 +850,7 @@ static void check_thread_timers(struct task_struct *tsk,
 			__group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);
 		}
 	}
+#endif
 	if (task_cputime_zero(tsk_expires))
 		tick_dep_clear_task(tsk, TICK_DEP_BIT_POSIX_TIMER);
 }
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 7/7] rtmutex: compatibility with CONFIG_SCHED_RT=n
  2017-05-29 21:02 [PATCH 0/7] scheduler tinification Nicolas Pitre
                   ` (5 preceding siblings ...)
  2017-05-29 21:03 ` [PATCH 6/7] sched/rt: make it configurable Nicolas Pitre
@ 2017-05-29 21:03 ` Nicolas Pitre
  2017-05-30  8:32 ` [PATCH 0/7] scheduler tinification Peter Zijlstra
  7 siblings, 0 replies; 18+ messages in thread
From: Nicolas Pitre @ 2017-05-29 21:03 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra; +Cc: linux-kernel

With no actual RT task, there is no priority inversion issues to care about.
We can therefore map RT mutexes to regular mutexes in that case and remain
compatible with most users.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
---
 include/linux/rtmutex.h | 69 +++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/locking/Makefile |  2 ++
 lib/Kconfig.debug       |  2 +-
 3 files changed, 72 insertions(+), 1 deletion(-)

diff --git a/include/linux/rtmutex.h b/include/linux/rtmutex.h
index 1abba5ce2a..05c444f930 100644
--- a/include/linux/rtmutex.h
+++ b/include/linux/rtmutex.h
@@ -12,6 +12,8 @@
 #ifndef __LINUX_RT_MUTEX_H
 #define __LINUX_RT_MUTEX_H
 
+#ifdef CONFIG_SCHED_RT
+
 #include <linux/linkage.h>
 #include <linux/rbtree.h>
 #include <linux/spinlock_types.h>
@@ -98,4 +100,71 @@ extern int rt_mutex_trylock(struct rt_mutex *lock);
 
 extern void rt_mutex_unlock(struct rt_mutex *lock);
 
+#else /* CONFIG_SCHED_RT */
+
+/*
+ * We have no realtime task support and therefore no priority inversion
+ * may occur. Let's map RT mutexes using regular mutexes.
+ */
+
+#include <linux/mutex.h>
+
+struct rt_mutex {
+	struct mutex m;
+};
+
+#define __RT_MUTEX_INITIALIZER(m) \
+	{ .m = __MUTEX_INITIALIZER(m) }
+
+#define DEFINE_RT_MUTEX(mutexname) \
+	struct rt_mutex mutexname = __RT_MUTEX_INITIALIZER(mutexname)
+
+static inline void __rt_mutex_init(struct rt_mutex *lock, const char *name)
+{
+	static struct lock_class_key __key;
+	__mutex_init(&lock->m, name, &__key);
+}
+
+#define rt_mutex_init(mutex)	__rt_mutex_init(mutex, #mutex)
+
+static inline int rt_mutex_is_locked(struct rt_mutex *lock)
+{
+	return mutex_is_locked(&lock->m);
+}
+
+static inline void rt_mutex_destroy(struct rt_mutex *lock)
+{
+	mutex_destroy(&lock->m);
+}
+
+static inline void rt_mutex_lock(struct rt_mutex *lock)
+{
+	mutex_lock(&lock->m);
+}
+
+static inline int rt_mutex_lock_interruptible(struct rt_mutex *lock)
+{
+	return mutex_lock_interruptible(&lock->m);
+}
+
+static inline int rt_mutex_trylock(struct rt_mutex *lock)
+{
+	return mutex_trylock(&lock->m);
+}
+
+static inline void rt_mutex_unlock(struct rt_mutex *lock)
+{
+	mutex_unlock(&lock->m);
+}
+
+static inline int rt_mutex_debug_check_no_locks_freed(const void *from,
+						      unsigned long len)
+{
+	return 0;
+}
+#define rt_mutex_debug_check_no_locks_held(task)	do { } while (0)
+#define rt_mutex_debug_task_free(t)			do { } while (0)
+
+#endif /* CONFIG_SCHED_RT */
+
 #endif
diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
index 760158d9d9..7a076be456 100644
--- a/kernel/locking/Makefile
+++ b/kernel/locking/Makefile
@@ -20,8 +20,10 @@ obj-$(CONFIG_SMP) += spinlock.o
 obj-$(CONFIG_LOCK_SPIN_ON_OWNER) += osq_lock.o
 obj-$(CONFIG_PROVE_LOCKING) += spinlock.o
 obj-$(CONFIG_QUEUED_SPINLOCKS) += qspinlock.o
+ifeq ($(CONFIG_SCHED_RT),y)
 obj-$(CONFIG_RT_MUTEXES) += rtmutex.o
 obj-$(CONFIG_DEBUG_RT_MUTEXES) += rtmutex-debug.o
+endif
 obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock.o
 obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock_debug.o
 obj-$(CONFIG_RWSEM_GENERIC_SPINLOCK) += rwsem-spinlock.o
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index e4587ebe52..0ecc7eb9dc 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1008,7 +1008,7 @@ menu "Lock Debugging (spinlocks, mutexes, etc...)"
 
 config DEBUG_RT_MUTEXES
 	bool "RT Mutex debugging, deadlock detection"
-	depends on DEBUG_KERNEL && RT_MUTEXES
+	depends on DEBUG_KERNEL && RT_MUTEXES && SCHED_RT
 	help
 	 This allows rt mutex semantics violations and rt mutex related
 	 deadlocks (lockups) to be detected and reported automatically.
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/7] scheduler tinification
  2017-05-29 21:02 [PATCH 0/7] scheduler tinification Nicolas Pitre
                   ` (6 preceding siblings ...)
  2017-05-29 21:03 ` [PATCH 7/7] rtmutex: compatibility with CONFIG_SCHED_RT=n Nicolas Pitre
@ 2017-05-30  8:32 ` Peter Zijlstra
  7 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2017-05-30  8:32 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Ingo Molnar, linux-kernel

On Mon, May 29, 2017 at 05:02:55PM -0400, Nicolas Pitre wrote:
> Many embedded systems don't need the full scheduler support. Most of the
> time, user space is tightly controlled and many of the scheduler facilities
> are simply unused.
> 
> This patch series makes it possible to configure out some parts of the
> scheduler such as the deadline and realtime scheduler classes. The saving
> in kernel footprint is non negligible.

Not happy with this. We already have too many CONFIG_ knobs, I really
don't want more.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/7] sched/rt: make it configurable
  2017-05-29 21:03 ` [PATCH 6/7] sched/rt: make it configurable Nicolas Pitre
@ 2017-05-30  8:41   ` Peter Zijlstra
  2017-05-30 12:17     ` Nicolas Pitre
  2017-05-31 16:06   ` [6/7] " Rob Herring
  1 sibling, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2017-05-30  8:41 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Ingo Molnar, linux-kernel, Thomas Gleixner

On Mon, May 29, 2017 at 05:03:01PM -0400, Nicolas Pitre wrote:

> @@ -1303,8 +1303,17 @@ config SCHED_AUTOGROUP
>  	  desktop applications.  Task group autogeneration is currently based
>  	  upon task session.
>  
> +config SCHED_RT
> +	bool "Real Time Task Scheduling" if EXPERT
> +	default y
> +	help
> +	  This adds the sched_rt scheduling class to the kernel providing
> + 	  support for the SCHED_FIFO and SCHED_RR policies. You might want
> +	  to disable this to reduce the kernel size. If unsure say y.
> +
>  config SCHED_DL
>  	bool "Deadline Task Scheduling" if EXPERT
> +	depends on SCHED_RT
>  	default y
>  	help
>  	  This adds the sched_dl scheduling class to the kernel providing
> @@ -1632,6 +1641,7 @@ config BASE_FULL
>  config FUTEX
>  	bool "Enable futex support" if EXPERT
>  	default y
> +	depends on SCHED_RT
>  	select RT_MUTEXES
>  	help
>  	  Disabling this option will cause the kernel to be built without

Aside from all the other completely non-starter #ifdeffery trainwrecks,
this is just plain wrong.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/7] sched/rt: make it configurable
  2017-05-30  8:41   ` Peter Zijlstra
@ 2017-05-30 12:17     ` Nicolas Pitre
  2017-05-30 12:31       ` Peter Zijlstra
  0 siblings, 1 reply; 18+ messages in thread
From: Nicolas Pitre @ 2017-05-30 12:17 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel, Thomas Gleixner

On Tue, 30 May 2017, Peter Zijlstra wrote:

> On Mon, May 29, 2017 at 05:03:01PM -0400, Nicolas Pitre wrote:
> 
> > @@ -1303,8 +1303,17 @@ config SCHED_AUTOGROUP
> >  	  desktop applications.  Task group autogeneration is currently based
> >  	  upon task session.
> >  
> > +config SCHED_RT
> > +	bool "Real Time Task Scheduling" if EXPERT
> > +	default y
> > +	help
> > +	  This adds the sched_rt scheduling class to the kernel providing
> > + 	  support for the SCHED_FIFO and SCHED_RR policies. You might want
> > +	  to disable this to reduce the kernel size. If unsure say y.
> > +
> >  config SCHED_DL
> >  	bool "Deadline Task Scheduling" if EXPERT
> > +	depends on SCHED_RT
> >  	default y
> >  	help
> >  	  This adds the sched_dl scheduling class to the kernel providing
> > @@ -1632,6 +1641,7 @@ config BASE_FULL
> >  config FUTEX
> >  	bool "Enable futex support" if EXPERT
> >  	default y
> > +	depends on SCHED_RT
> >  	select RT_MUTEXES
> >  	help
> >  	  Disabling this option will cause the kernel to be built without
> 
> Aside from all the other completely non-starter #ifdeffery trainwrecks,
> this is just plain wrong.

Care to elaborate?

You might not like the approach, but you can't dismiss the goal just 
like that. So please help me do it right.


Nicolas

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/7] sched/rt: make it configurable
  2017-05-30 12:17     ` Nicolas Pitre
@ 2017-05-30 12:31       ` Peter Zijlstra
  2017-05-31  2:18         ` Nicolas Pitre
  2017-05-31  9:57         ` Daniel Bristot de Oliveira
  0 siblings, 2 replies; 18+ messages in thread
From: Peter Zijlstra @ 2017-05-30 12:31 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Ingo Molnar, linux-kernel, Thomas Gleixner

On Tue, May 30, 2017 at 08:17:00AM -0400, Nicolas Pitre wrote:
> On Tue, 30 May 2017, Peter Zijlstra wrote:
> 
> > On Mon, May 29, 2017 at 05:03:01PM -0400, Nicolas Pitre wrote:
> > 
> > > @@ -1303,8 +1303,17 @@ config SCHED_AUTOGROUP
> > >  	  desktop applications.  Task group autogeneration is currently based
> > >  	  upon task session.
> > >  
> > > +config SCHED_RT
> > > +	bool "Real Time Task Scheduling" if EXPERT
> > > +	default y
> > > +	help
> > > +	  This adds the sched_rt scheduling class to the kernel providing
> > > + 	  support for the SCHED_FIFO and SCHED_RR policies. You might want
> > > +	  to disable this to reduce the kernel size. If unsure say y.
> > > +
> > >  config SCHED_DL
> > >  	bool "Deadline Task Scheduling" if EXPERT
> > > +	depends on SCHED_RT
> > >  	default y
> > >  	help
> > >  	  This adds the sched_dl scheduling class to the kernel providing
> > > @@ -1632,6 +1641,7 @@ config BASE_FULL
> > >  config FUTEX
> > >  	bool "Enable futex support" if EXPERT
> > >  	default y
> > > +	depends on SCHED_RT
> > >  	select RT_MUTEXES
> > >  	help
> > >  	  Disabling this option will cause the kernel to be built without
> > 
> > Aside from all the other completely non-starter #ifdeffery trainwrecks,
> > this is just plain wrong.
> 
> Care to elaborate?

SCHED_DL does not in any way depend on SCHED_RT and futexes should not
wholly get axed when we lack SCHED_RT.

> You might not like the approach, but you can't dismiss the goal just 
> like that. So please help me do it right.

Why can't I dismiss it? All I see is ugly that makes maintenance worse
for very little to no benefit.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/7] sched/rt: make it configurable
  2017-05-30 12:31       ` Peter Zijlstra
@ 2017-05-31  2:18         ` Nicolas Pitre
  2017-05-31  9:57         ` Daniel Bristot de Oliveira
  1 sibling, 0 replies; 18+ messages in thread
From: Nicolas Pitre @ 2017-05-31  2:18 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel, Thomas Gleixner

On Tue, 30 May 2017, Peter Zijlstra wrote:

> On Tue, May 30, 2017 at 08:17:00AM -0400, Nicolas Pitre wrote:
> > On Tue, 30 May 2017, Peter Zijlstra wrote:
> > 
> > > On Mon, May 29, 2017 at 05:03:01PM -0400, Nicolas Pitre wrote:
> > > 
> > > > @@ -1303,8 +1303,17 @@ config SCHED_AUTOGROUP
> > > >  	  desktop applications.  Task group autogeneration is currently based
> > > >  	  upon task session.
> > > >  
> > > > +config SCHED_RT
> > > > +	bool "Real Time Task Scheduling" if EXPERT
> > > > +	default y
> > > > +	help
> > > > +	  This adds the sched_rt scheduling class to the kernel providing
> > > > + 	  support for the SCHED_FIFO and SCHED_RR policies. You might want
> > > > +	  to disable this to reduce the kernel size. If unsure say y.
> > > > +
> > > >  config SCHED_DL
> > > >  	bool "Deadline Task Scheduling" if EXPERT
> > > > +	depends on SCHED_RT
> > > >  	default y
> > > >  	help
> > > >  	  This adds the sched_dl scheduling class to the kernel providing
> > > > @@ -1632,6 +1641,7 @@ config BASE_FULL
> > > >  config FUTEX
> > > >  	bool "Enable futex support" if EXPERT
> > > >  	default y
> > > > +	depends on SCHED_RT
U> > > >  	  Disabling this option will cause the kernel to be 
built without
> > > 
> > > Aside from all the other completely non-starter #ifdeffery trainwrecks,
> > > this is just plain wrong.
> > 
> > Care to elaborate?
> 
> SCHED_DL does not in any way depend on SCHED_RT

After a second look, the actual dependencies are very thin, so that's 
easy to care for.

> and futexes should not
> wholly get axed when we lack SCHED_RT.

Indeed, only PI futexes depend on rt_mutexes. Will fix.

> > You might not like the approach, but you can't dismiss the goal just 
> > like that. So please help me do it right.
> 
> Why can't I dismiss it? All I see is ugly that makes maintenance worse
> for very little to no benefit.

Maybe there is no benefits to you and your use cases. But don't we want 
for the embedded crowd to get more involved with mainline?

I didn't like the #ifdefery myself, but I wanted to put those patches 
out early for comments. I'm grateful you provided yours and that they 
highlight things that look relatively easy to address.


Nicolas

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/7] sched/rt: make it configurable
  2017-05-30 12:31       ` Peter Zijlstra
  2017-05-31  2:18         ` Nicolas Pitre
@ 2017-05-31  9:57         ` Daniel Bristot de Oliveira
  2017-05-31 10:40           ` Peter Zijlstra
  1 sibling, 1 reply; 18+ messages in thread
From: Daniel Bristot de Oliveira @ 2017-05-31  9:57 UTC (permalink / raw)
  To: Peter Zijlstra, Nicolas Pitre; +Cc: Ingo Molnar, linux-kernel, Thomas Gleixner

On 05/30/2017 02:31 PM, Peter Zijlstra wrote:
> All I see is ugly that makes maintenance worse

s/maintenance/maintenance & development & understanding & .../

+1

-- Daniel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/7] sched/rt: make it configurable
  2017-05-31  9:57         ` Daniel Bristot de Oliveira
@ 2017-05-31 10:40           ` Peter Zijlstra
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2017-05-31 10:40 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: Nicolas Pitre, Ingo Molnar, linux-kernel, Thomas Gleixner

On Wed, May 31, 2017 at 11:57:26AM +0200, Daniel Bristot de Oliveira wrote:
> On 05/30/2017 02:31 PM, Peter Zijlstra wrote:
> > All I see is ugly that makes maintenance worse
> 
> s/maintenance/maintenance & development & understanding & .../

Just to be a total pain, '&' in the replacement string is the fully
matched regex, so what you're saying is:

"maintenance maintenance development maintenance understanding maintenance ..."

Now I do appreciate the unintended (it was, right?) emphasis on
maintenance though ;-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [6/7] sched/rt: make it configurable
  2017-05-29 21:03 ` [PATCH 6/7] sched/rt: make it configurable Nicolas Pitre
  2017-05-30  8:41   ` Peter Zijlstra
@ 2017-05-31 16:06   ` Rob Herring
  2017-05-31 16:25     ` Nicolas Pitre
  1 sibling, 1 reply; 18+ messages in thread
From: Rob Herring @ 2017-05-31 16:06 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Ingo Molnar, Peter Zijlstra, linux-kernel

On Mon, May 29, 2017 at 05:03:01PM -0400, Nicolas Pitre wrote:
> On most small systems where user space is tightly controlled, the realtime
> scheduling class can often be dispensed with to reduce the kernel footprint.
> Let's make it configurable.
> 
> Signed-off-by: Nicolas Pitre <nico@linaro.org>
> ---

>  static inline int rt_prio(int prio)
>  {
> -	if (unlikely(prio < MAX_RT_PRIO))
> +	if (IS_ENABLED(CONFIG_SCHED_RT) && unlikely(prio < MAX_RT_PRIO))
>  		return 1;
>  	return 0;
>  }

>  #ifdef CONFIG_PREEMPT_NOTIFIERS
>  	INIT_HLIST_HEAD(&p->preempt_notifiers);
> @@ -3716,13 +3720,18 @@ void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task)
>  		p->sched_class = &dl_sched_class;
>  	} else
>  #endif
> +#ifdef CONFIG_SCHED_RT
>  	if (rt_prio(prio)) {

This ifdef is not necessary since rt_prio is conditioned on 
CONFIG_SCHED_RT already.

>  		if (oldprio < prio)
>  			queue_flag |= ENQUEUE_HEAD;
>  		p->sched_class = &rt_sched_class;
> -	} else {
> +	} else
> +#endif
> +	{
> +#ifdef CONFIG_SCHED_RT
>  		if (rt_prio(oldprio))
>  			p->rt.timeout = 0;
> +#endif
>  		p->sched_class = &fair_sched_class;
>  	}
>  
> @@ -3997,6 +4006,23 @@ static int __sched_setscheduler(struct task_struct *p,
>  
>  	/* May grab non-irq protected spin_locks: */
>  	BUG_ON(in_interrupt());
> +
> +	/*
> +	 * When the RT scheduling class is disabled, let's make sure kernel threads
> +	 * wanting RT still get lowest nice value to give them highest available
> +	 * priority rather than simply returning an error. Obviously we can't test
> +	 * rt_policy() here as it is always false in that case.
> +	 */
> +	if (!IS_ENABLED(CONFIG_SCHED_RT) && !user &&
> +	    (policy == SCHED_FIFO || policy == SCHED_RR)) {
> +		static const struct sched_attr k_attr = {
> +			.sched_policy = SCHED_NORMAL,
> +			.sched_nice = MIN_NICE,
> +		};
> +		attr = &k_attr;
> +		policy = SCHED_NORMAL;
> +	}
> +
>  recheck:
>  	/* Double check policy once rq lock held: */
>  	if (policy < 0) {
> @@ -5726,7 +5752,9 @@ void __init sched_init_smp(void)
>  	sched_init_granularity();
>  	free_cpumask_var(non_isolated_cpus);
>  
> +#ifdef CONFIG_SCHED_RT
>  	init_sched_rt_class();
> +#endif

You can do an empty inline function for !CONFIG_SCHED_RT.

>  #ifdef CONFIG_SCHED_DL
>  	init_sched_dl_class();
>  #endif

And here in the earlier patch.

> @@ -5832,7 +5860,9 @@ void __init sched_init(void)
>  	}
>  #endif /* CONFIG_CPUMASK_OFFSTACK */
>  
> +#ifdef CONFIG_SCHED_RT
>  	init_rt_bandwidth(&def_rt_bandwidth, global_rt_period(), global_rt_runtime());
> +#endif

And so on...

Rob

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [6/7] sched/rt: make it configurable
  2017-05-31 16:06   ` [6/7] " Rob Herring
@ 2017-05-31 16:25     ` Nicolas Pitre
  0 siblings, 0 replies; 18+ messages in thread
From: Nicolas Pitre @ 2017-05-31 16:25 UTC (permalink / raw)
  To: Rob Herring; +Cc: Ingo Molnar, Peter Zijlstra, linux-kernel

On Wed, 31 May 2017, Rob Herring wrote:

> On Mon, May 29, 2017 at 05:03:01PM -0400, Nicolas Pitre wrote:
> > On most small systems where user space is tightly controlled, the realtime
> > scheduling class can often be dispensed with to reduce the kernel footprint.
> > Let's make it configurable.
> > 
> > Signed-off-by: Nicolas Pitre <nico@linaro.org>
> > ---
> 
> >  static inline int rt_prio(int prio)
> >  {
> > -	if (unlikely(prio < MAX_RT_PRIO))
> > +	if (IS_ENABLED(CONFIG_SCHED_RT) && unlikely(prio < MAX_RT_PRIO))
> >  		return 1;
> >  	return 0;
> >  }
> 
> >  #ifdef CONFIG_PREEMPT_NOTIFIERS
> >  	INIT_HLIST_HEAD(&p->preempt_notifiers);
> > @@ -3716,13 +3720,18 @@ void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task)
> >  		p->sched_class = &dl_sched_class;
> >  	} else
> >  #endif
> > +#ifdef CONFIG_SCHED_RT
> >  	if (rt_prio(prio)) {
> 
> This ifdef is not necessary since rt_prio is conditioned on 
> CONFIG_SCHED_RT already.

Yeah, that was the intent. In many places the conditioned code 
dereferences p->rt.* and the compiler complains. So I added a couple 
#ifdefs to make it build until something better comes to mind. This 
particular one was unnecessary.

I'm on prettifying the whole thing at the moment. I wanted early 
comments and so far they're all meaning the same thing which is good.


Nicolas

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [tip:sched/core] sched/core: Omit building stop_sched_class when !SMP
  2017-05-29 21:02 ` [PATCH 2/7] sched: omit stop_sched_class when !SMP Nicolas Pitre
@ 2017-06-08  9:30   ` tip-bot for Nicolas Pitre
  0 siblings, 0 replies; 18+ messages in thread
From: tip-bot for Nicolas Pitre @ 2017-06-08  9:30 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, nicolas.pitre, peterz, linux-kernel, hpa, nico, tglx,
	mingo, efault

Commit-ID:  f5832c1998af2ca8d9947792d1c8e1816ab58e57
Gitweb:     http://git.kernel.org/tip/f5832c1998af2ca8d9947792d1c8e1816ab58e57
Author:     Nicolas Pitre <nicolas.pitre@linaro.org>
AuthorDate: Mon, 29 May 2017 17:02:57 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 8 Jun 2017 10:32:04 +0200

sched/core: Omit building stop_sched_class when !SMP

The stop class is invoked through stop_machine only.
This is dead code on UP builds.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170529210302.26868-3-nicolas.pitre@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/Makefile |  4 ++--
 kernel/sched/core.c   | 60 +++++++++++++++++++++++++--------------------------
 kernel/sched/sched.h  |  4 ++++
 3 files changed, 36 insertions(+), 32 deletions(-)

diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index 89ab675..5e4c2e7 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -16,9 +16,9 @@ CFLAGS_core.o := $(PROFILING) -fno-omit-frame-pointer
 endif
 
 obj-y += core.o loadavg.o clock.o cputime.o
-obj-y += idle_task.o fair.o rt.o deadline.o stop_task.o
+obj-y += idle_task.o fair.o rt.o deadline.o
 obj-y += wait.o swait.o completion.o idle.o
-obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o
+obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o
 obj-$(CONFIG_SCHED_AUTOGROUP) += autogroup.o
 obj-$(CONFIG_SCHEDSTATS) += stats.o
 obj-$(CONFIG_SCHED_DEBUG) += debug.o
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e5bd587..c343b81 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -788,36 +788,6 @@ void deactivate_task(struct rq *rq, struct task_struct *p, int flags)
 	dequeue_task(rq, p, flags);
 }
 
-void sched_set_stop_task(int cpu, struct task_struct *stop)
-{
-	struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 };
-	struct task_struct *old_stop = cpu_rq(cpu)->stop;
-
-	if (stop) {
-		/*
-		 * Make it appear like a SCHED_FIFO task, its something
-		 * userspace knows about and won't get confused about.
-		 *
-		 * Also, it will make PI more or less work without too
-		 * much confusion -- but then, stop work should not
-		 * rely on PI working anyway.
-		 */
-		sched_setscheduler_nocheck(stop, SCHED_FIFO, &param);
-
-		stop->sched_class = &stop_sched_class;
-	}
-
-	cpu_rq(cpu)->stop = stop;
-
-	if (old_stop) {
-		/*
-		 * Reset it back to a normal scheduling class so that
-		 * it can die in pieces.
-		 */
-		old_stop->sched_class = &rt_sched_class;
-	}
-}
-
 /*
  * __normal_prio - return the priority that is based on the static prio
  */
@@ -1588,6 +1558,36 @@ static void update_avg(u64 *avg, u64 sample)
 	*avg += diff >> 3;
 }
 
+void sched_set_stop_task(int cpu, struct task_struct *stop)
+{
+	struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 };
+	struct task_struct *old_stop = cpu_rq(cpu)->stop;
+
+	if (stop) {
+		/*
+		 * Make it appear like a SCHED_FIFO task, its something
+		 * userspace knows about and won't get confused about.
+		 *
+		 * Also, it will make PI more or less work without too
+		 * much confusion -- but then, stop work should not
+		 * rely on PI working anyway.
+		 */
+		sched_setscheduler_nocheck(stop, SCHED_FIFO, &param);
+
+		stop->sched_class = &stop_sched_class;
+	}
+
+	cpu_rq(cpu)->stop = stop;
+
+	if (old_stop) {
+		/*
+		 * Reset it back to a normal scheduling class so that
+		 * it can die in pieces.
+		 */
+		old_stop->sched_class = &rt_sched_class;
+	}
+}
+
 #else
 
 static inline int __set_cpus_allowed_ptr(struct task_struct *p,
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index f1e400c..f2ef759a 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1453,7 +1453,11 @@ static inline void set_curr_task(struct rq *rq, struct task_struct *curr)
 	curr->sched_class->set_curr_task(rq);
 }
 
+#ifdef CONFIG_SMP
 #define sched_class_highest (&stop_sched_class)
+#else
+#define sched_class_highest (&dl_sched_class)
+#endif
 #define for_each_class(class) \
    for (class = sched_class_highest; class; class = class->next)
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-06-08  9:34 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-29 21:02 [PATCH 0/7] scheduler tinification Nicolas Pitre
2017-05-29 21:02 ` [PATCH 1/7] cpuset/sched: cpuset makes sense for SMP only Nicolas Pitre
2017-05-29 21:02 ` [PATCH 2/7] sched: omit stop_sched_class when !SMP Nicolas Pitre
2017-06-08  9:30   ` [tip:sched/core] sched/core: Omit building " tip-bot for Nicolas Pitre
2017-05-29 21:02 ` [PATCH 3/7] sched/deadline: move dl related code out of sched/core.c Nicolas Pitre
2017-05-29 21:02 ` [PATCH 4/7] sched/deadline: make it configurable Nicolas Pitre
2017-05-29 21:03 ` [PATCH 5/7] sched/rt: move rt related code out of sched/core.c Nicolas Pitre
2017-05-29 21:03 ` [PATCH 6/7] sched/rt: make it configurable Nicolas Pitre
2017-05-30  8:41   ` Peter Zijlstra
2017-05-30 12:17     ` Nicolas Pitre
2017-05-30 12:31       ` Peter Zijlstra
2017-05-31  2:18         ` Nicolas Pitre
2017-05-31  9:57         ` Daniel Bristot de Oliveira
2017-05-31 10:40           ` Peter Zijlstra
2017-05-31 16:06   ` [6/7] " Rob Herring
2017-05-31 16:25     ` Nicolas Pitre
2017-05-29 21:03 ` [PATCH 7/7] rtmutex: compatibility with CONFIG_SCHED_RT=n Nicolas Pitre
2017-05-30  8:32 ` [PATCH 0/7] scheduler tinification Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.