All of lore.kernel.org
 help / color / mirror / Atom feed
* A pile of sched patches
@ 2014-02-07 19:58 Sebastian Andrzej Siewior
  2014-02-07 19:58 ` [PATCH 1/6] sched: Init idle->on_rq in init_idle() Sebastian Andrzej Siewior
                   ` (5 more replies)
  0 siblings, 6 replies; 24+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-02-07 19:58 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar; +Cc: Thomas Gleixner, linux-kernel

These patches were sitting -RT for quite some time. I think they sneaked
in during the v3.10 time frame.
Three are of them fix bugs and are marked for stable. The other three
improve might_sleep() debuging / output.

Sebastian


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/6] sched: Init idle->on_rq in init_idle()
  2014-02-07 19:58 A pile of sched patches Sebastian Andrzej Siewior
@ 2014-02-07 19:58 ` Sebastian Andrzej Siewior
  2014-02-07 21:09   ` Peter Zijlstra
                     ` (2 more replies)
  2014-02-07 19:58 ` [PATCH 2/6] sched: Check for idle task in might_sleep() Sebastian Andrzej Siewior
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 24+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-02-07 19:58 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Thomas Gleixner, linux-kernel, Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/sched/core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b46131e..64f75f9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4473,6 +4473,7 @@ void init_idle(struct task_struct *idle, int cpu)
 	rcu_read_unlock();
 
 	rq->curr = rq->idle = idle;
+	idle->on_rq = 1;
 #if defined(CONFIG_SMP)
 	idle->on_cpu = 1;
 #endif
-- 
1.9.rc1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/6] sched: Check for idle task in might_sleep()
  2014-02-07 19:58 A pile of sched patches Sebastian Andrzej Siewior
  2014-02-07 19:58 ` [PATCH 1/6] sched: Init idle->on_rq in init_idle() Sebastian Andrzej Siewior
@ 2014-02-07 19:58 ` Sebastian Andrzej Siewior
  2014-02-21 21:31   ` [tip:sched/core] " tip-bot for Thomas Gleixner
  2014-02-22 18:02   ` tip-bot for Thomas Gleixner
  2014-02-07 19:58 ` [PATCH 3/6] sched: Better debug output for might sleep Sebastian Andrzej Siewior
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 24+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-02-07 19:58 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Thomas Gleixner, linux-kernel, Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

Idle is not allowed to call sleeping functions ever!

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/sched/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 64f75f9..fbaec2a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6938,7 +6938,8 @@ void __might_sleep(const char *file, int line, int preempt_offset)
 	static unsigned long prev_jiffy;	/* ratelimiting */
 
 	rcu_sleep_check(); /* WARN_ON_ONCE() by default, no rate limit reqd. */
-	if ((preempt_count_equals(preempt_offset) && !irqs_disabled()) ||
+	if ((preempt_count_equals(preempt_offset) && !irqs_disabled() &&
+	     !is_idle_task(current)) ||
 	    system_state != SYSTEM_RUNNING || oops_in_progress)
 		return;
 	if (time_before(jiffies, prev_jiffy + HZ) && prev_jiffy)
-- 
1.9.rc1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/6] sched: Better debug output for might sleep
  2014-02-07 19:58 A pile of sched patches Sebastian Andrzej Siewior
  2014-02-07 19:58 ` [PATCH 1/6] sched: Init idle->on_rq in init_idle() Sebastian Andrzej Siewior
  2014-02-07 19:58 ` [PATCH 2/6] sched: Check for idle task in might_sleep() Sebastian Andrzej Siewior
@ 2014-02-07 19:58 ` Sebastian Andrzej Siewior
  2014-02-21 21:31   ` [tip:sched/core] " tip-bot for Thomas Gleixner
  2014-02-22 18:02   ` [tip:sched/core] sched: Add better debug output for might_sleep() tip-bot for Thomas Gleixner
  2014-02-07 19:58 ` [PATCH 4/6] sched: Adjust sched_reset_on_fork when nothing else changes Sebastian Andrzej Siewior
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 24+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-02-07 19:58 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Thomas Gleixner, linux-kernel, Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

might sleep can tell us where interrupts have been disabled, but we
have no idea what disabled preemption. Add some debug infrastructure.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/sched.h |  3 +++
 kernel/sched/core.c   | 23 +++++++++++++++++++++--
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 68a0e84..0e2fb59 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1459,6 +1459,9 @@ struct task_struct {
 	struct mutex perf_event_mutex;
 	struct list_head perf_event_list;
 #endif
+#ifdef CONFIG_DEBUG_PREEMPT
+	unsigned long preempt_disable_ip;
+#endif
 #ifdef CONFIG_NUMA
 	struct mempolicy *mempolicy;	/* Protected by alloc_lock */
 	short il_next;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fbaec2a..3112b28 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2510,8 +2510,13 @@ void __kprobes preempt_count_add(int val)
 	DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >=
 				PREEMPT_MASK - 10);
 #endif
-	if (preempt_count() == val)
-		trace_preempt_off(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
+	if (preempt_count() == val) {
+		unsigned long ip = get_parent_ip(CALLER_ADDR1);
+#ifdef CONFIG_DEBUG_PREEMPT
+		current->preempt_disable_ip = ip;
+#endif
+		trace_preempt_off(CALLER_ADDR0, ip);
+	}
 }
 EXPORT_SYMBOL(preempt_count_add);
 
@@ -2554,6 +2559,13 @@ static noinline void __schedule_bug(struct task_struct *prev)
 	print_modules();
 	if (irqs_disabled())
 		print_irqtrace_events(prev);
+#ifdef CONFIG_DEBUG_PREEMPT
+	if (in_atomic_preempt_off()) {
+		pr_err("Preemption disabled at:");
+		print_ip_sym(current->preempt_disable_ip);
+		pr_cont("\n");
+	}
+#endif
 	dump_stack();
 	add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
 }
@@ -6957,6 +6969,13 @@ void __might_sleep(const char *file, int line, int preempt_offset)
 	debug_show_held_locks(current);
 	if (irqs_disabled())
 		print_irqtrace_events(current);
+#ifdef CONFIG_DEBUG_PREEMPT
+	if (!preempt_count_equals(preempt_offset)) {
+		pr_err("Preemption disabled at:");
+		print_ip_sym(current->preempt_disable_ip);
+		pr_cont("\n");
+	}
+#endif
 	dump_stack();
 }
 EXPORT_SYMBOL(__might_sleep);
-- 
1.9.rc1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/6] sched: Adjust sched_reset_on_fork when nothing else changes
  2014-02-07 19:58 A pile of sched patches Sebastian Andrzej Siewior
                   ` (2 preceding siblings ...)
  2014-02-07 19:58 ` [PATCH 3/6] sched: Better debug output for might sleep Sebastian Andrzej Siewior
@ 2014-02-07 19:58 ` Sebastian Andrzej Siewior
  2014-02-21 21:32   ` [tip:sched/core] " tip-bot for Thomas Gleixner
  2014-02-22 18:02   ` [tip:sched/core] sched: Adjust p-> " tip-bot for Thomas Gleixner
  2014-02-07 19:58 ` [PATCH 5/6] sched: Queue RT tasks to head when prio drops Sebastian Andrzej Siewior
  2014-02-07 19:58 ` [PATCH 6/6] sched: Consider pi boosting in setscheduler Sebastian Andrzej Siewior
  5 siblings, 2 replies; 24+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-02-07 19:58 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Thomas Gleixner, linux-kernel, stable, Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

If the policy and priority remain unchanged a possible modification of
sched_reset_on_fork gets lost in the early exit path.

Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: rebase ontop of v3.14-rc1]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/sched/core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3112b28..a4c06a8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3392,7 +3392,8 @@ static int __sched_setscheduler(struct task_struct *p,
 	}
 
 	/*
-	 * If not changing anything there's no need to proceed further:
+	 * If not changing anything there's no need to proceed further,
+	 * but store a possible modification of reset_on_fork.
 	 */
 	if (unlikely(policy == p->policy)) {
 		if (fair_policy(policy) && attr->sched_nice != TASK_NICE(p))
@@ -3402,6 +3403,7 @@ static int __sched_setscheduler(struct task_struct *p,
 		if (dl_policy(policy))
 			goto change;
 
+		p->sched_reset_on_fork = reset_on_fork;
 		task_rq_unlock(rq, p, &flags);
 		return 0;
 	}
-- 
1.9.rc1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/6] sched: Queue RT tasks to head when prio drops
  2014-02-07 19:58 A pile of sched patches Sebastian Andrzej Siewior
                   ` (3 preceding siblings ...)
  2014-02-07 19:58 ` [PATCH 4/6] sched: Adjust sched_reset_on_fork when nothing else changes Sebastian Andrzej Siewior
@ 2014-02-07 19:58 ` Sebastian Andrzej Siewior
  2014-02-21 21:32   ` [tip:sched/core] " tip-bot for Thomas Gleixner
  2014-02-22 18:02   ` tip-bot for Thomas Gleixner
  2014-02-07 19:58 ` [PATCH 6/6] sched: Consider pi boosting in setscheduler Sebastian Andrzej Siewior
  5 siblings, 2 replies; 24+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-02-07 19:58 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Thomas Gleixner, linux-kernel, stable, Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

The following scenario does not work correctly:

Runqueue of CPUx contains two runnable and pinned tasks:
 T1: SCHED_FIFO, prio 80
 T2: SCHED_FIFO, prio 80

T1 is on the cpu and executes the following syscalls (classic priority
ceiling scenario):

 sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 90);
 ...
 sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 80);
 ...

Now T1 gets preempted by T3 (SCHED_FIFO, prio 95). After T3 goes back
to sleep the scheduler picks T2. Surprise!

The same happens w/o actual preemption when T1 is forced into the
scheduler due to a sporadic NEED_RESCHED event. The scheduler invokes
pick_next_task() which returns T2. So T1 gets preempted and scheduled
out.

This happens because sched_setscheduler() dequeues T1 from the prio 90
list and then enqueues it on the tail of the prio 80 list behind T2.
This violates the POSIX spec and surprises user space which relies on
the guarantee that SCHED_FIFO tasks are not scheduled out unless they
give the CPU up voluntarily or are preempted by a higher priority
task. In the latter case the preempted task must get back on the CPU
after the preempting task schedules out again.

We fixed a similar issue already in commit 60db48c (sched: Queue a
deboosted task to the head of the RT prio queue). The same treatment
is necessary for sched_setscheduler(). So enqueue to head of the prio
bucket list if the priority of the task is lowered.

It might be possible that existing user space relies on the current
behaviour, but it can be considered highly unlikely due to the corner
case nature of the application scenario.

Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/sched/core.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a4c06a8..d0e7825 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3472,8 +3472,13 @@ static int __sched_setscheduler(struct task_struct *p,
 
 	if (running)
 		p->sched_class->set_curr_task(rq);
-	if (on_rq)
-		enqueue_task(rq, p, 0);
+	if (on_rq) {
+		/*
+		 * We enqueue to tail when the priority of a task is
+		 * increased (user space view).
+		 */
+		enqueue_task(rq, p, oldprio <= p->prio ? ENQUEUE_HEAD : 0);
+	}
 
 	check_class_changed(rq, p, prev_class, oldprio);
 	task_rq_unlock(rq, p, &flags);
-- 
1.9.rc1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 6/6] sched: Consider pi boosting in setscheduler
  2014-02-07 19:58 A pile of sched patches Sebastian Andrzej Siewior
                   ` (4 preceding siblings ...)
  2014-02-07 19:58 ` [PATCH 5/6] sched: Queue RT tasks to head when prio drops Sebastian Andrzej Siewior
@ 2014-02-07 19:58 ` Sebastian Andrzej Siewior
  2014-02-21 21:32   ` [tip:sched/core] " tip-bot for Thomas Gleixner
  2014-02-22 18:02   ` [tip:sched/core] sched: Consider pi boosting in setscheduler() tip-bot for Thomas Gleixner
  5 siblings, 2 replies; 24+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-02-07 19:58 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Thomas Gleixner, linux-kernel, Dario Faggioli, stable,
	Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

If a PI boosted task policy/priority is modified by a setscheduler()
call we unconditionally dequeue and requeue the task if it is on the
runqueue even if the new priority is lower than the current effective
boosted priority. This can result in undesired reordering of the
priority bucket list.

If the new priority is less or equal than the current effective we
just store the new parameters in the task struct and leave the
scheduler class and the runqueue untouched. This is handled when the
task deboosts itself. Only if the new priority is higher than the
effective boosted priority we apply the change immediately.

Cc: Dario Faggioli <raistlin@linux.it>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: rebase ontop of v3.14-rc1]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/sched/rt.h |  7 +++++++
 kernel/locking/rtmutex.c | 12 ++++++++++++
 kernel/sched/core.c      | 41 ++++++++++++++++++++++++++++++-----------
 3 files changed, 49 insertions(+), 11 deletions(-)

diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h
index 34e4ebe..72c9f3a 100644
--- a/include/linux/sched/rt.h
+++ b/include/linux/sched/rt.h
@@ -35,6 +35,7 @@ static inline int rt_task(struct task_struct *p)
 #ifdef CONFIG_RT_MUTEXES
 extern int rt_mutex_getprio(struct task_struct *p);
 extern void rt_mutex_setprio(struct task_struct *p, int prio);
+extern int rt_mutex_check_prio(struct task_struct *task, int newprio);
 extern struct task_struct *rt_mutex_get_top_task(struct task_struct *task);
 extern void rt_mutex_adjust_pi(struct task_struct *p);
 static inline bool tsk_is_pi_blocked(struct task_struct *tsk)
@@ -46,6 +47,12 @@ static inline int rt_mutex_getprio(struct task_struct *p)
 {
 	return p->normal_prio;
 }
+
+static inline int rt_mutex_check_prio(struct task_struct *task, int newprio)
+{
+	return 0;
+}
+
 static inline struct task_struct *rt_mutex_get_top_task(struct task_struct *task)
 {
 	return NULL;
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 2e960a2..aa4dff0 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -213,6 +213,18 @@ struct task_struct *rt_mutex_get_top_task(struct task_struct *task)
 }
 
 /*
+ * Called by sched_setscheduler() to check whether the priority change
+ * is overruled by a possible priority boosting.
+ */
+int rt_mutex_check_prio(struct task_struct *task, int newprio)
+{
+	if (!task_has_pi_waiters(task))
+		return 0;
+
+	return task_top_pi_waiter(task)->task->prio <= newprio;
+}
+
+/*
  * Adjust the priority of a task, after its pi_waiters got modified.
  *
  * This can be both boosting and unboosting. task->pi_lock must be held.
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d0e7825..6dda083 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2920,7 +2920,8 @@ EXPORT_SYMBOL(sleep_on_timeout);
  * This function changes the 'effective' priority of a task. It does
  * not touch ->normal_prio like __setscheduler().
  *
- * Used by the rt_mutex code to implement priority inheritance logic.
+ * Used by the rt_mutex code to implement priority inheritance
+ * logic. Call site only calls if the priority of the task changed.
  */
 void rt_mutex_setprio(struct task_struct *p, int prio)
 {
@@ -3201,9 +3202,8 @@ __setparam_dl(struct task_struct *p, const struct sched_attr *attr)
 	dl_se->dl_new = 1;
 }
 
-/* Actually do priority change: must hold pi & rq lock. */
-static void __setscheduler(struct rq *rq, struct task_struct *p,
-			   const struct sched_attr *attr)
+static void __setscheduler_params(struct task_struct *p,
+		const struct sched_attr *attr)
 {
 	int policy = attr->sched_policy;
 
@@ -3223,9 +3223,14 @@ static void __setscheduler(struct rq *rq, struct task_struct *p,
 	 * getparam()/getattr() don't report silly values for !rt tasks.
 	 */
 	p->rt_priority = attr->sched_priority;
+	set_load_weight(p);
+}
 
-	p->normal_prio = normal_prio(p);
-	p->prio = rt_mutex_getprio(p);
+/* Actually do priority change: must hold pi & rq lock. */
+static void __setscheduler(struct rq *rq, struct task_struct *p,
+			   const struct sched_attr *attr)
+{
+	__setscheduler_params(p, attr);
 
 	if (dl_prio(p->prio))
 		p->sched_class = &dl_sched_class;
@@ -3233,8 +3238,6 @@ static void __setscheduler(struct rq *rq, struct task_struct *p,
 		p->sched_class = &rt_sched_class;
 	else
 		p->sched_class = &fair_sched_class;
-
-	set_load_weight(p);
 }
 
 static void
@@ -3287,6 +3290,7 @@ static int __sched_setscheduler(struct task_struct *p,
 				const struct sched_attr *attr,
 				bool user)
 {
+	int newprio = MAX_RT_PRIO - 1 - attr->sched_priority;
 	int retval, oldprio, oldpolicy = -1, on_rq, running;
 	int policy = attr->sched_policy;
 	unsigned long flags;
@@ -3457,6 +3461,24 @@ static int __sched_setscheduler(struct task_struct *p,
 		return -EBUSY;
 	}
 
+	p->sched_reset_on_fork = reset_on_fork;
+	oldprio = p->prio;
+
+	/*
+	 * Special case for priority boosted tasks.
+	 *
+	 * If the new priority is lower or equal (user space view)
+	 * than the current (boosted) priority, we just store the new
+	 * normal parameters and do not touch the scheduler class and
+	 * the runqueue. This will be done when the task deboost
+	 * itself.
+	 */
+	if (rt_mutex_check_prio(p, newprio)) {
+		__setscheduler_params(p, attr);
+		task_rq_unlock(rq, p, &flags);
+		return 0;
+	}
+
 	on_rq = p->on_rq;
 	running = task_current(rq, p);
 	if (on_rq)
@@ -3464,9 +3486,6 @@ static int __sched_setscheduler(struct task_struct *p,
 	if (running)
 		p->sched_class->put_prev_task(rq, p);
 
-	p->sched_reset_on_fork = reset_on_fork;
-
-	oldprio = p->prio;
 	prev_class = p->sched_class;
 	__setscheduler(rq, p, attr);
 
-- 
1.9.rc1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/6] sched: Init idle->on_rq in init_idle()
  2014-02-07 19:58 ` [PATCH 1/6] sched: Init idle->on_rq in init_idle() Sebastian Andrzej Siewior
@ 2014-02-07 21:09   ` Peter Zijlstra
  2014-02-11  9:17     ` [PATCH 1/6 v2] " Sebastian Andrzej Siewior
  2014-02-21 21:31   ` [tip:sched/core] " tip-bot for Thomas Gleixner
  2014-02-22 18:01   ` tip-bot for Thomas Gleixner
  2 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2014-02-07 21:09 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel

On Fri, Feb 07, 2014 at 08:58:37PM +0100, Sebastian Andrzej Siewior wrote:
> From: Thomas Gleixner <tglx@linutronix.de>

Distinct lack of rationale here.

> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  kernel/sched/core.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index b46131e..64f75f9 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4473,6 +4473,7 @@ void init_idle(struct task_struct *idle, int cpu)
>  	rcu_read_unlock();
>  
>  	rq->curr = rq->idle = idle;
> +	idle->on_rq = 1;
>  #if defined(CONFIG_SMP)
>  	idle->on_cpu = 1;
>  #endif
> -- 
> 1.9.rc1
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/6 v2] sched: Init idle->on_rq in init_idle()
  2014-02-07 21:09   ` Peter Zijlstra
@ 2014-02-11  9:17     ` Sebastian Andrzej Siewior
  2014-02-11  9:21       ` Peter Zijlstra
  0 siblings, 1 reply; 24+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-02-11  9:17 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel

The init task is state TASK_RUNNING and on_irq should be set to 1. It won't
be set by scheduler because the idle task is never woken up, it is always the
task we fall back to if there is no other task pending.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: add patch description]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
v1..v2: add patch description

 kernel/sched/core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b46131e..64f75f9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4473,6 +4473,7 @@ void init_idle(struct task_struct *idle, int cpu)
 	rcu_read_unlock();
 
 	rq->curr = rq->idle = idle;
+	idle->on_rq = 1;
 #if defined(CONFIG_SMP)
 	idle->on_cpu = 1;
 #endif
-- 
1.9.rc1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/6 v2] sched: Init idle->on_rq in init_idle()
  2014-02-11  9:17     ` [PATCH 1/6 v2] " Sebastian Andrzej Siewior
@ 2014-02-11  9:21       ` Peter Zijlstra
  2014-02-11 15:34         ` Thomas Gleixner
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2014-02-11  9:21 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel

On Tue, Feb 11, 2014 at 10:17:58AM +0100, Sebastian Andrzej Siewior wrote:
> The init task is state TASK_RUNNING and on_irq should be set to 1. It won't

					^^^ irq? :-)

> be set by scheduler because the idle task is never woken up, it is always the
> task we fall back to if there is no other task pending.

But why? Who cares? I mean, its true.. but what problem does it solve.
Why did Thomas write this patch.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/6 v2] sched: Init idle->on_rq in init_idle()
  2014-02-11  9:21       ` Peter Zijlstra
@ 2014-02-11 15:34         ` Thomas Gleixner
  2014-02-11 15:51           ` Peter Zijlstra
  0 siblings, 1 reply; 24+ messages in thread
From: Thomas Gleixner @ 2014-02-11 15:34 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Sebastian Andrzej Siewior, Ingo Molnar, linux-kernel

On Tue, 11 Feb 2014, Peter Zijlstra wrote:

> On Tue, Feb 11, 2014 at 10:17:58AM +0100, Sebastian Andrzej Siewior wrote:
> > The init task is state TASK_RUNNING and on_irq should be set to 1. It won't
> 
> 					^^^ irq? :-)
> 
> > be set by scheduler because the idle task is never woken up, it is always the
> > task we fall back to if there is no other task pending.
> 
> But why? Who cares? I mean, its true.. but what problem does it solve.
> Why did Thomas write this patch.

I could slap myself for not writing a proper changelog right away. It
took me some time to figure out why it was added in the first place,
why it's not longer necessary and why I kept it.

We stumbled in RT over a SMP bringup issue on ARM where the
idle->on_rq == 0 was causing try_to_wakeup() on the other cpu to run
into nada land.

After adding that idle->on_rq = 1; I was able to find the root cause
of the lockup: the idle task on the newly woken up cpu was fiddling
with a sleeping spinlock, which is a nono.

I kept the init of idle->on_rq to keep the state consistent and to
avoid another long lasting debug session.

As a side note, the whole debug mess could have been avoided if
might_sleep() would have yelled when called from the idle task. That's
fixed with patch 2/6 - and that one actually has a changelog :)

Thanks,

	tglx







^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/6 v2] sched: Init idle->on_rq in init_idle()
  2014-02-11 15:34         ` Thomas Gleixner
@ 2014-02-11 15:51           ` Peter Zijlstra
  0 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2014-02-11 15:51 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Sebastian Andrzej Siewior, Ingo Molnar, linux-kernel

On Tue, Feb 11, 2014 at 04:34:38PM +0100, Thomas Gleixner wrote:
> I could slap myself for not writing a proper changelog right away. It
> took me some time to figure out why it was added in the first place,
> why it's not longer necessary and why I kept it.

:-)

Thanks!

---
Subject: sched: Init idle->on_rq in init_idle()
From: Thomas Gleixner <tglx@linutronix.de>
Date: Fri, 7 Feb 2014 20:58:37 +0100

We stumbled in RT over a SMP bringup issue on ARM where the
idle->on_rq == 0 was causing try_to_wakeup() on the other cpu to run
into nada land.

After adding that idle->on_rq = 1; I was able to find the root cause
of the lockup: the idle task on the newly woken up cpu was fiddling
with a sleeping spinlock, which is a nono.

I kept the init of idle->on_rq to keep the state consistent and to
avoid another long lasting debug session.

As a side note, the whole debug mess could have been avoided if
might_sleep() would have yelled when called from the idle task. That's
fixed with patch 2/6 - and that one actually has a changelog :)

Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1391803122-4425-2-git-send-email-bigeasy@linutronix.de
---
 kernel/sched/core.c |    1 +
 1 file changed, 1 insertion(+)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4443,6 +4443,7 @@ void init_idle(struct task_struct *idle,
 	rcu_read_unlock();
 
 	rq->curr = rq->idle = idle;
+	idle->on_rq = 1;
 #if defined(CONFIG_SMP)
 	idle->on_cpu = 1;
 #endif

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [tip:sched/core] sched: Init idle->on_rq in init_idle()
  2014-02-07 19:58 ` [PATCH 1/6] sched: Init idle->on_rq in init_idle() Sebastian Andrzej Siewior
  2014-02-07 21:09   ` Peter Zijlstra
@ 2014-02-21 21:31   ` tip-bot for Thomas Gleixner
  2014-02-22 18:01   ` tip-bot for Thomas Gleixner
  2 siblings, 0 replies; 24+ messages in thread
From: tip-bot for Thomas Gleixner @ 2014-02-21 21:31 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, mingo, hpa, mingo, peterz, bigeasy, tglx

Commit-ID:  e037fef5fb7d2c887916576c1a288c821b6268f4
Gitweb:     http://git.kernel.org/tip/e037fef5fb7d2c887916576c1a288c821b6268f4
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Fri, 7 Feb 2014 20:58:37 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 21 Feb 2014 21:43:18 +0100

sched: Init idle->on_rq in init_idle()

We stumbled in RT over a SMP bringup issue on ARM where the
idle->on_rq == 0 was causing try_to_wakeup() on the other cpu to run
into nada land.

After adding that idle->on_rq = 1; I was able to find the root cause
of the lockup: the idle task on the newly woken up cpu was fiddling
with a sleeping spinlock, which is a nono.

I kept the init of idle->on_rq to keep the state consistent and to
avoid another long lasting debug session.

As a side note, the whole debug mess could have been avoided if
might_sleep() would have yelled when called from the idle task. That's
fixed with patch 2/6 - and that one actually has a changelog :)

Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1391803122-4425-2-git-send-email-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4c8aaf0..86e0558 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4451,6 +4451,7 @@ void init_idle(struct task_struct *idle, int cpu)
 	rcu_read_unlock();
 
 	rq->curr = rq->idle = idle;
+	idle->on_rq = 1;
 #if defined(CONFIG_SMP)
 	idle->on_cpu = 1;
 #endif

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [tip:sched/core] sched: Check for idle task in might_sleep()
  2014-02-07 19:58 ` [PATCH 2/6] sched: Check for idle task in might_sleep() Sebastian Andrzej Siewior
@ 2014-02-21 21:31   ` tip-bot for Thomas Gleixner
  2014-02-22 18:02   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 24+ messages in thread
From: tip-bot for Thomas Gleixner @ 2014-02-21 21:31 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, mingo, hpa, mingo, peterz, bigeasy, tglx

Commit-ID:  a2885f6bf67671c83f9162a271bc8898d5062842
Gitweb:     http://git.kernel.org/tip/a2885f6bf67671c83f9162a271bc8898d5062842
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Fri, 7 Feb 2014 20:58:38 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 21 Feb 2014 21:43:18 +0100

sched: Check for idle task in might_sleep()

Idle is not allowed to call sleeping functions ever!

Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1391803122-4425-3-git-send-email-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 86e0558..37d35e8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6935,7 +6935,8 @@ void __might_sleep(const char *file, int line, int preempt_offset)
 	static unsigned long prev_jiffy;	/* ratelimiting */
 
 	rcu_sleep_check(); /* WARN_ON_ONCE() by default, no rate limit reqd. */
-	if ((preempt_count_equals(preempt_offset) && !irqs_disabled()) ||
+	if ((preempt_count_equals(preempt_offset) && !irqs_disabled() &&
+	     !is_idle_task(current)) ||
 	    system_state != SYSTEM_RUNNING || oops_in_progress)
 		return;
 	if (time_before(jiffies, prev_jiffy + HZ) && prev_jiffy)

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [tip:sched/core] sched: Better debug output for might sleep
  2014-02-07 19:58 ` [PATCH 3/6] sched: Better debug output for might sleep Sebastian Andrzej Siewior
@ 2014-02-21 21:31   ` tip-bot for Thomas Gleixner
  2014-02-22 18:02   ` [tip:sched/core] sched: Add better debug output for might_sleep() tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 24+ messages in thread
From: tip-bot for Thomas Gleixner @ 2014-02-21 21:31 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, mingo, hpa, mingo, peterz, bigeasy, tglx

Commit-ID:  35062a4a9b448d3367a5ae1edbd09a8d9b881d93
Gitweb:     http://git.kernel.org/tip/35062a4a9b448d3367a5ae1edbd09a8d9b881d93
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Fri, 7 Feb 2014 20:58:39 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 21 Feb 2014 21:43:19 +0100

sched: Better debug output for might sleep

might sleep can tell us where interrupts have been disabled, but we
have no idea what disabled preemption. Add some debug infrastructure.

Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1391803122-4425-4-git-send-email-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/sched.h |  3 +++
 kernel/sched/core.c   | 23 +++++++++++++++++++++--
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index c49a258..825ed83 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1463,6 +1463,9 @@ struct task_struct {
 	struct mutex perf_event_mutex;
 	struct list_head perf_event_list;
 #endif
+#ifdef CONFIG_DEBUG_PREEMPT
+	unsigned long preempt_disable_ip;
+#endif
 #ifdef CONFIG_NUMA
 	struct mempolicy *mempolicy;	/* Protected by alloc_lock */
 	short il_next;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 37d35e8..241f7465 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2501,8 +2501,13 @@ void __kprobes preempt_count_add(int val)
 	DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >=
 				PREEMPT_MASK - 10);
 #endif
-	if (preempt_count() == val)
-		trace_preempt_off(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
+	if (preempt_count() == val) {
+		unsigned long ip = get_parent_ip(CALLER_ADDR1);
+#ifdef CONFIG_DEBUG_PREEMPT
+		current->preempt_disable_ip = ip;
+#endif
+		trace_preempt_off(CALLER_ADDR0, ip);
+	}
 }
 EXPORT_SYMBOL(preempt_count_add);
 
@@ -2545,6 +2550,13 @@ static noinline void __schedule_bug(struct task_struct *prev)
 	print_modules();
 	if (irqs_disabled())
 		print_irqtrace_events(prev);
+#ifdef CONFIG_DEBUG_PREEMPT
+	if (in_atomic_preempt_off()) {
+		pr_err("Preemption disabled at:");
+		print_ip_sym(current->preempt_disable_ip);
+		pr_cont("\n");
+	}
+#endif
 	dump_stack();
 	add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
 }
@@ -6954,6 +6966,13 @@ void __might_sleep(const char *file, int line, int preempt_offset)
 	debug_show_held_locks(current);
 	if (irqs_disabled())
 		print_irqtrace_events(current);
+#ifdef CONFIG_DEBUG_PREEMPT
+	if (!preempt_count_equals(preempt_offset)) {
+		pr_err("Preemption disabled at:");
+		print_ip_sym(current->preempt_disable_ip);
+		pr_cont("\n");
+	}
+#endif
 	dump_stack();
 }
 EXPORT_SYMBOL(__might_sleep);

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [tip:sched/core] sched: Adjust sched_reset_on_fork when nothing else changes
  2014-02-07 19:58 ` [PATCH 4/6] sched: Adjust sched_reset_on_fork when nothing else changes Sebastian Andrzej Siewior
@ 2014-02-21 21:32   ` tip-bot for Thomas Gleixner
  2014-02-22 18:02   ` [tip:sched/core] sched: Adjust p-> " tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 24+ messages in thread
From: tip-bot for Thomas Gleixner @ 2014-02-21 21:32 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, mingo, hpa, mingo, peterz, bigeasy, tglx

Commit-ID:  a296858625f810c28ceb360d448d4db5758ca48a
Gitweb:     http://git.kernel.org/tip/a296858625f810c28ceb360d448d4db5758ca48a
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Fri, 7 Feb 2014 20:58:40 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 21 Feb 2014 21:43:19 +0100

sched: Adjust sched_reset_on_fork when nothing else changes

If the policy and priority remain unchanged a possible modification of
sched_reset_on_fork gets lost in the early exit path.

[bigeasy: rebase ontop of v3.14-rc1]
Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1391803122-4425-5-git-send-email-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 241f7465..7527e68 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3370,7 +3370,8 @@ recheck:
 	}
 
 	/*
-	 * If not changing anything there's no need to proceed further:
+	 * If not changing anything there's no need to proceed further,
+	 * but store a possible modification of reset_on_fork.
 	 */
 	if (unlikely(policy == p->policy)) {
 		if (fair_policy(policy) && attr->sched_nice != task_nice(p))
@@ -3380,6 +3381,7 @@ recheck:
 		if (dl_policy(policy))
 			goto change;
 
+		p->sched_reset_on_fork = reset_on_fork;
 		task_rq_unlock(rq, p, &flags);
 		return 0;
 	}

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [tip:sched/core] sched: Queue RT tasks to head when prio drops
  2014-02-07 19:58 ` [PATCH 5/6] sched: Queue RT tasks to head when prio drops Sebastian Andrzej Siewior
@ 2014-02-21 21:32   ` tip-bot for Thomas Gleixner
  2014-02-22 18:02   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 24+ messages in thread
From: tip-bot for Thomas Gleixner @ 2014-02-21 21:32 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, mingo, hpa, mingo, peterz, bigeasy, tglx

Commit-ID:  410dcf7b5670c224f5bb3179b62642b7182e3486
Gitweb:     http://git.kernel.org/tip/410dcf7b5670c224f5bb3179b62642b7182e3486
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Fri, 7 Feb 2014 20:58:41 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 21 Feb 2014 21:43:19 +0100

sched: Queue RT tasks to head when prio drops

The following scenario does not work correctly:

Runqueue of CPUx contains two runnable and pinned tasks:
 T1: SCHED_FIFO, prio 80
 T2: SCHED_FIFO, prio 80

T1 is on the cpu and executes the following syscalls (classic priority
ceiling scenario):

 sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 90);
 ...
 sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 80);
 ...

Now T1 gets preempted by T3 (SCHED_FIFO, prio 95). After T3 goes back
to sleep the scheduler picks T2. Surprise!

The same happens w/o actual preemption when T1 is forced into the
scheduler due to a sporadic NEED_RESCHED event. The scheduler invokes
pick_next_task() which returns T2. So T1 gets preempted and scheduled
out.

This happens because sched_setscheduler() dequeues T1 from the prio 90
list and then enqueues it on the tail of the prio 80 list behind T2.
This violates the POSIX spec and surprises user space which relies on
the guarantee that SCHED_FIFO tasks are not scheduled out unless they
give the CPU up voluntarily or are preempted by a higher priority
task. In the latter case the preempted task must get back on the CPU
after the preempting task schedules out again.

We fixed a similar issue already in commit 60db48c (sched: Queue a
deboosted task to the head of the RT prio queue). The same treatment
is necessary for sched_setscheduler(). So enqueue to head of the prio
bucket list if the priority of the task is lowered.

It might be possible that existing user space relies on the current
behaviour, but it can be considered highly unlikely due to the corner
case nature of the application scenario.

Cc: stable@vger.kernel.org
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1391803122-4425-6-git-send-email-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/core.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7527e68..a41d239 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3450,8 +3450,13 @@ change:
 
 	if (running)
 		p->sched_class->set_curr_task(rq);
-	if (on_rq)
-		enqueue_task(rq, p, 0);
+	if (on_rq) {
+		/*
+		 * We enqueue to tail when the priority of a task is
+		 * increased (user space view).
+		 */
+		enqueue_task(rq, p, oldprio <= p->prio ? ENQUEUE_HEAD : 0);
+	}
 
 	check_class_changed(rq, p, prev_class, oldprio);
 	task_rq_unlock(rq, p, &flags);

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [tip:sched/core] sched: Consider pi boosting in setscheduler
  2014-02-07 19:58 ` [PATCH 6/6] sched: Consider pi boosting in setscheduler Sebastian Andrzej Siewior
@ 2014-02-21 21:32   ` tip-bot for Thomas Gleixner
  2014-02-22 18:02   ` [tip:sched/core] sched: Consider pi boosting in setscheduler() tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 24+ messages in thread
From: tip-bot for Thomas Gleixner @ 2014-02-21 21:32 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, mingo, hpa, mingo, peterz, raistlin, tglx, bigeasy

Commit-ID:  58b067579d824034dd1fa28b5fe384cd85d7adbc
Gitweb:     http://git.kernel.org/tip/58b067579d824034dd1fa28b5fe384cd85d7adbc
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Fri, 7 Feb 2014 20:58:42 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 21 Feb 2014 21:43:19 +0100

sched: Consider pi boosting in setscheduler

If a PI boosted task policy/priority is modified by a setscheduler()
call we unconditionally dequeue and requeue the task if it is on the
runqueue even if the new priority is lower than the current effective
boosted priority. This can result in undesired reordering of the
priority bucket list.

If the new priority is less or equal than the current effective we
just store the new parameters in the task struct and leave the
scheduler class and the runqueue untouched. This is handled when the
task deboosts itself. Only if the new priority is higher than the
effective boosted priority we apply the change immediately.

[bigeasy: rebase ontop of v3.14-rc1]
Cc: stable@vger.kernel.org
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Dario Faggioli <raistlin@linux.it>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1391803122-4425-7-git-send-email-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/sched/rt.h |  7 +++++++
 kernel/locking/rtmutex.c | 12 ++++++++++++
 kernel/sched/core.c      | 41 ++++++++++++++++++++++++++++++-----------
 3 files changed, 49 insertions(+), 11 deletions(-)

diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h
index f7453d4..6341f5b 100644
--- a/include/linux/sched/rt.h
+++ b/include/linux/sched/rt.h
@@ -18,6 +18,7 @@ static inline int rt_task(struct task_struct *p)
 #ifdef CONFIG_RT_MUTEXES
 extern int rt_mutex_getprio(struct task_struct *p);
 extern void rt_mutex_setprio(struct task_struct *p, int prio);
+extern int rt_mutex_check_prio(struct task_struct *task, int newprio);
 extern struct task_struct *rt_mutex_get_top_task(struct task_struct *task);
 extern void rt_mutex_adjust_pi(struct task_struct *p);
 static inline bool tsk_is_pi_blocked(struct task_struct *tsk)
@@ -29,6 +30,12 @@ static inline int rt_mutex_getprio(struct task_struct *p)
 {
 	return p->normal_prio;
 }
+
+static inline int rt_mutex_check_prio(struct task_struct *task, int newprio)
+{
+	return 0;
+}
+
 static inline struct task_struct *rt_mutex_get_top_task(struct task_struct *task)
 {
 	return NULL;
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 2e960a2..aa4dff0 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -213,6 +213,18 @@ struct task_struct *rt_mutex_get_top_task(struct task_struct *task)
 }
 
 /*
+ * Called by sched_setscheduler() to check whether the priority change
+ * is overruled by a possible priority boosting.
+ */
+int rt_mutex_check_prio(struct task_struct *task, int newprio)
+{
+	if (!task_has_pi_waiters(task))
+		return 0;
+
+	return task_top_pi_waiter(task)->task->prio <= newprio;
+}
+
+/*
  * Adjust the priority of a task, after its pi_waiters got modified.
  *
  * This can be both boosting and unboosting. task->pi_lock must be held.
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a41d239..c1687de 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2910,7 +2910,8 @@ EXPORT_SYMBOL(sleep_on_timeout);
  * This function changes the 'effective' priority of a task. It does
  * not touch ->normal_prio like __setscheduler().
  *
- * Used by the rt_mutex code to implement priority inheritance logic.
+ * Used by the rt_mutex code to implement priority inheritance
+ * logic. Call site only calls if the priority of the task changed.
  */
 void rt_mutex_setprio(struct task_struct *p, int prio)
 {
@@ -3179,9 +3180,8 @@ __setparam_dl(struct task_struct *p, const struct sched_attr *attr)
 	dl_se->dl_new = 1;
 }
 
-/* Actually do priority change: must hold pi & rq lock. */
-static void __setscheduler(struct rq *rq, struct task_struct *p,
-			   const struct sched_attr *attr)
+static void __setscheduler_params(struct task_struct *p,
+		const struct sched_attr *attr)
 {
 	int policy = attr->sched_policy;
 
@@ -3201,9 +3201,14 @@ static void __setscheduler(struct rq *rq, struct task_struct *p,
 	 * getparam()/getattr() don't report silly values for !rt tasks.
 	 */
 	p->rt_priority = attr->sched_priority;
+	set_load_weight(p);
+}
 
-	p->normal_prio = normal_prio(p);
-	p->prio = rt_mutex_getprio(p);
+/* Actually do priority change: must hold pi & rq lock. */
+static void __setscheduler(struct rq *rq, struct task_struct *p,
+			   const struct sched_attr *attr)
+{
+	__setscheduler_params(p, attr);
 
 	if (dl_prio(p->prio))
 		p->sched_class = &dl_sched_class;
@@ -3211,8 +3216,6 @@ static void __setscheduler(struct rq *rq, struct task_struct *p,
 		p->sched_class = &rt_sched_class;
 	else
 		p->sched_class = &fair_sched_class;
-
-	set_load_weight(p);
 }
 
 static void
@@ -3265,6 +3268,7 @@ static int __sched_setscheduler(struct task_struct *p,
 				const struct sched_attr *attr,
 				bool user)
 {
+	int newprio = MAX_RT_PRIO - 1 - attr->sched_priority;
 	int retval, oldprio, oldpolicy = -1, on_rq, running;
 	int policy = attr->sched_policy;
 	unsigned long flags;
@@ -3435,6 +3439,24 @@ change:
 		return -EBUSY;
 	}
 
+	p->sched_reset_on_fork = reset_on_fork;
+	oldprio = p->prio;
+
+	/*
+	 * Special case for priority boosted tasks.
+	 *
+	 * If the new priority is lower or equal (user space view)
+	 * than the current (boosted) priority, we just store the new
+	 * normal parameters and do not touch the scheduler class and
+	 * the runqueue. This will be done when the task deboost
+	 * itself.
+	 */
+	if (rt_mutex_check_prio(p, newprio)) {
+		__setscheduler_params(p, attr);
+		task_rq_unlock(rq, p, &flags);
+		return 0;
+	}
+
 	on_rq = p->on_rq;
 	running = task_current(rq, p);
 	if (on_rq)
@@ -3442,9 +3464,6 @@ change:
 	if (running)
 		p->sched_class->put_prev_task(rq, p);
 
-	p->sched_reset_on_fork = reset_on_fork;
-
-	oldprio = p->prio;
 	prev_class = p->sched_class;
 	__setscheduler(rq, p, attr);
 

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [tip:sched/core] sched: Init idle->on_rq in init_idle()
  2014-02-07 19:58 ` [PATCH 1/6] sched: Init idle->on_rq in init_idle() Sebastian Andrzej Siewior
  2014-02-07 21:09   ` Peter Zijlstra
  2014-02-21 21:31   ` [tip:sched/core] " tip-bot for Thomas Gleixner
@ 2014-02-22 18:01   ` tip-bot for Thomas Gleixner
  2 siblings, 0 replies; 24+ messages in thread
From: tip-bot for Thomas Gleixner @ 2014-02-22 18:01 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, peterz, bigeasy, tglx

Commit-ID:  77177856e3bf39d435b3ae4bfd164ca3c8cd4577
Gitweb:     http://git.kernel.org/tip/77177856e3bf39d435b3ae4bfd164ca3c8cd4577
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Fri, 7 Feb 2014 20:58:37 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 22 Feb 2014 18:07:36 +0100

sched: Init idle->on_rq in init_idle()

We stumbled in RT over a SMP bringup issue on ARM where the
idle->on_rq == 0 was causing try_to_wakeup() on the other cpu to run
into nada land.

After adding that idle->on_rq = 1; I was able to find the root cause
of the lockup: the idle task on the newly woken up cpu was fiddling
with a sleeping spinlock, which is a nono.

I kept the init of idle->on_rq to keep the state consistent and to
avoid another long lasting debug session.

As a side note, the whole debug mess could have been avoided if
might_sleep() would have yelled when called from the idle task. That's
fixed with patch 2/6 - and that one actually has a changelog :)

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1391803122-4425-2-git-send-email-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 49db434..06da865 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4443,6 +4443,7 @@ void init_idle(struct task_struct *idle, int cpu)
 	rcu_read_unlock();
 
 	rq->curr = rq->idle = idle;
+	idle->on_rq = 1;
 #if defined(CONFIG_SMP)
 	idle->on_cpu = 1;
 #endif

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [tip:sched/core] sched: Check for idle task in might_sleep()
  2014-02-07 19:58 ` [PATCH 2/6] sched: Check for idle task in might_sleep() Sebastian Andrzej Siewior
  2014-02-21 21:31   ` [tip:sched/core] " tip-bot for Thomas Gleixner
@ 2014-02-22 18:02   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 24+ messages in thread
From: tip-bot for Thomas Gleixner @ 2014-02-22 18:02 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, peterz, bigeasy, tglx

Commit-ID:  db273be2a7d42f92b3471e0f717982928214a650
Gitweb:     http://git.kernel.org/tip/db273be2a7d42f92b3471e0f717982928214a650
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Fri, 7 Feb 2014 20:58:38 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 22 Feb 2014 18:07:50 +0100

sched: Check for idle task in might_sleep()

Idle is not allowed to call sleeping functions ever!

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1391803122-4425-3-git-send-email-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 06da865..a01fe6c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6927,7 +6927,8 @@ void __might_sleep(const char *file, int line, int preempt_offset)
 	static unsigned long prev_jiffy;	/* ratelimiting */
 
 	rcu_sleep_check(); /* WARN_ON_ONCE() by default, no rate limit reqd. */
-	if ((preempt_count_equals(preempt_offset) && !irqs_disabled()) ||
+	if ((preempt_count_equals(preempt_offset) && !irqs_disabled() &&
+	     !is_idle_task(current)) ||
 	    system_state != SYSTEM_RUNNING || oops_in_progress)
 		return;
 	if (time_before(jiffies, prev_jiffy + HZ) && prev_jiffy)

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [tip:sched/core] sched: Add better debug output for might_sleep()
  2014-02-07 19:58 ` [PATCH 3/6] sched: Better debug output for might sleep Sebastian Andrzej Siewior
  2014-02-21 21:31   ` [tip:sched/core] " tip-bot for Thomas Gleixner
@ 2014-02-22 18:02   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 24+ messages in thread
From: tip-bot for Thomas Gleixner @ 2014-02-22 18:02 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, peterz, bigeasy, tglx

Commit-ID:  8f47b1871b8aac98f1a9d93bc3467fb97b65199a
Gitweb:     http://git.kernel.org/tip/8f47b1871b8aac98f1a9d93bc3467fb97b65199a
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Fri, 7 Feb 2014 20:58:39 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 22 Feb 2014 18:08:08 +0100

sched: Add better debug output for might_sleep()

might_sleep() can tell us where interrupts have been disabled, but we
have no idea what disabled preemption. Add some debug infrastructure.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1391803122-4425-4-git-send-email-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/sched.h |  3 +++
 kernel/sched/core.c   | 23 +++++++++++++++++++++--
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index c49a258..825ed83 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1463,6 +1463,9 @@ struct task_struct {
 	struct mutex perf_event_mutex;
 	struct list_head perf_event_list;
 #endif
+#ifdef CONFIG_DEBUG_PREEMPT
+	unsigned long preempt_disable_ip;
+#endif
 #ifdef CONFIG_NUMA
 	struct mempolicy *mempolicy;	/* Protected by alloc_lock */
 	short il_next;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a01fe6c..c94e851 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2501,8 +2501,13 @@ void __kprobes preempt_count_add(int val)
 	DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >=
 				PREEMPT_MASK - 10);
 #endif
-	if (preempt_count() == val)
-		trace_preempt_off(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
+	if (preempt_count() == val) {
+		unsigned long ip = get_parent_ip(CALLER_ADDR1);
+#ifdef CONFIG_DEBUG_PREEMPT
+		current->preempt_disable_ip = ip;
+#endif
+		trace_preempt_off(CALLER_ADDR0, ip);
+	}
 }
 EXPORT_SYMBOL(preempt_count_add);
 
@@ -2545,6 +2550,13 @@ static noinline void __schedule_bug(struct task_struct *prev)
 	print_modules();
 	if (irqs_disabled())
 		print_irqtrace_events(prev);
+#ifdef CONFIG_DEBUG_PREEMPT
+	if (in_atomic_preempt_off()) {
+		pr_err("Preemption disabled at:");
+		print_ip_sym(current->preempt_disable_ip);
+		pr_cont("\n");
+	}
+#endif
 	dump_stack();
 	add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
 }
@@ -6946,6 +6958,13 @@ void __might_sleep(const char *file, int line, int preempt_offset)
 	debug_show_held_locks(current);
 	if (irqs_disabled())
 		print_irqtrace_events(current);
+#ifdef CONFIG_DEBUG_PREEMPT
+	if (!preempt_count_equals(preempt_offset)) {
+		pr_err("Preemption disabled at:");
+		print_ip_sym(current->preempt_disable_ip);
+		pr_cont("\n");
+	}
+#endif
 	dump_stack();
 }
 EXPORT_SYMBOL(__might_sleep);

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [tip:sched/core] sched: Adjust p-> sched_reset_on_fork when nothing else changes
  2014-02-07 19:58 ` [PATCH 4/6] sched: Adjust sched_reset_on_fork when nothing else changes Sebastian Andrzej Siewior
  2014-02-21 21:32   ` [tip:sched/core] " tip-bot for Thomas Gleixner
@ 2014-02-22 18:02   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 24+ messages in thread
From: tip-bot for Thomas Gleixner @ 2014-02-22 18:02 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, peterz, bigeasy, tglx

Commit-ID:  d6b1e9119787fd2e31dcf0f0ce90b71197604206
Gitweb:     http://git.kernel.org/tip/d6b1e9119787fd2e31dcf0f0ce90b71197604206
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Fri, 7 Feb 2014 20:58:40 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 22 Feb 2014 18:08:43 +0100

sched: Adjust p->sched_reset_on_fork when nothing else changes

If the policy and priority remain unchanged a possible modification of
p->sched_reset_on_fork gets lost in the early exit path.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ Rebase ontop of v3.14-rc1. ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1391803122-4425-5-git-send-email-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c94e851..771eb87 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3362,7 +3362,8 @@ recheck:
 	}
 
 	/*
-	 * If not changing anything there's no need to proceed further:
+	 * If not changing anything there's no need to proceed further,
+	 * but store a possible modification of reset_on_fork.
 	 */
 	if (unlikely(policy == p->policy)) {
 		if (fair_policy(policy) && attr->sched_nice != task_nice(p))
@@ -3372,6 +3373,7 @@ recheck:
 		if (dl_policy(policy))
 			goto change;
 
+		p->sched_reset_on_fork = reset_on_fork;
 		task_rq_unlock(rq, p, &flags);
 		return 0;
 	}

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [tip:sched/core] sched: Queue RT tasks to head when prio drops
  2014-02-07 19:58 ` [PATCH 5/6] sched: Queue RT tasks to head when prio drops Sebastian Andrzej Siewior
  2014-02-21 21:32   ` [tip:sched/core] " tip-bot for Thomas Gleixner
@ 2014-02-22 18:02   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 24+ messages in thread
From: tip-bot for Thomas Gleixner @ 2014-02-22 18:02 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, peterz, bigeasy, tglx

Commit-ID:  81a44c5441d7f7d2c3dc9105f4d65ad0d5818617
Gitweb:     http://git.kernel.org/tip/81a44c5441d7f7d2c3dc9105f4d65ad0d5818617
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Fri, 7 Feb 2014 20:58:41 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 22 Feb 2014 18:09:41 +0100

sched: Queue RT tasks to head when prio drops

The following scenario does not work correctly:

Runqueue of CPUx contains two runnable and pinned tasks:

 T1: SCHED_FIFO, prio 80
 T2: SCHED_FIFO, prio 80

T1 is on the cpu and executes the following syscalls (classic priority
ceiling scenario):

 sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 90);
 ...
 sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 80);
 ...

Now T1 gets preempted by T3 (SCHED_FIFO, prio 95). After T3 goes back
to sleep the scheduler picks T2. Surprise!

The same happens w/o actual preemption when T1 is forced into the
scheduler due to a sporadic NEED_RESCHED event. The scheduler invokes
pick_next_task() which returns T2. So T1 gets preempted and scheduled
out.

This happens because sched_setscheduler() dequeues T1 from the prio 90
list and then enqueues it on the tail of the prio 80 list behind T2.
This violates the POSIX spec and surprises user space which relies on
the guarantee that SCHED_FIFO tasks are not scheduled out unless they
give the CPU up voluntarily or are preempted by a higher priority
task. In the latter case the preempted task must get back on the CPU
after the preempting task schedules out again.

We fixed a similar issue already in commit 60db48c (sched: Queue a
deboosted task to the head of the RT prio queue). The same treatment
is necessary for sched_setscheduler(). So enqueue to head of the prio
bucket list if the priority of the task is lowered.

It might be possible that existing user space relies on the current
behaviour, but it can be considered highly unlikely due to the corner
case nature of the application scenario.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1391803122-4425-6-git-send-email-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 771eb87..9c2fcbf 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3442,8 +3442,13 @@ change:
 
 	if (running)
 		p->sched_class->set_curr_task(rq);
-	if (on_rq)
-		enqueue_task(rq, p, 0);
+	if (on_rq) {
+		/*
+		 * We enqueue to tail when the priority of a task is
+		 * increased (user space view).
+		 */
+		enqueue_task(rq, p, oldprio <= p->prio ? ENQUEUE_HEAD : 0);
+	}
 
 	check_class_changed(rq, p, prev_class, oldprio);
 	task_rq_unlock(rq, p, &flags);

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [tip:sched/core] sched: Consider pi boosting in setscheduler()
  2014-02-07 19:58 ` [PATCH 6/6] sched: Consider pi boosting in setscheduler Sebastian Andrzej Siewior
  2014-02-21 21:32   ` [tip:sched/core] " tip-bot for Thomas Gleixner
@ 2014-02-22 18:02   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 24+ messages in thread
From: tip-bot for Thomas Gleixner @ 2014-02-22 18:02 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, peterz, raistlin, bigeasy, tglx

Commit-ID:  c365c292d05908c6ea6f32708f331e21033fe71d
Gitweb:     http://git.kernel.org/tip/c365c292d05908c6ea6f32708f331e21033fe71d
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Fri, 7 Feb 2014 20:58:42 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 22 Feb 2014 18:10:04 +0100

sched: Consider pi boosting in setscheduler()

If a PI boosted task policy/priority is modified by a setscheduler()
call we unconditionally dequeue and requeue the task if it is on the
runqueue even if the new priority is lower than the current effective
boosted priority. This can result in undesired reordering of the
priority bucket list.

If the new priority is less or equal than the current effective we
just store the new parameters in the task struct and leave the
scheduler class and the runqueue untouched. This is handled when the
task deboosts itself. Only if the new priority is higher than the
effective boosted priority we apply the change immediately.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ Rebase ontop of v3.14-rc1. ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Dario Faggioli <raistlin@linux.it>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1391803122-4425-7-git-send-email-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/sched/rt.h |  7 +++++++
 kernel/locking/rtmutex.c | 12 ++++++++++++
 kernel/sched/core.c      | 41 ++++++++++++++++++++++++++++++-----------
 3 files changed, 49 insertions(+), 11 deletions(-)

diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h
index f7453d4..6341f5b 100644
--- a/include/linux/sched/rt.h
+++ b/include/linux/sched/rt.h
@@ -18,6 +18,7 @@ static inline int rt_task(struct task_struct *p)
 #ifdef CONFIG_RT_MUTEXES
 extern int rt_mutex_getprio(struct task_struct *p);
 extern void rt_mutex_setprio(struct task_struct *p, int prio);
+extern int rt_mutex_check_prio(struct task_struct *task, int newprio);
 extern struct task_struct *rt_mutex_get_top_task(struct task_struct *task);
 extern void rt_mutex_adjust_pi(struct task_struct *p);
 static inline bool tsk_is_pi_blocked(struct task_struct *tsk)
@@ -29,6 +30,12 @@ static inline int rt_mutex_getprio(struct task_struct *p)
 {
 	return p->normal_prio;
 }
+
+static inline int rt_mutex_check_prio(struct task_struct *task, int newprio)
+{
+	return 0;
+}
+
 static inline struct task_struct *rt_mutex_get_top_task(struct task_struct *task)
 {
 	return NULL;
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 2e960a2..aa4dff0 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -213,6 +213,18 @@ struct task_struct *rt_mutex_get_top_task(struct task_struct *task)
 }
 
 /*
+ * Called by sched_setscheduler() to check whether the priority change
+ * is overruled by a possible priority boosting.
+ */
+int rt_mutex_check_prio(struct task_struct *task, int newprio)
+{
+	if (!task_has_pi_waiters(task))
+		return 0;
+
+	return task_top_pi_waiter(task)->task->prio <= newprio;
+}
+
+/*
  * Adjust the priority of a task, after its pi_waiters got modified.
  *
  * This can be both boosting and unboosting. task->pi_lock must be held.
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9c2fcbf..003263b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2902,7 +2902,8 @@ EXPORT_SYMBOL(sleep_on_timeout);
  * This function changes the 'effective' priority of a task. It does
  * not touch ->normal_prio like __setscheduler().
  *
- * Used by the rt_mutex code to implement priority inheritance logic.
+ * Used by the rt_mutex code to implement priority inheritance
+ * logic. Call site only calls if the priority of the task changed.
  */
 void rt_mutex_setprio(struct task_struct *p, int prio)
 {
@@ -3171,9 +3172,8 @@ __setparam_dl(struct task_struct *p, const struct sched_attr *attr)
 	dl_se->dl_new = 1;
 }
 
-/* Actually do priority change: must hold pi & rq lock. */
-static void __setscheduler(struct rq *rq, struct task_struct *p,
-			   const struct sched_attr *attr)
+static void __setscheduler_params(struct task_struct *p,
+		const struct sched_attr *attr)
 {
 	int policy = attr->sched_policy;
 
@@ -3193,9 +3193,14 @@ static void __setscheduler(struct rq *rq, struct task_struct *p,
 	 * getparam()/getattr() don't report silly values for !rt tasks.
 	 */
 	p->rt_priority = attr->sched_priority;
+	set_load_weight(p);
+}
 
-	p->normal_prio = normal_prio(p);
-	p->prio = rt_mutex_getprio(p);
+/* Actually do priority change: must hold pi & rq lock. */
+static void __setscheduler(struct rq *rq, struct task_struct *p,
+			   const struct sched_attr *attr)
+{
+	__setscheduler_params(p, attr);
 
 	if (dl_prio(p->prio))
 		p->sched_class = &dl_sched_class;
@@ -3203,8 +3208,6 @@ static void __setscheduler(struct rq *rq, struct task_struct *p,
 		p->sched_class = &rt_sched_class;
 	else
 		p->sched_class = &fair_sched_class;
-
-	set_load_weight(p);
 }
 
 static void
@@ -3257,6 +3260,7 @@ static int __sched_setscheduler(struct task_struct *p,
 				const struct sched_attr *attr,
 				bool user)
 {
+	int newprio = MAX_RT_PRIO - 1 - attr->sched_priority;
 	int retval, oldprio, oldpolicy = -1, on_rq, running;
 	int policy = attr->sched_policy;
 	unsigned long flags;
@@ -3427,6 +3431,24 @@ change:
 		return -EBUSY;
 	}
 
+	p->sched_reset_on_fork = reset_on_fork;
+	oldprio = p->prio;
+
+	/*
+	 * Special case for priority boosted tasks.
+	 *
+	 * If the new priority is lower or equal (user space view)
+	 * than the current (boosted) priority, we just store the new
+	 * normal parameters and do not touch the scheduler class and
+	 * the runqueue. This will be done when the task deboost
+	 * itself.
+	 */
+	if (rt_mutex_check_prio(p, newprio)) {
+		__setscheduler_params(p, attr);
+		task_rq_unlock(rq, p, &flags);
+		return 0;
+	}
+
 	on_rq = p->on_rq;
 	running = task_current(rq, p);
 	if (on_rq)
@@ -3434,9 +3456,6 @@ change:
 	if (running)
 		p->sched_class->put_prev_task(rq, p);
 
-	p->sched_reset_on_fork = reset_on_fork;
-
-	oldprio = p->prio;
 	prev_class = p->sched_class;
 	__setscheduler(rq, p, attr);
 

^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2014-02-22 18:03 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-07 19:58 A pile of sched patches Sebastian Andrzej Siewior
2014-02-07 19:58 ` [PATCH 1/6] sched: Init idle->on_rq in init_idle() Sebastian Andrzej Siewior
2014-02-07 21:09   ` Peter Zijlstra
2014-02-11  9:17     ` [PATCH 1/6 v2] " Sebastian Andrzej Siewior
2014-02-11  9:21       ` Peter Zijlstra
2014-02-11 15:34         ` Thomas Gleixner
2014-02-11 15:51           ` Peter Zijlstra
2014-02-21 21:31   ` [tip:sched/core] " tip-bot for Thomas Gleixner
2014-02-22 18:01   ` tip-bot for Thomas Gleixner
2014-02-07 19:58 ` [PATCH 2/6] sched: Check for idle task in might_sleep() Sebastian Andrzej Siewior
2014-02-21 21:31   ` [tip:sched/core] " tip-bot for Thomas Gleixner
2014-02-22 18:02   ` tip-bot for Thomas Gleixner
2014-02-07 19:58 ` [PATCH 3/6] sched: Better debug output for might sleep Sebastian Andrzej Siewior
2014-02-21 21:31   ` [tip:sched/core] " tip-bot for Thomas Gleixner
2014-02-22 18:02   ` [tip:sched/core] sched: Add better debug output for might_sleep() tip-bot for Thomas Gleixner
2014-02-07 19:58 ` [PATCH 4/6] sched: Adjust sched_reset_on_fork when nothing else changes Sebastian Andrzej Siewior
2014-02-21 21:32   ` [tip:sched/core] " tip-bot for Thomas Gleixner
2014-02-22 18:02   ` [tip:sched/core] sched: Adjust p-> " tip-bot for Thomas Gleixner
2014-02-07 19:58 ` [PATCH 5/6] sched: Queue RT tasks to head when prio drops Sebastian Andrzej Siewior
2014-02-21 21:32   ` [tip:sched/core] " tip-bot for Thomas Gleixner
2014-02-22 18:02   ` tip-bot for Thomas Gleixner
2014-02-07 19:58 ` [PATCH 6/6] sched: Consider pi boosting in setscheduler Sebastian Andrzej Siewior
2014-02-21 21:32   ` [tip:sched/core] " tip-bot for Thomas Gleixner
2014-02-22 18:02   ` [tip:sched/core] sched: Consider pi boosting in setscheduler() tip-bot for Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.