LKML Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] sched: Handle priority boosted tasks proper in setscheduler()
@ 2015-05-05 16:08 Thomas Gleixner
  2015-05-05 16:29 ` Steven Rostedt
  2015-05-08 13:18 ` [tip:sched/core] " tip-bot for Thomas Gleixner
  0 siblings, 2 replies; 9+ messages in thread
From: Thomas Gleixner @ 2015-05-05 16:08 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Mike Galbraith, Ronny Meeus, LKML, Peter Zijlstra

Ronny reported that the following scenario is not handled correctly:

T1 (prio = 10)
   lock(rtmutex);

T2 (prio = 20)
   lock(rtmutex)
      boost T1

T1 (prio = 20)
   sys_set_scheduler(prio = 30)
   T1 prio = 30
   ....
   sys_set_scheduler(prio = 10)
   T1 prio = 30

The last step is wrong as T1 should now be back at prio 20.

commit c365c292d0590 "sched: Consider pi boosting in setscheduler()"
only handles the case where a boosted tasks tries to lower its
priority.

Fix it by taking the new effective priority into account for the
decision whether a change of the priority is required.

Reported-by: Ronny Meeus <ronny.meeus@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/locking/rtmutex.c |   10 ++++++----
 kernel/sched/core.c      |   11 +++++------
 2 files changed, 11 insertions(+), 10 deletions(-)

Index: tip/kernel/locking/rtmutex.c
===================================================================
--- tip.orig/kernel/locking/rtmutex.c
+++ tip/kernel/locking/rtmutex.c
@@ -265,15 +265,17 @@ struct task_struct *rt_mutex_get_top_tas
 }
 
 /*
- * Called by sched_setscheduler() to check whether the priority change
- * is overruled by a possible priority boosting.
+ * Called by sched_setscheduler() to get the priority which will be
+ * effective after the change.
  */
 int rt_mutex_check_prio(struct task_struct *task, int newprio)
 {
 	if (!task_has_pi_waiters(task))
-		return 0;
+		return newprio;
 
-	return task_top_pi_waiter(task)->task->prio <= newprio;
+	if (task_top_pi_waiter(task)->task->prio <= newprio)
+		return task_top_pi_waiter(task)->task->prio;
+	return newprio;
 }
 
 /*
Index: tip/kernel/sched/core.c
===================================================================
--- tip.orig/kernel/sched/core.c
+++ tip/kernel/sched/core.c
@@ -3414,7 +3414,7 @@ static int __sched_setscheduler(struct t
 	int newprio = dl_policy(attr->sched_policy) ? MAX_DL_PRIO - 1 :
 		      MAX_RT_PRIO - 1 - attr->sched_priority;
 	int retval, oldprio, oldpolicy = -1, queued, running;
-	int policy = attr->sched_policy;
+	int new_effective_prio, policy = attr->sched_policy;
 	unsigned long flags;
 	const struct sched_class *prev_class;
 	struct rq *rq;
@@ -3596,15 +3596,14 @@ change:
 	oldprio = p->prio;
 
 	/*
-	 * Special case for priority boosted tasks.
-	 *
-	 * If the new priority is lower or equal (user space view)
-	 * than the current (boosted) priority, we just store the new
+	 * Take priority boosted tasks into account. If the new
+	 * effective priority is unchanged, we just store the new
 	 * normal parameters and do not touch the scheduler class and
 	 * the runqueue. This will be done when the task deboost
 	 * itself.
 	 */
-	if (rt_mutex_check_prio(p, newprio)) {
+	new_effective_prio = rt_mutex_check_prio(p, newprio);
+	if (new_effective_prio == oldprio) {
 		__setscheduler_params(p, attr);
 		task_rq_unlock(rq, p, &flags);
 		return 0;

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched: Handle priority boosted tasks proper in setscheduler()
  2015-05-05 16:08 [PATCH] sched: Handle priority boosted tasks proper in setscheduler() Thomas Gleixner
@ 2015-05-05 16:29 ` Steven Rostedt
  2015-05-05 16:31   ` Thomas Gleixner
  2015-05-08 13:18 ` [tip:sched/core] " tip-bot for Thomas Gleixner
  1 sibling, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2015-05-05 16:29 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Mike Galbraith, Ronny Meeus, LKML, Peter Zijlstra

On Tue, 5 May 2015 18:08:01 +0200 (CEST)
Thomas Gleixner <tglx@linutronix.de> wrote:

> Reported-by: Ronny Meeus <ronny.meeus@gmail.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  kernel/locking/rtmutex.c |   10 ++++++----
>  kernel/sched/core.c      |   11 +++++------
>  2 files changed, 11 insertions(+), 10 deletions(-)
> 
> Index: tip/kernel/locking/rtmutex.c
> ===================================================================
> --- tip.orig/kernel/locking/rtmutex.c
> +++ tip/kernel/locking/rtmutex.c
> @@ -265,15 +265,17 @@ struct task_struct *rt_mutex_get_top_tas
>  }
>  
>  /*
> - * Called by sched_setscheduler() to check whether the priority change
> - * is overruled by a possible priority boosting.
> + * Called by sched_setscheduler() to get the priority which will be
> + * effective after the change.
>   */
>  int rt_mutex_check_prio(struct task_struct *task, int newprio)
>  {
>  	if (!task_has_pi_waiters(task))
> -		return 0;
> +		return newprio;
>  
> -	return task_top_pi_waiter(task)->task->prio <= newprio;
> +	if (task_top_pi_waiter(task)->task->prio <= newprio)
> +		return task_top_pi_waiter(task)->task->prio;
> +	return newprio;
>  }
>  
>  /*
> Index: tip/kernel/sched/core.c
> ===================================================================
> --- tip.orig/kernel/sched/core.c
> +++ tip/kernel/sched/core.c
> @@ -3414,7 +3414,7 @@ static int __sched_setscheduler(struct t
>  	int newprio = dl_policy(attr->sched_policy) ? MAX_DL_PRIO - 1 :
>  		      MAX_RT_PRIO - 1 - attr->sched_priority;
>  	int retval, oldprio, oldpolicy = -1, queued, running;
> -	int policy = attr->sched_policy;
> +	int new_effective_prio, policy = attr->sched_policy;
>  	unsigned long flags;
>  	const struct sched_class *prev_class;
>  	struct rq *rq;
> @@ -3596,15 +3596,14 @@ change:
>  	oldprio = p->prio;
>  
>  	/*
> -	 * Special case for priority boosted tasks.
> -	 *
> -	 * If the new priority is lower or equal (user space view)
> -	 * than the current (boosted) priority, we just store the new
> +	 * Take priority boosted tasks into account. If the new
> +	 * effective priority is unchanged, we just store the new
>  	 * normal parameters and do not touch the scheduler class and
>  	 * the runqueue. This will be done when the task deboost
>  	 * itself.
>  	 */
> -	if (rt_mutex_check_prio(p, newprio)) {
> +	new_effective_prio = rt_mutex_check_prio(p, newprio);
> +	if (new_effective_prio == oldprio) {

When I first heard of this problem, I started writing code to fix this
and came up with pretty much the exact same answer.

I got pulled onto other things so I never finished it, but one thing
that worried me about this fix is this:

	T1 - FIFO policy (prio = 10)
	  lock(rtmutex);

	T2 (prio = 20)
	  lock(rtmutex)
	    boost T1 (prio = 20)

        TI (prio = 20)
	  sys_sched_setscheduler(prio = 30)
	  TI (prio = 30)

	T1 (prio = 30)
	  sys_sched_setscheduler(SCHED_OTHER)
	  new_effective_prio = 20, oldprio = 30

Before the code stopped at the rt_mutex_check_prio(), but now it
continues. Will having the policy change cause problems here?

-- Steve


>  		__setscheduler_params(p, attr);
>  		task_rq_unlock(rq, p, &flags);
>  		return 0;


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched: Handle priority boosted tasks proper in setscheduler()
  2015-05-05 16:29 ` Steven Rostedt
@ 2015-05-05 16:31   ` Thomas Gleixner
  2015-05-05 16:42     ` Steven Rostedt
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2015-05-05 16:31 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Mike Galbraith, Ronny Meeus, LKML, Peter Zijlstra

On Tue, 5 May 2015, Steven Rostedt wrote:
> I got pulled onto other things so I never finished it, but one thing
> that worried me about this fix is this:
> 
> 	T1 - FIFO policy (prio = 10)
> 	  lock(rtmutex);
> 
> 	T2 (prio = 20)
> 	  lock(rtmutex)
> 	    boost T1 (prio = 20)
> 
>         TI (prio = 20)
> 	  sys_sched_setscheduler(prio = 30)
> 	  TI (prio = 30)
> 
> 	T1 (prio = 30)
> 	  sys_sched_setscheduler(SCHED_OTHER)
> 	  new_effective_prio = 20, oldprio = 30
> 
> Before the code stopped at the rt_mutex_check_prio(), but now it
> continues. Will having the policy change cause problems here?

No, because it stays effective in the FIFO domain.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched: Handle priority boosted tasks proper in setscheduler()
  2015-05-05 16:31   ` Thomas Gleixner
@ 2015-05-05 16:42     ` Steven Rostedt
  2015-05-05 16:50       ` Thomas Gleixner
  0 siblings, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2015-05-05 16:42 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Mike Galbraith, Ronny Meeus, LKML, Peter Zijlstra

On Tue, 5 May 2015 18:31:20 +0200 (CEST)
Thomas Gleixner <tglx@linutronix.de> wrote:

> On Tue, 5 May 2015, Steven Rostedt wrote:
> > I got pulled onto other things so I never finished it, but one thing
> > that worried me about this fix is this:
> > 
> > 	T1 - FIFO policy (prio = 10)
> > 	  lock(rtmutex);
> > 
> > 	T2 (prio = 20)
> > 	  lock(rtmutex)
> > 	    boost T1 (prio = 20)
> > 
> >         TI (prio = 20)
> > 	  sys_sched_setscheduler(prio = 30)
> > 	  TI (prio = 30)
> > 
> > 	T1 (prio = 30)
> > 	  sys_sched_setscheduler(SCHED_OTHER)
> > 	  new_effective_prio = 20, oldprio = 30
> > 
> > Before the code stopped at the rt_mutex_check_prio(), but now it
> > continues. Will having the policy change cause problems here?
> 
> No, because it stays effective in the FIFO domain.
> 

Ah, the policy passed in isn't used, so we are safe. But, but I also
found that we still call __setscheduler(), which does:

	p->prio = normal_prio();

Isn't that going to null out the boosting?

-- Steve

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched: Handle priority boosted tasks proper in setscheduler()
  2015-05-05 16:42     ` Steven Rostedt
@ 2015-05-05 16:50       ` Thomas Gleixner
  2015-05-05 17:01         ` Steven Rostedt
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2015-05-05 16:50 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Mike Galbraith, Ronny Meeus, LKML, Peter Zijlstra

On Tue, 5 May 2015, Steven Rostedt wrote:
> On Tue, 5 May 2015 18:31:20 +0200 (CEST)
> Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> > On Tue, 5 May 2015, Steven Rostedt wrote:
> > > I got pulled onto other things so I never finished it, but one thing
> > > that worried me about this fix is this:
> > > 
> > > 	T1 - FIFO policy (prio = 10)
> > > 	  lock(rtmutex);
> > > 
> > > 	T2 (prio = 20)
> > > 	  lock(rtmutex)
> > > 	    boost T1 (prio = 20)
> > > 
> > >         TI (prio = 20)
> > > 	  sys_sched_setscheduler(prio = 30)
> > > 	  TI (prio = 30)
> > > 
> > > 	T1 (prio = 30)
> > > 	  sys_sched_setscheduler(SCHED_OTHER)
> > > 	  new_effective_prio = 20, oldprio = 30
> > > 
> > > Before the code stopped at the rt_mutex_check_prio(), but now it
> > > continues. Will having the policy change cause problems here?
> > 
> > No, because it stays effective in the FIFO domain.
> > 
> 
> Ah, the policy passed in isn't used, so we are safe. But, but I also
> found that we still call __setscheduler(), which does:
> 
> 	p->prio = normal_prio();
> 
> Isn't that going to null out the boosting?

Crap. Yes, I missed that. So __setscheduler() assumes that there is no
boosting going on. So we need:

   p->prio = effective_prio(p);

there instead.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched: Handle priority boosted tasks proper in setscheduler()
  2015-05-05 16:50       ` Thomas Gleixner
@ 2015-05-05 17:01         ` Steven Rostedt
  2015-05-05 17:49           ` [PATCH V2] " Thomas Gleixner
  0 siblings, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2015-05-05 17:01 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Mike Galbraith, Ronny Meeus, LKML, Peter Zijlstra

On Tue, 5 May 2015 18:50:17 +0200 (CEST)
Thomas Gleixner <tglx@linutronix.de> wrote:

> Crap. Yes, I missed that. So __setscheduler() assumes that there is no
> boosting going on. So we need:
> 
>    p->prio = effective_prio(p);
> 
> there instead.

Of course then we need to do something about normalize_task() (for
sysrq), which depends on it being normal_prio().

-- Steve

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH V2] sched: Handle priority boosted tasks proper in setscheduler()
  2015-05-05 17:01         ` Steven Rostedt
@ 2015-05-05 17:49           ` Thomas Gleixner
  2015-05-05 20:20             ` Steven Rostedt
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2015-05-05 17:49 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Mike Galbraith, Ronny Meeus, LKML, Peter Zijlstra

Ronny reported that the following scenario is not handled correctly:

T1 (prio = 10)
   lock(rtmutex);

T2 (prio = 20)
   lock(rtmutex)
      boost T1

T1 (prio = 20)
   sys_set_scheduler(prio = 30)
   T1 prio = 30
   ....
   sys_set_scheduler(prio = 10)
   T1 prio = 30

The last step is wrong as T1 should now be back at prio 20.

commit c365c292d0590 "sched: Consider pi boosting in setscheduler()"
only handles the case where a boosted tasks tries to lower its
priority.

Fix it by taking the new effective priority into account for the
decision whether a change of the priority is required.

Reported-by: Ronny Meeus <ronny.meeus@gmail.com>
Fixes: commit c365c292d0590 "sched: Consider pi boosting in setscheduler()"
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1505051806060.4225@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---

V2: Prevent __set_scheduler() from stomping over p->prio (pointed out
    by Steven) and fix the !RT_MUTEX case

 include/linux/sched/rt.h |    7 ++++---
 kernel/locking/rtmutex.c |   12 +++++++-----
 kernel/sched/core.c      |   26 ++++++++++++++------------
 3 files changed, 25 insertions(+), 20 deletions(-)

Index: linux/include/linux/sched/rt.h
===================================================================
--- linux.orig/include/linux/sched/rt.h
+++ linux/include/linux/sched/rt.h
@@ -18,7 +18,7 @@ static inline int rt_task(struct task_st
 #ifdef CONFIG_RT_MUTEXES
 extern int rt_mutex_getprio(struct task_struct *p);
 extern void rt_mutex_setprio(struct task_struct *p, int prio);
-extern int rt_mutex_check_prio(struct task_struct *task, int newprio);
+extern int rt_mutex_get_effective_prio(struct task_struct *task, int newprio);
 extern struct task_struct *rt_mutex_get_top_task(struct task_struct *task);
 extern void rt_mutex_adjust_pi(struct task_struct *p);
 static inline bool tsk_is_pi_blocked(struct task_struct *tsk)
@@ -31,9 +31,10 @@ static inline int rt_mutex_getprio(struc
 	return p->normal_prio;
 }
 
-static inline int rt_mutex_check_prio(struct task_struct *task, int newprio)
+static inline int rt_mutex_get_effective_prio(struct task_struct *task,
+					      int newprio)
 {
-	return 0;
+	return newprio;
 }
 
 static inline struct task_struct *rt_mutex_get_top_task(struct task_struct *task)
Index: linux/kernel/locking/rtmutex.c
===================================================================
--- linux.orig/kernel/locking/rtmutex.c
+++ linux/kernel/locking/rtmutex.c
@@ -265,15 +265,17 @@ struct task_struct *rt_mutex_get_top_tas
 }
 
 /*
- * Called by sched_setscheduler() to check whether the priority change
- * is overruled by a possible priority boosting.
+ * Called by sched_setscheduler() to get the priority which will be
+ * effective after the change.
  */
-int rt_mutex_check_prio(struct task_struct *task, int newprio)
+int rt_mutex_get_effective_prio(struct task_struct *task, int newprio)
 {
 	if (!task_has_pi_waiters(task))
-		return 0;
+		return newprio;
 
-	return task_top_pi_waiter(task)->task->prio <= newprio;
+	if (task_top_pi_waiter(task)->task->prio <= newprio)
+		return task_top_pi_waiter(task)->task->prio;
+	return newprio;
 }
 
 /*
Index: linux/kernel/sched/core.c
===================================================================
--- linux.orig/kernel/sched/core.c
+++ linux/kernel/sched/core.c
@@ -3300,15 +3300,18 @@ static void __setscheduler_params(struct
 
 /* Actually do priority change: must hold pi & rq lock. */
 static void __setscheduler(struct rq *rq, struct task_struct *p,
-			   const struct sched_attr *attr)
+			   const struct sched_attr *attr, bool keep_boost)
 {
 	__setscheduler_params(p, attr);
 
 	/*
-	 * If we get here, there was no pi waiters boosting the
-	 * task. It is safe to use the normal prio.
+	 * Keep a potential priority boosting if called from
+	 * sched_setscheduler().
 	 */
-	p->prio = normal_prio(p);
+	if (keep_boost)
+		p->prio = rt_mutex_get_effective_prio(p, normal_prio(p));
+	else
+		p->prio = normal_prio(p);
 
 	if (dl_prio(p->prio))
 		p->sched_class = &dl_sched_class;
@@ -3408,7 +3411,7 @@ static int __sched_setscheduler(struct t
 	int newprio = dl_policy(attr->sched_policy) ? MAX_DL_PRIO - 1 :
 		      MAX_RT_PRIO - 1 - attr->sched_priority;
 	int retval, oldprio, oldpolicy = -1, queued, running;
-	int policy = attr->sched_policy;
+	int new_effective_prio, policy = attr->sched_policy;
 	unsigned long flags;
 	const struct sched_class *prev_class;
 	struct rq *rq;
@@ -3590,15 +3593,14 @@ change:
 	oldprio = p->prio;
 
 	/*
-	 * Special case for priority boosted tasks.
-	 *
-	 * If the new priority is lower or equal (user space view)
-	 * than the current (boosted) priority, we just store the new
+	 * Take priority boosted tasks into account. If the new
+	 * effective priority is unchanged, we just store the new
 	 * normal parameters and do not touch the scheduler class and
 	 * the runqueue. This will be done when the task deboost
 	 * itself.
 	 */
-	if (rt_mutex_check_prio(p, newprio)) {
+	new_effective_prio = rt_mutex_get_effective_prio(p, newprio);
+	if (new_effective_prio == oldprio) {
 		__setscheduler_params(p, attr);
 		task_rq_unlock(rq, p, &flags);
 		return 0;
@@ -3612,7 +3614,7 @@ change:
 		put_prev_task(rq, p);
 
 	prev_class = p->sched_class;
-	__setscheduler(rq, p, attr);
+	__setscheduler(rq, p, attr, true);
 
 	if (running)
 		p->sched_class->set_curr_task(rq);
@@ -7346,7 +7348,7 @@ static void normalize_task(struct rq *rq
 	queued = task_on_rq_queued(p);
 	if (queued)
 		dequeue_task(rq, p, 0);
-	__setscheduler(rq, p, &attr);
+	__setscheduler(rq, p, &attr, false);
 	if (queued) {
 		enqueue_task(rq, p, 0);
 		resched_curr(rq);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V2] sched: Handle priority boosted tasks proper in setscheduler()
  2015-05-05 17:49           ` [PATCH V2] " Thomas Gleixner
@ 2015-05-05 20:20             ` Steven Rostedt
  0 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2015-05-05 20:20 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Mike Galbraith, Ronny Meeus, LKML, Peter Zijlstra

On Tue, 5 May 2015 19:49:49 +0200 (CEST)
Thomas Gleixner <tglx@linutronix.de> wrote:

> Ronny reported that the following scenario is not handled correctly:
> 
> T1 (prio = 10)
>    lock(rtmutex);
> 
> T2 (prio = 20)
>    lock(rtmutex)
>       boost T1
> 
> T1 (prio = 20)
>    sys_set_scheduler(prio = 30)
>    T1 prio = 30
>    ....
>    sys_set_scheduler(prio = 10)
>    T1 prio = 30
> 
> The last step is wrong as T1 should now be back at prio 20.
> 
> commit c365c292d0590 "sched: Consider pi boosting in setscheduler()"
> only handles the case where a boosted tasks tries to lower its
> priority.
> 
> Fix it by taking the new effective priority into account for the
> decision whether a change of the priority is required.
> 
> Reported-by: Ronny Meeus <ronny.meeus@gmail.com>
> Fixes: commit c365c292d0590 "sched: Consider pi boosting in setscheduler()"
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1505051806060.4225@nanos
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>

Note, this should be marked stable. I can get a nasty kernel spat with
playing around with RT priorities without this patch.

[   42.226836] ------------[ cut here ]------------
[   42.227495] WARNING: CPU: 0 PID: 2271 at /home/rostedt/work/git/linux-trace.git/kernel/sched/rt.c:1114 dequeue_rt_stack+0x220/0x22b()
[   42.227495] Modules linked in: [..]
[   42.227495] CPU: 0 PID: 2271 Comm: check_pi_deboos Not tainted 4.1.0-rc1-test+ #415
[   42.227495] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
[   42.227495]  0000000000000009 ffff880078a5fd28 ffffffff815fcc4c 0000000080000002
[   42.227495]  0000000000000000 ffff880078a5fd68 ffffffff810496ea 0000000000000000
[   42.227495]  ffffffff81075d5a ffff88007669e208 ffff88007d416460 0000000000000054
[   42.227495] Call Trace:
[   42.227495]  [<ffffffff815fcc4c>] dump_stack+0x4f/0x7b
[   42.227495]  [<ffffffff810496ea>] warn_slowpath_common+0xa1/0xbb
[   42.227495]  [<ffffffff81075d5a>] ? dequeue_rt_stack+0x220/0x22b
[   42.227495]  [<ffffffff810497a7>] warn_slowpath_null+0x1a/0x1c
[   42.227495]  [<ffffffff81075d5a>] dequeue_rt_stack+0x220/0x22b
[   42.227495]  [<ffffffff810762dc>] dequeue_rt_entity+0x1f/0x58
[   42.227495]  [<ffffffff810767a1>] dequeue_task_rt+0x24/0x34
[   42.227495]  [<ffffffff81068f03>] dequeue_task+0x69/0x70
[   42.227495]  [<ffffffff8106dfd7>] sched_move_task+0x4e/0xdb
[   42.227495]  [<ffffffff8106e07b>] cpu_cgroup_exit+0x17/0x19
[   42.227495]  [<ffffffff810b8065>] cgroup_exit+0x9f/0xbe
[   42.227495]  [<ffffffff8104abc7>] do_exit+0x429/0x92f
[   42.227495]  [<ffffffff8104bdb1>] SyS_exit+0x17/0x17
[   42.227495]  [<ffffffff81604c97>] system_call_fastpath+0x12/0x6a
[   42.227495] ---[ end trace 14460ffaa77bf181 ]---
[   42.412352] ------------[ cut here ]------------

-- Steve


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [tip:sched/core] sched: Handle priority boosted tasks proper in setscheduler()
  2015-05-05 16:08 [PATCH] sched: Handle priority boosted tasks proper in setscheduler() Thomas Gleixner
  2015-05-05 16:29 ` Steven Rostedt
@ 2015-05-08 13:18 ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 9+ messages in thread
From: tip-bot for Thomas Gleixner @ 2015-05-08 13:18 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, bp, rostedt, ronny.meeus, stable, tglx, hpa, mingo,
	umgwanakikbuti, peterz

Commit-ID:  0782e63bc6fe7e2d3408d250df11d388b7799c6b
Gitweb:     http://git.kernel.org/tip/0782e63bc6fe7e2d3408d250df11d388b7799c6b
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Tue, 5 May 2015 19:49:49 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 8 May 2015 11:53:55 +0200

sched: Handle priority boosted tasks proper in setscheduler()

Ronny reported that the following scenario is not handled correctly:

	T1 (prio = 10)
	   lock(rtmutex);

	T2 (prio = 20)
	   lock(rtmutex)
	      boost T1

	T1 (prio = 20)
	   sys_set_scheduler(prio = 30)
	   T1 prio = 30
	   ....
	   sys_set_scheduler(prio = 10)
	   T1 prio = 30

The last step is wrong as T1 should now be back at prio 20.

Commit c365c292d059 ("sched: Consider pi boosting in setscheduler()")
only handles the case where a boosted tasks tries to lower its
priority.

Fix it by taking the new effective priority into account for the
decision whether a change of the priority is required.

Reported-by: Ronny Meeus <ronny.meeus@gmail.com>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Fixes: c365c292d059 ("sched: Consider pi boosting in setscheduler()")
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1505051806060.4225@nanos
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/sched/rt.h |  7 ++++---
 kernel/locking/rtmutex.c | 12 +++++++-----
 kernel/sched/core.c      | 26 ++++++++++++++------------
 3 files changed, 25 insertions(+), 20 deletions(-)

diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h
index 6341f5b..a30b172 100644
--- a/include/linux/sched/rt.h
+++ b/include/linux/sched/rt.h
@@ -18,7 +18,7 @@ static inline int rt_task(struct task_struct *p)
 #ifdef CONFIG_RT_MUTEXES
 extern int rt_mutex_getprio(struct task_struct *p);
 extern void rt_mutex_setprio(struct task_struct *p, int prio);
-extern int rt_mutex_check_prio(struct task_struct *task, int newprio);
+extern int rt_mutex_get_effective_prio(struct task_struct *task, int newprio);
 extern struct task_struct *rt_mutex_get_top_task(struct task_struct *task);
 extern void rt_mutex_adjust_pi(struct task_struct *p);
 static inline bool tsk_is_pi_blocked(struct task_struct *tsk)
@@ -31,9 +31,10 @@ static inline int rt_mutex_getprio(struct task_struct *p)
 	return p->normal_prio;
 }
 
-static inline int rt_mutex_check_prio(struct task_struct *task, int newprio)
+static inline int rt_mutex_get_effective_prio(struct task_struct *task,
+					      int newprio)
 {
-	return 0;
+	return newprio;
 }
 
 static inline struct task_struct *rt_mutex_get_top_task(struct task_struct *task)
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index b732793..b025295 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -265,15 +265,17 @@ struct task_struct *rt_mutex_get_top_task(struct task_struct *task)
 }
 
 /*
- * Called by sched_setscheduler() to check whether the priority change
- * is overruled by a possible priority boosting.
+ * Called by sched_setscheduler() to get the priority which will be
+ * effective after the change.
  */
-int rt_mutex_check_prio(struct task_struct *task, int newprio)
+int rt_mutex_get_effective_prio(struct task_struct *task, int newprio)
 {
 	if (!task_has_pi_waiters(task))
-		return 0;
+		return newprio;
 
-	return task_top_pi_waiter(task)->task->prio <= newprio;
+	if (task_top_pi_waiter(task)->task->prio <= newprio)
+		return task_top_pi_waiter(task)->task->prio;
+	return newprio;
 }
 
 /*
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fe22f75..34db9bf 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3300,15 +3300,18 @@ static void __setscheduler_params(struct task_struct *p,
 
 /* Actually do priority change: must hold pi & rq lock. */
 static void __setscheduler(struct rq *rq, struct task_struct *p,
-			   const struct sched_attr *attr)
+			   const struct sched_attr *attr, bool keep_boost)
 {
 	__setscheduler_params(p, attr);
 
 	/*
-	 * If we get here, there was no pi waiters boosting the
-	 * task. It is safe to use the normal prio.
+	 * Keep a potential priority boosting if called from
+	 * sched_setscheduler().
 	 */
-	p->prio = normal_prio(p);
+	if (keep_boost)
+		p->prio = rt_mutex_get_effective_prio(p, normal_prio(p));
+	else
+		p->prio = normal_prio(p);
 
 	if (dl_prio(p->prio))
 		p->sched_class = &dl_sched_class;
@@ -3408,7 +3411,7 @@ static int __sched_setscheduler(struct task_struct *p,
 	int newprio = dl_policy(attr->sched_policy) ? MAX_DL_PRIO - 1 :
 		      MAX_RT_PRIO - 1 - attr->sched_priority;
 	int retval, oldprio, oldpolicy = -1, queued, running;
-	int policy = attr->sched_policy;
+	int new_effective_prio, policy = attr->sched_policy;
 	unsigned long flags;
 	const struct sched_class *prev_class;
 	struct rq *rq;
@@ -3590,15 +3593,14 @@ change:
 	oldprio = p->prio;
 
 	/*
-	 * Special case for priority boosted tasks.
-	 *
-	 * If the new priority is lower or equal (user space view)
-	 * than the current (boosted) priority, we just store the new
+	 * Take priority boosted tasks into account. If the new
+	 * effective priority is unchanged, we just store the new
 	 * normal parameters and do not touch the scheduler class and
 	 * the runqueue. This will be done when the task deboost
 	 * itself.
 	 */
-	if (rt_mutex_check_prio(p, newprio)) {
+	new_effective_prio = rt_mutex_get_effective_prio(p, newprio);
+	if (new_effective_prio == oldprio) {
 		__setscheduler_params(p, attr);
 		task_rq_unlock(rq, p, &flags);
 		return 0;
@@ -3612,7 +3614,7 @@ change:
 		put_prev_task(rq, p);
 
 	prev_class = p->sched_class;
-	__setscheduler(rq, p, attr);
+	__setscheduler(rq, p, attr, true);
 
 	if (running)
 		p->sched_class->set_curr_task(rq);
@@ -7346,7 +7348,7 @@ static void normalize_task(struct rq *rq, struct task_struct *p)
 	queued = task_on_rq_queued(p);
 	if (queued)
 		dequeue_task(rq, p, 0);
-	__setscheduler(rq, p, &attr);
+	__setscheduler(rq, p, &attr, false);
 	if (queued) {
 		enqueue_task(rq, p, 0);
 		resched_curr(rq);

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, back to index

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-05 16:08 [PATCH] sched: Handle priority boosted tasks proper in setscheduler() Thomas Gleixner
2015-05-05 16:29 ` Steven Rostedt
2015-05-05 16:31   ` Thomas Gleixner
2015-05-05 16:42     ` Steven Rostedt
2015-05-05 16:50       ` Thomas Gleixner
2015-05-05 17:01         ` Steven Rostedt
2015-05-05 17:49           ` [PATCH V2] " Thomas Gleixner
2015-05-05 20:20             ` Steven Rostedt
2015-05-08 13:18 ` [tip:sched/core] " tip-bot for Thomas Gleixner

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git
	git clone --mirror https://lore.kernel.org/lkml/10 lkml/git/10.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git