Question about prio_changed

All of lore.kernel.org
 help / color / mirror / Atom feed

* Question about prio_changed_dl()
@ 2016-02-19 12:43 luca abeni
  2016-02-25 14:01 ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: luca abeni @ 2016-02-19 12:43 UTC (permalink / raw)
  To: Juri Lelli, linux-kernel; +Cc: Peter Zijlstra

Hi,

when playing with the __dl_{add,sub}_ac() stuff recently posted by
Juri, I found something that looks strange in prio_changed_dl():

static void prio_changed_dl(struct rq *rq, struct task_struct *p,
			    int oldprio)
{
	if (task_on_rq_queued(p) || rq->curr == p) {
		[...]
	} else
		switched_to_dl(rq, p);
}
but switched_to_dl() does:
static void switched_to_dl(struct rq *rq, struct task_struct *p)
{
	if (task_on_rq_queued(p) && rq->curr != p) {
		[...]
	}
}

so, prio_changed_dl() invokes switched_to_dl() if task_on_rq_queued()
is false, but in this case switched_to_dl() does nothing... Am I
missing something, or the
	} else
		switched_to_dl(rq, p);
is useless?
(BTW, it seems to me that switched_to_dl() is never invoked, for some
reason...)


			Thanks,
				Luca

If you wonder how this is related to __dl_{add,sub}_ac(), I am going to
write an email about it in a short time :)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about prio_changed_dl()
  2016-02-19 12:43 Question about prio_changed_dl() luca abeni
@ 2016-02-25 14:01 ` Peter Zijlstra
  2016-02-25 14:16   ` Juri Lelli
                     ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Peter Zijlstra @ 2016-02-25 14:01 UTC (permalink / raw)
  To: luca abeni; +Cc: Juri Lelli, linux-kernel

On Fri, Feb 19, 2016 at 01:43:45PM +0100, luca abeni wrote:
> Hi,
> 
> when playing with the __dl_{add,sub}_ac() stuff recently posted by
> Juri, I found something that looks strange in prio_changed_dl():
> 
> static void prio_changed_dl(struct rq *rq, struct task_struct *p,
> 			    int oldprio)
> {
> 	if (task_on_rq_queued(p) || rq->curr == p) {
> 		[...]
> 	} else
> 		switched_to_dl(rq, p);
> }
> but switched_to_dl() does:
> static void switched_to_dl(struct rq *rq, struct task_struct *p)
> {
> 	if (task_on_rq_queued(p) && rq->curr != p) {
> 		[...]
> 	}
> }
> 
> so, prio_changed_dl() invokes switched_to_dl() if task_on_rq_queued()
> is false, but in this case switched_to_dl() does nothing... Am I
> missing something, or the
> 	} else
> 		switched_to_dl(rq, p);
> is useless?

Agreed, see below.

> (BTW, it seems to me that switched_to_dl() is never invoked, for some
> reason...)

Hmm, it should be invoked if you do sched_setattr() to get
SCHED_DEADLINE.

---
Subject: sched/deadline: Remove superfluous call to switched_to_dl()

	if (A || B) {

	} else if (A && !B) {

	}

If A we'll take the first branch, if !A we will not satisfy the second.
Therefore the second branch will never be taken.

Cc: Juri Lelli <juri.lelli@arm.com>
Reported-by: luca abeni <luca.abeni@unitn.it>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/deadline.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 57b939c81bce..c161c53d9424 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1768,8 +1768,7 @@ static void prio_changed_dl(struct rq *rq, struct task_struct *p,
 		 */
 		resched_curr(rq);
 #endif /* CONFIG_SMP */
-	} else
-		switched_to_dl(rq, p);
+	}
 }
 
 const struct sched_class dl_sched_class = {

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Question about prio_changed_dl()
  2016-02-25 14:01 ` Peter Zijlstra
@ 2016-02-25 14:16   ` Juri Lelli
  2016-02-25 14:25   ` luca abeni
  2016-02-29 11:19   ` [tip:sched/core] sched/deadline: Remove superfluous call to switched_to_dl() tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 11+ messages in thread
From: Juri Lelli @ 2016-02-25 14:16 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: luca abeni, linux-kernel

On 25/02/16 15:01, Peter Zijlstra wrote:
> On Fri, Feb 19, 2016 at 01:43:45PM +0100, luca abeni wrote:
> > Hi,
> > 
> > when playing with the __dl_{add,sub}_ac() stuff recently posted by
> > Juri, I found something that looks strange in prio_changed_dl():
> > 
> > static void prio_changed_dl(struct rq *rq, struct task_struct *p,
> > 			    int oldprio)
> > {
> > 	if (task_on_rq_queued(p) || rq->curr == p) {
> > 		[...]
> > 	} else
> > 		switched_to_dl(rq, p);
> > }
> > but switched_to_dl() does:
> > static void switched_to_dl(struct rq *rq, struct task_struct *p)
> > {
> > 	if (task_on_rq_queued(p) && rq->curr != p) {
> > 		[...]
> > 	}
> > }
> > 
> > so, prio_changed_dl() invokes switched_to_dl() if task_on_rq_queued()
> > is false, but in this case switched_to_dl() does nothing... Am I
> > missing something, or the
> > 	} else
> > 		switched_to_dl(rq, p);
> > is useless?
> 
> Agreed, see below.
> 
> > (BTW, it seems to me that switched_to_dl() is never invoked, for some
> > reason...)
> 
> Hmm, it should be invoked if you do sched_setattr() to get
> SCHED_DEADLINE.
> 
> ---
> Subject: sched/deadline: Remove superfluous call to switched_to_dl()
> 
> 	if (A || B) {
> 
> 	} else if (A && !B) {
> 
> 	}
> 
> If A we'll take the first branch, if !A we will not satisfy the second.
> Therefore the second branch will never be taken.
> 
> Cc: Juri Lelli <juri.lelli@arm.com>
> Reported-by: luca abeni <luca.abeni@unitn.it>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Indeed!

Acked-by: Juri Lelli <juri.lelli@arm.com>

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about prio_changed_dl()
  2016-02-25 14:01 ` Peter Zijlstra
  2016-02-25 14:16   ` Juri Lelli
@ 2016-02-25 14:25   ` luca abeni
  2016-02-25 14:52     ` Peter Zijlstra
  2016-02-29 11:19   ` [tip:sched/core] sched/deadline: Remove superfluous call to switched_to_dl() tip-bot for Peter Zijlstra
  2 siblings, 1 reply; 11+ messages in thread
From: luca abeni @ 2016-02-25 14:25 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Juri Lelli, linux-kernel

Hi Peter,

On Thu, 25 Feb 2016 15:01:49 +0100
Peter Zijlstra <peterz@infradead.org> wrote:
[...]
> > so, prio_changed_dl() invokes switched_to_dl() if
> > task_on_rq_queued() is false, but in this case switched_to_dl()
> > does nothing... Am I missing something, or the
> > 	} else
> > 		switched_to_dl(rq, p);
> > is useless?
> 
> Agreed, see below.
> 
> > (BTW, it seems to me that switched_to_dl() is never invoked, for
> > some reason...)
> 
> Hmm, it should be invoked if you do sched_setattr() to get
> SCHED_DEADLINE.

Sorry, that was me being confused...
It is prio_changed_dl() that is not invoked when the deadline
parameters are changed (I am testing a change to fix this - it actually
is included in the "Move the remaining __dl_{sub,add}_ac() calls from
core.c to deadline.c" patch I posted on Monday; I am going to extract
it in a separate patch).
switched_to_dl() is correctly invoked.

> 
> ---
> Subject: sched/deadline: Remove superfluous call to switched_to_dl()

This is what I was thinking about, yes :)


			Thanks,
				Luca

> 
> 	if (A || B) {
> 
> 	} else if (A && !B) {
> 
> 	}
> 
> If A we'll take the first branch, if !A we will not satisfy the
> second. Therefore the second branch will never be taken.
> 
> Cc: Juri Lelli <juri.lelli@arm.com>
> Reported-by: luca abeni <luca.abeni@unitn.it>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/sched/deadline.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 57b939c81bce..c161c53d9424 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1768,8 +1768,7 @@ static void prio_changed_dl(struct rq *rq,
> struct task_struct *p, */
>  		resched_curr(rq);
>  #endif /* CONFIG_SMP */
> -	} else
> -		switched_to_dl(rq, p);
> +	}
>  }
>  
>  const struct sched_class dl_sched_class = {
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about prio_changed_dl()
  2016-02-25 14:25   ` luca abeni
@ 2016-02-25 14:52     ` Peter Zijlstra
  2016-02-27 11:37       ` luca abeni
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2016-02-25 14:52 UTC (permalink / raw)
  To: luca abeni; +Cc: Juri Lelli, linux-kernel

On Thu, Feb 25, 2016 at 03:25:58PM +0100, luca abeni wrote:

> > > (BTW, it seems to me that switched_to_dl() is never invoked, for
> > > some reason...)
> > 
> > Hmm, it should be invoked if you do sched_setattr() to get
> > SCHED_DEADLINE.
> 
> Sorry, that was me being confused...
> It is prio_changed_dl() that is not invoked when the deadline
> parameters are changed (I am testing a change to fix this - it actually
> is included in the "Move the remaining __dl_{sub,add}_ac() calls from
> core.c to deadline.c" patch I posted on Monday; I am going to extract
> it in a separate patch).

Ah ok. So the idea was that the || dl_task() part would ensure
->prio_changed() would always be called.

I'll await your patch.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about prio_changed_dl()
  2016-02-25 14:52     ` Peter Zijlstra
@ 2016-02-27 11:37       ` luca abeni
  2016-03-02 19:02         ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: luca abeni @ 2016-02-27 11:37 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Juri Lelli, linux-kernel

On Thu, 25 Feb 2016 15:52:02 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, Feb 25, 2016 at 03:25:58PM +0100, luca abeni wrote:
> 
> > > > (BTW, it seems to me that switched_to_dl() is never invoked, for
> > > > some reason...)
> > > 
> > > Hmm, it should be invoked if you do sched_setattr() to get
> > > SCHED_DEADLINE.
> > 
> > Sorry, that was me being confused...
> > It is prio_changed_dl() that is not invoked when the deadline
> > parameters are changed (I am testing a change to fix this - it actually
> > is included in the "Move the remaining __dl_{sub,add}_ac() calls from
> > core.c to deadline.c" patch I posted on Monday; I am going to extract
> > it in a separate patch).
> 
> Ah ok. So the idea was that the || dl_task() part would ensure
> ->prio_changed() would always be called.
> 
> I'll await your patch.

>From 5a0bde5332bc1cc5701b2d9799f6ec197c1572cc Mon Sep 17 00:00:00 2001
From: Luca Abeni <luca.abeni@unitn.it>
Date: Fri, 26 Feb 2016 09:56:24 +0100
Subject: sched/core: fix __sched_setscheduler() to properly invoke prio_changed_dl()

Currently, prio_changed_dl() is not called when __sched_setscheduler()
changes the parameters of a SCHED_DEADLINE task. This happens because
when changing parameters deadline tasks do not change their priority,
so new_effective_prio == oldprio.
The issue is fixed by explicitly checking if the task is a deadline task.

Signed-off-by: Luca Abeni <luca.abeni@unitn.it>
---
 kernel/sched/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9503d59..5646bde 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4079,6 +4079,8 @@ change:
 		new_effective_prio = rt_mutex_get_effective_prio(p, newprio);
 		if (new_effective_prio == oldprio) {
 			__setscheduler_params(p, attr);
+			if (p->sched_class == &dl_sched_class)
+				p->sched_class->prio_changed(rq, p, oldprio);
 			task_rq_unlock(rq, p, &flags);
 			return 0;
 		}
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [tip:sched/core] sched/deadline: Remove superfluous call to switched_to_dl()
  2016-02-25 14:01 ` Peter Zijlstra
  2016-02-25 14:16   ` Juri Lelli
  2016-02-25 14:25   ` luca abeni
@ 2016-02-29 11:19   ` tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 11+ messages in thread
From: tip-bot for Peter Zijlstra @ 2016-02-29 11:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: luca.abeni, torvalds, tglx, hpa, efault, linux-kernel,
	juri.lelli, mingo, peterz

Commit-ID:  801ccdbf018ca5dbd478756ece55cd6c7726ed5b
Gitweb:     http://git.kernel.org/tip/801ccdbf018ca5dbd478756ece55cd6c7726ed5b
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Thu, 25 Feb 2016 15:01:49 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 29 Feb 2016 09:53:11 +0100

sched/deadline: Remove superfluous call to switched_to_dl()

	if (A || B) {

	} else if (A && !B) {

	}

If A we'll take the first branch, if !A we will not satisfy the second.
Therefore the second branch will never be taken.

Reported-by: luca abeni <luca.abeni@unitn.it>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160225140149.GK6357@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/deadline.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 04a569c..15abf04 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1772,8 +1772,7 @@ static void prio_changed_dl(struct rq *rq, struct task_struct *p,
 		 */
 		resched_curr(rq);
 #endif /* CONFIG_SMP */
-	} else
-		switched_to_dl(rq, p);
+	}
 }
 
 const struct sched_class dl_sched_class = {

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Question about prio_changed_dl()
  2016-02-27 11:37       ` luca abeni
@ 2016-03-02 19:02         ` Peter Zijlstra
  2016-03-02 20:16           ` luca abeni
  2016-03-04  8:50           ` luca abeni
  0 siblings, 2 replies; 11+ messages in thread
From: Peter Zijlstra @ 2016-03-02 19:02 UTC (permalink / raw)
  To: luca abeni; +Cc: Juri Lelli, linux-kernel

On Sat, Feb 27, 2016 at 12:37:57PM +0100, luca abeni wrote:
> Subject: sched/core: fix __sched_setscheduler() to properly invoke prio_changed_dl()
> 
> Currently, prio_changed_dl() is not called when __sched_setscheduler()
> changes the parameters of a SCHED_DEADLINE task. This happens because
> when changing parameters deadline tasks do not change their priority,
> so new_effective_prio == oldprio.
> The issue is fixed by explicitly checking if the task is a deadline task.
> 
> Signed-off-by: Luca Abeni <luca.abeni@unitn.it>
> ---
>  kernel/sched/core.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 9503d59..5646bde 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4079,6 +4079,8 @@ change:
>  		new_effective_prio = rt_mutex_get_effective_prio(p, newprio);
>  		if (new_effective_prio == oldprio) {
>  			__setscheduler_params(p, attr);
> +			if (p->sched_class == &dl_sched_class)
> +				p->sched_class->prio_changed(rq, p, oldprio);
>  			task_rq_unlock(rq, p, &flags);
>  			return 0;
>  		}

That code no longer exists like that.

Can you see if the below patch (which caused that) also fixes your
problem?

---
commit ff77e468535987b3d21b7bd4da15608ea3ce7d0b
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Mon Jan 18 15:27:07 2016 +0100

    sched/rt: Fix PI handling vs. sched_setscheduler()
    
    Andrea Parri reported:
    
    > I found that the following scenario (with CONFIG_RT_GROUP_SCHED=y) is not
    > handled correctly:
    >
    >     T1 (prio = 20)
    >        lock(rtmutex);
    >
    >     T2 (prio = 20)
    >        blocks on rtmutex  (rt_nr_boosted = 0 on T1's rq)
    >
    >     T1 (prio = 20)
    >        sys_set_scheduler(prio = 0)
    >           [new_effective_prio == oldprio]
    >           T1 prio = 20    (rt_nr_boosted = 0 on T1's rq)
    >
    > The last step is incorrect as T1 is now boosted (c.f., rt_se_boosted());
    > in particular, if we continue with
    >
    >    T1 (prio = 20)
    >       unlock(rtmutex)
    >          wakeup(T2)
    >          adjust_prio(T1)
    >             [prio != rt_mutex_getprio(T1)]
    >	    dequeue(T1)
    >	       rt_nr_boosted = (unsigned long)(-1)
    >	       ...
    >             T1 prio = 0
    >
    > then we end up leaving rt_nr_boosted in an "inconsistent" state.
    >
    > The simple program attached could reproduce the previous scenario; note
    > that, as a consequence of the presence of this state, the "assertion"
    >
    >     WARN_ON(!rt_nr_running && rt_nr_boosted)
    >
    > from dec_rt_group() may trigger.
    
    So normally we dequeue/enqueue tasks in sched_setscheduler(), which
    would ensure the accounting stays correct. However in the early PI path
    we fail to do so.
    
    So this was introduced at around v3.14, by:
    
      c365c292d059 ("sched: Consider pi boosting in setscheduler()")
    
    which fixed another problem exactly because that dequeue/enqueue, joy.
    
    Fix this by teaching rt about DEQUEUE_SAVE/ENQUEUE_RESTORE and have it
    preserve runqueue location with that option. This requires decoupling
    the on_rt_rq() state from being on the list.
    
    In order to allow for explicit movement during the SAVE/RESTORE,
    introduce {DE,EN}QUEUE_MOVE. We still must use SAVE/RESTORE in these
    cases to preserve other invariants.
    
    Respecting the SAVE/RESTORE flags also has the (nice) side-effect that
    things like sys_nice()/sys_sched_setaffinity() also do not reorder
    FIFO tasks (whereas they used to before this patch).
    
    Reported-by: Andrea Parri <parri.andrea@gmail.com>
    Tested-by: Andrea Parri <parri.andrea@gmail.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Juri Lelli <juri.lelli@arm.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Mike Galbraith <efault@gmx.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

diff --git a/include/linux/sched.h b/include/linux/sched.h
index a292c4b7e94c..87e5f9886ac8 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1293,6 +1293,8 @@ struct sched_rt_entity {
 	unsigned long timeout;
 	unsigned long watchdog_stamp;
 	unsigned int time_slice;
+	unsigned short on_rq;
+	unsigned short on_list;
 
 	struct sched_rt_entity *back;
 #ifdef CONFIG_RT_GROUP_SCHED
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 291552b4d8ee..bb565b4663c8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2221,6 +2221,10 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
 	__dl_clear_params(p);
 
 	INIT_LIST_HEAD(&p->rt.run_list);
+	p->rt.timeout		= 0;
+	p->rt.time_slice	= sched_rr_timeslice;
+	p->rt.on_rq		= 0;
+	p->rt.on_list		= 0;
 
 #ifdef CONFIG_PREEMPT_NOTIFIERS
 	INIT_HLIST_HEAD(&p->preempt_notifiers);
@@ -3531,7 +3535,7 @@ EXPORT_SYMBOL(default_wake_function);
  */
 void rt_mutex_setprio(struct task_struct *p, int prio)
 {
-	int oldprio, queued, running, enqueue_flag = ENQUEUE_RESTORE;
+	int oldprio, queued, running, queue_flag = DEQUEUE_SAVE | DEQUEUE_MOVE;
 	struct rq *rq;
 	const struct sched_class *prev_class;
 
@@ -3559,11 +3563,15 @@ void rt_mutex_setprio(struct task_struct *p, int prio)
 
 	trace_sched_pi_setprio(p, prio);
 	oldprio = p->prio;
+
+	if (oldprio == prio)
+		queue_flag &= ~DEQUEUE_MOVE;
+
 	prev_class = p->sched_class;
 	queued = task_on_rq_queued(p);
 	running = task_current(rq, p);
 	if (queued)
-		dequeue_task(rq, p, DEQUEUE_SAVE);
+		dequeue_task(rq, p, queue_flag);
 	if (running)
 		put_prev_task(rq, p);
 
@@ -3581,7 +3589,7 @@ void rt_mutex_setprio(struct task_struct *p, int prio)
 		if (!dl_prio(p->normal_prio) ||
 		    (pi_task && dl_entity_preempt(&pi_task->dl, &p->dl))) {
 			p->dl.dl_boosted = 1;
-			enqueue_flag |= ENQUEUE_REPLENISH;
+			queue_flag |= ENQUEUE_REPLENISH;
 		} else
 			p->dl.dl_boosted = 0;
 		p->sched_class = &dl_sched_class;
@@ -3589,7 +3597,7 @@ void rt_mutex_setprio(struct task_struct *p, int prio)
 		if (dl_prio(oldprio))
 			p->dl.dl_boosted = 0;
 		if (oldprio < prio)
-			enqueue_flag |= ENQUEUE_HEAD;
+			queue_flag |= ENQUEUE_HEAD;
 		p->sched_class = &rt_sched_class;
 	} else {
 		if (dl_prio(oldprio))
@@ -3604,7 +3612,7 @@ void rt_mutex_setprio(struct task_struct *p, int prio)
 	if (running)
 		p->sched_class->set_curr_task(rq);
 	if (queued)
-		enqueue_task(rq, p, enqueue_flag);
+		enqueue_task(rq, p, queue_flag);
 
 	check_class_changed(rq, p, prev_class, oldprio);
 out_unlock:
@@ -3960,6 +3968,7 @@ static int __sched_setscheduler(struct task_struct *p,
 	const struct sched_class *prev_class;
 	struct rq *rq;
 	int reset_on_fork;
+	int queue_flags = DEQUEUE_SAVE | DEQUEUE_MOVE;
 
 	/* may grab non-irq protected spin_locks */
 	BUG_ON(in_interrupt());
@@ -4142,17 +4151,14 @@ static int __sched_setscheduler(struct task_struct *p,
 		 * itself.
 		 */
 		new_effective_prio = rt_mutex_get_effective_prio(p, newprio);
-		if (new_effective_prio == oldprio) {
-			__setscheduler_params(p, attr);
-			task_rq_unlock(rq, p, &flags);
-			return 0;
-		}
+		if (new_effective_prio == oldprio)
+			queue_flags &= ~DEQUEUE_MOVE;
 	}
 
 	queued = task_on_rq_queued(p);
 	running = task_current(rq, p);
 	if (queued)
-		dequeue_task(rq, p, DEQUEUE_SAVE);
+		dequeue_task(rq, p, queue_flags);
 	if (running)
 		put_prev_task(rq, p);
 
@@ -4162,15 +4168,14 @@ static int __sched_setscheduler(struct task_struct *p,
 	if (running)
 		p->sched_class->set_curr_task(rq);
 	if (queued) {
-		int enqueue_flags = ENQUEUE_RESTORE;
 		/*
 		 * We enqueue to tail when the priority of a task is
 		 * increased (user space view).
 		 */
-		if (oldprio <= p->prio)
-			enqueue_flags |= ENQUEUE_HEAD;
+		if (oldprio < p->prio)
+			queue_flags |= ENQUEUE_HEAD;
 
-		enqueue_task(rq, p, enqueue_flags);
+		enqueue_task(rq, p, queue_flags);
 	}
 
 	check_class_changed(rq, p, prev_class, oldprio);
@@ -7958,7 +7963,7 @@ void sched_move_task(struct task_struct *tsk)
 	queued = task_on_rq_queued(tsk);
 
 	if (queued)
-		dequeue_task(rq, tsk, DEQUEUE_SAVE);
+		dequeue_task(rq, tsk, DEQUEUE_SAVE | DEQUEUE_MOVE);
 	if (unlikely(running))
 		put_prev_task(rq, tsk);
 
@@ -7982,7 +7987,7 @@ void sched_move_task(struct task_struct *tsk)
 	if (unlikely(running))
 		tsk->sched_class->set_curr_task(rq);
 	if (queued)
-		enqueue_task(rq, tsk, ENQUEUE_RESTORE);
+		enqueue_task(rq, tsk, ENQUEUE_RESTORE | ENQUEUE_MOVE);
 
 	task_rq_unlock(rq, tsk, &flags);
 }
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 8ec86abe0ea1..406a9c20b210 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -436,7 +436,7 @@ static void dequeue_top_rt_rq(struct rt_rq *rt_rq);
 
 static inline int on_rt_rq(struct sched_rt_entity *rt_se)
 {
-	return !list_empty(&rt_se->run_list);
+	return rt_se->on_rq;
 }
 
 #ifdef CONFIG_RT_GROUP_SCHED
@@ -482,8 +482,8 @@ static inline struct rt_rq *group_rt_rq(struct sched_rt_entity *rt_se)
 	return rt_se->my_q;
 }
 
-static void enqueue_rt_entity(struct sched_rt_entity *rt_se, bool head);
-static void dequeue_rt_entity(struct sched_rt_entity *rt_se);
+static void enqueue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flags);
+static void dequeue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flags);
 
 static void sched_rt_rq_enqueue(struct rt_rq *rt_rq)
 {
@@ -499,7 +499,7 @@ static void sched_rt_rq_enqueue(struct rt_rq *rt_rq)
 		if (!rt_se)
 			enqueue_top_rt_rq(rt_rq);
 		else if (!on_rt_rq(rt_se))
-			enqueue_rt_entity(rt_se, false);
+			enqueue_rt_entity(rt_se, 0);
 
 		if (rt_rq->highest_prio.curr < curr->prio)
 			resched_curr(rq);
@@ -516,7 +516,7 @@ static void sched_rt_rq_dequeue(struct rt_rq *rt_rq)
 	if (!rt_se)
 		dequeue_top_rt_rq(rt_rq);
 	else if (on_rt_rq(rt_se))
-		dequeue_rt_entity(rt_se);
+		dequeue_rt_entity(rt_se, 0);
 }
 
 static inline int rt_rq_throttled(struct rt_rq *rt_rq)
@@ -1166,7 +1166,30 @@ void dec_rt_tasks(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq)
 	dec_rt_group(rt_se, rt_rq);
 }
 
-static void __enqueue_rt_entity(struct sched_rt_entity *rt_se, bool head)
+/*
+ * Change rt_se->run_list location unless SAVE && !MOVE
+ *
+ * assumes ENQUEUE/DEQUEUE flags match
+ */
+static inline bool move_entity(unsigned int flags)
+{
+	if ((flags & (DEQUEUE_SAVE | DEQUEUE_MOVE)) == DEQUEUE_SAVE)
+		return false;
+
+	return true;
+}
+
+static void __delist_rt_entity(struct sched_rt_entity *rt_se, struct rt_prio_array *array)
+{
+	list_del_init(&rt_se->run_list);
+
+	if (list_empty(array->queue + rt_se_prio(rt_se)))
+		__clear_bit(rt_se_prio(rt_se), array->bitmap);
+
+	rt_se->on_list = 0;
+}
+
+static void __enqueue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flags)
 {
 	struct rt_rq *rt_rq = rt_rq_of_se(rt_se);
 	struct rt_prio_array *array = &rt_rq->active;
@@ -1179,26 +1202,37 @@ static void __enqueue_rt_entity(struct sched_rt_entity *rt_se, bool head)
 	 * get throttled and the current group doesn't have any other
 	 * active members.
 	 */
-	if (group_rq && (rt_rq_throttled(group_rq) || !group_rq->rt_nr_running))
+	if (group_rq && (rt_rq_throttled(group_rq) || !group_rq->rt_nr_running)) {
+		if (rt_se->on_list)
+			__delist_rt_entity(rt_se, array);
 		return;
+	}
 
-	if (head)
-		list_add(&rt_se->run_list, queue);
-	else
-		list_add_tail(&rt_se->run_list, queue);
-	__set_bit(rt_se_prio(rt_se), array->bitmap);
+	if (move_entity(flags)) {
+		WARN_ON_ONCE(rt_se->on_list);
+		if (flags & ENQUEUE_HEAD)
+			list_add(&rt_se->run_list, queue);
+		else
+			list_add_tail(&rt_se->run_list, queue);
+
+		__set_bit(rt_se_prio(rt_se), array->bitmap);
+		rt_se->on_list = 1;
+	}
+	rt_se->on_rq = 1;
 
 	inc_rt_tasks(rt_se, rt_rq);
 }
 
-static void __dequeue_rt_entity(struct sched_rt_entity *rt_se)
+static void __dequeue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flags)
 {
 	struct rt_rq *rt_rq = rt_rq_of_se(rt_se);
 	struct rt_prio_array *array = &rt_rq->active;
 
-	list_del_init(&rt_se->run_list);
-	if (list_empty(array->queue + rt_se_prio(rt_se)))
-		__clear_bit(rt_se_prio(rt_se), array->bitmap);
+	if (move_entity(flags)) {
+		WARN_ON_ONCE(!rt_se->on_list);
+		__delist_rt_entity(rt_se, array);
+	}
+	rt_se->on_rq = 0;
 
 	dec_rt_tasks(rt_se, rt_rq);
 }
@@ -1207,7 +1241,7 @@ static void __dequeue_rt_entity(struct sched_rt_entity *rt_se)
  * Because the prio of an upper entry depends on the lower
  * entries, we must remove entries top - down.
  */
-static void dequeue_rt_stack(struct sched_rt_entity *rt_se)
+static void dequeue_rt_stack(struct sched_rt_entity *rt_se, unsigned int flags)
 {
 	struct sched_rt_entity *back = NULL;
 
@@ -1220,31 +1254,31 @@ static void dequeue_rt_stack(struct sched_rt_entity *rt_se)
 
 	for (rt_se = back; rt_se; rt_se = rt_se->back) {
 		if (on_rt_rq(rt_se))
-			__dequeue_rt_entity(rt_se);
+			__dequeue_rt_entity(rt_se, flags);
 	}
 }
 
-static void enqueue_rt_entity(struct sched_rt_entity *rt_se, bool head)
+static void enqueue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flags)
 {
 	struct rq *rq = rq_of_rt_se(rt_se);
 
-	dequeue_rt_stack(rt_se);
+	dequeue_rt_stack(rt_se, flags);
 	for_each_sched_rt_entity(rt_se)
-		__enqueue_rt_entity(rt_se, head);
+		__enqueue_rt_entity(rt_se, flags);
 	enqueue_top_rt_rq(&rq->rt);
 }
 
-static void dequeue_rt_entity(struct sched_rt_entity *rt_se)
+static void dequeue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flags)
 {
 	struct rq *rq = rq_of_rt_se(rt_se);
 
-	dequeue_rt_stack(rt_se);
+	dequeue_rt_stack(rt_se, flags);
 
 	for_each_sched_rt_entity(rt_se) {
 		struct rt_rq *rt_rq = group_rt_rq(rt_se);
 
 		if (rt_rq && rt_rq->rt_nr_running)
-			__enqueue_rt_entity(rt_se, false);
+			__enqueue_rt_entity(rt_se, flags);
 	}
 	enqueue_top_rt_rq(&rq->rt);
 }
@@ -1260,7 +1294,7 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p, int flags)
 	if (flags & ENQUEUE_WAKEUP)
 		rt_se->timeout = 0;
 
-	enqueue_rt_entity(rt_se, flags & ENQUEUE_HEAD);
+	enqueue_rt_entity(rt_se, flags);
 
 	if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
 		enqueue_pushable_task(rq, p);
@@ -1271,7 +1305,7 @@ static void dequeue_task_rt(struct rq *rq, struct task_struct *p, int flags)
 	struct sched_rt_entity *rt_se = &p->rt;
 
 	update_curr_rt(rq);
-	dequeue_rt_entity(rt_se);
+	dequeue_rt_entity(rt_se, flags);
 
 	dequeue_pushable_task(rq, p);
 }
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 3dc7b8b3f94c..d3606e40ea0d 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1130,18 +1130,40 @@ static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
 extern const int sched_prio_to_weight[40];
 extern const u32 sched_prio_to_wmult[40];
 
+/*
+ * {de,en}queue flags:
+ *
+ * DEQUEUE_SLEEP  - task is no longer runnable
+ * ENQUEUE_WAKEUP - task just became runnable
+ *
+ * SAVE/RESTORE - an otherwise spurious dequeue/enqueue, done to ensure tasks
+ *                are in a known state which allows modification. Such pairs
+ *                should preserve as much state as possible.
+ *
+ * MOVE - paired with SAVE/RESTORE, explicitly does not preserve the location
+ *        in the runqueue.
+ *
+ * ENQUEUE_HEAD      - place at front of runqueue (tail if not specified)
+ * ENQUEUE_REPLENISH - CBS (replenish runtime and postpone deadline)
+ * ENQUEUE_WAKING    - sched_class::task_waking was called
+ *
+ */
+
+#define DEQUEUE_SLEEP		0x01
+#define DEQUEUE_SAVE		0x02 /* matches ENQUEUE_RESTORE */
+#define DEQUEUE_MOVE		0x04 /* matches ENQUEUE_MOVE */
+
 #define ENQUEUE_WAKEUP		0x01
-#define ENQUEUE_HEAD		0x02
+#define ENQUEUE_RESTORE		0x02
+#define ENQUEUE_MOVE		0x04
+
+#define ENQUEUE_HEAD		0x08
+#define ENQUEUE_REPLENISH	0x10
 #ifdef CONFIG_SMP
-#define ENQUEUE_WAKING		0x04	/* sched_class::task_waking was called */
+#define ENQUEUE_WAKING		0x20
 #else
 #define ENQUEUE_WAKING		0x00
 #endif
-#define ENQUEUE_REPLENISH	0x08
-#define ENQUEUE_RESTORE	0x10
-
-#define DEQUEUE_SLEEP		0x01
-#define DEQUEUE_SAVE		0x02
 
 #define RETRY_TASK		((void *)-1UL)
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Question about prio_changed_dl()
  2016-03-02 19:02         ` Peter Zijlstra
@ 2016-03-02 20:16           ` luca abeni
  2016-03-02 20:58             ` Peter Zijlstra
  2016-03-04  8:50           ` luca abeni
  1 sibling, 1 reply; 11+ messages in thread
From: luca abeni @ 2016-03-02 20:16 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Juri Lelli, linux-kernel

On Wed, 2 Mar 2016 20:02:58 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Sat, Feb 27, 2016 at 12:37:57PM +0100, luca abeni wrote:
> > Subject: sched/core: fix __sched_setscheduler() to properly invoke prio_changed_dl()
> > 
> > Currently, prio_changed_dl() is not called when __sched_setscheduler()
> > changes the parameters of a SCHED_DEADLINE task. This happens because
> > when changing parameters deadline tasks do not change their priority,
> > so new_effective_prio == oldprio.
> > The issue is fixed by explicitly checking if the task is a deadline task.
> > 
> > Signed-off-by: Luca Abeni <luca.abeni@unitn.it>
> > ---
> >  kernel/sched/core.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 9503d59..5646bde 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -4079,6 +4079,8 @@ change:
> >  		new_effective_prio = rt_mutex_get_effective_prio(p, newprio);
> >  		if (new_effective_prio == oldprio) {
> >  			__setscheduler_params(p, attr);
> > +			if (p->sched_class == &dl_sched_class)
> > +				p->sched_class->prio_changed(rq, p, oldprio);
> >  			task_rq_unlock(rq, p, &flags);
> >  			return 0;
> >  		}
> 
> That code no longer exists like that.
Uh... I must have done something wrong, then (I was convinced I pulled
from master and rebased my patch before testing and sending it, but
probably something went wrong).
Sorry about that.


> Can you see if the below patch (which caused that) also fixes your
> problem?
Ok; in the next days I'll make sure to have this patch applied and
I'll test it.


			Thanks,
				Luca
> 
> ---
> commit ff77e468535987b3d21b7bd4da15608ea3ce7d0b
> Author: Peter Zijlstra <peterz@infradead.org>
> Date:   Mon Jan 18 15:27:07 2016 +0100
> 
>     sched/rt: Fix PI handling vs. sched_setscheduler()
>     
>     Andrea Parri reported:
>     
>     > I found that the following scenario (with CONFIG_RT_GROUP_SCHED=y) is not
>     > handled correctly:
>     >
>     >     T1 (prio = 20)
>     >        lock(rtmutex);
>     >
>     >     T2 (prio = 20)
>     >        blocks on rtmutex  (rt_nr_boosted = 0 on T1's rq)
>     >
>     >     T1 (prio = 20)
>     >        sys_set_scheduler(prio = 0)
>     >           [new_effective_prio == oldprio]
>     >           T1 prio = 20    (rt_nr_boosted = 0 on T1's rq)
>     >
>     > The last step is incorrect as T1 is now boosted (c.f., rt_se_boosted());
>     > in particular, if we continue with
>     >
>     >    T1 (prio = 20)
>     >       unlock(rtmutex)
>     >          wakeup(T2)
>     >          adjust_prio(T1)
>     >             [prio != rt_mutex_getprio(T1)]
>     >	    dequeue(T1)
>     >	       rt_nr_boosted = (unsigned long)(-1)
>     >	       ...
>     >             T1 prio = 0
>     >
>     > then we end up leaving rt_nr_boosted in an "inconsistent" state.
>     >
>     > The simple program attached could reproduce the previous scenario; note
>     > that, as a consequence of the presence of this state, the "assertion"
>     >
>     >     WARN_ON(!rt_nr_running && rt_nr_boosted)
>     >
>     > from dec_rt_group() may trigger.
>     
>     So normally we dequeue/enqueue tasks in sched_setscheduler(), which
>     would ensure the accounting stays correct. However in the early PI path
>     we fail to do so.
>     
>     So this was introduced at around v3.14, by:
>     
>       c365c292d059 ("sched: Consider pi boosting in setscheduler()")
>     
>     which fixed another problem exactly because that dequeue/enqueue, joy.
>     
>     Fix this by teaching rt about DEQUEUE_SAVE/ENQUEUE_RESTORE and have it
>     preserve runqueue location with that option. This requires decoupling
>     the on_rt_rq() state from being on the list.
>     
>     In order to allow for explicit movement during the SAVE/RESTORE,
>     introduce {DE,EN}QUEUE_MOVE. We still must use SAVE/RESTORE in these
>     cases to preserve other invariants.
>     
>     Respecting the SAVE/RESTORE flags also has the (nice) side-effect that
>     things like sys_nice()/sys_sched_setaffinity() also do not reorder
>     FIFO tasks (whereas they used to before this patch).
>     
>     Reported-by: Andrea Parri <parri.andrea@gmail.com>
>     Tested-by: Andrea Parri <parri.andrea@gmail.com>
>     Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>     Cc: Juri Lelli <juri.lelli@arm.com>
>     Cc: Linus Torvalds <torvalds@linux-foundation.org>
>     Cc: Mike Galbraith <efault@gmx.de>
>     Cc: Peter Zijlstra <peterz@infradead.org>
>     Cc: Steven Rostedt <rostedt@goodmis.org>
>     Cc: Thomas Gleixner <tglx@linutronix.de>
>     Signed-off-by: Ingo Molnar <mingo@kernel.org>
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index a292c4b7e94c..87e5f9886ac8 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1293,6 +1293,8 @@ struct sched_rt_entity {
>  	unsigned long timeout;
>  	unsigned long watchdog_stamp;
>  	unsigned int time_slice;
> +	unsigned short on_rq;
> +	unsigned short on_list;
>  
>  	struct sched_rt_entity *back;
>  #ifdef CONFIG_RT_GROUP_SCHED
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 291552b4d8ee..bb565b4663c8 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2221,6 +2221,10 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
>  	__dl_clear_params(p);
>  
>  	INIT_LIST_HEAD(&p->rt.run_list);
> +	p->rt.timeout		= 0;
> +	p->rt.time_slice	= sched_rr_timeslice;
> +	p->rt.on_rq		= 0;
> +	p->rt.on_list		= 0;
>  
>  #ifdef CONFIG_PREEMPT_NOTIFIERS
>  	INIT_HLIST_HEAD(&p->preempt_notifiers);
> @@ -3531,7 +3535,7 @@ EXPORT_SYMBOL(default_wake_function);
>   */
>  void rt_mutex_setprio(struct task_struct *p, int prio)
>  {
> -	int oldprio, queued, running, enqueue_flag = ENQUEUE_RESTORE;
> +	int oldprio, queued, running, queue_flag = DEQUEUE_SAVE | DEQUEUE_MOVE;
>  	struct rq *rq;
>  	const struct sched_class *prev_class;
>  
> @@ -3559,11 +3563,15 @@ void rt_mutex_setprio(struct task_struct *p, int prio)
>  
>  	trace_sched_pi_setprio(p, prio);
>  	oldprio = p->prio;
> +
> +	if (oldprio == prio)
> +		queue_flag &= ~DEQUEUE_MOVE;
> +
>  	prev_class = p->sched_class;
>  	queued = task_on_rq_queued(p);
>  	running = task_current(rq, p);
>  	if (queued)
> -		dequeue_task(rq, p, DEQUEUE_SAVE);
> +		dequeue_task(rq, p, queue_flag);
>  	if (running)
>  		put_prev_task(rq, p);
>  
> @@ -3581,7 +3589,7 @@ void rt_mutex_setprio(struct task_struct *p, int prio)
>  		if (!dl_prio(p->normal_prio) ||
>  		    (pi_task && dl_entity_preempt(&pi_task->dl, &p->dl))) {
>  			p->dl.dl_boosted = 1;
> -			enqueue_flag |= ENQUEUE_REPLENISH;
> +			queue_flag |= ENQUEUE_REPLENISH;
>  		} else
>  			p->dl.dl_boosted = 0;
>  		p->sched_class = &dl_sched_class;
> @@ -3589,7 +3597,7 @@ void rt_mutex_setprio(struct task_struct *p, int prio)
>  		if (dl_prio(oldprio))
>  			p->dl.dl_boosted = 0;
>  		if (oldprio < prio)
> -			enqueue_flag |= ENQUEUE_HEAD;
> +			queue_flag |= ENQUEUE_HEAD;
>  		p->sched_class = &rt_sched_class;
>  	} else {
>  		if (dl_prio(oldprio))
> @@ -3604,7 +3612,7 @@ void rt_mutex_setprio(struct task_struct *p, int prio)
>  	if (running)
>  		p->sched_class->set_curr_task(rq);
>  	if (queued)
> -		enqueue_task(rq, p, enqueue_flag);
> +		enqueue_task(rq, p, queue_flag);
>  
>  	check_class_changed(rq, p, prev_class, oldprio);
>  out_unlock:
> @@ -3960,6 +3968,7 @@ static int __sched_setscheduler(struct task_struct *p,
>  	const struct sched_class *prev_class;
>  	struct rq *rq;
>  	int reset_on_fork;
> +	int queue_flags = DEQUEUE_SAVE | DEQUEUE_MOVE;
>  
>  	/* may grab non-irq protected spin_locks */
>  	BUG_ON(in_interrupt());
> @@ -4142,17 +4151,14 @@ static int __sched_setscheduler(struct task_struct *p,
>  		 * itself.
>  		 */
>  		new_effective_prio = rt_mutex_get_effective_prio(p, newprio);
> -		if (new_effective_prio == oldprio) {
> -			__setscheduler_params(p, attr);
> -			task_rq_unlock(rq, p, &flags);
> -			return 0;
> -		}
> +		if (new_effective_prio == oldprio)
> +			queue_flags &= ~DEQUEUE_MOVE;
>  	}
>  
>  	queued = task_on_rq_queued(p);
>  	running = task_current(rq, p);
>  	if (queued)
> -		dequeue_task(rq, p, DEQUEUE_SAVE);
> +		dequeue_task(rq, p, queue_flags);
>  	if (running)
>  		put_prev_task(rq, p);
>  
> @@ -4162,15 +4168,14 @@ static int __sched_setscheduler(struct task_struct *p,
>  	if (running)
>  		p->sched_class->set_curr_task(rq);
>  	if (queued) {
> -		int enqueue_flags = ENQUEUE_RESTORE;
>  		/*
>  		 * We enqueue to tail when the priority of a task is
>  		 * increased (user space view).
>  		 */
> -		if (oldprio <= p->prio)
> -			enqueue_flags |= ENQUEUE_HEAD;
> +		if (oldprio < p->prio)
> +			queue_flags |= ENQUEUE_HEAD;
>  
> -		enqueue_task(rq, p, enqueue_flags);
> +		enqueue_task(rq, p, queue_flags);
>  	}
>  
>  	check_class_changed(rq, p, prev_class, oldprio);
> @@ -7958,7 +7963,7 @@ void sched_move_task(struct task_struct *tsk)
>  	queued = task_on_rq_queued(tsk);
>  
>  	if (queued)
> -		dequeue_task(rq, tsk, DEQUEUE_SAVE);
> +		dequeue_task(rq, tsk, DEQUEUE_SAVE | DEQUEUE_MOVE);
>  	if (unlikely(running))
>  		put_prev_task(rq, tsk);
>  
> @@ -7982,7 +7987,7 @@ void sched_move_task(struct task_struct *tsk)
>  	if (unlikely(running))
>  		tsk->sched_class->set_curr_task(rq);
>  	if (queued)
> -		enqueue_task(rq, tsk, ENQUEUE_RESTORE);
> +		enqueue_task(rq, tsk, ENQUEUE_RESTORE | ENQUEUE_MOVE);
>  
>  	task_rq_unlock(rq, tsk, &flags);
>  }
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 8ec86abe0ea1..406a9c20b210 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -436,7 +436,7 @@ static void dequeue_top_rt_rq(struct rt_rq *rt_rq);
>  
>  static inline int on_rt_rq(struct sched_rt_entity *rt_se)
>  {
> -	return !list_empty(&rt_se->run_list);
> +	return rt_se->on_rq;
>  }
>  
>  #ifdef CONFIG_RT_GROUP_SCHED
> @@ -482,8 +482,8 @@ static inline struct rt_rq *group_rt_rq(struct sched_rt_entity *rt_se)
>  	return rt_se->my_q;
>  }
>  
> -static void enqueue_rt_entity(struct sched_rt_entity *rt_se, bool head);
> -static void dequeue_rt_entity(struct sched_rt_entity *rt_se);
> +static void enqueue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flags);
> +static void dequeue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flags);
>  
>  static void sched_rt_rq_enqueue(struct rt_rq *rt_rq)
>  {
> @@ -499,7 +499,7 @@ static void sched_rt_rq_enqueue(struct rt_rq *rt_rq)
>  		if (!rt_se)
>  			enqueue_top_rt_rq(rt_rq);
>  		else if (!on_rt_rq(rt_se))
> -			enqueue_rt_entity(rt_se, false);
> +			enqueue_rt_entity(rt_se, 0);
>  
>  		if (rt_rq->highest_prio.curr < curr->prio)
>  			resched_curr(rq);
> @@ -516,7 +516,7 @@ static void sched_rt_rq_dequeue(struct rt_rq *rt_rq)
>  	if (!rt_se)
>  		dequeue_top_rt_rq(rt_rq);
>  	else if (on_rt_rq(rt_se))
> -		dequeue_rt_entity(rt_se);
> +		dequeue_rt_entity(rt_se, 0);
>  }
>  
>  static inline int rt_rq_throttled(struct rt_rq *rt_rq)
> @@ -1166,7 +1166,30 @@ void dec_rt_tasks(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq)
>  	dec_rt_group(rt_se, rt_rq);
>  }
>  
> -static void __enqueue_rt_entity(struct sched_rt_entity *rt_se, bool head)
> +/*
> + * Change rt_se->run_list location unless SAVE && !MOVE
> + *
> + * assumes ENQUEUE/DEQUEUE flags match
> + */
> +static inline bool move_entity(unsigned int flags)
> +{
> +	if ((flags & (DEQUEUE_SAVE | DEQUEUE_MOVE)) == DEQUEUE_SAVE)
> +		return false;
> +
> +	return true;
> +}
> +
> +static void __delist_rt_entity(struct sched_rt_entity *rt_se, struct rt_prio_array *array)
> +{
> +	list_del_init(&rt_se->run_list);
> +
> +	if (list_empty(array->queue + rt_se_prio(rt_se)))
> +		__clear_bit(rt_se_prio(rt_se), array->bitmap);
> +
> +	rt_se->on_list = 0;
> +}
> +
> +static void __enqueue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flags)
>  {
>  	struct rt_rq *rt_rq = rt_rq_of_se(rt_se);
>  	struct rt_prio_array *array = &rt_rq->active;
> @@ -1179,26 +1202,37 @@ static void __enqueue_rt_entity(struct sched_rt_entity *rt_se, bool head)
>  	 * get throttled and the current group doesn't have any other
>  	 * active members.
>  	 */
> -	if (group_rq && (rt_rq_throttled(group_rq) || !group_rq->rt_nr_running))
> +	if (group_rq && (rt_rq_throttled(group_rq) || !group_rq->rt_nr_running)) {
> +		if (rt_se->on_list)
> +			__delist_rt_entity(rt_se, array);
>  		return;
> +	}
>  
> -	if (head)
> -		list_add(&rt_se->run_list, queue);
> -	else
> -		list_add_tail(&rt_se->run_list, queue);
> -	__set_bit(rt_se_prio(rt_se), array->bitmap);
> +	if (move_entity(flags)) {
> +		WARN_ON_ONCE(rt_se->on_list);
> +		if (flags & ENQUEUE_HEAD)
> +			list_add(&rt_se->run_list, queue);
> +		else
> +			list_add_tail(&rt_se->run_list, queue);
> +
> +		__set_bit(rt_se_prio(rt_se), array->bitmap);
> +		rt_se->on_list = 1;
> +	}
> +	rt_se->on_rq = 1;
>  
>  	inc_rt_tasks(rt_se, rt_rq);
>  }
>  
> -static void __dequeue_rt_entity(struct sched_rt_entity *rt_se)
> +static void __dequeue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flags)
>  {
>  	struct rt_rq *rt_rq = rt_rq_of_se(rt_se);
>  	struct rt_prio_array *array = &rt_rq->active;
>  
> -	list_del_init(&rt_se->run_list);
> -	if (list_empty(array->queue + rt_se_prio(rt_se)))
> -		__clear_bit(rt_se_prio(rt_se), array->bitmap);
> +	if (move_entity(flags)) {
> +		WARN_ON_ONCE(!rt_se->on_list);
> +		__delist_rt_entity(rt_se, array);
> +	}
> +	rt_se->on_rq = 0;
>  
>  	dec_rt_tasks(rt_se, rt_rq);
>  }
> @@ -1207,7 +1241,7 @@ static void __dequeue_rt_entity(struct sched_rt_entity *rt_se)
>   * Because the prio of an upper entry depends on the lower
>   * entries, we must remove entries top - down.
>   */
> -static void dequeue_rt_stack(struct sched_rt_entity *rt_se)
> +static void dequeue_rt_stack(struct sched_rt_entity *rt_se, unsigned int flags)
>  {
>  	struct sched_rt_entity *back = NULL;
>  
> @@ -1220,31 +1254,31 @@ static void dequeue_rt_stack(struct sched_rt_entity *rt_se)
>  
>  	for (rt_se = back; rt_se; rt_se = rt_se->back) {
>  		if (on_rt_rq(rt_se))
> -			__dequeue_rt_entity(rt_se);
> +			__dequeue_rt_entity(rt_se, flags);
>  	}
>  }
>  
> -static void enqueue_rt_entity(struct sched_rt_entity *rt_se, bool head)
> +static void enqueue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flags)
>  {
>  	struct rq *rq = rq_of_rt_se(rt_se);
>  
> -	dequeue_rt_stack(rt_se);
> +	dequeue_rt_stack(rt_se, flags);
>  	for_each_sched_rt_entity(rt_se)
> -		__enqueue_rt_entity(rt_se, head);
> +		__enqueue_rt_entity(rt_se, flags);
>  	enqueue_top_rt_rq(&rq->rt);
>  }
>  
> -static void dequeue_rt_entity(struct sched_rt_entity *rt_se)
> +static void dequeue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flags)
>  {
>  	struct rq *rq = rq_of_rt_se(rt_se);
>  
> -	dequeue_rt_stack(rt_se);
> +	dequeue_rt_stack(rt_se, flags);
>  
>  	for_each_sched_rt_entity(rt_se) {
>  		struct rt_rq *rt_rq = group_rt_rq(rt_se);
>  
>  		if (rt_rq && rt_rq->rt_nr_running)
> -			__enqueue_rt_entity(rt_se, false);
> +			__enqueue_rt_entity(rt_se, flags);
>  	}
>  	enqueue_top_rt_rq(&rq->rt);
>  }
> @@ -1260,7 +1294,7 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p, int flags)
>  	if (flags & ENQUEUE_WAKEUP)
>  		rt_se->timeout = 0;
>  
> -	enqueue_rt_entity(rt_se, flags & ENQUEUE_HEAD);
> +	enqueue_rt_entity(rt_se, flags);
>  
>  	if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
>  		enqueue_pushable_task(rq, p);
> @@ -1271,7 +1305,7 @@ static void dequeue_task_rt(struct rq *rq, struct task_struct *p, int flags)
>  	struct sched_rt_entity *rt_se = &p->rt;
>  
>  	update_curr_rt(rq);
> -	dequeue_rt_entity(rt_se);
> +	dequeue_rt_entity(rt_se, flags);
>  
>  	dequeue_pushable_task(rq, p);
>  }
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 3dc7b8b3f94c..d3606e40ea0d 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1130,18 +1130,40 @@ static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
>  extern const int sched_prio_to_weight[40];
>  extern const u32 sched_prio_to_wmult[40];
>  
> +/*
> + * {de,en}queue flags:
> + *
> + * DEQUEUE_SLEEP  - task is no longer runnable
> + * ENQUEUE_WAKEUP - task just became runnable
> + *
> + * SAVE/RESTORE - an otherwise spurious dequeue/enqueue, done to ensure tasks
> + *                are in a known state which allows modification. Such pairs
> + *                should preserve as much state as possible.
> + *
> + * MOVE - paired with SAVE/RESTORE, explicitly does not preserve the location
> + *        in the runqueue.
> + *
> + * ENQUEUE_HEAD      - place at front of runqueue (tail if not specified)
> + * ENQUEUE_REPLENISH - CBS (replenish runtime and postpone deadline)
> + * ENQUEUE_WAKING    - sched_class::task_waking was called
> + *
> + */
> +
> +#define DEQUEUE_SLEEP		0x01
> +#define DEQUEUE_SAVE		0x02 /* matches ENQUEUE_RESTORE */
> +#define DEQUEUE_MOVE		0x04 /* matches ENQUEUE_MOVE */
> +
>  #define ENQUEUE_WAKEUP		0x01
> -#define ENQUEUE_HEAD		0x02
> +#define ENQUEUE_RESTORE		0x02
> +#define ENQUEUE_MOVE		0x04
> +
> +#define ENQUEUE_HEAD		0x08
> +#define ENQUEUE_REPLENISH	0x10
>  #ifdef CONFIG_SMP
> -#define ENQUEUE_WAKING		0x04	/* sched_class::task_waking was called */
> +#define ENQUEUE_WAKING		0x20
>  #else
>  #define ENQUEUE_WAKING		0x00
>  #endif
> -#define ENQUEUE_REPLENISH	0x08
> -#define ENQUEUE_RESTORE	0x10
> -
> -#define DEQUEUE_SLEEP		0x01
> -#define DEQUEUE_SAVE		0x02
>  
>  #define RETRY_TASK		((void *)-1UL)
>  

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about prio_changed_dl()
  2016-03-02 20:16           ` luca abeni
@ 2016-03-02 20:58             ` Peter Zijlstra
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2016-03-02 20:58 UTC (permalink / raw)
  To: luca abeni; +Cc: Juri Lelli, linux-kernel

On Wed, Mar 02, 2016 at 09:16:57PM +0100, luca abeni wrote:
> > That code no longer exists like that.
> Uh... I must have done something wrong, then (I was convinced I pulled
> from master and rebased my patch before testing and sending it, but
> probably something went wrong).
> Sorry about that.

No my bad, that patch got stuck in my queue for a very long time because
I was running after perf problems. Sorry about that.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about prio_changed_dl()
  2016-03-02 19:02         ` Peter Zijlstra
  2016-03-02 20:16           ` luca abeni
@ 2016-03-04  8:50           ` luca abeni
  1 sibling, 0 replies; 11+ messages in thread
From: luca abeni @ 2016-03-04  8:50 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Juri Lelli, linux-kernel

Hi Peter,

On Wed, 2 Mar 2016 20:02:58 +0100
Peter Zijlstra <peterz@infradead.org> wrote:
[...]
> > +++ b/kernel/sched/core.c
> > @@ -4079,6 +4079,8 @@ change:
> >  		new_effective_prio =
> > rt_mutex_get_effective_prio(p, newprio); if (new_effective_prio ==
> > oldprio) { __setscheduler_params(p, attr);
> > +			if (p->sched_class == &dl_sched_class)
> > +				p->sched_class->prio_changed(rq,
> > p, oldprio); task_rq_unlock(rq, p, &flags);
> >  			return 0;
> >  		}
> 
> That code no longer exists like that.
> 
> Can you see if the below patch (which caused that) also fixes your
> problem?

Some quick tests show that your patch seems to solve this issue too...
So, I am using this one locally instead of my patch.


			Thanks,
				Luca

> ---
> commit ff77e468535987b3d21b7bd4da15608ea3ce7d0b
> Author: Peter Zijlstra <peterz@infradead.org>
> Date:   Mon Jan 18 15:27:07 2016 +0100
> 
>     sched/rt: Fix PI handling vs. sched_setscheduler()
>     
>     Andrea Parri reported:
>     
>     > I found that the following scenario (with
>     > CONFIG_RT_GROUP_SCHED=y) is not handled correctly:
>     >
>     >     T1 (prio = 20)
>     >        lock(rtmutex);
>     >
>     >     T2 (prio = 20)
>     >        blocks on rtmutex  (rt_nr_boosted = 0 on T1's rq)
>     >
>     >     T1 (prio = 20)
>     >        sys_set_scheduler(prio = 0)
>     >           [new_effective_prio == oldprio]
>     >           T1 prio = 20    (rt_nr_boosted = 0 on T1's rq)
>     >
>     > The last step is incorrect as T1 is now boosted (c.f.,
>     > rt_se_boosted()); in particular, if we continue with
>     >
>     >    T1 (prio = 20)
>     >       unlock(rtmutex)
>     >          wakeup(T2)
>     >          adjust_prio(T1)
>     >             [prio != rt_mutex_getprio(T1)]
>     >	    dequeue(T1)
>     >	       rt_nr_boosted = (unsigned long)(-1)
>     >	       ...
>     >             T1 prio = 0
>     >
>     > then we end up leaving rt_nr_boosted in an "inconsistent" state.
>     >
>     > The simple program attached could reproduce the previous
>     > scenario; note that, as a consequence of the presence of this
>     > state, the "assertion"
>     >
>     >     WARN_ON(!rt_nr_running && rt_nr_boosted)
>     >
>     > from dec_rt_group() may trigger.
>     
>     So normally we dequeue/enqueue tasks in sched_setscheduler(),
> which would ensure the accounting stays correct. However in the early
> PI path we fail to do so.
>     
>     So this was introduced at around v3.14, by:
>     
>       c365c292d059 ("sched: Consider pi boosting in setscheduler()")
>     
>     which fixed another problem exactly because that dequeue/enqueue,
> joy. 
>     Fix this by teaching rt about DEQUEUE_SAVE/ENQUEUE_RESTORE and
> have it preserve runqueue location with that option. This requires
> decoupling the on_rt_rq() state from being on the list.
>     
>     In order to allow for explicit movement during the SAVE/RESTORE,
>     introduce {DE,EN}QUEUE_MOVE. We still must use SAVE/RESTORE in
> these cases to preserve other invariants.
>     
>     Respecting the SAVE/RESTORE flags also has the (nice) side-effect
> that things like sys_nice()/sys_sched_setaffinity() also do not
> reorder FIFO tasks (whereas they used to before this patch).
>     
>     Reported-by: Andrea Parri <parri.andrea@gmail.com>
>     Tested-by: Andrea Parri <parri.andrea@gmail.com>
>     Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>     Cc: Juri Lelli <juri.lelli@arm.com>
>     Cc: Linus Torvalds <torvalds@linux-foundation.org>
>     Cc: Mike Galbraith <efault@gmx.de>
>     Cc: Peter Zijlstra <peterz@infradead.org>
>     Cc: Steven Rostedt <rostedt@goodmis.org>
>     Cc: Thomas Gleixner <tglx@linutronix.de>
>     Signed-off-by: Ingo Molnar <mingo@kernel.org>
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index a292c4b7e94c..87e5f9886ac8 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1293,6 +1293,8 @@ struct sched_rt_entity {
>  	unsigned long timeout;
>  	unsigned long watchdog_stamp;
>  	unsigned int time_slice;
> +	unsigned short on_rq;
> +	unsigned short on_list;
>  
>  	struct sched_rt_entity *back;
>  #ifdef CONFIG_RT_GROUP_SCHED
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 291552b4d8ee..bb565b4663c8 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2221,6 +2221,10 @@ static void __sched_fork(unsigned long
> clone_flags, struct task_struct *p) __dl_clear_params(p);
>  
>  	INIT_LIST_HEAD(&p->rt.run_list);
> +	p->rt.timeout		= 0;
> +	p->rt.time_slice	= sched_rr_timeslice;
> +	p->rt.on_rq		= 0;
> +	p->rt.on_list		= 0;
>  
>  #ifdef CONFIG_PREEMPT_NOTIFIERS
>  	INIT_HLIST_HEAD(&p->preempt_notifiers);
> @@ -3531,7 +3535,7 @@ EXPORT_SYMBOL(default_wake_function);
>   */
>  void rt_mutex_setprio(struct task_struct *p, int prio)
>  {
> -	int oldprio, queued, running, enqueue_flag = ENQUEUE_RESTORE;
> +	int oldprio, queued, running, queue_flag = DEQUEUE_SAVE |
> DEQUEUE_MOVE; struct rq *rq;
>  	const struct sched_class *prev_class;
>  
> @@ -3559,11 +3563,15 @@ void rt_mutex_setprio(struct task_struct *p,
> int prio) 
>  	trace_sched_pi_setprio(p, prio);
>  	oldprio = p->prio;
> +
> +	if (oldprio == prio)
> +		queue_flag &= ~DEQUEUE_MOVE;
> +
>  	prev_class = p->sched_class;
>  	queued = task_on_rq_queued(p);
>  	running = task_current(rq, p);
>  	if (queued)
> -		dequeue_task(rq, p, DEQUEUE_SAVE);
> +		dequeue_task(rq, p, queue_flag);
>  	if (running)
>  		put_prev_task(rq, p);
>  
> @@ -3581,7 +3589,7 @@ void rt_mutex_setprio(struct task_struct *p,
> int prio) if (!dl_prio(p->normal_prio) ||
>  		    (pi_task && dl_entity_preempt(&pi_task->dl,
> &p->dl))) { p->dl.dl_boosted = 1;
> -			enqueue_flag |= ENQUEUE_REPLENISH;
> +			queue_flag |= ENQUEUE_REPLENISH;
>  		} else
>  			p->dl.dl_boosted = 0;
>  		p->sched_class = &dl_sched_class;
> @@ -3589,7 +3597,7 @@ void rt_mutex_setprio(struct task_struct *p,
> int prio) if (dl_prio(oldprio))
>  			p->dl.dl_boosted = 0;
>  		if (oldprio < prio)
> -			enqueue_flag |= ENQUEUE_HEAD;
> +			queue_flag |= ENQUEUE_HEAD;
>  		p->sched_class = &rt_sched_class;
>  	} else {
>  		if (dl_prio(oldprio))
> @@ -3604,7 +3612,7 @@ void rt_mutex_setprio(struct task_struct *p,
> int prio) if (running)
>  		p->sched_class->set_curr_task(rq);
>  	if (queued)
> -		enqueue_task(rq, p, enqueue_flag);
> +		enqueue_task(rq, p, queue_flag);
>  
>  	check_class_changed(rq, p, prev_class, oldprio);
>  out_unlock:
> @@ -3960,6 +3968,7 @@ static int __sched_setscheduler(struct
> task_struct *p, const struct sched_class *prev_class;
>  	struct rq *rq;
>  	int reset_on_fork;
> +	int queue_flags = DEQUEUE_SAVE | DEQUEUE_MOVE;
>  
>  	/* may grab non-irq protected spin_locks */
>  	BUG_ON(in_interrupt());
> @@ -4142,17 +4151,14 @@ static int __sched_setscheduler(struct
> task_struct *p,
>  		 * itself.
>  		 */
>  		new_effective_prio = rt_mutex_get_effective_prio(p,
> newprio);
> -		if (new_effective_prio == oldprio) {
> -			__setscheduler_params(p, attr);
> -			task_rq_unlock(rq, p, &flags);
> -			return 0;
> -		}
> +		if (new_effective_prio == oldprio)
> +			queue_flags &= ~DEQUEUE_MOVE;
>  	}
>  
>  	queued = task_on_rq_queued(p);
>  	running = task_current(rq, p);
>  	if (queued)
> -		dequeue_task(rq, p, DEQUEUE_SAVE);
> +		dequeue_task(rq, p, queue_flags);
>  	if (running)
>  		put_prev_task(rq, p);
>  
> @@ -4162,15 +4168,14 @@ static int __sched_setscheduler(struct
> task_struct *p, if (running)
>  		p->sched_class->set_curr_task(rq);
>  	if (queued) {
> -		int enqueue_flags = ENQUEUE_RESTORE;
>  		/*
>  		 * We enqueue to tail when the priority of a task is
>  		 * increased (user space view).
>  		 */
> -		if (oldprio <= p->prio)
> -			enqueue_flags |= ENQUEUE_HEAD;
> +		if (oldprio < p->prio)
> +			queue_flags |= ENQUEUE_HEAD;
>  
> -		enqueue_task(rq, p, enqueue_flags);
> +		enqueue_task(rq, p, queue_flags);
>  	}
>  
>  	check_class_changed(rq, p, prev_class, oldprio);
> @@ -7958,7 +7963,7 @@ void sched_move_task(struct task_struct *tsk)
>  	queued = task_on_rq_queued(tsk);
>  
>  	if (queued)
> -		dequeue_task(rq, tsk, DEQUEUE_SAVE);
> +		dequeue_task(rq, tsk, DEQUEUE_SAVE | DEQUEUE_MOVE);
>  	if (unlikely(running))
>  		put_prev_task(rq, tsk);
>  
> @@ -7982,7 +7987,7 @@ void sched_move_task(struct task_struct *tsk)
>  	if (unlikely(running))
>  		tsk->sched_class->set_curr_task(rq);
>  	if (queued)
> -		enqueue_task(rq, tsk, ENQUEUE_RESTORE);
> +		enqueue_task(rq, tsk, ENQUEUE_RESTORE |
> ENQUEUE_MOVE); 
>  	task_rq_unlock(rq, tsk, &flags);
>  }
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 8ec86abe0ea1..406a9c20b210 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -436,7 +436,7 @@ static void dequeue_top_rt_rq(struct rt_rq
> *rt_rq); 
>  static inline int on_rt_rq(struct sched_rt_entity *rt_se)
>  {
> -	return !list_empty(&rt_se->run_list);
> +	return rt_se->on_rq;
>  }
>  
>  #ifdef CONFIG_RT_GROUP_SCHED
> @@ -482,8 +482,8 @@ static inline struct rt_rq *group_rt_rq(struct
> sched_rt_entity *rt_se) return rt_se->my_q;
>  }
>  
> -static void enqueue_rt_entity(struct sched_rt_entity *rt_se, bool
> head); -static void dequeue_rt_entity(struct sched_rt_entity *rt_se);
> +static void enqueue_rt_entity(struct sched_rt_entity *rt_se,
> unsigned int flags); +static void dequeue_rt_entity(struct
> sched_rt_entity *rt_se, unsigned int flags); 
>  static void sched_rt_rq_enqueue(struct rt_rq *rt_rq)
>  {
> @@ -499,7 +499,7 @@ static void sched_rt_rq_enqueue(struct rt_rq
> *rt_rq) if (!rt_se)
>  			enqueue_top_rt_rq(rt_rq);
>  		else if (!on_rt_rq(rt_se))
> -			enqueue_rt_entity(rt_se, false);
> +			enqueue_rt_entity(rt_se, 0);
>  
>  		if (rt_rq->highest_prio.curr < curr->prio)
>  			resched_curr(rq);
> @@ -516,7 +516,7 @@ static void sched_rt_rq_dequeue(struct rt_rq
> *rt_rq) if (!rt_se)
>  		dequeue_top_rt_rq(rt_rq);
>  	else if (on_rt_rq(rt_se))
> -		dequeue_rt_entity(rt_se);
> +		dequeue_rt_entity(rt_se, 0);
>  }
>  
>  static inline int rt_rq_throttled(struct rt_rq *rt_rq)
> @@ -1166,7 +1166,30 @@ void dec_rt_tasks(struct sched_rt_entity
> *rt_se, struct rt_rq *rt_rq) dec_rt_group(rt_se, rt_rq);
>  }
>  
> -static void __enqueue_rt_entity(struct sched_rt_entity *rt_se, bool
> head) +/*
> + * Change rt_se->run_list location unless SAVE && !MOVE
> + *
> + * assumes ENQUEUE/DEQUEUE flags match
> + */
> +static inline bool move_entity(unsigned int flags)
> +{
> +	if ((flags & (DEQUEUE_SAVE | DEQUEUE_MOVE)) == DEQUEUE_SAVE)
> +		return false;
> +
> +	return true;
> +}
> +
> +static void __delist_rt_entity(struct sched_rt_entity *rt_se, struct
> rt_prio_array *array) +{
> +	list_del_init(&rt_se->run_list);
> +
> +	if (list_empty(array->queue + rt_se_prio(rt_se)))
> +		__clear_bit(rt_se_prio(rt_se), array->bitmap);
> +
> +	rt_se->on_list = 0;
> +}
> +
> +static void __enqueue_rt_entity(struct sched_rt_entity *rt_se,
> unsigned int flags) {
>  	struct rt_rq *rt_rq = rt_rq_of_se(rt_se);
>  	struct rt_prio_array *array = &rt_rq->active;
> @@ -1179,26 +1202,37 @@ static void __enqueue_rt_entity(struct
> sched_rt_entity *rt_se, bool head)
>  	 * get throttled and the current group doesn't have any other
>  	 * active members.
>  	 */
> -	if (group_rq && (rt_rq_throttled(group_rq)
> || !group_rq->rt_nr_running))
> +	if (group_rq && (rt_rq_throttled(group_rq)
> || !group_rq->rt_nr_running)) {
> +		if (rt_se->on_list)
> +			__delist_rt_entity(rt_se, array);
>  		return;
> +	}
>  
> -	if (head)
> -		list_add(&rt_se->run_list, queue);
> -	else
> -		list_add_tail(&rt_se->run_list, queue);
> -	__set_bit(rt_se_prio(rt_se), array->bitmap);
> +	if (move_entity(flags)) {
> +		WARN_ON_ONCE(rt_se->on_list);
> +		if (flags & ENQUEUE_HEAD)
> +			list_add(&rt_se->run_list, queue);
> +		else
> +			list_add_tail(&rt_se->run_list, queue);
> +
> +		__set_bit(rt_se_prio(rt_se), array->bitmap);
> +		rt_se->on_list = 1;
> +	}
> +	rt_se->on_rq = 1;
>  
>  	inc_rt_tasks(rt_se, rt_rq);
>  }
>  
> -static void __dequeue_rt_entity(struct sched_rt_entity *rt_se)
> +static void __dequeue_rt_entity(struct sched_rt_entity *rt_se,
> unsigned int flags) {
>  	struct rt_rq *rt_rq = rt_rq_of_se(rt_se);
>  	struct rt_prio_array *array = &rt_rq->active;
>  
> -	list_del_init(&rt_se->run_list);
> -	if (list_empty(array->queue + rt_se_prio(rt_se)))
> -		__clear_bit(rt_se_prio(rt_se), array->bitmap);
> +	if (move_entity(flags)) {
> +		WARN_ON_ONCE(!rt_se->on_list);
> +		__delist_rt_entity(rt_se, array);
> +	}
> +	rt_se->on_rq = 0;
>  
>  	dec_rt_tasks(rt_se, rt_rq);
>  }
> @@ -1207,7 +1241,7 @@ static void __dequeue_rt_entity(struct
> sched_rt_entity *rt_se)
>   * Because the prio of an upper entry depends on the lower
>   * entries, we must remove entries top - down.
>   */
> -static void dequeue_rt_stack(struct sched_rt_entity *rt_se)
> +static void dequeue_rt_stack(struct sched_rt_entity *rt_se, unsigned
> int flags) {
>  	struct sched_rt_entity *back = NULL;
>  
> @@ -1220,31 +1254,31 @@ static void dequeue_rt_stack(struct
> sched_rt_entity *rt_se) 
>  	for (rt_se = back; rt_se; rt_se = rt_se->back) {
>  		if (on_rt_rq(rt_se))
> -			__dequeue_rt_entity(rt_se);
> +			__dequeue_rt_entity(rt_se, flags);
>  	}
>  }
>  
> -static void enqueue_rt_entity(struct sched_rt_entity *rt_se, bool
> head) +static void enqueue_rt_entity(struct sched_rt_entity *rt_se,
> unsigned int flags) {
>  	struct rq *rq = rq_of_rt_se(rt_se);
>  
> -	dequeue_rt_stack(rt_se);
> +	dequeue_rt_stack(rt_se, flags);
>  	for_each_sched_rt_entity(rt_se)
> -		__enqueue_rt_entity(rt_se, head);
> +		__enqueue_rt_entity(rt_se, flags);
>  	enqueue_top_rt_rq(&rq->rt);
>  }
>  
> -static void dequeue_rt_entity(struct sched_rt_entity *rt_se)
> +static void dequeue_rt_entity(struct sched_rt_entity *rt_se,
> unsigned int flags) {
>  	struct rq *rq = rq_of_rt_se(rt_se);
>  
> -	dequeue_rt_stack(rt_se);
> +	dequeue_rt_stack(rt_se, flags);
>  
>  	for_each_sched_rt_entity(rt_se) {
>  		struct rt_rq *rt_rq = group_rt_rq(rt_se);
>  
>  		if (rt_rq && rt_rq->rt_nr_running)
> -			__enqueue_rt_entity(rt_se, false);
> +			__enqueue_rt_entity(rt_se, flags);
>  	}
>  	enqueue_top_rt_rq(&rq->rt);
>  }
> @@ -1260,7 +1294,7 @@ enqueue_task_rt(struct rq *rq, struct
> task_struct *p, int flags) if (flags & ENQUEUE_WAKEUP)
>  		rt_se->timeout = 0;
>  
> -	enqueue_rt_entity(rt_se, flags & ENQUEUE_HEAD);
> +	enqueue_rt_entity(rt_se, flags);
>  
>  	if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
>  		enqueue_pushable_task(rq, p);
> @@ -1271,7 +1305,7 @@ static void dequeue_task_rt(struct rq *rq,
> struct task_struct *p, int flags) struct sched_rt_entity *rt_se =
> &p->rt; 
>  	update_curr_rt(rq);
> -	dequeue_rt_entity(rt_se);
> +	dequeue_rt_entity(rt_se, flags);
>  
>  	dequeue_pushable_task(rq, p);
>  }
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 3dc7b8b3f94c..d3606e40ea0d 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1130,18 +1130,40 @@ static inline void finish_lock_switch(struct
> rq *rq, struct task_struct *prev) extern const int
> sched_prio_to_weight[40]; extern const u32 sched_prio_to_wmult[40];
>  
> +/*
> + * {de,en}queue flags:
> + *
> + * DEQUEUE_SLEEP  - task is no longer runnable
> + * ENQUEUE_WAKEUP - task just became runnable
> + *
> + * SAVE/RESTORE - an otherwise spurious dequeue/enqueue, done to
> ensure tasks
> + *                are in a known state which allows modification.
> Such pairs
> + *                should preserve as much state as possible.
> + *
> + * MOVE - paired with SAVE/RESTORE, explicitly does not preserve the
> location
> + *        in the runqueue.
> + *
> + * ENQUEUE_HEAD      - place at front of runqueue (tail if not
> specified)
> + * ENQUEUE_REPLENISH - CBS (replenish runtime and postpone deadline)
> + * ENQUEUE_WAKING    - sched_class::task_waking was called
> + *
> + */
> +
> +#define DEQUEUE_SLEEP		0x01
> +#define DEQUEUE_SAVE		0x02 /* matches ENQUEUE_RESTORE
> */ +#define DEQUEUE_MOVE		0x04 /* matches ENQUEUE_MOVE
> */ +
>  #define ENQUEUE_WAKEUP		0x01
> -#define ENQUEUE_HEAD		0x02
> +#define ENQUEUE_RESTORE		0x02
> +#define ENQUEUE_MOVE		0x04
> +
> +#define ENQUEUE_HEAD		0x08
> +#define ENQUEUE_REPLENISH	0x10
>  #ifdef CONFIG_SMP
> -#define ENQUEUE_WAKING		0x04	/*
> sched_class::task_waking was called */ +#define
> ENQUEUE_WAKING		0x20 #else
>  #define ENQUEUE_WAKING		0x00
>  #endif
> -#define ENQUEUE_REPLENISH	0x08
> -#define ENQUEUE_RESTORE	0x10
> -
> -#define DEQUEUE_SLEEP		0x01
> -#define DEQUEUE_SAVE		0x02
>  
>  #define RETRY_TASK		((void *)-1UL)
>  

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-03-04  8:51 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-19 12:43 Question about prio_changed_dl() luca abeni
2016-02-25 14:01 ` Peter Zijlstra
2016-02-25 14:16   ` Juri Lelli
2016-02-25 14:25   ` luca abeni
2016-02-25 14:52     ` Peter Zijlstra
2016-02-27 11:37       ` luca abeni
2016-03-02 19:02         ` Peter Zijlstra
2016-03-02 20:16           ` luca abeni
2016-03-02 20:58             ` Peter Zijlstra
2016-03-04  8:50           ` luca abeni
2016-02-29 11:19   ` [tip:sched/core] sched/deadline: Remove superfluous call to switched_to_dl() tip-bot for Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.