Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
@ 2012-04-23 12:31 cucinotta
  0 siblings, 0 replies; 42+ messages in thread
From: cucinotta @ 2012-04-23 12:31 UTC (permalink / raw)
  To: Peter Zijlstra, Juri Lelli
  Cc: tglx, mingo, rostedt, cfriesen, oleg, fweisbec, darren,
	johan.eker, p.faure, linux-kernel, claudio, michael,
	Fabio Checconi, Tommaso Cucinotta, nicola.manica, luca.abeni,
	dhaval.giani, hgu1972, paulmck, Dario Faggioli, insop.song,
	liming.wang

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="Windows-1252", Size: 2278 bytes --]

How would it be to add an unlikely() to the check of the while () condition ?
Guess the point of subtractions vs division and multiplications was that the former was supposed to be faster than the latter, especially in view of the fact that the while () loop body is likely to be executed just once, if at all.
However, removing a while () loop from the scheduler path doesn't seem a bad idea either.

  T.
------Original Message------
From: Peter Zijlstra
To: Juri Lelli
Cc: tglx@linutronix.de
Cc: mingo@redhat.com
Cc: rostedt@goodmis.org
Cc: cfriesen@nortel.com
Cc: oleg@redhat.com
Cc: fweisbec@gmail.com
Cc: darren@dvhart.com
Cc: johan.eker@ericsson.com
Cc: p.faure@akatech.ch
Cc: linux-kernel@vger.kernel.org
Cc: claudio@evidence.eu.com
Cc: michael@amarulasolutions.com
Cc: Fabio Checconi
Cc: Tommaso Cucinotta
Cc: nicola.manica@disi.unitn.it
Cc: luca.abeni@unitn.it
Cc: dhaval.giani@gmail.com
Cc: hgu1972@gmail.com
Cc: paulmck@linux.vnet.ibm.com
Cc: Dario Faggioli
Cc: insop.song@ericsson.com
Cc: liming.wang@windriver.com
Subject: Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
Sent: 23 Apr 2012 13:22

On Mon, 2012-04-23 at 14:13 +0200, Juri Lelli wrote:
> On 04/23/2012 01:32 PM, Peter Zijlstra wrote:
> > On Fri, 2012-04-06 at 09:14 +0200, Juri Lelli wrote:
> >> +       /*
> >> +        * We Keep moving the deadline away until we get some
> >> +        * available runtime for the entity. This ensures correct
> >> +        * handling of situations where the runtime overrun is
> >> +        * arbitrary large.
> >> +        */
> >> +       while (dl_se->runtime<= 0) {
> >> +               dl_se->deadline += dl_se->dl_deadline;
> >> +               dl_se->runtime += dl_se->dl_runtime;
> >> +       }
> >
> > Does gcc 'optimize' that into a division? If so, it might need special
> > glue to make it not do that.
> 
> I got two adds and a jle, no div here..

Gcc is known to change such loops to something like:

 if (runtime <= 0) {
   tmp = 1 - runtime / dl_runtime;
   deadline += tmp * dl_deadline;
   runtime += tmp * dl_runtime;
 }



Sent from my BlackBerry® smartphone from eMobileÿôèº{.nÇ+‰·Ÿ®‰†+%ŠËÿ±éÝ¶\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dÊ‡Ú™ë,j\a¢f£¢·hšïêÿ‘êçz_è®\x03(éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨èÚ&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-23 15:43       ` Peter Zijlstra
  2012-04-23 16:41         ` Juri Lelli
@ 2012-05-15 10:10         ` Juri Lelli
  1 sibling, 0 replies; 42+ messages in thread
From: Juri Lelli @ 2012-05-15 10:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, rostedt, cfriesen, oleg, fweisbec, darren,
	johan.eker, p.faure, linux-kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, raistlin, insop.song, liming.wang

On 04/23/2012 05:43 PM, Peter Zijlstra wrote:
> On Mon, 2012-04-23 at 17:39 +0200, Juri Lelli wrote:
>> On 04/23/2012 04:35 PM, Peter Zijlstra wrote:
>>> On Fri, 2012-04-06 at 09:14 +0200, Juri Lelli wrote:
>>>> +static void init_dl_task_timer(struct sched_dl_entity *dl_se)
>>>> +{
>>>> +       struct hrtimer *timer =&dl_se->dl_timer;
>>>> +
>>>> +       if (hrtimer_active(timer)) {
>>>> +               hrtimer_try_to_cancel(timer);
>>>> +               return;
>>>> +       }
>>>
>>> Same question I guess, how can it be active here? Also, just letting it
>>> run doesn't seem like the best way out..
>>>
>>
>> Probably s/hrtimer_try_to_cancel/hrtimer_cancel is better.
>
> Yeah, not sure you can do hrtimer_cancel() there though, you're holding
> ->pi_lock and rq->lock and have IRQs disabled. That sounds like asking
> for trouble.
>
> Anyway, if it can't happen, we don't have to fix it.. so lets answer
> that first ;-)

Even if I dropped the bits for allowing !root users, this critical point
still remains.
What if I leave this like it is and instead I do the following?

@@ -488,9 +488,10 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
         /*
          * We need to take care of a possible races here. In fact, the
          * task might have changed its scheduling policy to something
-        * different from SCHED_DEADLINE (through sched_setscheduler()).
+        * different from SCHED_DEADLINE or changed its reservation
+        * parameters (through sched_{setscheduler(),setscheduler2()}).
          */
-       if (!dl_task(p))
+       if (!dl_task(p) || dl_se->dl_new)
                 goto unlock;
  
         dl_se->dl_throttled = 0;

The idea is that hrtimer_try_to_cancel should fail only if the callback routine
is running. If, meanwhile, I set up new parameters, I can try to recognize this
situation through dl_new (set to 1 during __setparam_dl).

BTW, I'd have a new version ready (also rebased on the current tip/master). It
address all the comments excluding your gcc work-around, math128 and
nr_cpus_allowed shift (patches are ready but those changes not yet mainline,
right?). Anyway, do you think would be fine to post it?

Thanks and Regards,

- Juri

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-23 23:21         ` Tommaso Cucinotta
@ 2012-04-24  9:50           ` Peter Zijlstra
  0 siblings, 0 replies; 42+ messages in thread
From: Peter Zijlstra @ 2012-04-24  9:50 UTC (permalink / raw)
  To: Tommaso Cucinotta
  Cc: Juri Lelli, tglx, mingo, rostedt, cfriesen, oleg, fweisbec,
	darren, johan.eker, p.faure, linux-kernel, claudio, michael,
	fchecconi, nicola.manica, luca.abeni, dhaval.giani, hgu1972,
	paulmck, raistlin, insop.song, liming.wang

On Tue, 2012-04-24 at 00:21 +0100, Tommaso Cucinotta wrote:
> > Yes I can do it for x86_64, but people tend to get mighty upset if you
> > break the compile for all other arches...
> 
> rather than breaking compile, I was thinking more of using the 
> optimization for a more accurate comparison on archs that have 64-bit 
> mul and 128-bit cmp, and leaving the overflow on other archs. Though, 
> that would imply a difference in behavior on those borderline cases 
> (very big periods I guess).
> 
> However, I'm also puzzled from what would happen by compiling the 
> current code on mostly 16-bit micros which have very limited 32-bit 
> operations... 

We don't support 16bit archs, 32bit is almost useless as it is :-)

Anyway, how about something like this, I guess archs can go wild and add
asm/math128.h if they want etc..

Completely untested, hasn't even seen a compiler..

---
Subject: math128: Add {add,mult,cmp}_u128
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Tue Apr 24 11:47:12 CEST 2012


Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 include/linux/math128.h |   75 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 75 insertions(+)

--- /dev/null
+++ b/include/linux/math128.h
@@ -0,0 +1,75 @@
+#ifndef _LINUX_MATH128_H
+#define _LINUX_MATH128_H
+
+#include <linux/types.h>
+
+typedef struct {
+	u64 hi, lo;
+} u128;
+
+u128 add_u128(u128 a, u128 b)
+{
+	u128 res;
+
+	res.hi = a.hi + b.hi;
+	res.lo = a.lo + b.lo;
+
+	if (res.lo < a.lo || res.lo < b.lo)
+		res.hi++;
+
+	return res;
+}
+
+/*
+ * a * b = (ah * 2^32 + al) * (bh * 2^32 + bl) =
+ *   ah*bh * 2^64 + (ah*bl + bh*al) * 2^32 + al*bl
+ */
+u128 mult_u128(u64 a, u64 b)
+{
+	u128 res;
+	u64 ah, al;
+	u64 bh, bl;
+	u128 t1, t2, t3, t4;
+
+	ah = a >> 32;
+	al = a & ((1ULL << 32) - 1);
+
+	bh = b >> 32;
+	bl = b & ((1ULL << 32) - 1);
+
+	t1.lo = 0;
+	t1.hi = ah * bh;
+
+	t2.lo = ah * bl;
+	t2.hi = t2.lo >> 32;
+	t2.lo <<= 32;
+
+	t3.lo = al * bh;
+	t3.hi = t3.lo >> 32;
+	t3.lo <<= 32;
+
+	t4.lo = al * bl;
+	t4.hi = 0;
+
+	res = add_u128(t1, t2);
+	res = add_u128(res, t3);
+	res = add_u128(res, t4);
+
+	return res;
+}
+
+int cmp_u128(u128 a, u128 b)
+{
+	if (a.hi > b.hi)
+		return 1;
+	if (a.hi < b.hi)
+		return -1;
+	if (a.lo > b.lo)
+		return 1;
+	if (a.lo < b.lo)
+		return -1;
+
+	return 0;
+}
+
+#endif /* _LINUX_MATH128_H */


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-24  7:21             ` Juri Lelli
@ 2012-04-24  9:00               ` Peter Zijlstra
  0 siblings, 0 replies; 42+ messages in thread
From: Peter Zijlstra @ 2012-04-24  9:00 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Tommaso Cucinotta, tglx, mingo, rostedt, cfriesen, oleg,
	fweisbec, darren, johan.eker, p.faure, linux-kernel, claudio,
	michael, fchecconi, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, raistlin, insop.song, liming.wang

On Tue, 2012-04-24 at 09:21 +0200, Juri Lelli wrote:
> Well, depends on how much effort will this turn to require. I personally
> would prefer to be able to come out with a new release ASAP. Just to
> continue the discussion with the most of the comments addressed and a
> more updated code (I also have a mainline version of the patchset
> quite ready). 

Right, one thing we can initially do is require root for using
SCHED_DEADLINE and then when later work closes all the holes and we've
added user bandwidth controls we can allow everybody in.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
       [not found]           ` <4F95D41F.5060700@sssup.it>
@ 2012-04-24  7:21             ` Juri Lelli
  2012-04-24  9:00               ` Peter Zijlstra
  0 siblings, 1 reply; 42+ messages in thread
From: Juri Lelli @ 2012-04-24  7:21 UTC (permalink / raw)
  To: Tommaso Cucinotta
  Cc: Peter Zijlstra, tglx, mingo, rostedt, cfriesen, oleg, fweisbec,
	darren, johan.eker, p.faure, linux-kernel, claudio, michael,
	fchecconi, nicola.manica, luca.abeni, dhaval.giani, hgu1972,
	paulmck, raistlin, insop.song, liming.wang

On 04/24/2012 12:13 AM, Tommaso Cucinotta wrote:
> Il 23/04/2012 17:41, Juri Lelli ha scritto:
>> The user could call __setparam_dl on a throttled task through
>> __sched_setscheduler.
>
> in case it can be related: a scenario that used to break isolation
> (in the old aquosa crap): 1) create a deadline task 2) (actively)
> wait till it's just about to be throttled 3) remove reservation
> (i.e., return the task to the normal system policy and destroy
> reservation info in the kernel) 4) reserve it again
>

Yes, this is very similar to what I thought just after I've sent the
email (ouch! :-)).
  
> Assuming the borderline condition of a nearly fully saturated system,
> if 3)-4) manage to happen sufficiently close to each other and right
> after 2), now the task budget is refilled with a deadline which is
> where it should not be, according to the admission control rules. In
> other words, we may break guarantees of other tasks by a properly
> misbehaving task. Something relevant when considering misbehaviour
> and admission control from a security perspective [1].
>

Thanks for the ref., I'll read it!

> At that time, I was persuaded that the right way to avoid this would
> be to avoid to free system cpu bw immediately when a reservation is
> destroyed, but rather wait till its current abs deadline, then "free"
> the bandwidth. A new task trying to re-create the reservation too
> early, i.e., at step 4) above, would be rejected by the system as it
> would still see a fully occupied cpu bw. Never implemented of course
> :-)...
>

A kind of "two steps" approach. It would work, I just have to think how
to implement it (and let the system survive ;-)). Then create some
bench to test it.

> And also, from a security perspective, a misbehaving (sched_other)
> task might thrash the system with useless nansosleeps forcing the OS
> to continuously schedule/deschedule it. Equivalently, with a deadline
> scheduler, you could try to set a very small period/deadline. That's
> why in [1], among the configurable variables, there was a minimum
> allowed reservation period.
>

Yes, this should be easily controlled at admission time.

> Nothing really urgent, just something you might want to keep in mind
> for the future, I thought.
>

Well, depends on how much effort will this turn to require. I personally
would prefer to be able to come out with a new release ASAP. Just to
continue the discussion with the most of the comments addressed and a
more updated code (I also have a mainline version of the patchset
quite ready).

Thanks a lot,

- Juri


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-24  6:29             ` Dario Faggioli
@ 2012-04-24  6:52               ` Juri Lelli
  0 siblings, 0 replies; 42+ messages in thread
From: Juri Lelli @ 2012-04-24  6:52 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Tommaso Cucinotta, Peter Zijlstra, tglx, mingo, rostedt,
	cfriesen, oleg, fweisbec, darren, johan.eker, p.faure,
	linux-kernel, claudio, michael, fchecconi, nicola.manica,
	luca.abeni, dhaval.giani, hgu1972, paulmck, insop.song,
	liming.wang

On 04/24/2012 08:29 AM, Dario Faggioli wrote:
> On Tue, 2012-04-24 at 00:25 +0100, Tommaso Cucinotta wrote:
>>> The idea is that ->clock_task gives the time as observed by schedulable
>>> tasks and excludes other muck.
>>
>> so clock_task might be better to compute the consumed budget at task
>> deschedule, but for setting deadlines one period ahead in the future
>> guess the regular wall-time rq->clock is the one to be used?
>>
> Yep, that was the idea, unless my recollection has completely gone
> flaky! :-P
>
> Perhaps adding a comment saying right this thing above, as Peter
> suggested?
>

Sure! TODO for the next release :-).

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-23 23:25           ` Tommaso Cucinotta
@ 2012-04-24  6:29             ` Dario Faggioli
  2012-04-24  6:52               ` Juri Lelli
  0 siblings, 1 reply; 42+ messages in thread
From: Dario Faggioli @ 2012-04-24  6:29 UTC (permalink / raw)
  To: Tommaso Cucinotta
  Cc: Peter Zijlstra, Juri Lelli, tglx, mingo, rostedt, cfriesen, oleg,
	fweisbec, darren, johan.eker, p.faure, linux-kernel, claudio,
	michael, fchecconi, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, insop.song, liming.wang

[-- Attachment #1: Type: text/plain, Size: 845 bytes --]

On Tue, 2012-04-24 at 00:25 +0100, Tommaso Cucinotta wrote: 
> > The idea is that ->clock_task gives the time as observed by schedulable
> > tasks and excludes other muck.
> 
> so clock_task might be better to compute the consumed budget at task 
> deschedule, but for setting deadlines one period ahead in the future 
> guess the regular wall-time rq->clock is the one to be used?
> 
Yep, that was the idea, unless my recollection has completely gone
flaky! :-P

Perhaps adding a comment saying right this thing above, as Peter
suggested?

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-23 21:58       ` Peter Zijlstra
  2012-04-23 23:21         ` Tommaso Cucinotta
@ 2012-04-24  1:03         ` Steven Rostedt
  1 sibling, 0 replies; 42+ messages in thread
From: Steven Rostedt @ 2012-04-24  1:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tommaso Cucinotta, Juri Lelli, tglx, mingo, cfriesen, oleg,
	fweisbec, darren, johan.eker, p.faure, linux-kernel, claudio,
	michael, fchecconi, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, raistlin, insop.song, liming.wang

On Mon, 2012-04-23 at 23:58 +0200, Peter Zijlstra wrote:
> On Mon, 2012-04-23 at 22:55 +0100, Tommaso Cucinotta wrote:
> > why not write this straight in asm, i.e., multiply 64*64 then divide by 
> > 64 keeping the intermediate result on 128 bits? 
> 
> If you know of a way to do this for all 30 odd architectures supported
> by our beloved kernel, do let me know ;-)
> 
> Yes I can do it for x86_64, but people tend to get mighty upset if you
> break the compile for all other arches...

Use the draconian method. Make SCHED_DEADLINE dependent on
"ARCH_HAS_128_MULT" and any arch maintainer that wants SCHED_DEADLINE
for their arch will be responsible for implementing it ;-)

-- Steve



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-23 21:45         ` Peter Zijlstra
@ 2012-04-23 23:25           ` Tommaso Cucinotta
  2012-04-24  6:29             ` Dario Faggioli
  0 siblings, 1 reply; 42+ messages in thread
From: Tommaso Cucinotta @ 2012-04-23 23:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, tglx, mingo, rostedt, cfriesen, oleg, fweisbec,
	darren, johan.eker, p.faure, linux-kernel, claudio, michael,
	fchecconi, nicola.manica, luca.abeni, dhaval.giani, hgu1972,
	paulmck, raistlin, insop.song, liming.wang

Il 23/04/2012 22:45, Peter Zijlstra ha scritto:
> On Mon, 2012-04-23 at 22:25 +0100, Tommaso Cucinotta wrote:
>> I cannot get the real difference between rq->clock and rq->task_clock.
> One runs at wall-time (rq->clock) the other excludes time in irq-context
> and steal-time (rq->clock_task).
>
> The idea is that ->clock_task gives the time as observed by schedulable
> tasks and excludes other muck.

so clock_task might be better to compute the consumed budget at task 
deschedule, but for setting deadlines one period ahead in the future 
guess the regular wall-time rq->clock is the one to be used?

Thx,

     T.

-- 
Tommaso Cucinotta, Computer Engineering PhD, Researcher
ReTiS Lab, Scuola Superiore Sant'Anna, Pisa, Italy
Tel +39 050 882 024, Fax +39 050 882 003
http://retis.sssup.it/people/tommaso


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-23 21:58       ` Peter Zijlstra
@ 2012-04-23 23:21         ` Tommaso Cucinotta
  2012-04-24  9:50           ` Peter Zijlstra
  2012-04-24  1:03         ` Steven Rostedt
  1 sibling, 1 reply; 42+ messages in thread
From: Tommaso Cucinotta @ 2012-04-23 23:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, tglx, mingo, rostedt, cfriesen, oleg, fweisbec,
	darren, johan.eker, p.faure, linux-kernel, claudio, michael,
	fchecconi, nicola.manica, luca.abeni, dhaval.giani, hgu1972,
	paulmck, raistlin, insop.song, liming.wang

Il 23/04/2012 22:58, Peter Zijlstra ha scritto:
> On Mon, 2012-04-23 at 22:55 +0100, Tommaso Cucinotta wrote:
>> why not write this straight in asm, i.e., multiply 64*64 then divide by
>> 64 keeping the intermediate result on 128 bits?
> If you know of a way to do this for all 30 odd architectures supported
> by our beloved kernel, do let me know ;-)

:-)
> Yes I can do it for x86_64, but people tend to get mighty upset if you
> break the compile for all other arches...

rather than breaking compile, I was thinking more of using the 
optimization for a more accurate comparison on archs that have 64-bit 
mul and 128-bit cmp, and leaving the overflow on other archs. Though, 
that would imply a difference in behavior on those borderline cases 
(very big periods I guess).

However, I'm also puzzled from what would happen by compiling the 
current code on mostly 16-bit micros which have very limited 32-bit 
operations...

     T.

-- 
Tommaso Cucinotta, Computer Engineering PhD, Researcher
ReTiS Lab, Scuola Superiore Sant'Anna, Pisa, Italy
Tel +39 050 882 024, Fax +39 050 882 003
http://retis.sssup.it/people/tommaso


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-23 21:55     ` Tommaso Cucinotta
@ 2012-04-23 21:58       ` Peter Zijlstra
  2012-04-23 23:21         ` Tommaso Cucinotta
  2012-04-24  1:03         ` Steven Rostedt
  0 siblings, 2 replies; 42+ messages in thread
From: Peter Zijlstra @ 2012-04-23 21:58 UTC (permalink / raw)
  To: Tommaso Cucinotta
  Cc: Juri Lelli, tglx, mingo, rostedt, cfriesen, oleg, fweisbec,
	darren, johan.eker, p.faure, linux-kernel, claudio, michael,
	fchecconi, nicola.manica, luca.abeni, dhaval.giani, hgu1972,
	paulmck, raistlin, insop.song, liming.wang

On Mon, 2012-04-23 at 22:55 +0100, Tommaso Cucinotta wrote:
> why not write this straight in asm, i.e., multiply 64*64 then divide by 
> 64 keeping the intermediate result on 128 bits? 

If you know of a way to do this for all 30 odd architectures supported
by our beloved kernel, do let me know ;-)

Yes I can do it for x86_64, but people tend to get mighty upset if you
break the compile for all other arches...

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-23 11:55   ` Peter Zijlstra
  2012-04-23 14:43     ` Juri Lelli
@ 2012-04-23 21:55     ` Tommaso Cucinotta
  2012-04-23 21:58       ` Peter Zijlstra
  1 sibling, 1 reply; 42+ messages in thread
From: Tommaso Cucinotta @ 2012-04-23 21:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, tglx, mingo, rostedt, cfriesen, oleg, fweisbec,
	darren, johan.eker, p.faure, linux-kernel, claudio, michael,
	fchecconi, nicola.manica, luca.abeni, dhaval.giani, hgu1972,
	paulmck, raistlin, insop.song, liming.wang

Il 23/04/2012 12:55, Peter Zijlstra ha scritto:
> On Fri, 2012-04-06 at 09:14 +0200, Juri Lelli wrote:
>> +/*
>> + * Here we check if --at time t-- an entity (which is probably being
>> + * [re]activated or, in general, enqueued) can use its remaining runtime
>> + * and its current deadline _without_ exceeding the bandwidth it is
>> + * assigned (function returns true if it can).
>> + *
>> + * For this to hold, we must check if:
>> + *   runtime / (deadline - t)<  dl_runtime / dl_deadline .
> It might be good to put a few words in as to why that is.. I know I
> always forget (but know where to find it by now), also might be good to
> refer those papers Tommaso listed when Steven asked this a while back.
>
>> + */
>> +static bool dl_entity_overflow(struct sched_dl_entity *dl_se, u64 t)
>> +{
>> +       u64 left, right;
>> +
>> +       /*
>> +        * left and right are the two sides of the equation above,
>> +        * after a bit of shuffling to use multiplications instead
>> +        * of divisions.
>> +        *
>> +        * Note that none of the time values involved in the two
>> +        * multiplications are absolute: dl_deadline and dl_runtime
>> +        * are the relative deadline and the maximum runtime of each
>> +        * instance, runtime is the runtime left for the last instance
>> +        * and (deadline - t), since t is rq->clock, is the time left
>> +        * to the (absolute) deadline. Therefore, overflowing the u64
>> +        * type is very unlikely to occur in both cases.
>> +        */
>> +       left = dl_se->dl_deadline * dl_se->runtime;
>> +       right = (dl_se->deadline - t) * dl_se->dl_runtime;
>
>  From what I can see there are no constraints on the values in
> __setparam_dl() so the above left term can be constructed to be an
> overflow.
>
> Ideally we'd use u128 here, but I don't think people will let us :/

why not write this straight in asm, i.e., multiply 64*64 then divide by 
64 keeping the intermediate result on 128 bits?
Something straightforward to write in asm, but not that easy to let gcc 
understand that I don't want to multiply 128*128 :-).... a few years ago 
I had a similar issue; perhaps it was a 32/64 version of this problem, 
and gcc was not optimizing properly the C code with -O3, so I had used 
asm segments.
In this case, if avoiding the division is a major requirement, then we 
could multiply twice 64*64 in asm, then compare the two results on 128 
bits ? Again, a few assembly lines on architectures supporting the 64*64 
and 128-bits comparison.

     T.

-- 
Tommaso Cucinotta, Computer Engineering PhD, Researcher
ReTiS Lab, Scuola Superiore Sant'Anna, Pisa, Italy
Tel +39 050 882 024, Fax +39 050 882 003
http://retis.sssup.it/people/tommaso


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-23 21:25       ` Tommaso Cucinotta
@ 2012-04-23 21:45         ` Peter Zijlstra
  2012-04-23 23:25           ` Tommaso Cucinotta
  0 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2012-04-23 21:45 UTC (permalink / raw)
  To: Tommaso Cucinotta
  Cc: Juri Lelli, tglx, mingo, rostedt, cfriesen, oleg, fweisbec,
	darren, johan.eker, p.faure, linux-kernel, claudio, michael,
	fchecconi, nicola.manica, luca.abeni, dhaval.giani, hgu1972,
	paulmck, raistlin, insop.song, liming.wang

On Mon, 2012-04-23 at 22:25 +0100, Tommaso Cucinotta wrote:
> I cannot get the real difference between rq->clock and rq->task_clock. 

One runs at wall-time (rq->clock) the other excludes time in irq-context
and steal-time (rq->clock_task).

The idea is that ->clock_task gives the time as observed by schedulable
tasks and excludes other muck.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-23 10:37     ` Juri Lelli
@ 2012-04-23 21:25       ` Tommaso Cucinotta
  2012-04-23 21:45         ` Peter Zijlstra
  0 siblings, 1 reply; 42+ messages in thread
From: Tommaso Cucinotta @ 2012-04-23 21:25 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Peter Zijlstra, tglx, mingo, rostedt, cfriesen, oleg, fweisbec,
	darren, johan.eker, p.faure, linux-kernel, claudio, michael,
	fchecconi, nicola.manica, luca.abeni, dhaval.giani, hgu1972,
	paulmck, raistlin, insop.song, liming.wang

Il 23/04/2012 11:37, Juri Lelli ha scritto:
> On 04/23/2012 12:31 PM, Peter Zijlstra wrote:
>> On Fri, 2012-04-06 at 09:14 +0200, Juri Lelli wrote:
>>> +       dl_se->deadline = rq->clock + dl_se->dl_deadline;
>>
>> You might want to use rq->clock_task, this clock excludes times spend in
>> hardirq context and steal-time (when paravirt).
>>
>> Then again, it might not want to use that.. but its something you might
>> want to consider and make explicit by means of a comment.
>
> Yes, I planned a consistency check for the use of clock/clock_task
> throughout the code, but it seems I then forgot it.
> Planned for the next iteration :-).

unless I'm mistaken, there are 3 repetitions of this block in 05/16:

+		dl_se->deadline = rq->clock + dl_se->dl_deadline;
+		dl_se->runtime = dl_se->dl_runtime;


perhaps enclosing them into a function (e.g., reset_from_now() or 
similar) may help to keep consistency...

Another thing: I cannot get the real difference between rq->clock and 
rq->task_clock.
If task_clock is a kind of CLOCK_MONOTONIC thing that increases only 
when the task (or any task) is scheduled, then you don't want to use 
that here.
Here you need to set the new ->deadline to an absolute time, so I guess 
the regular rq->clock is what you need, isn't it ?

Hope I didn't say too much nonsense.

     T.

-- 
Tommaso Cucinotta, Computer Engineering PhD, Researcher
ReTiS Lab, Scuola Superiore Sant'Anna, Pisa, Italy
Tel +39 050 882 024, Fax +39 050 882 003
http://retis.sssup.it/people/tommaso


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-23 15:43       ` Peter Zijlstra
@ 2012-04-23 16:41         ` Juri Lelli
       [not found]           ` <4F95D41F.5060700@sssup.it>
  2012-05-15 10:10         ` Juri Lelli
  1 sibling, 1 reply; 42+ messages in thread
From: Juri Lelli @ 2012-04-23 16:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, rostedt, cfriesen, oleg, fweisbec, darren,
	johan.eker, p.faure, linux-kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, raistlin, insop.song, liming.wang

On 04/23/2012 05:43 PM, Peter Zijlstra wrote:
> On Mon, 2012-04-23 at 17:39 +0200, Juri Lelli wrote:
>> On 04/23/2012 04:35 PM, Peter Zijlstra wrote:
>>> On Fri, 2012-04-06 at 09:14 +0200, Juri Lelli wrote:
>>>> +static void init_dl_task_timer(struct sched_dl_entity *dl_se)
>>>> +{
>>>> +       struct hrtimer *timer =&dl_se->dl_timer;
>>>> +
>>>> +       if (hrtimer_active(timer)) {
>>>> +               hrtimer_try_to_cancel(timer);
>>>> +               return;
>>>> +       }
>>>
>>> Same question I guess, how can it be active here? Also, just letting it
>>> run doesn't seem like the best way out..
>>>
>>
>> Probably s/hrtimer_try_to_cancel/hrtimer_cancel is better.
>
> Yeah, not sure you can do hrtimer_cancel() there though, you're holding
> ->pi_lock and rq->lock and have IRQs disabled. That sounds like asking
> for trouble.
>
> Anyway, if it can't happen, we don't have to fix it.. so lets answer
> that first ;-)

The user could call __setparam_dl on a throttled task through
__sched_setscheduler.

BTW, I noticed that we should change this (inside __sched_setscheduler):

         /*
          * If not changing anything there's no need to proceed further
          */
         if (unlikely(policy == p->policy && (!rt_policy(policy) ||
                         param->sched_priority == p->rt_priority))) {

                 __task_rq_unlock(rq);
                 raw_spin_unlock_irqrestore(&p->pi_lock, flags);
                 return 0;
         }

to something like this:

	if (unlikely(policy == p->policy && (!rt_policy(policy) ||
                         param->sched_priority == p->rt_priority) &&
			!dl_policy(policy)))

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-23 15:39     ` Juri Lelli
@ 2012-04-23 15:43       ` Peter Zijlstra
  2012-04-23 16:41         ` Juri Lelli
  2012-05-15 10:10         ` Juri Lelli
  0 siblings, 2 replies; 42+ messages in thread
From: Peter Zijlstra @ 2012-04-23 15:43 UTC (permalink / raw)
  To: Juri Lelli
  Cc: tglx, mingo, rostedt, cfriesen, oleg, fweisbec, darren,
	johan.eker, p.faure, linux-kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, raistlin, insop.song, liming.wang

On Mon, 2012-04-23 at 17:39 +0200, Juri Lelli wrote:
> On 04/23/2012 04:35 PM, Peter Zijlstra wrote:
> > On Fri, 2012-04-06 at 09:14 +0200, Juri Lelli wrote:
> >> +static void init_dl_task_timer(struct sched_dl_entity *dl_se)
> >> +{
> >> +       struct hrtimer *timer =&dl_se->dl_timer;
> >> +
> >> +       if (hrtimer_active(timer)) {
> >> +               hrtimer_try_to_cancel(timer);
> >> +               return;
> >> +       }
> >
> > Same question I guess, how can it be active here? Also, just letting it
> > run doesn't seem like the best way out..
> >
> 
> Probably s/hrtimer_try_to_cancel/hrtimer_cancel is better.

Yeah, not sure you can do hrtimer_cancel() there though, you're holding
->pi_lock and rq->lock and have IRQs disabled. That sounds like asking
for trouble.

Anyway, if it can't happen, we don't have to fix it.. so lets answer
that first ;-)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-23 14:35   ` Peter Zijlstra
@ 2012-04-23 15:39     ` Juri Lelli
  2012-04-23 15:43       ` Peter Zijlstra
  0 siblings, 1 reply; 42+ messages in thread
From: Juri Lelli @ 2012-04-23 15:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, rostedt, cfriesen, oleg, fweisbec, darren,
	johan.eker, p.faure, linux-kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, raistlin, insop.song, liming.wang

On 04/23/2012 04:35 PM, Peter Zijlstra wrote:
> On Fri, 2012-04-06 at 09:14 +0200, Juri Lelli wrote:
>> +static void init_dl_task_timer(struct sched_dl_entity *dl_se)
>> +{
>> +       struct hrtimer *timer =&dl_se->dl_timer;
>> +
>> +       if (hrtimer_active(timer)) {
>> +               hrtimer_try_to_cancel(timer);
>> +               return;
>> +       }
>
> Same question I guess, how can it be active here? Also, just letting it
> run doesn't seem like the best way out..
>

Probably s/hrtimer_try_to_cancel/hrtimer_cancel is better.
  
>> +
>> +       hrtimer_init(timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
>> +       timer->function = dl_task_timer;
>> +       timer->irqsafe = 1;
>> +}

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-23 15:15   ` Peter Zijlstra
@ 2012-04-23 15:37     ` Juri Lelli
  0 siblings, 0 replies; 42+ messages in thread
From: Juri Lelli @ 2012-04-23 15:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, rostedt, cfriesen, oleg, fweisbec, darren,
	johan.eker, p.faure, linux-kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, raistlin, insop.song, liming.wang

On 04/23/2012 05:15 PM, Peter Zijlstra wrote:
> On Fri, 2012-04-06 at 09:14 +0200, Juri Lelli wrote:
>> +static
>> +int dl_runtime_exceeded(struct rq *rq, struct sched_dl_entity *dl_se)
>> +{
>> +       int dmiss = dl_time_before(dl_se->deadline, rq->clock);
>> +       int rorun = dl_se->runtime<= 0;
>> +
>> +       if (!rorun&&  !dmiss)
>> +               return 0;
>> +
>> +       /*
>> +        * If we are beyond our current deadline and we are still
>> +        * executing, then we have already used some of the runtime of
>> +        * the next instance. Thus, if we do not account that, we are
>> +        * stealing bandwidth from the system at each deadline miss!
>> +        */
>> +       if (dmiss) {
>> +               dl_se->runtime = rorun ? dl_se->runtime : 0;
>> +               dl_se->runtime -= rq->clock - dl_se->deadline;
>> +       }
>
> So ideally this can't happen, but since we already leak time from the
> system through means of hardirq / kstop / context-switch-overhead /
> clock-jitter etc.. we avoid the error accumulating?
>

Yep, seems fair :-).
  
>> +
>> +       return 1;
>> +}
>
>

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-23 14:25   ` Peter Zijlstra
@ 2012-04-23 15:34     ` Juri Lelli
  0 siblings, 0 replies; 42+ messages in thread
From: Juri Lelli @ 2012-04-23 15:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, rostedt, cfriesen, oleg, fweisbec, darren,
	johan.eker, p.faure, linux-kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, raistlin, insop.song, liming.wang

On 04/23/2012 04:25 PM, Peter Zijlstra wrote:
> On Fri, 2012-04-06 at 09:14 +0200, Juri Lelli wrote:
>> +/*
>> + * This is the bandwidth enforcement timer callback. If here, we know
>> + * a task is not on its dl_rq, since the fact that the timer was running
>> + * means the task is throttled and needs a runtime replenishment.
>> + *
>> + * However, what we actually do depends on the fact the task is active,
>> + * (it is on its rq) or has been removed from there by a call to
>> + * dequeue_task_dl(). In the former case we must issue the runtime
>> + * replenishment and add the task back to the dl_rq; in the latter, we just
>> + * do nothing but clearing dl_throttled, so that runtime and deadline
>> + * updating (and the queueing back to dl_rq) will be done by the
>> + * next call to enqueue_task_dl().
>
> OK, so that comment isn't entirely clear to me, how can that timer still
> be active when the task isn't? You start the timer when you throttle it,
> at that point it cannot in fact dequeue itself anymore.
>
> The only possibility I see is the one mentioned with the dl_task() check
> below, that someone else called sched_setscheduler() on it.
>

Ok, I was also stuck at this point when I first reviewed v3.
Then I convinced myself that, even if probably always true,
the p->on_rq check would prevent weird situations like for
example: by the time I block on a mutex, go to sleep or whatever,
I am throttled, then the dl_timer fires and I'm still !on_rq.
But I didn't see this happening ever actually...

>> + */
>> +static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
>> +{
>> +       unsigned long flags;
>> +       struct sched_dl_entity *dl_se = container_of(timer,
>> +                                                    struct sched_dl_entity,
>> +                                                    dl_timer);
>> +       struct task_struct *p = dl_task_of(dl_se);
>> +       struct rq *rq = task_rq_lock(p,&flags);
>> +
>> +       /*
>> +        * We need to take care of a possible races here. In fact, the
>> +        * task might have changed its scheduling policy to something
>> +        * different from SCHED_DEADLINE (through sched_setscheduler()).
>> +        */
>> +       if (!dl_task(p))
>> +               goto unlock;
>> +
>> +       dl_se->dl_throttled = 0;
>> +       if (p->on_rq) {
>> +               enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
>> +               if (task_has_dl_policy(rq->curr))
>> +                       check_preempt_curr_dl(rq, p, 0);
>> +               else
>> +                       resched_task(rq->curr);
>> +       }
>
> So I can't see how that cannot be true.
>
>> +unlock:
>> +       task_rq_unlock(rq, p,&flags);
>> +
>> +       return HRTIMER_NORESTART;
>> +}

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-06  7:14 ` [PATCH 05/16] sched: SCHED_DEADLINE policy implementation Juri Lelli
                     ` (9 preceding siblings ...)
  2012-04-23 14:35   ` Peter Zijlstra
@ 2012-04-23 15:15   ` Peter Zijlstra
  2012-04-23 15:37     ` Juri Lelli
  10 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2012-04-23 15:15 UTC (permalink / raw)
  To: Juri Lelli
  Cc: tglx, mingo, rostedt, cfriesen, oleg, fweisbec, darren,
	johan.eker, p.faure, linux-kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, raistlin, insop.song, liming.wang

On Fri, 2012-04-06 at 09:14 +0200, Juri Lelli wrote:
> +static
> +int dl_runtime_exceeded(struct rq *rq, struct sched_dl_entity *dl_se)
> +{
> +       int dmiss = dl_time_before(dl_se->deadline, rq->clock);
> +       int rorun = dl_se->runtime <= 0;
>+
> +       if (!rorun && !dmiss)
> +               return 0;
> +
> +       /*
> +        * If we are beyond our current deadline and we are still
> +        * executing, then we have already used some of the runtime of
> +        * the next instance. Thus, if we do not account that, we are
> +        * stealing bandwidth from the system at each deadline miss!
> +        */
> +       if (dmiss) {
> +               dl_se->runtime = rorun ? dl_se->runtime : 0;
> +               dl_se->runtime -= rq->clock - dl_se->deadline;
> +       }

So ideally this can't happen, but since we already leak time from the
system through means of hardirq / kstop / context-switch-overhead /
clock-jitter etc.. we avoid the error accumulating?

> +
> +       return 1;
> +} 



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-23 14:43     ` Juri Lelli
@ 2012-04-23 15:11       ` Peter Zijlstra
  0 siblings, 0 replies; 42+ messages in thread
From: Peter Zijlstra @ 2012-04-23 15:11 UTC (permalink / raw)
  To: Juri Lelli
  Cc: tglx, mingo, rostedt, cfriesen, oleg, fweisbec, darren,
	johan.eker, p.faure, linux-kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, raistlin, insop.song, liming.wang

On Mon, 2012-04-23 at 16:43 +0200, Juri Lelli wrote:
> 
> >  From what I can see there are no constraints on the values in
> > __setparam_dl() so the above left term can be constructed to be an
> > overflow.
> >
> 
> Yes, could happen :-\.
> 
> > Ideally we'd use u128 here, but I don't think people will let us :/
> >
> 
> Do we need to do something about that? If we cannot go for bigger space
> probably limit dl_deadline (or warn the user).. 

Depends on what happens, if only this task gets screwy, no real problem,
they supplied funny input, they get funny output. If OTOH it affects
other tasks we should do something.

Ideally we'd avoid the situation by some clever maths, second best would
be rejecting the parameters up front.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-23 11:55   ` Peter Zijlstra
@ 2012-04-23 14:43     ` Juri Lelli
  2012-04-23 15:11       ` Peter Zijlstra
  2012-04-23 21:55     ` Tommaso Cucinotta
  1 sibling, 1 reply; 42+ messages in thread
From: Juri Lelli @ 2012-04-23 14:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, rostedt, cfriesen, oleg, fweisbec, darren,
	johan.eker, p.faure, linux-kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, raistlin, insop.song, liming.wang

On 04/23/2012 01:55 PM, Peter Zijlstra wrote:
> On Fri, 2012-04-06 at 09:14 +0200, Juri Lelli wrote:
>> +/*
>> + * Here we check if --at time t-- an entity (which is probably being
>> + * [re]activated or, in general, enqueued) can use its remaining runtime
>> + * and its current deadline _without_ exceeding the bandwidth it is
>> + * assigned (function returns true if it can).
>> + *
>> + * For this to hold, we must check if:
>> + *   runtime / (deadline - t)<  dl_runtime / dl_deadline .
>
> It might be good to put a few words in as to why that is.. I know I
> always forget (but know where to find it by now), also might be good to
> refer those papers Tommaso listed when Steven asked this a while back.
>

Ok, I'll fix the comment, extend it and add T.'s references in the
Documentation.

>> + */
>> +static bool dl_entity_overflow(struct sched_dl_entity *dl_se, u64 t)
>> +{
>> +       u64 left, right;
>> +
>> +       /*
>> +        * left and right are the two sides of the equation above,
>> +        * after a bit of shuffling to use multiplications instead
>> +        * of divisions.
>> +        *
>> +        * Note that none of the time values involved in the two
>> +        * multiplications are absolute: dl_deadline and dl_runtime
>> +        * are the relative deadline and the maximum runtime of each
>> +        * instance, runtime is the runtime left for the last instance
>> +        * and (deadline - t), since t is rq->clock, is the time left
>> +        * to the (absolute) deadline. Therefore, overflowing the u64
>> +        * type is very unlikely to occur in both cases.
>> +        */
>> +       left = dl_se->dl_deadline * dl_se->runtime;
>> +       right = (dl_se->deadline - t) * dl_se->dl_runtime;
>
>
>  From what I can see there are no constraints on the values in
> __setparam_dl() so the above left term can be constructed to be an
> overflow.
>

Yes, could happen :-\.

> Ideally we'd use u128 here, but I don't think people will let us :/
>

Do we need to do something about that? If we cannot go for bigger space
probably limit dl_deadline (or warn the user)..

>> +       return dl_time_before(right, left);
>> +}

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-06  7:14 ` [PATCH 05/16] sched: SCHED_DEADLINE policy implementation Juri Lelli
                     ` (8 preceding siblings ...)
  2012-04-23 14:25   ` Peter Zijlstra
@ 2012-04-23 14:35   ` Peter Zijlstra
  2012-04-23 15:39     ` Juri Lelli
  2012-04-23 15:15   ` Peter Zijlstra
  10 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2012-04-23 14:35 UTC (permalink / raw)
  To: Juri Lelli
  Cc: tglx, mingo, rostedt, cfriesen, oleg, fweisbec, darren,
	johan.eker, p.faure, linux-kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, raistlin, insop.song, liming.wang

On Fri, 2012-04-06 at 09:14 +0200, Juri Lelli wrote:
> +static void init_dl_task_timer(struct sched_dl_entity *dl_se)
> +{
> +       struct hrtimer *timer = &dl_se->dl_timer;
> +
> +       if (hrtimer_active(timer)) {
> +               hrtimer_try_to_cancel(timer);
> +               return;
> +       }

Same question I guess, how can it be active here? Also, just letting it
run doesn't seem like the best way out.. 

> +
> +       hrtimer_init(timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> +       timer->function = dl_task_timer;
> +       timer->irqsafe = 1;
> +} 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-06  7:14 ` [PATCH 05/16] sched: SCHED_DEADLINE policy implementation Juri Lelli
                     ` (7 preceding siblings ...)
  2012-04-23 14:11   ` Peter Zijlstra
@ 2012-04-23 14:25   ` Peter Zijlstra
  2012-04-23 15:34     ` Juri Lelli
  2012-04-23 14:35   ` Peter Zijlstra
  2012-04-23 15:15   ` Peter Zijlstra
  10 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2012-04-23 14:25 UTC (permalink / raw)
  To: Juri Lelli
  Cc: tglx, mingo, rostedt, cfriesen, oleg, fweisbec, darren,
	johan.eker, p.faure, linux-kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, raistlin, insop.song, liming.wang

On Fri, 2012-04-06 at 09:14 +0200, Juri Lelli wrote:
> +/*
> + * This is the bandwidth enforcement timer callback. If here, we know
> + * a task is not on its dl_rq, since the fact that the timer was running
> + * means the task is throttled and needs a runtime replenishment.
> + *
> + * However, what we actually do depends on the fact the task is active,
> + * (it is on its rq) or has been removed from there by a call to
> + * dequeue_task_dl(). In the former case we must issue the runtime
> + * replenishment and add the task back to the dl_rq; in the latter, we just
> + * do nothing but clearing dl_throttled, so that runtime and deadline
> + * updating (and the queueing back to dl_rq) will be done by the
> + * next call to enqueue_task_dl().

OK, so that comment isn't entirely clear to me, how can that timer still
be active when the task isn't? You start the timer when you throttle it,
at that point it cannot in fact dequeue itself anymore.

The only possibility I see is the one mentioned with the dl_task() check
below, that someone else called sched_setscheduler() on it.

> + */
> +static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
> +{
> +       unsigned long flags;
> +       struct sched_dl_entity *dl_se = container_of(timer,
> +                                                    struct sched_dl_entity,
> +                                                    dl_timer);
> +       struct task_struct *p = dl_task_of(dl_se);
> +       struct rq *rq = task_rq_lock(p, &flags);
> +
> +       /*
> +        * We need to take care of a possible races here. In fact, the
> +        * task might have changed its scheduling policy to something
> +        * different from SCHED_DEADLINE (through sched_setscheduler()).
> +        */
> +       if (!dl_task(p))
> +               goto unlock;
> +
> +       dl_se->dl_throttled = 0;
> +       if (p->on_rq) {
> +               enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
> +               if (task_has_dl_policy(rq->curr))
> +                       check_preempt_curr_dl(rq, p, 0);
> +               else
> +                       resched_task(rq->curr);
> +       }

So I can't see how that cannot be true.

> +unlock:
> +       task_rq_unlock(rq, p, &flags);
> +
> +       return HRTIMER_NORESTART;
> +} 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-06  7:14 ` [PATCH 05/16] sched: SCHED_DEADLINE policy implementation Juri Lelli
                     ` (6 preceding siblings ...)
  2012-04-23 11:55   ` Peter Zijlstra
@ 2012-04-23 14:11   ` Peter Zijlstra
  2012-04-23 14:25   ` Peter Zijlstra
                     ` (2 subsequent siblings)
  10 siblings, 0 replies; 42+ messages in thread
From: Peter Zijlstra @ 2012-04-23 14:11 UTC (permalink / raw)
  To: Juri Lelli
  Cc: tglx, mingo, rostedt, cfriesen, oleg, fweisbec, darren,
	johan.eker, p.faure, linux-kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, raistlin, insop.song, liming.wang

On Fri, 2012-04-06 at 09:14 +0200, Juri Lelli wrote:
> +static int start_dl_timer(struct sched_dl_entity *dl_se)
> +{
> +       struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
> +       struct rq *rq = rq_of_dl_rq(dl_rq);
> +       ktime_t now, act;
> +       ktime_t soft, hard;
> +       unsigned long range;
> +       s64 delta;
> +
> +       /*
> +        * We want the timer to fire at the deadline, but considering
> +        * that it is actually coming from rq->clock and not from
> +        * hrtimer's time base reading.
> +        */
> +       act = ns_to_ktime(dl_se->deadline);
> +       now = hrtimer_cb_get_time(&dl_se->dl_timer);
> +       delta = ktime_to_ns(now) - rq->clock;
> +       act = ktime_add_ns(act, delta);


Right, this all is very sad.. but I guess we'll have to like live with
it. The only other option is adding another timer base that tries to
keep itself in sync with rq->clock but that all sounds very painful
indeed.

Keeping up with rq->clock_task would be even more painful since it slows
the clock down in random fashion making the timer fire early.
Compensating that is going to be both fun and expensive.

> +       /*
> +        * If the expiry time already passed, e.g., because the value
> +        * chosen as the deadline is too small, don't even try to
> +        * start the timer in the past!
> +        */
> +       if (ktime_us_delta(act, now) < 0)
> +               return 0;
> +
> +       hrtimer_set_expires(&dl_se->dl_timer, act);
> +
> +       soft = hrtimer_get_softexpires(&dl_se->dl_timer);
> +       hard = hrtimer_get_expires(&dl_se->dl_timer);
> +       range = ktime_to_ns(ktime_sub(hard, soft));
> +       __hrtimer_start_range_ns(&dl_se->dl_timer, soft,
> +                                range, HRTIMER_MODE_ABS, 0);
> +
> +       return hrtimer_active(&dl_se->dl_timer);
> +} 

/me reminds himself to make __hrtimer_start_range_ns() return -ETIME.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-23 13:37         ` Juri Lelli
@ 2012-04-23 14:01           ` Peter Zijlstra
  0 siblings, 0 replies; 42+ messages in thread
From: Peter Zijlstra @ 2012-04-23 14:01 UTC (permalink / raw)
  To: Juri Lelli
  Cc: tglx, mingo, rostedt, cfriesen, oleg, fweisbec, darren,
	johan.eker, p.faure, linux-kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, raistlin, insop.song, liming.wang,
	Andrew Morton, Linus Torvalds

On Mon, 2012-04-23 at 15:37 +0200, Juri Lelli wrote:
> 
> This is what I got for that snippet:
> 
> ffffffff81062826 <enqueue_task_dl>:
> [...]
> ffffffff81062885:       49 03 44 24 20          add    0x20(%r12),%rax
> ffffffff8106288a:       49 8b 54 24 28          mov    0x28(%r12),%rdx
> ffffffff8106288f:       49 01 54 24 38          add    %rdx,0x38(%r12)
> ffffffff81062894:       49 89 44 24 30          mov    %rax,0x30(%r12)
> ffffffff81062899:       49 8b 44 24 30          mov    0x30(%r12),%rax
> ffffffff8106289e:       48 85 c0                test   %rax,%rax
> ffffffff810628a1:       7e e2                   jle    ffffffff81062885 <enqueue_task_dl+0x5f>
> 
> So it seems we are fine in this case, right?

Yep.

> It is anyway better to enforce this Gcc behaviour, just to be
> on the safe side? 

Dunno, the 'fix' is somewhat hideous (although we could make it suck
less), we've only ever bothered with it if caused problems, so I guess
we'll just wait and see until it breaks.


---
Subject: kernel,sched,time: Clean up gcc work-arounds
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Mon Apr 23 15:55:48 CEST 2012

We've grown various copies of a particular gcc work-around, consolidate
them into one and add a larger comment.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 include/linux/compiler.h |   12 ++++++++++++
 include/linux/math64.h   |    4 +---
 kernel/sched/core.c      |    8 ++------
 kernel/sched/fair.c      |    8 ++------
 kernel/time.c            |   11 ++++-------
 5 files changed, 21 insertions(+), 22 deletions(-)

--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -310,4 +310,16 @@ void ftrace_likely_update(struct ftrace_
  */
 #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
 
+/*
+ * Avoid gcc loop optimization by clobbering a variable, forcing a reload
+ * and invalidating the optimization.
+ *
+ * The optimization in question transforms various loops into divisions/modulo
+ * operations, this is a problem when either the resulting operation generates
+ * unimplemented libgcc functions (u64 divisions for example) or the loop is
+ * known not to contain a lot of iterations and the division is in fact more
+ * expensive.
+ */
+#define __gcc_dont_optimize_loop(var) asm("" "+rm" (var))
+
 #endif /* __LINUX_COMPILER_H */
--- a/include/linux/math64.h
+++ b/include/linux/math64.h
@@ -105,9 +105,7 @@ __iter_div_u64_rem(u64 dividend, u32 div
 	u32 ret = 0;
 
 	while (dividend >= divisor) {
-		/* The following asm() prevents the compiler from
-		   optimising this loop into a modulo operation.  */
-		asm("" : "+rm"(dividend));
+		__gcc_dont_optimize_loop(dividend);
 
 		dividend -= divisor;
 		ret++;
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -628,12 +628,8 @@ void sched_avg_update(struct rq *rq)
 	s64 period = sched_avg_period();
 
 	while ((s64)(rq->clock - rq->age_stamp) > period) {
-		/*
-		 * Inline assembly required to prevent the compiler
-		 * optimising this loop into a divmod call.
-		 * See __iter_div_u64_rem() for another example of this.
-		 */
-		asm("" : "+rm" (rq->age_stamp));
+		__gcc_dont_optimize_loop(rq->age_stamp);
+
 		rq->age_stamp += period;
 		rq->rt_avg /= 2;
 	}
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -853,12 +853,8 @@ static void update_cfs_load(struct cfs_r
 		update_cfs_rq_load_contribution(cfs_rq, global_update);
 
 	while (cfs_rq->load_period > period) {
-		/*
-		 * Inline assembly required to prevent the compiler
-		 * optimising this loop into a divmod call.
-		 * See __iter_div_u64_rem() for another example of this.
-		 */
-		asm("" : "+rm" (cfs_rq->load_period));
+		__gcc_dont_optimize_loop(cfs_rq->load_period);
+
 		cfs_rq->load_period /= 2;
 		cfs_rq->load_avg /= 2;
 	}
--- a/kernel/time.c
+++ b/kernel/time.c
@@ -349,17 +349,14 @@ EXPORT_SYMBOL(mktime);
 void set_normalized_timespec(struct timespec *ts, time_t sec, s64 nsec)
 {
 	while (nsec >= NSEC_PER_SEC) {
-		/*
-		 * The following asm() prevents the compiler from
-		 * optimising this loop into a modulo operation. See
-		 * also __iter_div_u64_rem() in include/linux/time.h
-		 */
-		asm("" : "+rm"(nsec));
+		__gcc_dont_optimize_loop(nsec);
+
 		nsec -= NSEC_PER_SEC;
 		++sec;
 	}
 	while (nsec < 0) {
-		asm("" : "+rm"(nsec));
+		__gcc_dont_optimize_loop(nsec);
+
 		nsec += NSEC_PER_SEC;
 		--sec;
 	}


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-23 12:22       ` Peter Zijlstra
@ 2012-04-23 13:37         ` Juri Lelli
  2012-04-23 14:01           ` Peter Zijlstra
  0 siblings, 1 reply; 42+ messages in thread
From: Juri Lelli @ 2012-04-23 13:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, rostedt, cfriesen, oleg, fweisbec, darren,
	johan.eker, p.faure, linux-kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, raistlin, insop.song, liming.wang

On 04/23/2012 02:22 PM, Peter Zijlstra wrote:
> On Mon, 2012-04-23 at 14:13 +0200, Juri Lelli wrote:
>> On 04/23/2012 01:32 PM, Peter Zijlstra wrote:
>>> On Fri, 2012-04-06 at 09:14 +0200, Juri Lelli wrote:
>>>> +       /*
>>>> +        * We Keep moving the deadline away until we get some
>>>> +        * available runtime for the entity. This ensures correct
>>>> +        * handling of situations where the runtime overrun is
>>>> +        * arbitrary large.
>>>> +        */
>>>> +       while (dl_se->runtime<= 0) {
>>>> +               dl_se->deadline += dl_se->dl_deadline;
>>>> +               dl_se->runtime += dl_se->dl_runtime;
>>>> +       }
>>>
>>> Does gcc 'optimize' that into a division? If so, it might need special
>>> glue to make it not do that.
>>
>> I got two adds and a jle, no div here..
>
> Gcc is known to change such loops to something like:
>
>   if (runtime<= 0) {
>     tmp = 1 - runtime / dl_runtime;
>     deadline += tmp * dl_deadline;
>     runtime += tmp * dl_runtime;
>   }
>
>

This is what I got for that snippet:

ffffffff81062826 <enqueue_task_dl>:
[...]
ffffffff81062885:       49 03 44 24 20          add    0x20(%r12),%rax
ffffffff8106288a:       49 8b 54 24 28          mov    0x28(%r12),%rdx
ffffffff8106288f:       49 01 54 24 38          add    %rdx,0x38(%r12)
ffffffff81062894:       49 89 44 24 30          mov    %rax,0x30(%r12)
ffffffff81062899:       49 8b 44 24 30          mov    0x30(%r12),%rax
ffffffff8106289e:       48 85 c0                test   %rax,%rax
ffffffff810628a1:       7e e2                   jle    ffffffff81062885 <enqueue_task_dl+0x5f>

So it seems we are fine in this case, right?
It is anyway better to enforce this Gcc behaviour, just to be
on the safe side?

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-23 12:13     ` Juri Lelli
@ 2012-04-23 12:22       ` Peter Zijlstra
  2012-04-23 13:37         ` Juri Lelli
  0 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2012-04-23 12:22 UTC (permalink / raw)
  To: Juri Lelli
  Cc: tglx, mingo, rostedt, cfriesen, oleg, fweisbec, darren,
	johan.eker, p.faure, linux-kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, raistlin, insop.song, liming.wang

On Mon, 2012-04-23 at 14:13 +0200, Juri Lelli wrote:
> On 04/23/2012 01:32 PM, Peter Zijlstra wrote:
> > On Fri, 2012-04-06 at 09:14 +0200, Juri Lelli wrote:
> >> +       /*
> >> +        * We Keep moving the deadline away until we get some
> >> +        * available runtime for the entity. This ensures correct
> >> +        * handling of situations where the runtime overrun is
> >> +        * arbitrary large.
> >> +        */
> >> +       while (dl_se->runtime<= 0) {
> >> +               dl_se->deadline += dl_se->dl_deadline;
> >> +               dl_se->runtime += dl_se->dl_runtime;
> >> +       }
> >
> > Does gcc 'optimize' that into a division? If so, it might need special
> > glue to make it not do that.
> 
> I got two adds and a jle, no div here..

Gcc is known to change such loops to something like:

 if (runtime <= 0) {
   tmp = 1 - runtime / dl_runtime;
   deadline += tmp * dl_deadline;
   runtime += tmp * dl_runtime;
 }



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-23 11:32   ` Peter Zijlstra
@ 2012-04-23 12:13     ` Juri Lelli
  2012-04-23 12:22       ` Peter Zijlstra
  0 siblings, 1 reply; 42+ messages in thread
From: Juri Lelli @ 2012-04-23 12:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, rostedt, cfriesen, oleg, fweisbec, darren,
	johan.eker, p.faure, linux-kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, raistlin, insop.song, liming.wang

On 04/23/2012 01:32 PM, Peter Zijlstra wrote:
> On Fri, 2012-04-06 at 09:14 +0200, Juri Lelli wrote:
>> +       /*
>> +        * We Keep moving the deadline away until we get some
>> +        * available runtime for the entity. This ensures correct
>> +        * handling of situations where the runtime overrun is
>> +        * arbitrary large.
>> +        */
>> +       while (dl_se->runtime<= 0) {
>> +               dl_se->deadline += dl_se->dl_deadline;
>> +               dl_se->runtime += dl_se->dl_runtime;
>> +       }
>
> Does gcc 'optimize' that into a division? If so, it might need special
> glue to make it not do that.

I got two adds and a jle, no div here..

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-23 11:34   ` Peter Zijlstra
@ 2012-04-23 11:57     ` Juri Lelli
  0 siblings, 0 replies; 42+ messages in thread
From: Juri Lelli @ 2012-04-23 11:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, rostedt, cfriesen, oleg, fweisbec, darren,
	johan.eker, p.faure, linux-kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, raistlin, insop.song, liming.wang

On 04/23/2012 01:34 PM, Peter Zijlstra wrote:
> On Fri, 2012-04-06 at 09:14 +0200, Juri Lelli wrote:
>> +       /*
>> +        * At this point, the deadline really should be "in
>> +        * the future" with respect to rq->clock. If it's
>> +        * not, we are, for some reason, lagging too much!
>> +        * Anyway, after having warn userspace abut that,
>> +        * we still try to keep the things running by
>> +        * resetting the deadline and the budget of the
>> +        * entity.
>> +        */
>> +       if (dl_time_before(dl_se->deadline, rq->clock)) {
>> +               WARN_ON_ONCE(1);
>
> Doing printk() and friends from scheduler context isn't actually safe
> and can lock up your machine.. there's a printk_sched() that
> maybe-sorta-kinda can get your complaints out..
>

Thanks! I'll look at it.
  
>> +               dl_se->deadline = rq->clock + dl_se->dl_deadline;
>> +               dl_se->runtime = dl_se->dl_runtime;
>> +       }

- Juri

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-06  7:14 ` [PATCH 05/16] sched: SCHED_DEADLINE policy implementation Juri Lelli
                     ` (5 preceding siblings ...)
  2012-04-23 11:34   ` Peter Zijlstra
@ 2012-04-23 11:55   ` Peter Zijlstra
  2012-04-23 14:43     ` Juri Lelli
  2012-04-23 21:55     ` Tommaso Cucinotta
  2012-04-23 14:11   ` Peter Zijlstra
                     ` (3 subsequent siblings)
  10 siblings, 2 replies; 42+ messages in thread
From: Peter Zijlstra @ 2012-04-23 11:55 UTC (permalink / raw)
  To: Juri Lelli
  Cc: tglx, mingo, rostedt, cfriesen, oleg, fweisbec, darren,
	johan.eker, p.faure, linux-kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, raistlin, insop.song, liming.wang

On Fri, 2012-04-06 at 09:14 +0200, Juri Lelli wrote:
> +/*
> + * Here we check if --at time t-- an entity (which is probably being
> + * [re]activated or, in general, enqueued) can use its remaining runtime
> + * and its current deadline _without_ exceeding the bandwidth it is
> + * assigned (function returns true if it can).
> + *
> + * For this to hold, we must check if:
> + *   runtime / (deadline - t) < dl_runtime / dl_deadline .

It might be good to put a few words in as to why that is.. I know I
always forget (but know where to find it by now), also might be good to
refer those papers Tommaso listed when Steven asked this a while back.

> + */
> +static bool dl_entity_overflow(struct sched_dl_entity *dl_se, u64 t)
> +{
> +       u64 left, right;
> +
> +       /*
> +        * left and right are the two sides of the equation above,
> +        * after a bit of shuffling to use multiplications instead
> +        * of divisions.
> +        *
> +        * Note that none of the time values involved in the two
> +        * multiplications are absolute: dl_deadline and dl_runtime
> +        * are the relative deadline and the maximum runtime of each
> +        * instance, runtime is the runtime left for the last instance
> +        * and (deadline - t), since t is rq->clock, is the time left
> +        * to the (absolute) deadline. Therefore, overflowing the u64
> +        * type is very unlikely to occur in both cases.
> +        */
> +       left = dl_se->dl_deadline * dl_se->runtime;
> +       right = (dl_se->deadline - t) * dl_se->dl_runtime;


>From what I can see there are no constraints on the values in
__setparam_dl() so the above left term can be constructed to be an
overflow.

Ideally we'd use u128 here, but I don't think people will let us :/

> +       return dl_time_before(right, left);
> +} 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-06  7:14 ` [PATCH 05/16] sched: SCHED_DEADLINE policy implementation Juri Lelli
                     ` (4 preceding siblings ...)
  2012-04-23 11:32   ` Peter Zijlstra
@ 2012-04-23 11:34   ` Peter Zijlstra
  2012-04-23 11:57     ` Juri Lelli
  2012-04-23 11:55   ` Peter Zijlstra
                     ` (4 subsequent siblings)
  10 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2012-04-23 11:34 UTC (permalink / raw)
  To: Juri Lelli
  Cc: tglx, mingo, rostedt, cfriesen, oleg, fweisbec, darren,
	johan.eker, p.faure, linux-kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, raistlin, insop.song, liming.wang

On Fri, 2012-04-06 at 09:14 +0200, Juri Lelli wrote:
> +       /*
> +        * At this point, the deadline really should be "in
> +        * the future" with respect to rq->clock. If it's
> +        * not, we are, for some reason, lagging too much!
> +        * Anyway, after having warn userspace abut that,
> +        * we still try to keep the things running by
> +        * resetting the deadline and the budget of the
> +        * entity.
> +        */
> +       if (dl_time_before(dl_se->deadline, rq->clock)) {
> +               WARN_ON_ONCE(1);

Doing printk() and friends from scheduler context isn't actually safe
and can lock up your machine.. there's a printk_sched() that
maybe-sorta-kinda can get your complaints out..

> +               dl_se->deadline = rq->clock + dl_se->dl_deadline;
> +               dl_se->runtime = dl_se->dl_runtime;
> +       } 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-06  7:14 ` [PATCH 05/16] sched: SCHED_DEADLINE policy implementation Juri Lelli
                     ` (3 preceding siblings ...)
  2012-04-23 10:31   ` Peter Zijlstra
@ 2012-04-23 11:32   ` Peter Zijlstra
  2012-04-23 12:13     ` Juri Lelli
  2012-04-23 11:34   ` Peter Zijlstra
                     ` (5 subsequent siblings)
  10 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2012-04-23 11:32 UTC (permalink / raw)
  To: Juri Lelli
  Cc: tglx, mingo, rostedt, cfriesen, oleg, fweisbec, darren,
	johan.eker, p.faure, linux-kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, raistlin, insop.song, liming.wang

On Fri, 2012-04-06 at 09:14 +0200, Juri Lelli wrote:
> +       /*
> +        * We Keep moving the deadline away until we get some
> +        * available runtime for the entity. This ensures correct
> +        * handling of situations where the runtime overrun is
> +        * arbitrary large.
> +        */
> +       while (dl_se->runtime <= 0) {
> +               dl_se->deadline += dl_se->dl_deadline;
> +               dl_se->runtime += dl_se->dl_runtime;
> +       } 

Does gcc 'optimize' that into a division? If so, it might need special
glue to make it not do that.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-23 10:31   ` Peter Zijlstra
@ 2012-04-23 10:37     ` Juri Lelli
  2012-04-23 21:25       ` Tommaso Cucinotta
  0 siblings, 1 reply; 42+ messages in thread
From: Juri Lelli @ 2012-04-23 10:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, rostedt, cfriesen, oleg, fweisbec, darren,
	johan.eker, p.faure, linux-kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, raistlin, insop.song, liming.wang

On 04/23/2012 12:31 PM, Peter Zijlstra wrote:
> On Fri, 2012-04-06 at 09:14 +0200, Juri Lelli wrote:
>> +       dl_se->deadline = rq->clock + dl_se->dl_deadline;
>
> You might want to use rq->clock_task, this clock excludes times spend in
> hardirq context and steal-time (when paravirt).
>
> Then again, it might not want to use that.. but its something you might
> want to consider and make explicit by means of a comment.

Yes, I planned a consistency check for the use of clock/clock_task
throughout the code, but it seems I then forgot it.
Planned for the next iteration :-).

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-06  7:14 ` [PATCH 05/16] sched: SCHED_DEADLINE policy implementation Juri Lelli
                     ` (2 preceding siblings ...)
  2012-04-23 10:15   ` Peter Zijlstra
@ 2012-04-23 10:31   ` Peter Zijlstra
  2012-04-23 10:37     ` Juri Lelli
  2012-04-23 11:32   ` Peter Zijlstra
                     ` (6 subsequent siblings)
  10 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2012-04-23 10:31 UTC (permalink / raw)
  To: Juri Lelli
  Cc: tglx, mingo, rostedt, cfriesen, oleg, fweisbec, darren,
	johan.eker, p.faure, linux-kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, raistlin, insop.song, liming.wang

On Fri, 2012-04-06 at 09:14 +0200, Juri Lelli wrote:
> +       dl_se->deadline = rq->clock + dl_se->dl_deadline;

You might want to use rq->clock_task, this clock excludes times spend in
hardirq context and steal-time (when paravirt).

Then again, it might not want to use that.. but its something you might
want to consider and make explicit by means of a comment.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-23 10:15   ` Peter Zijlstra
@ 2012-04-23 10:18     ` Juri Lelli
  0 siblings, 0 replies; 42+ messages in thread
From: Juri Lelli @ 2012-04-23 10:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, rostedt, cfriesen, oleg, fweisbec, darren,
	johan.eker, p.faure, linux-kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, raistlin, insop.song, liming.wang

On 04/23/2012 12:15 PM, Peter Zijlstra wrote:
> On Fri, 2012-04-06 at 09:14 +0200, Juri Lelli wrote:
>> + * Copyright (C) 2010 Dario Faggioli<raistlin@linux.it>,
>> + *                    Michael Trimarchi<michael@amarulasolutions.com>,
>> + *                    Fabio Checconi<fabio@gandalf.sssup.it>
>
> Its 2012 at the time of writing, you might want to update this.. ;-)

Yep, time passes.. :-P

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-06  7:14 ` [PATCH 05/16] sched: SCHED_DEADLINE policy implementation Juri Lelli
  2012-04-11  3:06   ` Steven Rostedt
  2012-04-11 13:41   ` Steven Rostedt
@ 2012-04-23 10:15   ` Peter Zijlstra
  2012-04-23 10:18     ` Juri Lelli
  2012-04-23 10:31   ` Peter Zijlstra
                     ` (7 subsequent siblings)
  10 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2012-04-23 10:15 UTC (permalink / raw)
  To: Juri Lelli
  Cc: tglx, mingo, rostedt, cfriesen, oleg, fweisbec, darren,
	johan.eker, p.faure, linux-kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, raistlin, insop.song, liming.wang

On Fri, 2012-04-06 at 09:14 +0200, Juri Lelli wrote:
> + * Copyright (C) 2010 Dario Faggioli <raistlin@linux.it>,
> + *                    Michael Trimarchi <michael@amarulasolutions.com>,
> + *                    Fabio Checconi <fabio@gandalf.sssup.it>

Its 2012 at the time of writing, you might want to update this.. ;-)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-11 13:41   ` Steven Rostedt
@ 2012-04-11 13:55     ` Juri Lelli
  0 siblings, 0 replies; 42+ messages in thread
From: Juri Lelli @ 2012-04-11 13:55 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: peterz, tglx, mingo, cfriesen, oleg, fweisbec, darren,
	johan.eker, p.faure, linux-kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, raistlin, insop.song, liming.wang

On 04/11/2012 03:41 PM, Steven Rostedt wrote:
> On Fri, 2012-04-06 at 09:14 +0200, Juri Lelli wrote:
>
>> +static void replenish_dl_entity(struct sched_dl_entity *dl_se)
>> +{
>> +	struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
>> +	struct rq *rq = rq_of_dl_rq(dl_rq);
>> +
>> +	/*
>> +	 * We Keep moving the deadline away until we get some
>
> s/Keep/keep/
>
>> +	 * available runtime for the entity. This ensures correct
>> +	 * handling of situations where the runtime overrun is
>> +	 * arbitrary large.
>> +	 */
>> +	while (dl_se->runtime<= 0) {
>> +		dl_se->deadline += dl_se->dl_deadline;
>> +		dl_se->runtime += dl_se->dl_runtime;
>> +	}
>> +
>> +	/*
>> +	 * At this point, the deadline really should be "in
>> +	 * the future" with respect to rq->clock. If it's
>> +	 * not, we are, for some reason, lagging too much!
>> +	 * Anyway, after having warn userspace abut that,
>> +	 * we still try to keep the things running by
>> +	 * resetting the deadline and the budget of the
>> +	 * entity.
>> +	 */
>> +	if (dl_time_before(dl_se->deadline, rq->clock)) {
>> +		WARN_ON_ONCE(1);
>> +		dl_se->deadline = rq->clock + dl_se->dl_deadline;
>> +		dl_se->runtime = dl_se->dl_runtime;
>> +	}
>> +}
>> +
>
> I just finished reviewing patches 1-5, and have yet to find anything
> wrong with them (except for these typos). I'll continue my review, and
> then I'll start testing them.
>
> Good work (so far ;-)
>
> -- Steve
>
>

Well, I tried my best to not spoil to much the work made by
Dario & Co. :-).

Anyway, thanks!

- Juri

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-06  7:14 ` [PATCH 05/16] sched: SCHED_DEADLINE policy implementation Juri Lelli
  2012-04-11  3:06   ` Steven Rostedt
@ 2012-04-11 13:41   ` Steven Rostedt
  2012-04-11 13:55     ` Juri Lelli
  2012-04-23 10:15   ` Peter Zijlstra
                     ` (8 subsequent siblings)
  10 siblings, 1 reply; 42+ messages in thread
From: Steven Rostedt @ 2012-04-11 13:41 UTC (permalink / raw)
  To: Juri Lelli
  Cc: peterz, tglx, mingo, cfriesen, oleg, fweisbec, darren,
	johan.eker, p.faure, linux-kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, raistlin, insop.song, liming.wang

On Fri, 2012-04-06 at 09:14 +0200, Juri Lelli wrote:

> +static void replenish_dl_entity(struct sched_dl_entity *dl_se)
> +{
> +	struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
> +	struct rq *rq = rq_of_dl_rq(dl_rq);
> +
> +	/*
> +	 * We Keep moving the deadline away until we get some

s/Keep/keep/

> +	 * available runtime for the entity. This ensures correct
> +	 * handling of situations where the runtime overrun is
> +	 * arbitrary large.
> +	 */
> +	while (dl_se->runtime <= 0) {
> +		dl_se->deadline += dl_se->dl_deadline;
> +		dl_se->runtime += dl_se->dl_runtime;
> +	}
> +
> +	/*
> +	 * At this point, the deadline really should be "in
> +	 * the future" with respect to rq->clock. If it's
> +	 * not, we are, for some reason, lagging too much!
> +	 * Anyway, after having warn userspace abut that,
> +	 * we still try to keep the things running by
> +	 * resetting the deadline and the budget of the
> +	 * entity.
> +	 */
> +	if (dl_time_before(dl_se->deadline, rq->clock)) {
> +		WARN_ON_ONCE(1);
> +		dl_se->deadline = rq->clock + dl_se->dl_deadline;
> +		dl_se->runtime = dl_se->dl_runtime;
> +	}
> +}
> +

I just finished reviewing patches 1-5, and have yet to find anything
wrong with them (except for these typos). I'll continue my review, and
then I'll start testing them.

Good work (so far ;-)

-- Steve



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-11  3:06   ` Steven Rostedt
@ 2012-04-11  6:54     ` Juri Lelli
  0 siblings, 0 replies; 42+ messages in thread
From: Juri Lelli @ 2012-04-11  6:54 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: peterz, tglx, mingo, cfriesen, oleg, fweisbec, darren,
	johan.eker, p.faure, linux-kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, raistlin, insop.song, liming.wang

On 04/11/2012 05:06 AM, Steven Rostedt wrote:
> On Fri, 2012-04-06 at 09:14 +0200, Juri Lelli wrote:
>
>> +/*
>> + * Pure Earliest Deadline First (EDF) scheduling does not deal with the
>> + * possibility of a entity lasting more than what it declared, and thus
>> + * exhausting its runtime.
>> + *
>> + * Here we are interested in making runtime overrun possible, but we do
>> + * not want a entity which is misbehaving to affect the scheduling of all
>> + * other entities.
>> + * Therefore, a budgeting strategy called Constant Bandwidth Server (CBS)
>> + * is used, in order to confine each entity within its own bandwidth.
>> + *
>> + * This function deals exactly with that, and ensures that when the runtime
>> + * of a entity is replenished, its deadline is also postponed. That ensures
>> + * the overrunning entity can't interfere with other entity in the system and
>> + * can't make them miss their deadlines. Reasons why this kind of overruns
>> + * could happen are, typically, a entity voluntarily trying to overcume its
>
> s/overcume/overcome/
>
> -- Steve
>
>> + * runtime, or it just underestimated it during sched_setscheduler_ex().
>> + */
>> +static void replenish_dl_entity(struct sched_dl_entity *dl_se)
>> +{
>> +
>

Thanks!

- Juri

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-06  7:14 ` [PATCH 05/16] sched: SCHED_DEADLINE policy implementation Juri Lelli
@ 2012-04-11  3:06   ` Steven Rostedt
  2012-04-11  6:54     ` Juri Lelli
  2012-04-11 13:41   ` Steven Rostedt
                     ` (9 subsequent siblings)
  10 siblings, 1 reply; 42+ messages in thread
From: Steven Rostedt @ 2012-04-11  3:06 UTC (permalink / raw)
  To: Juri Lelli
  Cc: peterz, tglx, mingo, cfriesen, oleg, fweisbec, darren,
	johan.eker, p.faure, linux-kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, paulmck, raistlin, insop.song, liming.wang

On Fri, 2012-04-06 at 09:14 +0200, Juri Lelli wrote:

> +/*
> + * Pure Earliest Deadline First (EDF) scheduling does not deal with the
> + * possibility of a entity lasting more than what it declared, and thus
> + * exhausting its runtime.
> + *
> + * Here we are interested in making runtime overrun possible, but we do
> + * not want a entity which is misbehaving to affect the scheduling of all
> + * other entities.
> + * Therefore, a budgeting strategy called Constant Bandwidth Server (CBS)
> + * is used, in order to confine each entity within its own bandwidth.
> + *
> + * This function deals exactly with that, and ensures that when the runtime
> + * of a entity is replenished, its deadline is also postponed. That ensures
> + * the overrunning entity can't interfere with other entity in the system and
> + * can't make them miss their deadlines. Reasons why this kind of overruns
> + * could happen are, typically, a entity voluntarily trying to overcume its

s/overcume/overcome/

-- Steve

> + * runtime, or it just underestimated it during sched_setscheduler_ex().
> + */
> +static void replenish_dl_entity(struct sched_dl_entity *dl_se)
> +{
> +


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 05/16] sched: SCHED_DEADLINE policy implementation.
  2012-04-06  7:14 [RFC][PATCH 00/16] sched: SCHED_DEADLINE v4 Juri Lelli
@ 2012-04-06  7:14 ` Juri Lelli
  2012-04-11  3:06   ` Steven Rostedt
                     ` (10 more replies)
  0 siblings, 11 replies; 42+ messages in thread
From: Juri Lelli @ 2012-04-06  7:14 UTC (permalink / raw)
  To: peterz, tglx
  Cc: mingo, rostedt, cfriesen, oleg, fweisbec, darren, johan.eker,
	p.faure, linux-kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, juri.lelli, nicola.manica, luca.abeni,
	dhaval.giani, hgu1972, paulmck, raistlin, insop.song,
	liming.wang

From: Dario Faggioli <raistlin@linux.it>

Add a scheduling class, in sched_dl.c and a new policy called
SCHED_DEADLINE. It is an implementation of the Earliest Deadline
First (EDF) scheduling algorithm, augmented with a mechanism (called
Constant Bandwidth Server, CBS) that makes it possible to isolate
the behaviour of tasks between each other.

The typical -deadline task will be made up of a computation phase
(instance) which is activated on a periodic or sporadic fashion. The
expected (maximum) duration of such computation is called the task's
runtime; the time interval by which each instance need to be completed
is called the task's relative deadline. The task's absolute deadline
is dynamically calculated as the time instant a task (better, an
instance) activates plus the relative deadline.

The EDF algorithms selects the task with the smallest absolute
deadline as the one to be executed first, while the CBS ensures each
task to run for at most its runtime every (relative) deadline
length time interval, avoiding any interference between different
tasks (bandwidth isolation).
Thanks to this feature, also tasks that do not strictly comply with
the computational model sketched above can effectively use the new
policy.

This patch:
 - implements the core logic of the scheduling algorithm in the new
   scheduling class file;
 - provides all the glue code between the new scheduling class and
   the core scheduler and refines the interactions between sched_dl
   and the other existing scheduling classes.

Signed-off-by: Dario Faggioli <raistlin@linux.it>
Signed-off-by: Michael Trimarchi <michael@amarulasolutions.com>
Signed-off-by: Fabio Checconi <fabio@gandalf.sssup.it>
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
---
 include/linux/sched.h   |    2 +-
 kernel/fork.c           |    4 +-
 kernel/sched.c          |   67 +++++-
 kernel/sched_dl.c       |  655 +++++++++++++++++++++++++++++++++++++++++++++++
 kernel/sched_rt.c       |    1 +
 kernel/sched_stoptask.c |    2 +-
 6 files changed, 719 insertions(+), 12 deletions(-)
 create mode 100644 kernel/sched_dl.c

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6eb72b6..416ce99 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2323,7 +2323,7 @@ extern void wake_up_new_task(struct task_struct *tsk);
 #else
  static inline void kick_process(struct task_struct *tsk) { }
 #endif
-extern void sched_fork(struct task_struct *p);
+extern int sched_fork(struct task_struct *p);
 extern void sched_dead(struct task_struct *p);
 
 extern void proc_caches_init(void);
diff --git a/kernel/fork.c b/kernel/fork.c
index e3db0cb..b263c69 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1241,7 +1241,9 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 #endif
 
 	/* Perform scheduler related setup. Assign this task to a CPU. */
-	sched_fork(p);
+	retval = sched_fork(p);
+	if (retval)
+		goto bad_fork_cleanup_policy;
 
 	retval = perf_event_init_task(p);
 	if (retval)
diff --git a/kernel/sched.c b/kernel/sched.c
index fd23c67..1a38ad1 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1964,9 +1964,6 @@ static inline void __set_task_cpu(struct task_struct *p, unsigned int cpu)
 #endif
 }
 
-static const struct sched_class rt_sched_class;
-static const struct sched_class dl_sched_class;
-
 #define sched_class_highest (&stop_sched_class)
 #define for_each_class(class) \
    for (class = sched_class_highest; class; class = class->next)
@@ -2257,6 +2254,7 @@ static int irqtime_account_si_update(void)
 #include "sched_idletask.c"
 #include "sched_fair.c"
 #include "sched_rt.c"
+#include "sched_dl.c"
 #include "sched_autogroup.c"
 #include "sched_stoptask.c"
 #ifdef CONFIG_SCHED_DEBUG
@@ -3038,7 +3036,7 @@ static void __sched_fork(struct task_struct *p)
 /*
  * fork()/clone()-time setup:
  */
-void sched_fork(struct task_struct *p)
+int sched_fork(struct task_struct *p)
 {
 	unsigned long flags;
 	int cpu = get_cpu();
@@ -3077,8 +3075,14 @@ void sched_fork(struct task_struct *p)
 		p->sched_reset_on_fork = 0;
 	}
 
-	if (!rt_prio(p->prio))
+	if (dl_prio(p->prio)) {
+		put_cpu();
+		return -EAGAIN;
+	} else if (rt_prio(p->prio)) {
+		p->sched_class = &rt_sched_class;
+	} else {
 		p->sched_class = &fair_sched_class;
+	}
 
 	if (p->sched_class->task_fork)
 		p->sched_class->task_fork(p);
@@ -3111,6 +3115,7 @@ void sched_fork(struct task_struct *p)
 #endif
 
 	put_cpu();
+	return 0;
 }
 
 /*
@@ -5234,7 +5239,7 @@ void rt_mutex_setprio(struct task_struct *p, int prio)
 	struct rq *rq;
 	const struct sched_class *prev_class;
 
-	BUG_ON(prio < 0 || prio > MAX_PRIO);
+	BUG_ON(prio > MAX_PRIO);
 
 	rq = __task_rq_lock(p);
 
@@ -5470,6 +5475,38 @@ __setscheduler(struct rq *rq, struct task_struct *p, int policy, int prio)
 }
 
 /*
+ * This function initializes the sched_dl_entity of a newly becoming
+ * SCHED_DEADLINE task.
+ *
+ * Only the static values are considered here, the actual runtime and the
+ * absolute deadline will be properly calculated when the task is enqueued
+ * for the first time with its new policy.
+ */
+static void
+__setparam_dl(struct task_struct *p, const struct sched_param2 *param2)
+{
+	struct sched_dl_entity *dl_se = &p->dl;
+
+	init_dl_task_timer(dl_se);
+	dl_se->dl_runtime = param2->sched_runtime;
+	dl_se->dl_deadline = param2->sched_deadline;
+	dl_se->flags = param2->sched_flags;
+	dl_se->dl_throttled = 0;
+	dl_se->dl_new = 1;
+}
+
+static void
+__getparam_dl(struct task_struct *p, struct sched_param2 *param2)
+{
+	struct sched_dl_entity *dl_se = &p->dl;
+
+	param2->sched_priority = p->rt_priority;
+	param2->sched_runtime = dl_se->dl_runtime;
+	param2->sched_deadline = dl_se->dl_deadline;
+	param2->sched_flags = dl_se->flags;
+}
+
+/*
  * This function validates the new parameters of a -deadline task.
  * We ask for the deadline not being zero, and greater or equal
  * than the runtime.
@@ -5643,7 +5680,11 @@ recheck:
 
 	oldprio = p->prio;
 	prev_class = p->sched_class;
-	__setscheduler(rq, p, policy, param->sched_priority);
+	if (dl_policy(policy)) {
+		__setparam_dl(p, param);
+		__setscheduler(rq, p, policy, param->sched_priority);
+	} else
+		__setscheduler(rq, p, policy, param->sched_priority);
 
 	if (running)
 		p->sched_class->set_curr_task(rq);
@@ -5743,8 +5784,11 @@ do_sched_setscheduler2(pid_t pid, int policy,
 	rcu_read_lock();
 	retval = -ESRCH;
 	p = find_process_by_pid(pid);
-	if (p != NULL)
+	if (p != NULL) {
+		if (dl_policy(policy))
+			lparam2.sched_priority = 0;
 		retval = sched_setscheduler2(p, policy, &lparam2);
+	}
 	rcu_read_unlock();
 
 	return retval;
@@ -5891,7 +5935,10 @@ SYSCALL_DEFINE2(sched_getparam2, pid_t, pid,
 	if (retval)
 		goto out_unlock;
 
-	lp.sched_priority = p->rt_priority;
+	if (task_has_dl_policy(p))
+		__getparam_dl(p, &lp);
+	else
+		lp.sched_priority = p->rt_priority;
 	rcu_read_unlock();
 
 	retval = copy_to_user(param2, &lp,
@@ -6290,6 +6337,7 @@ SYSCALL_DEFINE1(sched_get_priority_max, int, policy)
 	case SCHED_RR:
 		ret = MAX_USER_RT_PRIO-1;
 		break;
+	case SCHED_DEADLINE:
 	case SCHED_NORMAL:
 	case SCHED_BATCH:
 	case SCHED_IDLE:
@@ -6315,6 +6363,7 @@ SYSCALL_DEFINE1(sched_get_priority_min, int, policy)
 	case SCHED_RR:
 		ret = 1;
 		break;
+	case SCHED_DEADLINE:
 	case SCHED_NORMAL:
 	case SCHED_BATCH:
 	case SCHED_IDLE:
diff --git a/kernel/sched_dl.c b/kernel/sched_dl.c
new file mode 100644
index 0000000..604e2bc
--- /dev/null
+++ b/kernel/sched_dl.c
@@ -0,0 +1,655 @@
+/*
+ * Deadline Scheduling Class (SCHED_DEADLINE)
+ *
+ * Earliest Deadline First (EDF) + Constant Bandwidth Server (CBS).
+ *
+ * Tasks that periodically executes their instances for less than their
+ * runtime won't miss any of their deadlines.
+ * Tasks that are not periodic or sporadic or that tries to execute more
+ * than their reserved bandwidth will be slowed down (and may potentially
+ * miss some of their deadlines), and won't affect any other task.
+ *
+ * Copyright (C) 2010 Dario Faggioli <raistlin@linux.it>,
+ *                    Michael Trimarchi <michael@amarulasolutions.com>,
+ *                    Fabio Checconi <fabio@gandalf.sssup.it>
+ */
+static const struct sched_class dl_sched_class;
+
+static inline int dl_time_before(u64 a, u64 b)
+{
+	return (s64)(a - b) < 0;
+}
+
+static inline struct task_struct *dl_task_of(struct sched_dl_entity *dl_se)
+{
+	return container_of(dl_se, struct task_struct, dl);
+}
+
+static inline struct rq *rq_of_dl_rq(struct dl_rq *dl_rq)
+{
+	return container_of(dl_rq, struct rq, dl);
+}
+
+static inline struct dl_rq *dl_rq_of_se(struct sched_dl_entity *dl_se)
+{
+	struct task_struct *p = dl_task_of(dl_se);
+	struct rq *rq = task_rq(p);
+
+	return &rq->dl;
+}
+
+static inline int on_dl_rq(struct sched_dl_entity *dl_se)
+{
+	return !RB_EMPTY_NODE(&dl_se->rb_node);
+}
+
+static inline int is_leftmost(struct task_struct *p, struct dl_rq *dl_rq)
+{
+	struct sched_dl_entity *dl_se = &p->dl;
+
+	return dl_rq->rb_leftmost == &dl_se->rb_node;
+}
+
+static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags);
+static void __dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags);
+static void check_preempt_curr_dl(struct rq *rq, struct task_struct *p,
+				  int flags);
+
+/*
+ * We are being explicitly informed that a new instance is starting,
+ * and this means that:
+ *  - the absolute deadline of the entity has to be placed at
+ *    current time + relative deadline;
+ *  - the runtime of the entity has to be set to the maximum value.
+ *
+ * The capability of specifying such event is useful whenever a -deadline
+ * entity wants to (try to!) synchronize its behaviour with the scheduler's
+ * one, and to (try to!) reconcile itself with its own scheduling
+ * parameters.
+ */
+static inline void setup_new_dl_entity(struct sched_dl_entity *dl_se)
+{
+	struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
+	struct rq *rq = rq_of_dl_rq(dl_rq);
+
+	WARN_ON(!dl_se->dl_new || dl_se->dl_throttled);
+
+	dl_se->deadline = rq->clock + dl_se->dl_deadline;
+	dl_se->runtime = dl_se->dl_runtime;
+	dl_se->dl_new = 0;
+}
+
+/*
+ * Pure Earliest Deadline First (EDF) scheduling does not deal with the
+ * possibility of a entity lasting more than what it declared, and thus
+ * exhausting its runtime.
+ *
+ * Here we are interested in making runtime overrun possible, but we do
+ * not want a entity which is misbehaving to affect the scheduling of all
+ * other entities.
+ * Therefore, a budgeting strategy called Constant Bandwidth Server (CBS)
+ * is used, in order to confine each entity within its own bandwidth.
+ *
+ * This function deals exactly with that, and ensures that when the runtime
+ * of a entity is replenished, its deadline is also postponed. That ensures
+ * the overrunning entity can't interfere with other entity in the system and
+ * can't make them miss their deadlines. Reasons why this kind of overruns
+ * could happen are, typically, a entity voluntarily trying to overcume its
+ * runtime, or it just underestimated it during sched_setscheduler_ex().
+ */
+static void replenish_dl_entity(struct sched_dl_entity *dl_se)
+{
+	struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
+	struct rq *rq = rq_of_dl_rq(dl_rq);
+
+	/*
+	 * We Keep moving the deadline away until we get some
+	 * available runtime for the entity. This ensures correct
+	 * handling of situations where the runtime overrun is
+	 * arbitrary large.
+	 */
+	while (dl_se->runtime <= 0) {
+		dl_se->deadline += dl_se->dl_deadline;
+		dl_se->runtime += dl_se->dl_runtime;
+	}
+
+	/*
+	 * At this point, the deadline really should be "in
+	 * the future" with respect to rq->clock. If it's
+	 * not, we are, for some reason, lagging too much!
+	 * Anyway, after having warn userspace abut that,
+	 * we still try to keep the things running by
+	 * resetting the deadline and the budget of the
+	 * entity.
+	 */
+	if (dl_time_before(dl_se->deadline, rq->clock)) {
+		WARN_ON_ONCE(1);
+		dl_se->deadline = rq->clock + dl_se->dl_deadline;
+		dl_se->runtime = dl_se->dl_runtime;
+	}
+}
+
+/*
+ * Here we check if --at time t-- an entity (which is probably being
+ * [re]activated or, in general, enqueued) can use its remaining runtime
+ * and its current deadline _without_ exceeding the bandwidth it is
+ * assigned (function returns true if it can).
+ *
+ * For this to hold, we must check if:
+ *   runtime / (deadline - t) < dl_runtime / dl_deadline .
+ */
+static bool dl_entity_overflow(struct sched_dl_entity *dl_se, u64 t)
+{
+	u64 left, right;
+
+	/*
+	 * left and right are the two sides of the equation above,
+	 * after a bit of shuffling to use multiplications instead
+	 * of divisions.
+	 *
+	 * Note that none of the time values involved in the two
+	 * multiplications are absolute: dl_deadline and dl_runtime
+	 * are the relative deadline and the maximum runtime of each
+	 * instance, runtime is the runtime left for the last instance
+	 * and (deadline - t), since t is rq->clock, is the time left
+	 * to the (absolute) deadline. Therefore, overflowing the u64
+	 * type is very unlikely to occur in both cases.
+	 */
+	left = dl_se->dl_deadline * dl_se->runtime;
+	right = (dl_se->deadline - t) * dl_se->dl_runtime;
+
+	return dl_time_before(right, left);
+}
+
+/*
+ * When a -deadline entity is queued back on the runqueue, its runtime and
+ * deadline might need updating.
+ *
+ * The policy here is that we update the deadline of the entity only if:
+ *  - the current deadline is in the past,
+ *  - using the remaining runtime with the current deadline would make
+ *    the entity exceed its bandwidth.
+ */
+static void update_dl_entity(struct sched_dl_entity *dl_se)
+{
+	struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
+	struct rq *rq = rq_of_dl_rq(dl_rq);
+
+	/*
+	 * The arrival of a new instance needs special treatment, i.e.,
+	 * the actual scheduling parameters have to be "renewed".
+	 */
+	if (dl_se->dl_new) {
+		setup_new_dl_entity(dl_se);
+		return;
+	}
+
+	if (dl_time_before(dl_se->deadline, rq->clock) ||
+	    dl_entity_overflow(dl_se, rq->clock)) {
+		dl_se->deadline = rq->clock + dl_se->dl_deadline;
+		dl_se->runtime = dl_se->dl_runtime;
+	}
+}
+
+/*
+ * If the entity depleted all its runtime, and if we want it to sleep
+ * while waiting for some new execution time to become available, we
+ * set the bandwidth enforcement timer to the replenishment instant
+ * and try to activate it.
+ *
+ * Notice that it is important for the caller to know if the timer
+ * actually started or not (i.e., the replenishment instant is in
+ * the future or in the past).
+ */
+static int start_dl_timer(struct sched_dl_entity *dl_se)
+{
+	struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
+	struct rq *rq = rq_of_dl_rq(dl_rq);
+	ktime_t now, act;
+	ktime_t soft, hard;
+	unsigned long range;
+	s64 delta;
+
+	/*
+	 * We want the timer to fire at the deadline, but considering
+	 * that it is actually coming from rq->clock and not from
+	 * hrtimer's time base reading.
+	 */
+	act = ns_to_ktime(dl_se->deadline);
+	now = hrtimer_cb_get_time(&dl_se->dl_timer);
+	delta = ktime_to_ns(now) - rq->clock;
+	act = ktime_add_ns(act, delta);
+
+	/*
+	 * If the expiry time already passed, e.g., because the value
+	 * chosen as the deadline is too small, don't even try to
+	 * start the timer in the past!
+	 */
+	if (ktime_us_delta(act, now) < 0)
+		return 0;
+
+	hrtimer_set_expires(&dl_se->dl_timer, act);
+
+	soft = hrtimer_get_softexpires(&dl_se->dl_timer);
+	hard = hrtimer_get_expires(&dl_se->dl_timer);
+	range = ktime_to_ns(ktime_sub(hard, soft));
+	__hrtimer_start_range_ns(&dl_se->dl_timer, soft,
+				 range, HRTIMER_MODE_ABS, 0);
+
+	return hrtimer_active(&dl_se->dl_timer);
+}
+
+/*
+ * This is the bandwidth enforcement timer callback. If here, we know
+ * a task is not on its dl_rq, since the fact that the timer was running
+ * means the task is throttled and needs a runtime replenishment.
+ *
+ * However, what we actually do depends on the fact the task is active,
+ * (it is on its rq) or has been removed from there by a call to
+ * dequeue_task_dl(). In the former case we must issue the runtime
+ * replenishment and add the task back to the dl_rq; in the latter, we just
+ * do nothing but clearing dl_throttled, so that runtime and deadline
+ * updating (and the queueing back to dl_rq) will be done by the
+ * next call to enqueue_task_dl().
+ */
+static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
+{
+	unsigned long flags;
+	struct sched_dl_entity *dl_se = container_of(timer,
+						     struct sched_dl_entity,
+						     dl_timer);
+	struct task_struct *p = dl_task_of(dl_se);
+	struct rq *rq = task_rq_lock(p, &flags);
+
+	/*
+	 * We need to take care of a possible races here. In fact, the
+	 * task might have changed its scheduling policy to something
+	 * different from SCHED_DEADLINE (through sched_setscheduler()).
+	 */
+	if (!dl_task(p))
+		goto unlock;
+
+	dl_se->dl_throttled = 0;
+	if (p->on_rq) {
+		enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
+		if (task_has_dl_policy(rq->curr))
+			check_preempt_curr_dl(rq, p, 0);
+		else
+			resched_task(rq->curr);
+	}
+unlock:
+	task_rq_unlock(rq, p, &flags);
+
+	return HRTIMER_NORESTART;
+}
+
+static void init_dl_task_timer(struct sched_dl_entity *dl_se)
+{
+	struct hrtimer *timer = &dl_se->dl_timer;
+
+	if (hrtimer_active(timer)) {
+		hrtimer_try_to_cancel(timer);
+		return;
+	}
+
+	hrtimer_init(timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	timer->function = dl_task_timer;
+	timer->irqsafe = 1;
+}
+
+static
+int dl_runtime_exceeded(struct rq *rq, struct sched_dl_entity *dl_se)
+{
+	int dmiss = dl_time_before(dl_se->deadline, rq->clock);
+	int rorun = dl_se->runtime <= 0;
+
+	if (!rorun && !dmiss)
+		return 0;
+
+	/*
+	 * If we are beyond our current deadline and we are still
+	 * executing, then we have already used some of the runtime of
+	 * the next instance. Thus, if we do not account that, we are
+	 * stealing bandwidth from the system at each deadline miss!
+	 */
+	if (dmiss) {
+		dl_se->runtime = rorun ? dl_se->runtime : 0;
+		dl_se->runtime -= rq->clock - dl_se->deadline;
+	}
+
+	return 1;
+}
+
+/*
+ * Update the current task's runtime statistics (provided it is still
+ * a -deadline task and has not been removed from the dl_rq).
+ */
+static void update_curr_dl(struct rq *rq)
+{
+	struct task_struct *curr = rq->curr;
+	struct sched_dl_entity *dl_se = &curr->dl;
+	u64 delta_exec;
+
+	if (!dl_task(curr) || !on_dl_rq(dl_se))
+		return;
+
+	delta_exec = rq->clock_task - curr->se.exec_start;
+	if (unlikely((s64)delta_exec < 0))
+		delta_exec = 0;
+
+	schedstat_set(curr->se.statistics.exec_max,
+		      max(curr->se.statistics.exec_max, delta_exec));
+
+	curr->se.sum_exec_runtime += delta_exec;
+	account_group_exec_runtime(curr, delta_exec);
+
+	curr->se.exec_start = rq->clock;
+	cpuacct_charge(curr, delta_exec);
+
+	dl_se->runtime -= delta_exec;
+	if (dl_runtime_exceeded(rq, dl_se)) {
+		__dequeue_task_dl(rq, curr, 0);
+		if (likely(start_dl_timer(dl_se)))
+			dl_se->dl_throttled = 1;
+		else
+			enqueue_task_dl(rq, curr, ENQUEUE_REPLENISH);
+
+		if (!is_leftmost(curr, &rq->dl))
+			resched_task(curr);
+	}
+}
+
+static void __enqueue_dl_entity(struct sched_dl_entity *dl_se)
+{
+	struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
+	struct rb_node **link = &dl_rq->rb_root.rb_node;
+	struct rb_node *parent = NULL;
+	struct sched_dl_entity *entry;
+	int leftmost = 1;
+
+	BUG_ON(!RB_EMPTY_NODE(&dl_se->rb_node));
+
+	while (*link) {
+		parent = *link;
+		entry = rb_entry(parent, struct sched_dl_entity, rb_node);
+		if (dl_time_before(dl_se->deadline, entry->deadline))
+			link = &parent->rb_left;
+		else {
+			link = &parent->rb_right;
+			leftmost = 0;
+		}
+	}
+
+	if (leftmost)
+		dl_rq->rb_leftmost = &dl_se->rb_node;
+
+	rb_link_node(&dl_se->rb_node, parent, link);
+	rb_insert_color(&dl_se->rb_node, &dl_rq->rb_root);
+
+	dl_rq->dl_nr_running++;
+}
+
+static void __dequeue_dl_entity(struct sched_dl_entity *dl_se)
+{
+	struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
+
+	if (RB_EMPTY_NODE(&dl_se->rb_node))
+		return;
+
+	if (dl_rq->rb_leftmost == &dl_se->rb_node) {
+		struct rb_node *next_node;
+
+		next_node = rb_next(&dl_se->rb_node);
+		dl_rq->rb_leftmost = next_node;
+	}
+
+	rb_erase(&dl_se->rb_node, &dl_rq->rb_root);
+	RB_CLEAR_NODE(&dl_se->rb_node);
+
+	dl_rq->dl_nr_running--;
+}
+
+static void
+enqueue_dl_entity(struct sched_dl_entity *dl_se, int flags)
+{
+	BUG_ON(on_dl_rq(dl_se));
+
+	/*
+	 * If this is a wakeup or a new instance, the scheduling
+	 * parameters of the task might need updating. Otherwise,
+	 * we want a replenishment of its runtime.
+	 */
+	if (!dl_se->dl_new && flags & ENQUEUE_REPLENISH)
+		replenish_dl_entity(dl_se);
+	else
+		update_dl_entity(dl_se);
+
+	__enqueue_dl_entity(dl_se);
+}
+
+static void dequeue_dl_entity(struct sched_dl_entity *dl_se)
+{
+	__dequeue_dl_entity(dl_se);
+}
+
+static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
+{
+	/*
+	 * If p is throttled, we do nothing. In fact, if it exhausted
+	 * its budget it needs a replenishment and, since it now is on
+	 * its rq, the bandwidth timer callback (which clearly has not
+	 * run yet) will take care of this.
+	 */
+	if (p->dl.dl_throttled)
+		return;
+
+	enqueue_dl_entity(&p->dl, flags);
+}
+
+static void __dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags)
+{
+	dequeue_dl_entity(&p->dl);
+}
+
+static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags)
+{
+	update_curr_dl(rq);
+	__dequeue_task_dl(rq, p, flags);
+}
+
+/*
+ * Yield task semantic for -deadline tasks is:
+ *
+ *   get off from the CPU until our next instance, with
+ *   a new runtime.
+ */
+static void yield_task_dl(struct rq *rq)
+{
+	struct task_struct *p = rq->curr;
+
+	/*
+	 * We make the task go to sleep until its current deadline by
+	 * forcing its runtime to zero. This way, update_curr_dl() stops
+	 * it and the bandwidth timer will wake it up and will give it
+	 * new scheduling parameters (thanks to dl_new=1).
+	 */
+	if (p->dl.runtime > 0) {
+		rq->curr->dl.dl_new = 1;
+		p->dl.runtime = 0;
+	}
+	update_curr_dl(rq);
+}
+
+/*
+ * Only called when both the current and waking task are -deadline
+ * tasks.
+ */
+static void check_preempt_curr_dl(struct rq *rq, struct task_struct *p,
+				  int flags)
+{
+	if (dl_time_before(p->dl.deadline, rq->curr->dl.deadline))
+		resched_task(rq->curr);
+}
+
+#ifdef CONFIG_SCHED_HRTICK
+static void start_hrtick_dl(struct rq *rq, struct task_struct *p)
+{
+	s64 delta = p->dl.dl_runtime - p->dl.runtime;
+
+	if (delta > 10000)
+		hrtick_start(rq, delta);
+}
+#else
+static void start_hrtick_dl(struct rq *rq, struct task_struct *p)
+{
+}
+#endif
+
+static struct sched_dl_entity *pick_next_dl_entity(struct rq *rq,
+						   struct dl_rq *dl_rq)
+{
+	struct rb_node *left = dl_rq->rb_leftmost;
+
+	if (!left)
+		return NULL;
+
+	return rb_entry(left, struct sched_dl_entity, rb_node);
+}
+
+struct task_struct *pick_next_task_dl(struct rq *rq)
+{
+	struct sched_dl_entity *dl_se;
+	struct task_struct *p;
+	struct dl_rq *dl_rq;
+
+	dl_rq = &rq->dl;
+
+	if (unlikely(!dl_rq->dl_nr_running))
+		return NULL;
+
+	dl_se = pick_next_dl_entity(rq, dl_rq);
+	BUG_ON(!dl_se);
+
+	p = dl_task_of(dl_se);
+	p->se.exec_start = rq->clock;
+#ifdef CONFIG_SCHED_HRTICK
+	if (hrtick_enabled(rq))
+		start_hrtick_dl(rq, p);
+#endif
+	return p;
+}
+
+static void put_prev_task_dl(struct rq *rq, struct task_struct *p)
+{
+	update_curr_dl(rq);
+	p->se.exec_start = 0;
+}
+
+static void task_tick_dl(struct rq *rq, struct task_struct *p, int queued)
+{
+	update_curr_dl(rq);
+
+#ifdef CONFIG_SCHED_HRTICK
+	if (hrtick_enabled(rq) && queued && p->dl.runtime > 0)
+		start_hrtick_dl(rq, p);
+#endif
+}
+
+static void task_fork_dl(struct task_struct *p)
+{
+	/*
+	 * SCHED_DEADLINE tasks cannot fork and this is achieved through
+	 * sched_fork()
+	 */
+}
+
+static void task_dead_dl(struct task_struct *p)
+{
+	struct hrtimer *timer = &p->dl.dl_timer;
+
+	if (hrtimer_active(timer))
+		hrtimer_try_to_cancel(timer);
+}
+
+static void set_curr_task_dl(struct rq *rq)
+{
+	struct task_struct *p = rq->curr;
+
+	p->se.exec_start = rq->clock;
+}
+
+static void switched_from_dl(struct rq *rq, struct task_struct *p)
+{
+	if (hrtimer_active(&p->dl.dl_timer))
+		hrtimer_try_to_cancel(&p->dl.dl_timer);
+}
+
+static void switched_to_dl(struct rq *rq, struct task_struct *p)
+{
+	/*
+	 * If p is throttled, don't consider the possibility
+	 * of preempting rq->curr, the check will be done right
+	 * after its runtime will get replenished.
+	 */
+	if (unlikely(p->dl.dl_throttled))
+		return;
+
+	if (!p->on_rq || rq->curr != p) {
+		if (task_has_dl_policy(rq->curr))
+			check_preempt_curr_dl(rq, p, 0);
+		else
+			resched_task(rq->curr);
+	}
+}
+
+static void prio_changed_dl(struct rq *rq, struct task_struct *p,
+			    int oldprio)
+{
+	switched_to_dl(rq, p);
+}
+
+#ifdef CONFIG_SMP
+static int
+select_task_rq_dl(struct task_struct *p, int sd_flag, int flags)
+{
+	return task_cpu(p);
+}
+
+static void set_cpus_allowed_dl(struct task_struct *p,
+				const struct cpumask *new_mask)
+{
+	int weight = cpumask_weight(new_mask);
+
+	BUG_ON(!dl_task(p));
+
+	cpumask_copy(&p->cpus_allowed, new_mask);
+	p->dl.nr_cpus_allowed = weight;
+}
+#endif
+
+static const struct sched_class dl_sched_class = {
+	.next			= &rt_sched_class,
+	.enqueue_task		= enqueue_task_dl,
+	.dequeue_task		= dequeue_task_dl,
+	.yield_task		= yield_task_dl,
+
+	.check_preempt_curr	= check_preempt_curr_dl,
+
+	.pick_next_task		= pick_next_task_dl,
+	.put_prev_task		= put_prev_task_dl,
+
+#ifdef CONFIG_SMP
+	.select_task_rq		= select_task_rq_dl,
+
+	.set_cpus_allowed       = set_cpus_allowed_dl,
+#endif
+
+	.set_curr_task		= set_curr_task_dl,
+	.task_tick		= task_tick_dl,
+	.task_fork              = task_fork_dl,
+	.task_dead		= task_dead_dl,
+
+	.prio_changed           = prio_changed_dl,
+	.switched_from		= switched_from_dl,
+	.switched_to		= switched_to_dl,
+};
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index c108b9c..4b09704 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -2,6 +2,7 @@
  * Real-Time Scheduling Class (mapped to the SCHED_FIFO and SCHED_RR
  * policies)
  */
+static const struct sched_class rt_sched_class;
 
 #ifdef CONFIG_RT_GROUP_SCHED
 
diff --git a/kernel/sched_stoptask.c b/kernel/sched_stoptask.c
index 8b44e7f..4270a36 100644
--- a/kernel/sched_stoptask.c
+++ b/kernel/sched_stoptask.c
@@ -81,7 +81,7 @@ get_rr_interval_stop(struct rq *rq, struct task_struct *task)
  * Simple, special scheduling class for the per-CPU stop tasks:
  */
 static const struct sched_class stop_sched_class = {
-	.next			= &rt_sched_class,
+	.next			= &dl_sched_class,
 
 	.enqueue_task		= enqueue_task_stop,
 	.dequeue_task		= dequeue_task_stop,
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2012-05-15 10:10 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-23 12:31 [PATCH 05/16] sched: SCHED_DEADLINE policy implementation cucinotta
  -- strict thread matches above, loose matches on Subject: below --
2012-04-06  7:14 [RFC][PATCH 00/16] sched: SCHED_DEADLINE v4 Juri Lelli
2012-04-06  7:14 ` [PATCH 05/16] sched: SCHED_DEADLINE policy implementation Juri Lelli
2012-04-11  3:06   ` Steven Rostedt
2012-04-11  6:54     ` Juri Lelli
2012-04-11 13:41   ` Steven Rostedt
2012-04-11 13:55     ` Juri Lelli
2012-04-23 10:15   ` Peter Zijlstra
2012-04-23 10:18     ` Juri Lelli
2012-04-23 10:31   ` Peter Zijlstra
2012-04-23 10:37     ` Juri Lelli
2012-04-23 21:25       ` Tommaso Cucinotta
2012-04-23 21:45         ` Peter Zijlstra
2012-04-23 23:25           ` Tommaso Cucinotta
2012-04-24  6:29             ` Dario Faggioli
2012-04-24  6:52               ` Juri Lelli
2012-04-23 11:32   ` Peter Zijlstra
2012-04-23 12:13     ` Juri Lelli
2012-04-23 12:22       ` Peter Zijlstra
2012-04-23 13:37         ` Juri Lelli
2012-04-23 14:01           ` Peter Zijlstra
2012-04-23 11:34   ` Peter Zijlstra
2012-04-23 11:57     ` Juri Lelli
2012-04-23 11:55   ` Peter Zijlstra
2012-04-23 14:43     ` Juri Lelli
2012-04-23 15:11       ` Peter Zijlstra
2012-04-23 21:55     ` Tommaso Cucinotta
2012-04-23 21:58       ` Peter Zijlstra
2012-04-23 23:21         ` Tommaso Cucinotta
2012-04-24  9:50           ` Peter Zijlstra
2012-04-24  1:03         ` Steven Rostedt
2012-04-23 14:11   ` Peter Zijlstra
2012-04-23 14:25   ` Peter Zijlstra
2012-04-23 15:34     ` Juri Lelli
2012-04-23 14:35   ` Peter Zijlstra
2012-04-23 15:39     ` Juri Lelli
2012-04-23 15:43       ` Peter Zijlstra
2012-04-23 16:41         ` Juri Lelli
     [not found]           ` <4F95D41F.5060700@sssup.it>
2012-04-24  7:21             ` Juri Lelli
2012-04-24  9:00               ` Peter Zijlstra
2012-05-15 10:10         ` Juri Lelli
2012-04-23 15:15   ` Peter Zijlstra
2012-04-23 15:37     ` Juri Lelli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.