All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [RFD] sched/deadline: EDF dynamic quota design
       [not found] ` <20140514113245.GZ11096@twins.programming.kicks-ass.net>
@ 2014-05-14 12:55   ` Peter Zijlstra
       [not found]   ` <53736CD9.90805@unitn.it>
  1 sibling, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2014-05-14 12:55 UTC (permalink / raw)
  To: xiaofeng.yan
  Cc: Ingo Molnar, duzhiping.du, xiaofeng.yan2012, juri.lelli,
	luca.abeni, raistlin, henrik, tkhai, harald.gustafsson,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3975 bytes --]

On Wed, May 14, 2014 at 01:32:45PM +0200, Peter Zijlstra wrote:
> On Wed, May 14, 2014 at 06:43:38PM +0800, xiaofeng.yan wrote:
> > Hi all,
> > 
> > We design dynamic quota function based on current EDF schedule .
> > Wehave realized this dynamic quota design and get good performance in the actual scene.
> > The impalement for this function is very strong connected with production line requirement.
> > In some implementations, the current design in product could be be accepted by community,
> > such as beyond the total bandwidth limitations.
> > So we design some parts again.
> > If you want to review our current implements in product, I will send our patches and test result.
> > We hope to push it to the main line. Could you give me suggestion whether it can be accepted or not?
> > We will be very grateful for your suggestion.
> 
> At the very least you could've Cc'ed the people who actually wrote the
> EDF stuff :-/

Also, Cc lkml, left the rest of the msg intact as clearly people haven't
had a copy yet.

> > The design principle is as follows:
> > 
> > 
> > ------------------EDF Dynamic Quota Design------------------
> > 
> > * Current EDF defect
> >   EDF tasks' bandwidth is fixed currently. they could not adjust
> >   their quota whenever they are busy or idle.
> > 
> > * The scene for dynamic quota
> >   Currently, we have the scenarios which need to adjust tasks'
> >   quota dynamically. Such as, the network's workload fluctuates when
> >   forwarding the packets, which results in imbalance. Busy task should
> >   increase quota, and idle task should reduce quota.
> >   It is beneficial to increase the processing ability of the business.
> > 
> > * Dynamic quota idea
> >   We add the dynamic quota function based on current EDF schedule.
> >   The principle is as follows:
> >   The total bandwidth of EDF tasks is fixed during running.
> >   Idle tasks release left bandwidth to the global bandwidth pool, and
> >   busy tasks get some quota from the global pool.
> > 
> > * Example.
> >   Three tasks: T1,T2,T3. Their initial status is as follows,
> >   T1(200us,500us,500us)
> >   T2(200us,500us,500us)
> >   T3(200us,500us,500us)
> > 
> >   At time t1, the tasks' running status is as follows,
> >   T1(200us,500us,500us)
> >   T2(100us,500us,500us)
> >   T3(200us,500us,500us)
> >   Busy tasks: T1,T3
> >   Idle task:  T2
> >   Bandwidth pool: 20%
> > 
> >   Now, there are 20% quota in the bandwidth pool, T1 or T3 get 5% quota
> >   (adjustable) from the bandwidth pool when they enter run-queue.
> >   Then, the status is as follows:
> >   T1(225us,500us,500us)
> >   T2(100us,500us,500us)
> >   T3(225us,500us,500us)
> >   Bandwidth pool: 10%
> > 
> >   Busy tasks could get the quota when the bandwidth pool is not empty.
> > 
> > ----------------------------------------------------------------------------------------
> 
> Yeah, not sure that's sound. But I'll leave that to others.
> 
> So there have been papers on how if you transform the runtime into an
> avg the tardiness also turns into an avg and remains bounded.
> 
> Now, you still have to guarantee your individual tasks respect the avg,
> otherwise the tardiness guarantees are out the window along with it.
> 
> I've not thought through your proposal to see if it maintains this
> guarantee -- but seeing how you don't talk about guarantees at all I
> fear the worst.
> 
> Another point is that 'avg' or 'dynamic' tasks should co-exist with the
> normal fixed runtime tasks (ideally) without affecting the performance
> of the normal tasks (too much).
> 
> Your proposal also doesn't cover this.
> 
> There have also been proposals to turn the CBS into a 'soft' CBS and
> instead of hard throttling allow the task to continue executing at a
> lower class (throttle in effect turns into a switch to SCHED_NORMAL, and
> unthrottle restores it to SCHED_DEADLINE).
> 
> 



[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFD] sched/deadline: EDF dynamic quota design
       [not found]     ` <5374A335.90705@huawei.com>
@ 2014-05-15 12:31       ` Juri Lelli
  2014-05-16  7:11         ` Henrik Austad
  0 siblings, 1 reply; 9+ messages in thread
From: Juri Lelli @ 2014-05-15 12:31 UTC (permalink / raw)
  To: xiaofeng.yan
  Cc: Luca Abeni, Peter Zijlstra, Ingo Molnar, duzhiping.du,
	xiaofeng.yan2012, raistlin, henrik, tkhai, harald.gustafsson,
	linux-kernel

Hi,

[Cc-ed lkml again to include Luca's reply]

and thanks to Luca for his reply!

On Thu, 15 May 2014 19:21:25 +0800
"xiaofeng.yan" <xiaofeng.yan@huawei.com> wrote:

> On 2014/5/14 21:17, Luca Abeni wrote:
> > Hi Peter,
> >
> > thanks for adding me in cc :)
> > This dynamic quota design looks similar to some resource reclaiming 
> > algorithm
> > known in literature, and might be related to some feedback scheduling 
> > ideas.
> > More details below...
> >
> > On 05/14/2014 01:32 PM, Peter Zijlstra wrote:
> > [...]
> >>> * Example.
> >>>    Three tasks: T1,T2,T3. Their initial status is as follows,
> >>>    T1(200us,500us,500us)
> >>>    T2(200us,500us,500us)
> >>>    T3(200us,500us,500us)
> >>>
> >>>    At time t1, the tasks' running status is as follows,
> >>>    T1(200us,500us,500us)
> >>>    T2(100us,500us,500us)
> >>>    T3(200us,500us,500us)
> >>>    Busy tasks: T1,T3
> >>>    Idle task:  T2
> >>>    Bandwidth pool: 20%
> >>>
> >>>    Now, there are 20% quota in the bandwidth pool, T1 or T3 get 5% 
> >>> quota
> >>>    (adjustable) from the bandwidth pool when they enter run-queue.
> >>>    Then, the status is as follows:
> >>>    T1(225us,500us,500us)
> >>>    T2(100us,500us,500us)
> >>>    T3(225us,500us,500us)
> >>>    Bandwidth pool: 10%
> >>>
> >>>    Busy tasks could get the quota when the bandwidth pool is not empty.
> >>>
> >>> ---------------------------------------------------------------------------------------- 
> >>>
> >>
> >> Yeah, not sure that's sound. But I'll leave that to others.
> > The idea (donating to other tasks some of the runtime reserved to a task
> > but actually unused because the task is busy) is sound, but it must be 
> > done
> > carefully, otherwise the guarantees about reserved runtimes risk to be 
> > broken.
> >
> > This (CPU reclaiming) has been investigated in the past, and a lot of 
> > possible
> > algorithms have been proposed. For example, see:
> > "Greedy reclamation of unused bandwidth in constant-bandwidth servers" by
> > Lipari and others:
> > http://scholar.google.com/scholar?cluster=3702635449685717385&hl=it&as_sdt=0,5 
> >
> > or
> > "Capacity sharing for overrun control" by Caccamo and others:
> > http://scholar.google.com/scholar?cluster=8595685180937180485&hl=it&as_sdt=0,5 
> >
> > and of course all of the related papers found by Scholar (I cited 
> > these two
> > papers because I can easily remember them, but there are many other).
> > There is also a nice paper by Scott Brandt that compares the 
> > implementations
> > of multiple reclaiming algorithms, but I do not remember the title.
> >
> > All of these papers provide a more sound theoretical foundation for 
> > runtime/budget
> > donation/sharing.
> >
> > As a form of shameless self-advertisement, I have also to mention that 
> > adaptive
> > reservations (feedback scheduling based on CBS/SCHED_DEADLINE) can be 
> > related,
> > allowing to dynamically adapt the runtime to the actual tasks demands.
> > For example, see "Analysis of a reservation-based feedback scheduler":
> > http://scholar.google.com/scholar?cluster=2878648944333333913&hl=it&as_sdt=0,5 
> >
> >
> > This can also be implemented in user-space (without modifying the 
> > scheduler)
> > by having a daemon that monitors the SCHED_DEADLINE tasks and changes 
> > their
> > runtimes based on some kind of feedback (for example, difference 
> > between the
> > scheduling deadline of a task and its actual deadline - if this 
> > information
> > is made available by the scheduler). I developed a similar 
> > implementation in
> > the past (not based on SCHED_DEADLINE, but on some previous 
> > implementation
> > of the CBS algorithm).
> >
> >
> > Hope these references can help someone... :)
> >
> Thanks very much for your reply in time.  I will read the above paper 
> you referred to.
> We just implemented simple dynamic quota according to our product 
> requirement without many theories.

And this is quite a big "without", I fear. The timeliness guarantees
you can give to users are based on the theory behind the algorithms we
implement. If you change the implementation, without any proof of
theoretical soundness, you risk to break them all.

But, this doesn't mean that what you did is wrong. We'd just have to
see if can be proved to be right (in theory :)).

> In our scene some tasks are continuous for busy or idle state more than 
> alternating busy tasks and idle task between periods.
> So our method can get good performance during date package forwarding.

Not sure I understand this. But see below..

> Thanks for the authors to implement EDF based on linux. This schedule 
> class has been applied in product.

And this is great news! :)

Any way you can share more information on the use-case? That would be
helpful to also understand your solution better.

> Will EDF has dynamic quota in future?

Well, as Luca said, you can already use SCHED_DEADLINE as the backend
for feedback scheduling (that pertain mostly to user-space).

And yes, there are already thoughts to modify it a bit to go towards
Lipari's et al. GRUB algorithm. That would be probably helpful in
situations like yours. But I can't give you any timing for it.

> It is already having real-world impacts.

Great news again ;).

Thanks,

- Juri

> Thanks
> Yan
> >
> >
> >                 Luca
> >
> > .
> >
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFD] sched/deadline: EDF dynamic quota design
  2014-05-15 12:31       ` Juri Lelli
@ 2014-05-16  7:11         ` Henrik Austad
  2014-05-21 12:45           ` Luca Abeni
  0 siblings, 1 reply; 9+ messages in thread
From: Henrik Austad @ 2014-05-16  7:11 UTC (permalink / raw)
  To: Juri Lelli
  Cc: xiaofeng.yan, Luca Abeni, Peter Zijlstra, Ingo Molnar,
	duzhiping.du, xiaofeng.yan2012, raistlin, tkhai,
	harald.gustafsson, linux-kernel

On Thu, May 15, 2014 at 02:31:32PM +0200, Juri Lelli wrote:
> Hi,

Hi all,

> [Cc-ed lkml again to include Luca's reply]
> 
> and thanks to Luca for his reply!

Indeed! and several great references. My backlog just got 3 more items 
pushed :)

> On Thu, 15 May 2014 19:21:25 +0800
> "xiaofeng.yan" <xiaofeng.yan@huawei.com> wrote:
> 
> > On 2014/5/14 21:17, Luca Abeni wrote:
> > > Hi Peter,
> > >
> > > thanks for adding me in cc :)
> > > This dynamic quota design looks similar to some resource reclaiming 
> > > algorithm
> > > known in literature, and might be related to some feedback scheduling 
> > > ideas.
> > > More details below...
> > >
> > > On 05/14/2014 01:32 PM, Peter Zijlstra wrote:
> > > [...]
> > >>> * Example.
> > >>>    Three tasks: T1,T2,T3. Their initial status is as follows,
> > >>>    T1(200us,500us,500us)
> > >>>    T2(200us,500us,500us)
> > >>>    T3(200us,500us,500us)
> > >>>
> > >>>    At time t1, the tasks' running status is as follows,
> > >>>    T1(200us,500us,500us)
> > >>>    T2(100us,500us,500us)
> > >>>    T3(200us,500us,500us)
> > >>>    Busy tasks: T1,T3
> > >>>    Idle task:  T2
> > >>>    Bandwidth pool: 20%
> > >>>
> > >>>    Now, there are 20% quota in the bandwidth pool, T1 or T3 get 5% 
> > >>> quota
> > >>>    (adjustable) from the bandwidth pool when they enter run-queue.
> > >>>    Then, the status is as follows:
> > >>>    T1(225us,500us,500us)
> > >>>    T2(100us,500us,500us)
> > >>>    T3(225us,500us,500us)
> > >>>    Bandwidth pool: 10%
> > >>>
> > >>>    Busy tasks could get the quota when the bandwidth pool is not empty.

First off, the %-usage is a bit confusing. I get that you in -this- example 
use a standard 500us poolsize so that 20% means "20% of 500us = 100us". 

Secondly, you allocate 600us over a 500us period - I assume that you have 
more than 1 CPU? (a statement "On a system with X CPUs we run..." would be 
appreciated.

Once you start looking at better way to express available bandwidth, you 
realize that this is a road fraught with peril. You could say that you have 
'100us in the pool', but then you need the period as well. Then you realize 
that not all threads have the same period. Or same release-time. Further 
down the road.

- Does all the threads start at the same time, or is the initial 
  release-time independent of one another?
- when does extra BW expire?
- what if the period differs between the tasks
- on which CPU can you use this extra BW?

For instance, a somewhat similar setup like Yan's, but with slightly 
modified parameters to serve my point.

T: {C,D,R}, running on a 2 core system
C: WCET (worst-case execution time), required time on the CPU each period
D: Deadline (after release)
R: Release period (Task should have budget refill/wakeup every R us)

T1: {100us,  500us,  500us}
T2: {300us,  500us,  500us}
T3: {400us, 1000us, 1000us}
T4: {400us, 1000us, 1000us}

T1 and T2 run on core 0, T3 and T4 on core 1, both cores are 80% utilized. 
If T4 finish after 100, who should be given the extra 300us? You probably 
need to 'tag' the extra BW in the pool with both period _and_ cpu.

This is going to get complicated to get right. Or rather, you run the risk 
of running into strange bugs.

> > >> Yeah, not sure that's sound. But I'll leave that to others.
> > > The idea (donating to other tasks some of the runtime reserved to a task
> > > but actually unused because the task is busy) is sound, but it must be 
> > > done
> > > carefully, otherwise the guarantees about reserved runtimes risk to be 
> > > broken.
> > >
> > > This (CPU reclaiming) has been investigated in the past, and a lot of 
> > > possible
> > > algorithms have been proposed. For example, see:
> > > "Greedy reclamation of unused bandwidth in constant-bandwidth servers" by
> > > Lipari and others:
> > > http://scholar.google.com/scholar?cluster=3702635449685717385&hl=it&as_sdt=0,5 
> > >
> > > or
> > > "Capacity sharing for overrun control" by Caccamo and others:
> > > http://scholar.google.com/scholar?cluster=8595685180937180485&hl=it&as_sdt=0,5 
> > >
> > > and of course all of the related papers found by Scholar (I cited 
> > > these two
> > > papers because I can easily remember them, but there are many other).
> > > There is also a nice paper by Scott Brandt that compares the 
> > > implementations
> > > of multiple reclaiming algorithms, but I do not remember the title.
> > >
> > > All of these papers provide a more sound theoretical foundation for 
> > > runtime/budget
> > > donation/sharing.
> > >
> > > As a form of shameless self-advertisement, I have also to mention that 
> > > adaptive
> > > reservations (feedback scheduling based on CBS/SCHED_DEADLINE) can be 
> > > related,
> > > allowing to dynamically adapt the runtime to the actual tasks demands.
> > > For example, see "Analysis of a reservation-based feedback scheduler":
> > > http://scholar.google.com/scholar?cluster=2878648944333333913&hl=it&as_sdt=0,5 

Thanks! Great references :)

> > > This can also be implemented in user-space (without modifying the 
> > > scheduler)
> > > by having a daemon that monitors the SCHED_DEADLINE tasks and changes 
> > > their
> > > runtimes based on some kind of feedback (for example, difference 
> > > between the
> > > scheduling deadline of a task and its actual deadline - if this 
> > > information
> > > is made available by the scheduler). I developed a similar 
> > > implementation in
> > > the past (not based on SCHED_DEADLINE, but on some previous 
> > > implementation
> > > of the CBS algorithm).

This sounds like a very slow approach. What if the extra BW given by T2 was 
for one period only? There's no way you could create a userspace daemon to 
handle that kind of budget-tweaking.

Also, it sounds like a *really* dangerous idea to let some random (tm) 
userspace daemon adjust the deadline-budget for other tasks in the system 
based on an observation of spent budget for the last X seconds. It's not 
military funding we're concerned with here.

When you state your WCET, it is not because you need -exactly- that budget, 
it is because you should *never* exceed that kind of rquired 
computational time.

Blindly reducing allocated runtime is defeating that whole purpose. 
Granted, if you use EDF for BW-control only, it could be done - but then 
the thread itself should do that. Real-time is not about being fair. Heck, 
it's not even about being optimal, it's about being predictable and 
"dynamically adjusting" is not!

> > > Hope these references can help someone... :)
> > >
> > Thanks very much for your reply in time.  I will read the above paper 
> > you referred to.
> > We just implemented simple dynamic quota according to our product 
> > requirement without many theories.
> 
> And this is quite a big "without", I fear. The timeliness guarantees
> you can give to users are based on the theory behind the algorithms we
> implement. If you change the implementation, without any proof of
> theoretical soundness, you risk to break them all.

To me, it sounds like the approach Yan is taking, is using EDF for granting 
an explicit BW of a system with guaranteed progress, so basically a small 
subset of what EDF can offer.

But if this is the case, why not let T1 and T3 grab BW from !SCHED_DEADLINE 
tasks? You already allocate 600us every 500us, so _at_minimum_ there should 
be 400us available. (yes, yes, starving out other parts of the system is 
bad, I know, especially rcuc is going to go nuts ;)
 
> But, this doesn't mean that what you did is wrong. We'd just have to
> see if can be proved to be right (in theory :)).
> 
> > In our scene some tasks are continuous for busy or idle state more than 
> > alternating busy tasks and idle task between periods.
> > So our method can get good performance during date package forwarding.
> 
> Not sure I understand this. But see below..
> 
> > Thanks for the authors to implement EDF based on linux. This schedule 
> > class has been applied in product.
> 
> And this is great news! :)
> 
> Any way you can share more information on the use-case? That would be
> helpful to also understand your solution better.
> 
> > Will EDF has dynamic quota in future?
> 
> Well, as Luca said, you can already use SCHED_DEADLINE as the backend
> for feedback scheduling (that pertain mostly to user-space).
> 
> And yes, there are already thoughts to modify it a bit to go towards
> Lipari's et al. GRUB algorithm. That would be probably helpful in
> situations like yours. But I can't give you any timing for it.

Need to read up on GRUB before involving myself in this part of the 
discussion, but I'm not sure how much I enjoy the idea of some part of 
userspace (more or less) blindly adjusting deadline-params for other tasks.


I think we need to be very careful about whether or not we're talking about 
EDF as a "CPU BW"-enforcer or a "meet a deadline for a periodic 
task"-enforcer. Those 2 will react quite differently from having their 
runtime adjusted on the fly.

Just my NOK 0.12

> > It is already having real-world impacts.
> 
> Great news again ;).

\o/

-- 
Henrik Austad

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFD] sched/deadline: EDF dynamic quota design
  2014-05-16  7:11         ` Henrik Austad
@ 2014-05-21 12:45           ` Luca Abeni
  2014-06-17  2:43             ` xiaofeng.yan
  0 siblings, 1 reply; 9+ messages in thread
From: Luca Abeni @ 2014-05-21 12:45 UTC (permalink / raw)
  To: Henrik Austad, Juri Lelli
  Cc: xiaofeng.yan, Peter Zijlstra, Ingo Molnar, duzhiping.du,
	xiaofeng.yan2012, raistlin, tkhai, harald.gustafsson,
	linux-kernel

Hi,

first of all, sorry for the ultra-delayed reply: I've been busy,
and I did not notice this email... Anyway, some comments are below

On 05/16/2014 09:11 AM, Henrik Austad wrote:
[...]
>>>> This can also be implemented in user-space (without modifying the
>>>> scheduler)
>>>> by having a daemon that monitors the SCHED_DEADLINE tasks and changes
>>>> their
>>>> runtimes based on some kind of feedback (for example, difference
>>>> between the
>>>> scheduling deadline of a task and its actual deadline - if this
>>>> information
>>>> is made available by the scheduler). I developed a similar
>>>> implementation in
>>>> the past (not based on SCHED_DEADLINE, but on some previous
>>>> implementation
>>>> of the CBS algorithm).
>
> This sounds like a very slow approach. What if the extra BW given by T2 was
> for one period only? There's no way you could create a userspace daemon to
> handle that kind of budget-tweaking.
Right. With "This can also be implemented in user-space..." I was referring
to the feedback scheduling (adaptive reservation) approach, which is designed
to "auto-tune" the reservation budget following slower variations (it is like
a low-pass filter, which can set the budget to something between the average
used budget and the largest one). Basically, it works on a larger time scale.
If you want a "per-period" runtime donation, you need a reclaiming mechanism
like GRUB, CASH, or similar, which needs to be implemented in the kernel.


> Also, it sounds like a *really* dangerous idea to let some random (tm)
> userspace daemon adjust the deadline-budget for other tasks in the system
> based on an observation of spent budget for the last X seconds. It's not
> military funding we're concerned with here.
>
> When you state your WCET, it is not because you need -exactly- that budget,
> it is because you should *never* exceed that kind of rquired
> computational time.
Exact. But the idea of feedback scheduling was that sometimes you do not know
the WCET... You can guess it, or measure it over a large number of runs (but
the Murphy law ensures that you will miss the worst case anyway ;-).
And there are situations in which you do not need to respect all of the
deadlines... The daemon I was talking about just monitors the difference between
the scheduling deadline and the "real job deadline" for some tasks, in order to
understand if the runtime they have been assigned is enough or not... If some
task is not receiving enough runtime (that is, if the difference between its
scheduling deadline and the "real deadline" becomes too large), the daemon
tries to increase its runtime. When the system is overloaded (that is, everyone
of the monitored tasks wants too much runtime, and the admission control fails),
the daemon decreases the runtimes according to the weight of the tasks...
Of course, the daemon does not have to monitor all of the SCHED_DEADLINE tasks
in the system, but only the ones for which adaptive reservations are useful
(tasks for which the WCET is not known for sure, and that can tollerate some
missed deadlines). The other SCHED_DEADLINE tasks can stay with their fixed
runtimes unchanged.


> Blindly reducing allocated runtime is defeating that whole purpose.
Of course, there could be a minimum guaranteed runtime per task.

> Granted, if you use EDF for BW-control only, it could be done - but then
> the thread itself should do that. Real-time is not about being fair. Heck,
> it's not even about being optimal, it's about being predictable and
> "dynamically adjusting" is not!
Well, this could lead to a long discussion, in which everyone is right and
everyone is wrong... Let's say that it depends on the applications requirements
and constraints.

[...]
>>> Will EDF has dynamic quota in future?
>>
>> Well, as Luca said, you can already use SCHED_DEADLINE as the backend
>> for feedback scheduling (that pertain mostly to user-space).
>>
>> And yes, there are already thoughts to modify it a bit to go towards
>> Lipari's et al. GRUB algorithm. That would be probably helpful in
>> situations like yours. But I can't give you any timing for it.
>
> Need to read up on GRUB before involving myself in this part of the
> discussion, but I'm not sure how much I enjoy the idea of some part of
> userspace (more or less) blindly adjusting deadline-params for other tasks.
No, GRUB does not involve the user-space adjusting any scheduling parameter.
GRUB is a reclaiming algorithm, which works in a different way respect to
the feedback scheduling approach I described, and requires modifications in
the scheduler.
The basic ideas are (warning! This is an over-simplification of the algorithm! :)
- You assign runtime and period to each SCHED_DEADLINE task as usual
- Each task is guaranteed to receive its runtime every period
- You can also define a maximum fraction Umax of the CPU time that the
   SCHED_DEADLINE tasks can use. Note that Umax _must_ be larger or equal
   than sum_i runtime_i / period_i
   (note: in the original GRUB paper, only one CPU is considered, and Umax is
   set equal to 1)
- If the tasks are consuming less than Umax, then the scheduling algorithm
   allows them to use more runtime (but not less than the guaranteed runtime_i)
   in order to use up to Umax.
   This is achieved by modifying the rule used to decrease the runtime: in
   SCHED_DEADLINE, if a task executes for a time delta, its runtime is decreased
   by delta; using GRUB, it would be decreased by a smaller amount of time
   (computed based on Umax, on the active SCHED_DEADLINE tasks, etc...).
   This requires to implement some kind of state machine (the details are in
   the GRUB paper)

I also had an implementation of the GRUB algorithm (based on a modification
of my old CBS scheduler for Linux), but the computational complexity of the
algorithm was too high. That's why I never proposed to merge it in SCHED_DEADLINE.
But maybe there can be some trade-off between the "exact compliance with the
GRUB algorithm" and implementation efficiency that can make it acceptable...


> I think we need to be very careful about whether or not we're talking about
> EDF as a "CPU BW"-enforcer or a "meet a deadline for a periodic
> task"-enforcer. Those 2 will react quite differently from having their
> runtime adjusted on the fly.
Well, IMHO the nice thing about SCHED_DEADLINE is that it can be both :)
It depends on how you configure the scheduling parameters.


			Luca

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFD] sched/deadline: EDF dynamic quota design
  2014-05-21 12:45           ` Luca Abeni
@ 2014-06-17  2:43             ` xiaofeng.yan
  2014-06-17  8:01               ` Luca Abeni
  0 siblings, 1 reply; 9+ messages in thread
From: xiaofeng.yan @ 2014-06-17  2:43 UTC (permalink / raw)
  To: Luca Abeni
  Cc: Henrik Austad, Juri Lelli, Peter Zijlstra, Ingo Molnar,
	duzhiping.du, xiaofeng.yan2012, raistlin, tkhai,
	harald.gustafsson, linux-kernel

On 2014/5/21 20:45, Luca Abeni wrote:
> Hi,
>
> first of all, sorry for the ultra-delayed reply: I've been busy,
> and I did not notice this email... Anyway, some comments are below
>
> On 05/16/2014 09:11 AM, Henrik Austad wrote:
> [...]
>>>>> This can also be implemented in user-space (without modifying the
>>>>> scheduler)
>>>>> by having a daemon that monitors the SCHED_DEADLINE tasks and changes
>>>>> their
>>>>> runtimes based on some kind of feedback (for example, difference
>>>>> between the
>>>>> scheduling deadline of a task and its actual deadline - if this
>>>>> information
>>>>> is made available by the scheduler). I developed a similar
>>>>> implementation in
>>>>> the past (not based on SCHED_DEADLINE, but on some previous
>>>>> implementation
>>>>> of the CBS algorithm).
>>
>> This sounds like a very slow approach. What if the extra BW given by 
>> T2 was
>> for one period only? There's no way you could create a userspace 
>> daemon to
>> handle that kind of budget-tweaking.
> Right. With "This can also be implemented in user-space..." I was 
> referring
> to the feedback scheduling (adaptive reservation) approach, which is 
> designed
> to "auto-tune" the reservation budget following slower variations (it 
> is like
> a low-pass filter, which can set the budget to something between the 
> average
> used budget and the largest one). Basically, it works on a larger time 
> scale.
> If you want a "per-period" runtime donation, you need a reclaiming 
> mechanism
> like GRUB, CASH, or similar, which needs to be implemented in the kernel.
>
>
>> Also, it sounds like a *really* dangerous idea to let some random (tm)
>> userspace daemon adjust the deadline-budget for other tasks in the 
>> system
>> based on an observation of spent budget for the last X seconds. It's not
>> military funding we're concerned with here.
>>
>> When you state your WCET, it is not because you need -exactly- that 
>> budget,
>> it is because you should *never* exceed that kind of rquired
>> computational time.
> Exact. But the idea of feedback scheduling was that sometimes you do 
> not know
> the WCET... You can guess it, or measure it over a large number of 
> runs (but
> the Murphy law ensures that you will miss the worst case anyway ;-).
> And there are situations in which you do not need to respect all of the
> deadlines... The daemon I was talking about just monitors the 
> difference between
> the scheduling deadline and the "real job deadline" for some tasks, in 
> order to
> understand if the runtime they have been assigned is enough or not... 
> If some
> task is not receiving enough runtime (that is, if the difference 
> between its
> scheduling deadline and the "real deadline" becomes too large), the 
> daemon
> tries to increase its runtime. When the system is overloaded (that is, 
> everyone
> of the monitored tasks wants too much runtime, and the admission 
> control fails),
> the daemon decreases the runtimes according to the weight of the tasks...
> Of course, the daemon does not have to monitor all of the 
> SCHED_DEADLINE tasks
> in the system, but only the ones for which adaptive reservations are 
> useful
> (tasks for which the WCET is not known for sure, and that can 
> tollerate some
> missed deadlines). The other SCHED_DEADLINE tasks can stay with their 
> fixed
> runtimes unchanged.
>
>
>> Blindly reducing allocated runtime is defeating that whole purpose.
> Of course, there could be a minimum guaranteed runtime per task.
>
>> Granted, if you use EDF for BW-control only, it could be done - but then
>> the thread itself should do that. Real-time is not about being fair. 
>> Heck,
>> it's not even about being optimal, it's about being predictable and
>> "dynamically adjusting" is not!
> Well, this could lead to a long discussion, in which everyone is right 
> and
> everyone is wrong... Let's say that it depends on the applications 
> requirements
> and constraints.
>
> [...]
>>>> Will EDF has dynamic quota in future?
>>>
>>> Well, as Luca said, you can already use SCHED_DEADLINE as the backend
>>> for feedback scheduling (that pertain mostly to user-space).
>>>
>>> And yes, there are already thoughts to modify it a bit to go towards
>>> Lipari's et al. GRUB algorithm. That would be probably helpful in
>>> situations like yours. But I can't give you any timing for it.
>>
>> Need to read up on GRUB before involving myself in this part of the
>> discussion, but I'm not sure how much I enjoy the idea of some part of
>> userspace (more or less) blindly adjusting deadline-params for other 
>> tasks.
> No, GRUB does not involve the user-space adjusting any scheduling 
> parameter.
> GRUB is a reclaiming algorithm, which works in a different way respect to
> the feedback scheduling approach I described, and requires 
> modifications in
> the scheduler.
> The basic ideas are (warning! This is an over-simplification of the 
> algorithm! :)
> - You assign runtime and period to each SCHED_DEADLINE task as usual
> - Each task is guaranteed to receive its runtime every period
> - You can also define a maximum fraction Umax of the CPU time that the
>   SCHED_DEADLINE tasks can use. Note that Umax _must_ be larger or equal
>   than sum_i runtime_i / period_i
>   (note: in the original GRUB paper, only one CPU is considered, and 
> Umax is
>   set equal to 1)
> - If the tasks are consuming less than Umax, then the scheduling 
> algorithm
>   allows them to use more runtime (but not less than the guaranteed 
> runtime_i)
>   in order to use up to Umax.
>   This is achieved by modifying the rule used to decrease the runtime: in
>   SCHED_DEADLINE, if a task executes for a time delta, its runtime is 
> decreased
>   by delta; using GRUB, it would be decreased by a smaller amount of time
>   (computed based on Umax, on the active SCHED_DEADLINE tasks, etc...).
>   This requires to implement some kind of state machine (the details 
> are in
>   the GRUB paper)
>
> I also had an implementation of the GRUB algorithm (based on a 
> modification
> of my old CBS scheduler for Linux), but the computational complexity 
> of the
> algorithm was too high. That's why I never proposed to merge it in 
> SCHED_DEADLINE.
> But maybe there can be some trade-off between the "exact compliance 
> with the
> GRUB algorithm" and implementation efficiency that can make it 
> acceptable...
>
>
Has these  codes been opened about the implementation in some community 
or not ?
Could you tell me where I should get the program from?
  I would like to test your solution in our scene.

Thanks
Yan
>> I think we need to be very careful about whether or not we're talking 
>> about
>> EDF as a "CPU BW"-enforcer or a "meet a deadline for a periodic
>> task"-enforcer. Those 2 will react quite differently from having their
>> runtime adjusted on the fly.
> Well, IMHO the nice thing about SCHED_DEADLINE is that it can be both :)
> It depends on how you configure the scheduling parameters.
>
>
>             Luca
>
> .
>



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFD] sched/deadline: EDF dynamic quota design
  2014-06-17  2:43             ` xiaofeng.yan
@ 2014-06-17  8:01               ` Luca Abeni
  2014-06-18  7:01                 ` xiaofeng.yan
  0 siblings, 1 reply; 9+ messages in thread
From: Luca Abeni @ 2014-06-17  8:01 UTC (permalink / raw)
  To: xiaofeng.yan
  Cc: Henrik Austad, Juri Lelli, Peter Zijlstra, Ingo Molnar,
	duzhiping.du, xiaofeng.yan2012, raistlin, tkhai,
	harald.gustafsson, linux-kernel

Hi,

On 06/17/2014 04:43 AM, xiaofeng.yan wrote:
[...]
>> The basic ideas are (warning! This is an over-simplification of the algorithm! :)
>> - You assign runtime and period to each SCHED_DEADLINE task as usual
>> - Each task is guaranteed to receive its runtime every period
>> - You can also define a maximum fraction Umax of the CPU time that the
>>   SCHED_DEADLINE tasks can use. Note that Umax _must_ be larger or equal
>>   than sum_i runtime_i / period_i
>>   (note: in the original GRUB paper, only one CPU is considered, and Umax is
>>   set equal to 1)
>> - If the tasks are consuming less than Umax, then the scheduling algorithm
>>   allows them to use more runtime (but not less than the guaranteed runtime_i)
>>   in order to use up to Umax.
>>   This is achieved by modifying the rule used to decrease the runtime: in
>>   SCHED_DEADLINE, if a task executes for a time delta, its runtime is decreased
>>   by delta; using GRUB, it would be decreased by a smaller amount of time
>>   (computed based on Umax, on the active SCHED_DEADLINE tasks, etc...).
>>   This requires to implement some kind of state machine (the details are in
>>   the GRUB paper)
>>
>> I also had an implementation of the GRUB algorithm (based on a modification
>> of my old CBS scheduler for Linux), but the computational complexity of the
>> algorithm was too high. That's why I never proposed to merge it in SCHED_DEADLINE.
>> But maybe there can be some trade-off between the "exact compliance with the
>> GRUB algorithm" and implementation efficiency that can make it acceptable...
>>
>>
> Has these  codes been opened about the implementation in some community or not ?
The old GRUB scheduler for Linux was used for some experiments published in a paper
at RTLWS 2007, and of course the code was open-source (released under GPL).
It required a patch for the Linux kernel (I used a 2.6.something kernel) which allowed
to load the scheduler as a kernel module (yes, I know this is the wrong way to go...
But implementing it like this was simpler :).
That is very old code... I probably still have it somewhere, but I have to search
for it. If someone is interested, I can try to search (the story of the user-space
daemon for adaptive reservations is similar: I released it as open-source years ago...
If anyone is interested I can search for this code too)


				Luca

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFD] sched/deadline: EDF dynamic quota design
  2014-06-17  8:01               ` Luca Abeni
@ 2014-06-18  7:01                 ` xiaofeng.yan
  2014-06-19  9:13                   ` Luca Abeni
  0 siblings, 1 reply; 9+ messages in thread
From: xiaofeng.yan @ 2014-06-18  7:01 UTC (permalink / raw)
  To: Luca Abeni
  Cc: Henrik Austad, Juri Lelli, Peter Zijlstra, Ingo Molnar,
	duzhiping.du, xiaofeng.yan2012, raistlin, tkhai,
	harald.gustafsson, linux-kernel

On 2014/6/17 16:01, Luca Abeni wrote:
> Hi,
>
> On 06/17/2014 04:43 AM, xiaofeng.yan wrote:
> [...]
>>> The basic ideas are (warning! This is an over-simplification of the 
>>> algorithm! :)
>>> - You assign runtime and period to each SCHED_DEADLINE task as usual
>>> - Each task is guaranteed to receive its runtime every period
>>> - You can also define a maximum fraction Umax of the CPU time that the
>>>   SCHED_DEADLINE tasks can use. Note that Umax _must_ be larger or 
>>> equal
>>>   than sum_i runtime_i / period_i
>>>   (note: in the original GRUB paper, only one CPU is considered, and 
>>> Umax is
>>>   set equal to 1)
>>> - If the tasks are consuming less than Umax, then the scheduling 
>>> algorithm
>>>   allows them to use more runtime (but not less than the guaranteed 
>>> runtime_i)
>>>   in order to use up to Umax.
>>>   This is achieved by modifying the rule used to decrease the 
>>> runtime: in
>>>   SCHED_DEADLINE, if a task executes for a time delta, its runtime 
>>> is decreased
>>>   by delta; using GRUB, it would be decreased by a smaller amount of 
>>> time
>>>   (computed based on Umax, on the active SCHED_DEADLINE tasks, etc...).
>>>   This requires to implement some kind of state machine (the details 
>>> are in
>>>   the GRUB paper)
>>>
>>> I also had an implementation of the GRUB algorithm (based on a 
>>> modification
>>> of my old CBS scheduler for Linux), but the computational complexity 
>>> of the
>>> algorithm was too high. That's why I never proposed to merge it in 
>>> SCHED_DEADLINE.
>>> But maybe there can be some trade-off between the "exact compliance 
>>> with the
>>> GRUB algorithm" and implementation efficiency that can make it 
>>> acceptable...
>>>
>>>
>> Has these  codes been opened about the implementation in some 
>> community or not ?
> The old GRUB scheduler for Linux was used for some experiments 
> published in a paper
> at RTLWS 2007, and of course the code was open-source (released under 
> GPL).
> It required a patch for the Linux kernel (I used a 2.6.something 
> kernel) which allowed
> to load the scheduler as a kernel module (yes, I know this is the 
> wrong way to go...
> But implementing it like this was simpler :).
> That is very old code... I probably still have it somewhere, but I 
> have to search
> for it. If someone is interested, I can try to search (the story of 
> the user-space
> daemon for adaptive reservations is similar: I released it as 
> open-source years ago...
> If anyone is interested I can search for this code too)
>
>
>                 Luca
>
I'm glad that you reply this email.  yes, I'm so interesting about your 
solution.  In fact , there are scenarios
in our product.  Could you send me a link if you have?  I can test your 
solution in our scene if you like.

Thanks
Yan
>



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFD] sched/deadline: EDF dynamic quota design
  2014-06-18  7:01                 ` xiaofeng.yan
@ 2014-06-19  9:13                   ` Luca Abeni
  2014-06-20  2:29                     ` xiaofeng.yan
  0 siblings, 1 reply; 9+ messages in thread
From: Luca Abeni @ 2014-06-19  9:13 UTC (permalink / raw)
  To: xiaofeng.yan
  Cc: Henrik Austad, Juri Lelli, Peter Zijlstra, Ingo Molnar,
	duzhiping.du, xiaofeng.yan2012, raistlin, tkhai,
	harald.gustafsson, linux-kernel

On 06/18/2014 09:01 AM, xiaofeng.yan wrote:
[...]
>>>> I also had an implementation of the GRUB algorithm (based on a modification
>>>> of my old CBS scheduler for Linux), but the computational complexity of the
>>>> algorithm was too high. That's why I never proposed to merge it in SCHED_DEADLINE.
>>>> But maybe there can be some trade-off between the "exact compliance with the
>>>> GRUB algorithm" and implementation efficiency that can make it acceptable...
>>>>
>>>>
>>> Has these  codes been opened about the implementation in some community or not ?
>> The old GRUB scheduler for Linux was used for some experiments published in a paper
>> at RTLWS 2007, and of course the code was open-source (released under GPL).
>> It required a patch for the Linux kernel (I used a 2.6.something kernel) which allowed
>> to load the scheduler as a kernel module (yes, I know this is the wrong way to go...
>> But implementing it like this was simpler :).
>> That is very old code... I probably still have it somewhere, but I have to search
>> for it. If someone is interested, I can try to search (the story of the user-space
>> daemon for adaptive reservations is similar: I released it as open-source years ago...
>> If anyone is interested I can search for this code too)
>>
>>
>>                 Luca
>>
> I'm glad that you reply this email.  yes, I'm so interesting about your solution.  In fact , there are scenarios
> in our product.  Could you send me a link if you have?  I can test your solution in our scene if you like.
Ok, so I found my old code for the CBS scheduler with GRUB modifications.
You can get it from here: http://disi.unitn.it/~abeni/old-cbs-scheduler.tgz

Please note that:
1) This is old code (for 2.6.x kernels), written before SCHED_DEADLINE development
    was started
2) The scheduler architecture is completely different respect to the current one,
    but the basic scheduling algorithm implemented by my old scheduler is the same
    one implemented by SCHED_DEADLINE (but I did not implement multi-processor support :)
3) You can have a look at the modifications needed to implement GRUB by simply grepping
    for "GRUB" in the source code. Basically, the algorithm is implemented by:
    1) Implementing a state machine to keep track of the current state of a task (is it
       using its reserved fraction of CPU time, did it already use such a fraction of CPU
       time, or is it not using any CPU time?). This is done by adding a "state" field in
       "cbs_struct", and properly updating it in cbs.c
    2) Keeping track of the total fraction of CPU time used by the active tasks. See the "U"
       variable in cbs.c (in a modern scheduler, it should probably become a field in the
       runqueue structure)
    3) Modifying the rule used to update the runtime. For a "standard" CBS without CPU
       reclaiming (the one implemented by SCHED_DEADLINE), if a task executes for an amount
       of time "delta" its runtime must be decreased by delta. For GRUB, it must be decreased
       by "delta" mutliplied by U. See "account()" in cbs.c.
       The "trick" is in properly updating U (and this is done using the state machine
       mentioned above)

Summing up, this code is not directly usable, but it shows you what needs to be done in
order to implement the GRUB mechanism for CPU reclaiming in a CBS scheduler...



				Luca

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFD] sched/deadline: EDF dynamic quota design
  2014-06-19  9:13                   ` Luca Abeni
@ 2014-06-20  2:29                     ` xiaofeng.yan
  0 siblings, 0 replies; 9+ messages in thread
From: xiaofeng.yan @ 2014-06-20  2:29 UTC (permalink / raw)
  To: Luca Abeni
  Cc: Henrik Austad, Juri Lelli, Peter Zijlstra, Ingo Molnar,
	duzhiping.du, xiaofeng.yan2012, raistlin, tkhai,
	harald.gustafsson, linux-kernel

On 2014/6/19 17:13, Luca Abeni wrote:
> On 06/18/2014 09:01 AM, xiaofeng.yan wrote:
> [...]
>>>>> I also had an implementation of the GRUB algorithm (based on a 
>>>>> modification
>>>>> of my old CBS scheduler for Linux), but the computational 
>>>>> complexity of the
>>>>> algorithm was too high. That's why I never proposed to merge it in 
>>>>> SCHED_DEADLINE.
>>>>> But maybe there can be some trade-off between the "exact 
>>>>> compliance with the
>>>>> GRUB algorithm" and implementation efficiency that can make it 
>>>>> acceptable...
>>>>>
>>>>>
>>>> Has these  codes been opened about the implementation in some 
>>>> community or not ?
>>> The old GRUB scheduler for Linux was used for some experiments 
>>> published in a paper
>>> at RTLWS 2007, and of course the code was open-source (released 
>>> under GPL).
>>> It required a patch for the Linux kernel (I used a 2.6.something 
>>> kernel) which allowed
>>> to load the scheduler as a kernel module (yes, I know this is the 
>>> wrong way to go...
>>> But implementing it like this was simpler :).
>>> That is very old code... I probably still have it somewhere, but I 
>>> have to search
>>> for it. If someone is interested, I can try to search (the story of 
>>> the user-space
>>> daemon for adaptive reservations is similar: I released it as 
>>> open-source years ago...
>>> If anyone is interested I can search for this code too)
>>>
>>>
>>>                 Luca
>>>
>> I'm glad that you reply this email.  yes, I'm so interesting about 
>> your solution.  In fact , there are scenarios
>> in our product.  Could you send me a link if you have?  I can test 
>> your solution in our scene if you like.
> Ok, so I found my old code for the CBS scheduler with GRUB modifications.
> You can get it from here: 
> http://disi.unitn.it/~abeni/old-cbs-scheduler.tgz
>
> Please note that:
> 1) This is old code (for 2.6.x kernels), written before SCHED_DEADLINE 
> development
>    was started
> 2) The scheduler architecture is completely different respect to the 
> current one,
>    but the basic scheduling algorithm implemented by my old scheduler 
> is the same
>    one implemented by SCHED_DEADLINE (but I did not implement 
> multi-processor support :)
> 3) You can have a look at the modifications needed to implement GRUB 
> by simply grepping
>    for "GRUB" in the source code. Basically, the algorithm is 
> implemented by:
>    1) Implementing a state machine to keep track of the current state 
> of a task (is it
>       using its reserved fraction of CPU time, did it already use such 
> a fraction of CPU
>       time, or is it not using any CPU time?). This is done by adding 
> a "state" field in
>       "cbs_struct", and properly updating it in cbs.c
>    2) Keeping track of the total fraction of CPU time used by the 
> active tasks. See the "U"
>       variable in cbs.c (in a modern scheduler, it should probably 
> become a field in the
>       runqueue structure)
>    3) Modifying the rule used to update the runtime. For a "standard" 
> CBS without CPU
>       reclaiming (the one implemented by SCHED_DEADLINE), if a task 
> executes for an amount
>       of time "delta" its runtime must be decreased by delta. For 
> GRUB, it must be decreased
>       by "delta" mutliplied by U. See "account()" in cbs.c.
>       The "trick" is in properly updating U (and this is done using 
> the state machine
>       mentioned above)
>
> Summing up, this code is not directly usable, but it shows you what 
> needs to be done in
> order to implement the GRUB mechanism for CPU reclaiming in a CBS 
> scheduler...
>
>
Thanks for giving me your solution. I will take a look at it and modify 
it in our scene later.

Thanks
Yan
>
>                 Luca
>
> .
>



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-06-20  2:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <537348DA.7080001@huawei.com>
     [not found] ` <20140514113245.GZ11096@twins.programming.kicks-ass.net>
2014-05-14 12:55   ` [RFD] sched/deadline: EDF dynamic quota design Peter Zijlstra
     [not found]   ` <53736CD9.90805@unitn.it>
     [not found]     ` <5374A335.90705@huawei.com>
2014-05-15 12:31       ` Juri Lelli
2014-05-16  7:11         ` Henrik Austad
2014-05-21 12:45           ` Luca Abeni
2014-06-17  2:43             ` xiaofeng.yan
2014-06-17  8:01               ` Luca Abeni
2014-06-18  7:01                 ` xiaofeng.yan
2014-06-19  9:13                   ` Luca Abeni
2014-06-20  2:29                     ` xiaofeng.yan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.