On mer, 2014-07-09 at 16:55 -0400, Josh Whitehead wrote:
> This patch adds the pieces needed for the CBS functionality. 
>
So, first of all, this is absolutely my favorite resource-reservation
based real-time scheduling algorithm... so glad you chose it! :-P

> This includes a 
> "soft" flag to indicate whether a domain should be handled in a pure EDF manner
> or handled by the CBS algorithm, the CBS algorithm itself to the wake function,
> and few other bits and comments to track scheduler statistics, set parameters, 
> and document the functionality of the code itself.
> 
Indeed, it's something pretty easy to implement, on top of a decent
EDF-looking scheduler.

> --- a/xen/common/sched_sedf.c
> +++ b/xen/common/sched_sedf.c

>  /******************************************************************************
> - * Simple EDF Scheduler for xen
> + * Simple EDF Scheduler for Xen
>   *
This line changed already in patch 2.

>   * Permission is hereby granted, free of charge, to any person obtaining a copy
>   * of this software and associated documentation files (the "Software"), to
> @@ -46,6 +46,7 @@
>  #define CHECK(_p) ((void)0)
>  #endif
>  
> +#define SEDF_SOFT_TASK (1)
>  #define SEDF_ASLEEP (16)
>  
And this is why I was saying it was ugly and not very practical to leave
SEDF_ASLEEP alone and #define to 16. Now you have two status flag, and
instead of bits 1 and 2, they're represented by bits 1 and 5... Weird,
isn't it?

>  #define DEFAULT_PERIOD (MILLISECS(20))
> @@ -73,7 +74,8 @@ struct sedf_vcpu_info {
>   
>      /* Parameters for EDF */
>      s_time_t  period;  /* = Server scheduling period */
> -    s_time_t  budget;   /* = Guarenteed minimum CPU time per period */
> +    s_time_t  budget;  /* = Guarenteed minimum CPU time per period */
>
And this one too.

Just decide how do you want them to look, and make that happen once and
for ever! :-P :-P

> +    /* Note: Server bandwidth = (budget / period) */
>
Again, I'd avoid the word server. 'VCPU bandwidth' sounds just fine.
 
> @@ -208,6 +215,7 @@ static void *sedf_alloc_vdata(const struct scheduler *ops, struct vcpu *v, void
>      inf->vcpu = v;
>  
>      inf->deadl_abs  = 0;
> +    inf->cputime    = 0;
>      inf->status     = SEDF_ASLEEP;
>
inf is xzalloc-ed above, so I don't think we need to zero the fields
explicitly. I see this is probably an attempt to achieve more simmetry
and clarity, given the already existing explicit zeroing of deadl_abs.

I don't have a strong opinion. It's certainly inconsistent and
odd-looking to have only one, so I'd either =0 both (as above) or none.
However, whatever path you choose to go, I don't think either this or
the removal of deadl_abs=0 belongs to this patch.

It's probably best to put this hunk in one of the cleanup two cleanup
patches before this.

> @@ -306,29 +314,64 @@ static void desched_edf_vcpu(s_time_t now, struct vcpu *v)
>      /* Update the vcpu's cputime */
>      inf->cputime += now - inf->sched_start_abs;
>  
> -    /* Scheduling decisions which don't remove the running vcpu from
> -     * the runq */
> +    /* If running vcpu has budget remaining, return */
>      if ( (inf->cputime < inf->budget) && sedf_runnable(v) )
>          return;
>
>
I happen to like the original comment better. Well, actually, I'd change
it a bit myself, but please, stick to something that provides some
insights about "the semantic" of what's happening.

I mean, the fact that you return if the condition is true is already
evident from the code, no much point in saying it in a comment. :-)

OTOH, it may help the reader to tell him what's the consequence of such
return, i.e., "if the vcpu still has budget left, leave it alone on the
RunQ" (or something like that).

> +    /* Current vcpu has consumed its budget for this period */
>      __del_from_queue(v);
>  
> -    /*
> -     * Manage bookkeeping (i.e. calculate next deadline, memorise
> -     * overrun-time of budget) of finished vcpus.
> -     */
> +#ifdef SEDF_STATS
> +    /* Manage deadline misses */
> +    if ( unlikely(inf->deadl_abs < now) )
> +    {
> +        inf->miss_tot++;
> +        inf->miss_time += inf->cputime;
>
Two questions:
 1) you're checking whether a deadline miss happened only in case there
    has been a budget overrun, right? So, why? Can't a vcpu overcome its
    deadline even without overrunning (e.g., in case the system is
    overloaded to more than 100%, i UP case, utilization)?
 2) By miss_time you mean by how much the deadline has been missed (in
    literature, that's called tardiness), right? If yes, what makes you
    sure that the miss time is inf->cputime?

> +    }
> +#endif
> +
>
These #ifdef are really really ugly. I think using some helpers, and
their empty stubs in case SEDF_STATS is not defined, would make it all
look a lot better.

E.g., have a look at what's done in sched_credit.c

However, this can well happen in another unrelated series.

> +    /* Manage overruns */
>      if ( inf->cputime >= inf->budget )
>      {
>          inf->cputime -= inf->budget;
>  
>          /* Set next deadline */
>          inf->deadl_abs += inf->period;
> +
> +        /* Ensure that the cputime is always less than budget */
> +        if ( unlikely(inf->cputime > inf->budget) )
> +        {
> +#ifdef SEDF_STATS
> +            inf->over_tot++;
>
Why this is incremented only here? Or, I guess, what's the intended
meaning and usage of over_tot?

I'm asking because it looks like you want to count the number of
overruns. However, it seems to me that putting this here, you are
counting the occurrences of the vcpu having executed more than twice its
budget (you do the ++ only if cputime-budget>budget ==>
cputime>2*budget)... Is that what you want?

> +            inf->over_time += inf->cputime;
> +#endif
> +
> +            /* Make up for the overage by pushing the deadline
> +               into the future; ensure one domain doesn't consistently
> +               consume extra time by overrunning its budget */
>
Style of the comment.

> +            inf->deadl_abs += ((inf->cputime / inf->budget)
> +                               * inf->period);
> +            inf->cputime -= (inf->cputime / inf->budget) * inf->budget;
> +        }
> +
Honestly, the whole "if ( inf->cputime >= inf->budget ) {}" block looks
a bit clumsy to me. How about just something like this:

    while ( inf->cputime >= inf->budget )
    {
        inf->over_tot++; // if you want to count *all* overruns,
                         // otherwise, skip the ++ for 1st iteration

        inf->cputime -= inf->budget;
        inf->deadl_abs += inf->period;
    }

> +        /* Ensure that the start of the next period is in the future */
> +        if ( unlikely(PERIOD_BEGIN(inf) < now) )
> +            inf->deadl_abs += 
> +                (DIV_UP(now - PERIOD_BEGIN(inf),
> +                        inf->period)) * inf->period;
>
Well, I agree that checking that the new period begins in future is
required. However, if at this point that is not true, we really are
lagging quite a bit!

On what to do, I'm not sure whether I'd just push the deadline ahead, as
you're doing, or go for a full reset (e.g., deadl_abs=NOW()+period).

I'm fine with the pushing ahead, if you like it better, but I'd consider
printing a warning somewhere, as this really should not happen.

>      }
>   
>      /* Add a runnable vcpu to the appropriate queue */
>      if ( sedf_runnable(v) )
>      {
> -        __add_to_waitqueue_sort(v);
> +        if( sedf_soft(v) )
> +        {
> +            __add_to_runqueue_sort(v);
> +        }
> +        else 
> +        {
> +            __add_to_waitqueue_sort(v);
> +        }
>      }
>  
>      ASSERT(EQ(sedf_runnable(v), __task_on_queue(v)));
>
As I said in another mail, let's kill EQ() macro too.

ASSERT(sedf_runnable(v) == __task_on_queue(v));

looks perfectly fine to me.

Of course, do it in the appropriate patch (not here).

> @@ -418,9 +461,9 @@ static void sedf_deinit(const struct scheduler *ops)
>  static struct task_slice sedf_do_schedule(
>      const struct scheduler *ops, s_time_t now, bool_t tasklet_work_scheduled)
>  {
> -    int                   cpu      = smp_processor_id();
> -    struct list_head     *runq     = RUNQ(cpu);
> -    struct list_head     *waitq    = WAITQ(cpu);
> +    int                    cpu     = smp_processor_id();
> +    struct list_head      *runq    = RUNQ(cpu);
> +    struct list_head      *waitq   = WAITQ(cpu);
>      struct sedf_vcpu_info *inf     = SEDF_VCPU(current);
>      struct sedf_vcpu_info *runinf, *waitinf;
>      struct task_slice      ret;
>
I may be wrong, but I feel like this hunk is something that would be
better fixed in patch 2 than in this one, isn't it?

> @@ -469,7 +512,7 @@ static struct task_slice sedf_do_schedule(
>              waitinf  = list_entry(waitq->next,
>                                    struct sedf_vcpu_info, list);
>              /*
> -             * Rerun scheduler, when scheduled vcpu consumes
> +             * Rerun scheduler when scheduled vcpu consumes
>
And this one too.

>               * its budget or the first vcpu from the waitqueue
>               * gets ready.
>               */

> @@ -490,7 +533,7 @@ static struct task_slice sedf_do_schedule(
>      }
>  
>      /*
> -     * TODO: Do something USEFUL when this happens and find out, why it
> +     * TODO: Do something USEFUL when this happens and find out why it
>
And also this one.

>       * still can happen!!!
>       */
>      if ( ret.time < 0)

> @@ -554,15 +597,21 @@ static inline int should_switch(struct vcpu *cur,
>  /*
>   * This function wakes up a vcpu, i.e. moves them into the appropriate queue
>   *
> - * When a blocked vcpu unblocks, it is allowed to start execution at
> - * the beginning of the next complete period
> - * (D..deadline, R..running, B..blocking/sleeping, U..unblocking/waking up
> + *  For Hard Real-Time vcpus (soft = 0):
> + *     -When a blocked vcpu unblocks, it is allowed to start execution at
> + *      the beginning of the next complete period
> + *      (D..deadline, R..running, B..blocking/sleeping, U..unblocking/waking up
> + *
> + *      DRRB_____D__U_____DRRRRR___D________ ... 
> + *
> + *     -This causes the vcpu to miss a period (and a deadlline)
> + *     -Doesn't disturb the schedule at all
> + *     -Deadlines keep occuring isochronous
>
So, now I'm curious, why did you pick this variant, instead of the
'let's reset everything at wake-up time' one? You think keeping deadline
isochronous is important, to the point that is ok to postpone execution
by one period?

(Note that I'm not saying that is not the case... I'm genuinely
asking :-) )

> -    if ( now < inf->deadl_abs )
> -    {
> -        /* Short blocking */
> -        inf->short_block_tot++;
> -    }
> -    else
> +    if ( sedf_soft(v) )
>      {
> -        /* Long unblocking, someone is going to miss their deadline. */
> -        inf->long_block_tot++;
> +        /* 
> +         * Apply CBS rule
> +         * Where:
> +         *      c == Remaining server budget == (inf->budget - cpu_time) 
> +         *      d == Server (vcpu) deadline  == inf->deadl_abs
> +         *      r == Wake-up time of vcpu    == now
> +         *      U == Server (vcpu) bandwidth == (inf->budget / inf->period)
> +         *
> +         * if c>=(d-r)*U  --->  
> +         *      (inf->budget - cputime) >= 
> +         *              (inf->deadl_abs - now) * (inf->period / inf->period)
> +         *
> +         * If true, push deadline back by one period and refresh budget, else
> +         * use current budget and deadline.
> +         * 
> +         * Note: The 'if' statement below is equivalent to the above comments;
> +         *       it has been simplified to avoid the division operator
> +         */
> +        if ((inf->budget - inf->cputime) * inf->period >=
> +            (inf->deadl_abs - now) * inf->budget)
> +        {
> +            /* Push back deadline by one period */
> +            inf->deadl_abs += inf->period;
> +            inf->cputime = 0;
> +        }
> +        
This looks ok.

> +        /* 
> +         * In CBS we don't care if the period has begun,
> +         * the task doesn't have to wait for its period
> +         * because it'll never request more than its budget
> +         * for any given period.
> +         */
> +        __add_to_runqueue_sort(v);
>
Well, I'm not sure I understand the comment or, if I understand it, that
I agree with it. It's not that the vcpu won't request more than its
budget. A vcpu may well try to execute for 100% of the time... What
happens in that case is that the budgetting mechanism of the algorithm
kicks in and slow it down, by pushing its deadline away (and hence
lowering its priority).

But perhaps you were trying to say something different?

>      }
> +    else {
> +        /* Task is a hard task, treat accordingly */
> +#ifdef SEDF_STATS
> +        if ( now < inf->deadl_abs )
> +        {
> +            /* Short blocking */
> +            inf->short_block_tot++;
> +        }
> +        else
> +        {
> +            /* Long unblocking, someone is going to miss their deadline. */
> +            inf->long_block_tot++;
>
Again, I'm not sure I understand who 'someone' should be. It looks to me
that it is right the vcpu that is unblocking that risks to miss its
deadline, since it will have to skip a period.

It should not be any of the other vcpus that miss its own deadline,
because of this vcpu unblocking, or we'd be breaking the CPU isolation
feature that this kind of real-time scheduling is supposed to provide...
Is that what you meant?

> +        }
> +#endif
>
I think this could have been inside #ifdef SEDF_STATS from the
beginning, which would mean do part of this in patch 2.
 
> -    if ( PERIOD_BEGIN(inf) > now )
> -        __add_to_waitqueue_sort(v);
> -    else
> -        __add_to_runqueue_sort(v);
> +        if ( PERIOD_BEGIN(inf) > now )
> +            __add_to_waitqueue_sort(v);
> +        else
> +            __add_to_runqueue_sort(v);
> +    }
>   
>  #ifdef SEDF_STATS
>      /* Do some statistics here... */
> @@ -626,7 +716,7 @@ static void sedf_wake(const struct scheduler *ops, struct vcpu *v)
>          cpu_raise_softirq(v->processor, SCHEDULE_SOFTIRQ);
>  }
>  
> -/* Print a lot of useful information about a vcpus in the system */
> +/* Print a lot of useful information about a vcpu in the system */
>
ditto (do this in patch 2).

> @@ -760,8 +868,9 @@ static int sedf_adjust(const struct scheduler *ops, struct domain *d, struct xen
>              goto out;
>          }
>  
> -        op->u.sedf.period    = SEDF_VCPU(d->vcpu[0])->period;
> -        op->u.sedf.slice     = SEDF_VCPU(d->vcpu[0])->budget;
> +        op->u.sedf.period = SEDF_VCPU(d->vcpu[0])->period;
> +        op->u.sedf.budget = SEDF_VCPU(d->vcpu[0])->budget;
> +        op->u.sedf.soft   = sedf_soft(d->vcpu[0]);
>
And again (i.e., do this in patch 2?).
>      }
>  

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)