Re: [PATCH v9]xen: sched: convert RTDS from time to event driven model

From: Meng Xu <mengxu@cis.upenn.edu>
To: Tianyang Chen <tiche@seas.upenn.edu>
Cc: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Dario Faggioli <dario.faggioli@citrix.com>,
	George Dunlap <george.dunlap@citrix.com>,
	Dagaen Golomb <dgolomb@seas.upenn.edu>
Subject: Re: [PATCH v9]xen: sched: convert RTDS from time to event driven model
Date: Tue, 15 Mar 2016 23:11:10 -0400	[thread overview]
Message-ID: <CAENZ-+kxbhucXbGBrOozQ2h-AGa_w-WVc6hHxCXuhPKaSK22_Q@mail.gmail.com> (raw)
In-Reply-To: <1458000275-4237-1-git-send-email-tiche@seas.upenn.edu>

On Mon, Mar 14, 2016 at 8:04 PM, Tianyang Chen <tiche@seas.upenn.edu> wrote:
> Budget replenishment and enforcement are separated by adding
> a replenishment timer, which fires at the next most imminent
> release time of all runnable vcpus.
>
> A replenishment queue has been added to keep track of all vcpus that
> are runnable.
>
> The following functions have major changes to manage the replenishment
> events and timer.
>
> repl_handler(): It is a timer handler which is re-programmed
> to fire at the nearest vcpu deadline to replenish vcpus.
>
> rt_schedule(): picks the highest runnable vcpu based on cpu
> affinity and ret.time will be passed to schedule(). If an idle
> vcpu is picked, -1 is returned to avoid busy-waiting. repl_update()
> has been removed.
>
> rt_vcpu_wake(): when a vcpu wakes up, it tickles instead of
> picking one from the run queue.
>
> rt_context_saved(): when context switching is finished, the
> preempted vcpu will be put back into the runq.
>
> Simplified funtional graph:
>
> schedule.c SCHEDULE_SOFTIRQ:
>     rt_schedule():
>         [spin_lock]
>         burn_budget(scurr)
>         snext = runq_pick()
>         [spin_unlock]
>
> sched_rt.c TIMER_SOFTIRQ
>     replenishment_timer_handler()
>         [spin_lock]
>         <for_each_vcpu_on_q(i)> {
>             replenish(i)
>             runq_tickle(i)
>         }>
>         program_timer()
>         [spin_lock]
>
> Signed-off-by: Tianyang Chen <tiche@seas.upenn.edu>
> Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
> Signed-off-by: Dagaen Golomb <dgolomb@seas.upenn.edu>
> ---
> changes since v8:
>     Replaced spin_lock_irqsave with spin_lock_irq.
>     Bug fix in burn_budget() where budget == 0.
>     Removed unnecessary tickling in the timer handler when
>     vcpus have the same deadline.
>
> changes since v7:
>     Coding sytle.
>     Added a flag to indicate that a vcpu was depleted.
>     Added a local list including updated vcpus in the
>     timer handler. It is used to decide which vcpu should
>     tickle.
>
> changes since v6:
>     Removed unnecessary vcpu_on_q() checks for runq_insert()
>     Renamed replenishment queue related functions without
>     underscores
>     New replq_reinsert() function for rt_vcpu_wake()
>
> changes since v5:
>     Removed unnecessary vcpu_on_replq() checks
>     deadline_queue_insert() returns a flag to indicate if it's
>     needed to re-program the timer
>     Removed unnecessary timer checks
>     Added helper functions to remove vcpus from queues
>     coding style
>
> Changes since v4:
>     removed unnecessary replenishment queue checks in vcpu_wake()
>     extended replq_remove() to all cases in vcpu_sleep()
>     used _deadline_queue_insert() helper function for both queues
>     _replq_insert() and _replq_remove() program timer internally
>
> Changes since v3:
>     removed running queue.
>     added repl queue to keep track of repl events.
>     timer is now per scheduler.
>     timer is init on a valid cpu in a cpupool.
> ---
>  xen/common/sched_rt.c |  437 ++++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 341 insertions(+), 96 deletions(-)
>
> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index bfed2e2..b491915 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -3,7 +3,9 @@
>   * EDF scheduling is a real-time scheduling algorithm used in embedded field.
>   *
>   * by Sisu Xi, 2013, Washington University in Saint Louis
> - * and Meng Xu, 2014, University of Pennsylvania
> + * Meng Xu, 2014-2016, University of Pennsylvania
> + * Tianyang Chen, 2016, University of Pennsylvania
> + * Dagaen Golomb, 2016, University of Pennsylvania
>   *
>   * based on the code of credit Scheduler
>   */
> @@ -16,6 +18,7 @@
>  #include <xen/delay.h>
>  #include <xen/event.h>
>  #include <xen/time.h>
> +#include <xen/timer.h>
>  #include <xen/perfc.h>
>  #include <xen/sched-if.h>
>  #include <xen/softirq.h>
> @@ -87,7 +90,7 @@
>  #define RTDS_DEFAULT_BUDGET     (MICROSECS(4000))
>
>  #define UPDATE_LIMIT_SHIFT      10
> -#define MAX_SCHEDULE            (MILLISECS(1))
> +
>  /*
>   * Flags
>   */
> @@ -115,6 +118,18 @@
>  #define RTDS_delayed_runq_add (1<<__RTDS_delayed_runq_add)
>
>  /*
> + * The replenishment timer handler needs to check this bit
> + * to know where a replenished vcpu was, when deciding which
> + * vcpu should tickle.
> + * A replenished vcpu should tickle if it was moved from the
> + * depleted queue to the run queue.
> + * + Set in burn_budget() if a vcpu has zero budget left.
> + * + Cleared and checked in the repenishment handler.

It seems you have an extra + here...
Need to be removed.

My bad, I didn't spot it out in last patch... :-(

> + */
> +#define __RTDS_was_depleted     3
> +#define RTDS_was_depleted (1<<__RTDS_was_depleted)
> +
> +/*
>   * rt tracing events ("only" 512 available!). Check
>   * include/public/trace.h for more details.
>   */
> @@ -142,6 +157,9 @@ static cpumask_var_t *_cpumask_scratch;
>   */
>  static unsigned int nr_rt_ops;
>
> +/* handler for the replenishment timer */
> +static void repl_handler(void *data);
> +
>  /*
>   * Systme-wide private data, include global RunQueue/DepletedQ
>   * Global lock is referenced by schedule_data.schedule_lock from all
> @@ -152,7 +170,9 @@ struct rt_private {
>      struct list_head sdom;      /* list of availalbe domains, used for dump */
>      struct list_head runq;      /* ordered list of runnable vcpus */
>      struct list_head depletedq; /* unordered list of depleted vcpus */
> +    struct list_head replq;     /* ordered list of vcpus that need replenishment */
>      cpumask_t tickled;          /* cpus been tickled */
> +    struct timer *repl_timer;   /* replenishment timer */
>  };
>
>  /*
> @@ -160,6 +180,7 @@ struct rt_private {
>   */
>  struct rt_vcpu {
>      struct list_head q_elem;    /* on the runq/depletedq list */
> +    struct list_head replq_elem; /* on the repl event list */
>
>      /* Up-pointers */
>      struct rt_dom *sdom;
> @@ -213,8 +234,14 @@ static inline struct list_head *rt_depletedq(const struct scheduler *ops)
>      return &rt_priv(ops)->depletedq;
>  }
>
> +static inline struct list_head *rt_replq(const struct scheduler *ops)
> +{
> +    return &rt_priv(ops)->replq;
> +}
> +
>  /*
> - * Queue helper functions for runq and depletedq
> + * Helper functions for manipulating the runqueue, the depleted queue,
> + * and the replenishment events queue.
>   */
>  static int
>  __vcpu_on_q(const struct rt_vcpu *svc)
> @@ -228,6 +255,18 @@ __q_elem(struct list_head *elem)
>      return list_entry(elem, struct rt_vcpu, q_elem);
>  }
>
> +static struct rt_vcpu *
> +replq_elem(struct list_head *elem)
> +{
> +    return list_entry(elem, struct rt_vcpu, replq_elem);
> +}
> +
> +static int
> +vcpu_on_replq(const struct rt_vcpu *svc)
> +{
> +    return !list_empty(&svc->replq_elem);
> +}
> +
>  /*
>   * Debug related code, dump vcpu/cpu information
>   */
> @@ -288,7 +327,7 @@ rt_dump_pcpu(const struct scheduler *ops, int cpu)
>  static void
>  rt_dump(const struct scheduler *ops)
>  {
> -    struct list_head *runq, *depletedq, *iter;
> +    struct list_head *runq, *depletedq, *replq, *iter;
>      struct rt_private *prv = rt_priv(ops);
>      struct rt_vcpu *svc;
>      struct rt_dom *sdom;
> @@ -301,6 +340,7 @@ rt_dump(const struct scheduler *ops)
>
>      runq = rt_runq(ops);
>      depletedq = rt_depletedq(ops);
> +    replq = rt_replq(ops);
>
>      printk("Global RunQueue info:\n");
>      list_for_each( iter, runq )
> @@ -316,6 +356,13 @@ rt_dump(const struct scheduler *ops)
>          rt_dump_vcpu(ops, svc);
>      }
>
> +    printk("Global Replenishment Event info:\n");
> +    list_for_each ( iter, replq )
> +    {
> +        svc = replq_elem(iter);
> +        rt_dump_vcpu(ops, svc);
> +    }
> +
>      printk("Domain info:\n");
>      list_for_each( iter, &prv->sdom )
>      {
> @@ -380,11 +427,78 @@ rt_update_deadline(s_time_t now, struct rt_vcpu *svc)
>      return;
>  }
>
> +/*
> + * A helper function that only removes a vcpu from a queue
> + * and it returns 1 if the vcpu was in front of the list.
> + */
> +static inline int
> +deadline_queue_remove(struct list_head *queue, struct list_head *elem)
> +{
> +    int pos = 0;
> +
> +    if ( queue->next != elem )
> +        pos = 1;
> +
> +    list_del_init(elem);
> +    return !pos;
> +}
> +
>  static inline void
>  __q_remove(struct rt_vcpu *svc)
>  {
> -    if ( __vcpu_on_q(svc) )
> -        list_del_init(&svc->q_elem);
> +    ASSERT( __vcpu_on_q(svc) );
> +    list_del_init(&svc->q_elem);
> +}
> +
> +/*
> + * Removing a vcpu from the replenishment queue could
> + * re-program the timer for the next replenishment event
> + * if it was at the front of the list.
> + */
> +static inline void
> +__replq_remove(const struct scheduler *ops, struct rt_vcpu *svc)
> +{
> +    struct rt_private *prv = rt_priv(ops);
> +    struct list_head *replq = rt_replq(ops);
> +    struct timer* repl_timer = prv->repl_timer;
> +
> +    ASSERT( vcpu_on_replq(svc) );
> +
> +    if ( deadline_queue_remove(replq, &svc->replq_elem) )
> +    {
> +        /* re-arm the timer for the next replenishment event */
> +        if ( !list_empty(replq) )
> +        {
> +            struct rt_vcpu *svc_next = replq_elem(replq->next);
> +            set_timer(repl_timer, svc_next->cur_deadline);
> +        }
> +        else
> +            stop_timer(repl_timer);
> +    }
> +}
> +
> +/*
> + * An utility function that inserts a vcpu to a
> + * queue based on certain order (EDF). The returned
> + * value is 1 if a vcpu has been inserted to the
> + * front of a list.
> + */
> +static int
> +deadline_queue_insert(struct rt_vcpu * (*_get_q_elem)(struct list_head *elem),
> +    struct rt_vcpu *svc, struct list_head *elem, struct list_head *queue)
> +{
> +    struct list_head *iter;
> +    int pos = 0;
> +
> +    list_for_each ( iter, queue )
> +    {
> +        struct rt_vcpu * iter_svc = (*_get_q_elem)(iter);
> +        if ( svc->cur_deadline <= iter_svc->cur_deadline )
> +            break;
> +        pos++;
> +    }
> +    list_add_tail(elem, iter);
> +    return !pos;
>  }
>
>  /*
> @@ -397,27 +511,62 @@ __runq_insert(const struct scheduler *ops, struct rt_vcpu *svc)
>  {
>      struct rt_private *prv = rt_priv(ops);
>      struct list_head *runq = rt_runq(ops);
> -    struct list_head *iter;
>
>      ASSERT( spin_is_locked(&prv->lock) );
> -
>      ASSERT( !__vcpu_on_q(svc) );
> +    ASSERT( vcpu_on_replq(svc) );
>
>      /* add svc to runq if svc still has budget */
>      if ( svc->cur_budget > 0 )
> -    {
> -        list_for_each(iter, runq)
> -        {
> -            struct rt_vcpu * iter_svc = __q_elem(iter);
> -            if ( svc->cur_deadline <= iter_svc->cur_deadline )
> -                    break;
> -         }
> -        list_add_tail(&svc->q_elem, iter);
> -    }
> +        deadline_queue_insert(&__q_elem, svc, &svc->q_elem, runq);
>      else
> -    {
>          list_add(&svc->q_elem, &prv->depletedq);
> +}
> +
> +/*
> + * Insert svc into the replenishment event list
> + * in replenishment time order.
> + * vcpus that need to be replished earlier go first.
> + * The timer may be re-programmed if svc is inserted
> + * at the front of the event list.
> + */
> +static void
> +__replq_insert(const struct scheduler *ops, struct rt_vcpu *svc)
> +{
> +    struct list_head *replq = rt_replq(ops);
> +    struct rt_private *prv = rt_priv(ops);
> +    struct timer *repl_timer = prv->repl_timer;
> +
> +    ASSERT( !vcpu_on_replq(svc) );
> +
> +    if ( deadline_queue_insert(&replq_elem, svc, &svc->replq_elem, replq) )
> +        set_timer(repl_timer, svc->cur_deadline);
> +}
> +
> +/*
> + * Removes and re-inserts an event to the replenishment queue.
> + */
> +static void
> +replq_reinsert(const struct scheduler *ops, struct rt_vcpu *svc)
> +{
> +    struct list_head *replq = rt_replq(ops);
> +    struct rt_private *prv = rt_priv(ops);
> +    struct timer *repl_timer = prv->repl_timer;
> +
> +    ASSERT( vcpu_on_replq(svc) );
> +
> +    if ( deadline_queue_remove(replq, &svc->replq_elem) )
> +    {
> +        if ( deadline_queue_insert(&replq_elem, svc, &svc->replq_elem, replq) )
> +            set_timer(repl_timer, svc->cur_deadline);
> +        else
> +        {
> +            struct rt_vcpu *svc_next = replq_elem(replq->next);
> +            set_timer(repl_timer, svc_next->cur_deadline);
> +        }
>      }
> +    else if ( deadline_queue_insert(&replq_elem, svc, &svc->replq_elem, replq) )
> +        set_timer(repl_timer, svc->cur_deadline);
>  }
>
>  /*
> @@ -449,11 +598,18 @@ rt_init(struct scheduler *ops)
>      INIT_LIST_HEAD(&prv->sdom);
>      INIT_LIST_HEAD(&prv->runq);
>      INIT_LIST_HEAD(&prv->depletedq);
> +    INIT_LIST_HEAD(&prv->replq);
>
>      cpumask_clear(&prv->tickled);
>
>      ops->sched_data = prv;
>
> +    /*
> +     * The timer initialization will happen later when
> +     * the first pcpu is added to this pool in alloc_pdata.
> +     */
> +    prv->repl_timer = NULL;
> +
>      return 0;
>
>   no_mem:
> @@ -473,6 +629,10 @@ rt_deinit(struct scheduler *ops)
>          xfree(_cpumask_scratch);
>          _cpumask_scratch = NULL;
>      }
> +
> +    kill_timer(prv->repl_timer);
> +    xfree(prv->repl_timer);
> +
>      ops->sched_data = NULL;
>      xfree(prv);
>  }
> @@ -494,6 +654,17 @@ rt_alloc_pdata(const struct scheduler *ops, int cpu)
>      if ( !alloc_cpumask_var(&_cpumask_scratch[cpu]) )
>          return NULL;
>
> +    if ( prv->repl_timer == NULL )
> +    {
> +        /* Allocate the timer on the first cpu of this pool. */
> +        prv->repl_timer = xzalloc(struct timer);
> +
> +        if ( prv->repl_timer == NULL )
> +            return NULL;
> +
> +        init_timer(prv->repl_timer, repl_handler, (void *)ops, cpu);
> +    }
> +
>      /* 1 indicates alloc. succeed in schedule.c */
>      return (void *)1;
>  }
> @@ -587,6 +758,7 @@ rt_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
>          return NULL;
>
>      INIT_LIST_HEAD(&svc->q_elem);
> +    INIT_LIST_HEAD(&svc->replq_elem);
>      svc->flags = 0U;
>      svc->sdom = dd;
>      svc->vcpu = vc;
> @@ -610,7 +782,8 @@ rt_free_vdata(const struct scheduler *ops, void *priv)
>  }
>
>  /*
> - * This function is called in sched_move_domain() in schedule.c
> + * It is called in sched_move_domain() and sched_init_vcpu
> + * in schedule.c.
>   * When move a domain to a new cpupool.
>   * It inserts vcpus of moving domain to the scheduler's RunQ in
>   * dest. cpupool.
> @@ -628,8 +801,13 @@ rt_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
>      if ( now >= svc->cur_deadline )
>          rt_update_deadline(now, svc);
>
> -    if ( !__vcpu_on_q(svc) && vcpu_runnable(vc) && !vc->is_running )
> -        __runq_insert(ops, svc);
> +    if ( !__vcpu_on_q(svc) && vcpu_runnable(vc) )
> +    {
> +        __replq_insert(ops, svc);
> +
> +        if ( !vc->is_running )
> +            __runq_insert(ops, svc);
> +    }
>      vcpu_schedule_unlock_irq(lock, vc);
>
>      SCHED_STAT_CRANK(vcpu_insert);
> @@ -652,6 +830,10 @@ rt_vcpu_remove(const struct scheduler *ops, struct vcpu *vc)
>      lock = vcpu_schedule_lock_irq(vc);
>      if ( __vcpu_on_q(svc) )
>          __q_remove(svc);
> +
> +    if ( vcpu_on_replq(svc) )
> +        __replq_remove(ops,svc);
> +
>      vcpu_schedule_unlock_irq(lock, vc);
>  }
>
> @@ -706,8 +888,15 @@ burn_budget(const struct scheduler *ops, struct rt_vcpu *svc, s_time_t now)
>
>      svc->cur_budget -= delta;
>
> -    if ( svc->cur_budget < 0 )
> +    if ( svc->cur_budget <= 0 )
> +    {
>          svc->cur_budget = 0;
> +        /*
> +         * Set __RTDS_was_depleted so the replenishment
> +         * handler can let this vcpu tickle if it was refilled.
> +         */
> +        set_bit(__RTDS_was_depleted, &svc->flags);
> +    }
>
>      /* TRACE */
>      {
> @@ -784,44 +973,6 @@ __runq_pick(const struct scheduler *ops, const cpumask_t *mask)
>  }
>
>  /*
> - * Update vcpu's budget and
> - * sort runq by insert the modifed vcpu back to runq
> - * lock is grabbed before calling this function
> - */
> -static void
> -__repl_update(const struct scheduler *ops, s_time_t now)
> -{
> -    struct list_head *runq = rt_runq(ops);
> -    struct list_head *depletedq = rt_depletedq(ops);
> -    struct list_head *iter;
> -    struct list_head *tmp;
> -    struct rt_vcpu *svc = NULL;
> -
> -    list_for_each_safe(iter, tmp, runq)
> -    {
> -        svc = __q_elem(iter);
> -        if ( now < svc->cur_deadline )
> -            break;
> -
> -        rt_update_deadline(now, svc);
> -        /* reinsert the vcpu if its deadline is updated */
> -        __q_remove(svc);
> -        __runq_insert(ops, svc);
> -    }
> -
> -    list_for_each_safe(iter, tmp, depletedq)
> -    {
> -        svc = __q_elem(iter);
> -        if ( now >= svc->cur_deadline )
> -        {
> -            rt_update_deadline(now, svc);
> -            __q_remove(svc); /* remove from depleted queue */
> -            __runq_insert(ops, svc); /* add to runq */
> -        }
> -    }
> -}
> -
> -/*
>   * schedule function for rt scheduler.
>   * The lock is already grabbed in schedule.c, no need to lock here
>   */
> @@ -840,8 +991,6 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
>      /* burn_budget would return for IDLE VCPU */
>      burn_budget(ops, scurr, now);
>
> -    __repl_update(ops, now);
> -
>      if ( tasklet_work_scheduled )
>      {
>          trace_var(TRC_RTDS_SCHED_TASKLET, 1, 0,  NULL);
> @@ -868,6 +1017,7 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
>          set_bit(__RTDS_delayed_runq_add, &scurr->flags);
>
>      snext->last_start = now;
> +    ret.time =  -1; /* if an idle vcpu is picked */
>      if ( !is_idle_vcpu(snext->vcpu) )
>      {
>          if ( snext != scurr )
> @@ -880,9 +1030,8 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
>              snext->vcpu->processor = cpu;
>              ret.migrated = 1;
>          }
> +        ret.time = snext->budget; /* invoke the scheduler next time */

Ah, this is incorrect, although this is easy to fix.

The ret.time is the relative time when the *budget enforcement* timer
should be invoked.
Since snext is not always full budget, it may have already consumed some budget.

It should be
ret.time = snext->cur_budget

Isn't it? :-)

The other parts of the scheduler looks fine to me, as long as the
above comments are solved.

Hi Tianyang,
Great and expedite work! :-D

Hi Dario,
Could you have another look at this patch and see if it's ok after the
above comments are solved?

Thank you very much!

Meng

-- 
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel