From: Meng Xu <mengxu@cis.upenn.edu>
To: Tianyang Chen <tiche@seas.upenn.edu>
Cc: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
Dario Faggioli <dario.faggioli@citrix.com>,
George Dunlap <george.dunlap@citrix.com>,
Dagaen Golomb <dgolomb@seas.upenn.edu>
Subject: Re: [PATCH v9]xen: sched: convert RTDS from time to event driven model
Date: Tue, 15 Mar 2016 23:11:10 -0400 [thread overview]
Message-ID: <CAENZ-+kxbhucXbGBrOozQ2h-AGa_w-WVc6hHxCXuhPKaSK22_Q@mail.gmail.com> (raw)
In-Reply-To: <1458000275-4237-1-git-send-email-tiche@seas.upenn.edu>
On Mon, Mar 14, 2016 at 8:04 PM, Tianyang Chen <tiche@seas.upenn.edu> wrote:
> Budget replenishment and enforcement are separated by adding
> a replenishment timer, which fires at the next most imminent
> release time of all runnable vcpus.
>
> A replenishment queue has been added to keep track of all vcpus that
> are runnable.
>
> The following functions have major changes to manage the replenishment
> events and timer.
>
> repl_handler(): It is a timer handler which is re-programmed
> to fire at the nearest vcpu deadline to replenish vcpus.
>
> rt_schedule(): picks the highest runnable vcpu based on cpu
> affinity and ret.time will be passed to schedule(). If an idle
> vcpu is picked, -1 is returned to avoid busy-waiting. repl_update()
> has been removed.
>
> rt_vcpu_wake(): when a vcpu wakes up, it tickles instead of
> picking one from the run queue.
>
> rt_context_saved(): when context switching is finished, the
> preempted vcpu will be put back into the runq.
>
> Simplified funtional graph:
>
> schedule.c SCHEDULE_SOFTIRQ:
> rt_schedule():
> [spin_lock]
> burn_budget(scurr)
> snext = runq_pick()
> [spin_unlock]
>
> sched_rt.c TIMER_SOFTIRQ
> replenishment_timer_handler()
> [spin_lock]
> <for_each_vcpu_on_q(i)> {
> replenish(i)
> runq_tickle(i)
> }>
> program_timer()
> [spin_lock]
>
> Signed-off-by: Tianyang Chen <tiche@seas.upenn.edu>
> Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
> Signed-off-by: Dagaen Golomb <dgolomb@seas.upenn.edu>
> ---
> changes since v8:
> Replaced spin_lock_irqsave with spin_lock_irq.
> Bug fix in burn_budget() where budget == 0.
> Removed unnecessary tickling in the timer handler when
> vcpus have the same deadline.
>
> changes since v7:
> Coding sytle.
> Added a flag to indicate that a vcpu was depleted.
> Added a local list including updated vcpus in the
> timer handler. It is used to decide which vcpu should
> tickle.
>
> changes since v6:
> Removed unnecessary vcpu_on_q() checks for runq_insert()
> Renamed replenishment queue related functions without
> underscores
> New replq_reinsert() function for rt_vcpu_wake()
>
> changes since v5:
> Removed unnecessary vcpu_on_replq() checks
> deadline_queue_insert() returns a flag to indicate if it's
> needed to re-program the timer
> Removed unnecessary timer checks
> Added helper functions to remove vcpus from queues
> coding style
>
> Changes since v4:
> removed unnecessary replenishment queue checks in vcpu_wake()
> extended replq_remove() to all cases in vcpu_sleep()
> used _deadline_queue_insert() helper function for both queues
> _replq_insert() and _replq_remove() program timer internally
>
> Changes since v3:
> removed running queue.
> added repl queue to keep track of repl events.
> timer is now per scheduler.
> timer is init on a valid cpu in a cpupool.
> ---
> xen/common/sched_rt.c | 437 ++++++++++++++++++++++++++++++++++++++-----------
> 1 file changed, 341 insertions(+), 96 deletions(-)
>
> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index bfed2e2..b491915 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -3,7 +3,9 @@
> * EDF scheduling is a real-time scheduling algorithm used in embedded field.
> *
> * by Sisu Xi, 2013, Washington University in Saint Louis
> - * and Meng Xu, 2014, University of Pennsylvania
> + * Meng Xu, 2014-2016, University of Pennsylvania
> + * Tianyang Chen, 2016, University of Pennsylvania
> + * Dagaen Golomb, 2016, University of Pennsylvania
> *
> * based on the code of credit Scheduler
> */
> @@ -16,6 +18,7 @@
> #include <xen/delay.h>
> #include <xen/event.h>
> #include <xen/time.h>
> +#include <xen/timer.h>
> #include <xen/perfc.h>
> #include <xen/sched-if.h>
> #include <xen/softirq.h>
> @@ -87,7 +90,7 @@
> #define RTDS_DEFAULT_BUDGET (MICROSECS(4000))
>
> #define UPDATE_LIMIT_SHIFT 10
> -#define MAX_SCHEDULE (MILLISECS(1))
> +
> /*
> * Flags
> */
> @@ -115,6 +118,18 @@
> #define RTDS_delayed_runq_add (1<<__RTDS_delayed_runq_add)
>
> /*
> + * The replenishment timer handler needs to check this bit
> + * to know where a replenished vcpu was, when deciding which
> + * vcpu should tickle.
> + * A replenished vcpu should tickle if it was moved from the
> + * depleted queue to the run queue.
> + * + Set in burn_budget() if a vcpu has zero budget left.
> + * + Cleared and checked in the repenishment handler.
It seems you have an extra + here...
Need to be removed.
My bad, I didn't spot it out in last patch... :-(
> + */
> +#define __RTDS_was_depleted 3
> +#define RTDS_was_depleted (1<<__RTDS_was_depleted)
> +
> +/*
> * rt tracing events ("only" 512 available!). Check
> * include/public/trace.h for more details.
> */
> @@ -142,6 +157,9 @@ static cpumask_var_t *_cpumask_scratch;
> */
> static unsigned int nr_rt_ops;
>
> +/* handler for the replenishment timer */
> +static void repl_handler(void *data);
> +
> /*
> * Systme-wide private data, include global RunQueue/DepletedQ
> * Global lock is referenced by schedule_data.schedule_lock from all
> @@ -152,7 +170,9 @@ struct rt_private {
> struct list_head sdom; /* list of availalbe domains, used for dump */
> struct list_head runq; /* ordered list of runnable vcpus */
> struct list_head depletedq; /* unordered list of depleted vcpus */
> + struct list_head replq; /* ordered list of vcpus that need replenishment */
> cpumask_t tickled; /* cpus been tickled */
> + struct timer *repl_timer; /* replenishment timer */
> };
>
> /*
> @@ -160,6 +180,7 @@ struct rt_private {
> */
> struct rt_vcpu {
> struct list_head q_elem; /* on the runq/depletedq list */
> + struct list_head replq_elem; /* on the repl event list */
>
> /* Up-pointers */
> struct rt_dom *sdom;
> @@ -213,8 +234,14 @@ static inline struct list_head *rt_depletedq(const struct scheduler *ops)
> return &rt_priv(ops)->depletedq;
> }
>
> +static inline struct list_head *rt_replq(const struct scheduler *ops)
> +{
> + return &rt_priv(ops)->replq;
> +}
> +
> /*
> - * Queue helper functions for runq and depletedq
> + * Helper functions for manipulating the runqueue, the depleted queue,
> + * and the replenishment events queue.
> */
> static int
> __vcpu_on_q(const struct rt_vcpu *svc)
> @@ -228,6 +255,18 @@ __q_elem(struct list_head *elem)
> return list_entry(elem, struct rt_vcpu, q_elem);
> }
>
> +static struct rt_vcpu *
> +replq_elem(struct list_head *elem)
> +{
> + return list_entry(elem, struct rt_vcpu, replq_elem);
> +}
> +
> +static int
> +vcpu_on_replq(const struct rt_vcpu *svc)
> +{
> + return !list_empty(&svc->replq_elem);
> +}
> +
> /*
> * Debug related code, dump vcpu/cpu information
> */
> @@ -288,7 +327,7 @@ rt_dump_pcpu(const struct scheduler *ops, int cpu)
> static void
> rt_dump(const struct scheduler *ops)
> {
> - struct list_head *runq, *depletedq, *iter;
> + struct list_head *runq, *depletedq, *replq, *iter;
> struct rt_private *prv = rt_priv(ops);
> struct rt_vcpu *svc;
> struct rt_dom *sdom;
> @@ -301,6 +340,7 @@ rt_dump(const struct scheduler *ops)
>
> runq = rt_runq(ops);
> depletedq = rt_depletedq(ops);
> + replq = rt_replq(ops);
>
> printk("Global RunQueue info:\n");
> list_for_each( iter, runq )
> @@ -316,6 +356,13 @@ rt_dump(const struct scheduler *ops)
> rt_dump_vcpu(ops, svc);
> }
>
> + printk("Global Replenishment Event info:\n");
> + list_for_each ( iter, replq )
> + {
> + svc = replq_elem(iter);
> + rt_dump_vcpu(ops, svc);
> + }
> +
> printk("Domain info:\n");
> list_for_each( iter, &prv->sdom )
> {
> @@ -380,11 +427,78 @@ rt_update_deadline(s_time_t now, struct rt_vcpu *svc)
> return;
> }
>
> +/*
> + * A helper function that only removes a vcpu from a queue
> + * and it returns 1 if the vcpu was in front of the list.
> + */
> +static inline int
> +deadline_queue_remove(struct list_head *queue, struct list_head *elem)
> +{
> + int pos = 0;
> +
> + if ( queue->next != elem )
> + pos = 1;
> +
> + list_del_init(elem);
> + return !pos;
> +}
> +
> static inline void
> __q_remove(struct rt_vcpu *svc)
> {
> - if ( __vcpu_on_q(svc) )
> - list_del_init(&svc->q_elem);
> + ASSERT( __vcpu_on_q(svc) );
> + list_del_init(&svc->q_elem);
> +}
> +
> +/*
> + * Removing a vcpu from the replenishment queue could
> + * re-program the timer for the next replenishment event
> + * if it was at the front of the list.
> + */
> +static inline void
> +__replq_remove(const struct scheduler *ops, struct rt_vcpu *svc)
> +{
> + struct rt_private *prv = rt_priv(ops);
> + struct list_head *replq = rt_replq(ops);
> + struct timer* repl_timer = prv->repl_timer;
> +
> + ASSERT( vcpu_on_replq(svc) );
> +
> + if ( deadline_queue_remove(replq, &svc->replq_elem) )
> + {
> + /* re-arm the timer for the next replenishment event */
> + if ( !list_empty(replq) )
> + {
> + struct rt_vcpu *svc_next = replq_elem(replq->next);
> + set_timer(repl_timer, svc_next->cur_deadline);
> + }
> + else
> + stop_timer(repl_timer);
> + }
> +}
> +
> +/*
> + * An utility function that inserts a vcpu to a
> + * queue based on certain order (EDF). The returned
> + * value is 1 if a vcpu has been inserted to the
> + * front of a list.
> + */
> +static int
> +deadline_queue_insert(struct rt_vcpu * (*_get_q_elem)(struct list_head *elem),
> + struct rt_vcpu *svc, struct list_head *elem, struct list_head *queue)
> +{
> + struct list_head *iter;
> + int pos = 0;
> +
> + list_for_each ( iter, queue )
> + {
> + struct rt_vcpu * iter_svc = (*_get_q_elem)(iter);
> + if ( svc->cur_deadline <= iter_svc->cur_deadline )
> + break;
> + pos++;
> + }
> + list_add_tail(elem, iter);
> + return !pos;
> }
>
> /*
> @@ -397,27 +511,62 @@ __runq_insert(const struct scheduler *ops, struct rt_vcpu *svc)
> {
> struct rt_private *prv = rt_priv(ops);
> struct list_head *runq = rt_runq(ops);
> - struct list_head *iter;
>
> ASSERT( spin_is_locked(&prv->lock) );
> -
> ASSERT( !__vcpu_on_q(svc) );
> + ASSERT( vcpu_on_replq(svc) );
>
> /* add svc to runq if svc still has budget */
> if ( svc->cur_budget > 0 )
> - {
> - list_for_each(iter, runq)
> - {
> - struct rt_vcpu * iter_svc = __q_elem(iter);
> - if ( svc->cur_deadline <= iter_svc->cur_deadline )
> - break;
> - }
> - list_add_tail(&svc->q_elem, iter);
> - }
> + deadline_queue_insert(&__q_elem, svc, &svc->q_elem, runq);
> else
> - {
> list_add(&svc->q_elem, &prv->depletedq);
> +}
> +
> +/*
> + * Insert svc into the replenishment event list
> + * in replenishment time order.
> + * vcpus that need to be replished earlier go first.
> + * The timer may be re-programmed if svc is inserted
> + * at the front of the event list.
> + */
> +static void
> +__replq_insert(const struct scheduler *ops, struct rt_vcpu *svc)
> +{
> + struct list_head *replq = rt_replq(ops);
> + struct rt_private *prv = rt_priv(ops);
> + struct timer *repl_timer = prv->repl_timer;
> +
> + ASSERT( !vcpu_on_replq(svc) );
> +
> + if ( deadline_queue_insert(&replq_elem, svc, &svc->replq_elem, replq) )
> + set_timer(repl_timer, svc->cur_deadline);
> +}
> +
> +/*
> + * Removes and re-inserts an event to the replenishment queue.
> + */
> +static void
> +replq_reinsert(const struct scheduler *ops, struct rt_vcpu *svc)
> +{
> + struct list_head *replq = rt_replq(ops);
> + struct rt_private *prv = rt_priv(ops);
> + struct timer *repl_timer = prv->repl_timer;
> +
> + ASSERT( vcpu_on_replq(svc) );
> +
> + if ( deadline_queue_remove(replq, &svc->replq_elem) )
> + {
> + if ( deadline_queue_insert(&replq_elem, svc, &svc->replq_elem, replq) )
> + set_timer(repl_timer, svc->cur_deadline);
> + else
> + {
> + struct rt_vcpu *svc_next = replq_elem(replq->next);
> + set_timer(repl_timer, svc_next->cur_deadline);
> + }
> }
> + else if ( deadline_queue_insert(&replq_elem, svc, &svc->replq_elem, replq) )
> + set_timer(repl_timer, svc->cur_deadline);
> }
>
> /*
> @@ -449,11 +598,18 @@ rt_init(struct scheduler *ops)
> INIT_LIST_HEAD(&prv->sdom);
> INIT_LIST_HEAD(&prv->runq);
> INIT_LIST_HEAD(&prv->depletedq);
> + INIT_LIST_HEAD(&prv->replq);
>
> cpumask_clear(&prv->tickled);
>
> ops->sched_data = prv;
>
> + /*
> + * The timer initialization will happen later when
> + * the first pcpu is added to this pool in alloc_pdata.
> + */
> + prv->repl_timer = NULL;
> +
> return 0;
>
> no_mem:
> @@ -473,6 +629,10 @@ rt_deinit(struct scheduler *ops)
> xfree(_cpumask_scratch);
> _cpumask_scratch = NULL;
> }
> +
> + kill_timer(prv->repl_timer);
> + xfree(prv->repl_timer);
> +
> ops->sched_data = NULL;
> xfree(prv);
> }
> @@ -494,6 +654,17 @@ rt_alloc_pdata(const struct scheduler *ops, int cpu)
> if ( !alloc_cpumask_var(&_cpumask_scratch[cpu]) )
> return NULL;
>
> + if ( prv->repl_timer == NULL )
> + {
> + /* Allocate the timer on the first cpu of this pool. */
> + prv->repl_timer = xzalloc(struct timer);
> +
> + if ( prv->repl_timer == NULL )
> + return NULL;
> +
> + init_timer(prv->repl_timer, repl_handler, (void *)ops, cpu);
> + }
> +
> /* 1 indicates alloc. succeed in schedule.c */
> return (void *)1;
> }
> @@ -587,6 +758,7 @@ rt_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
> return NULL;
>
> INIT_LIST_HEAD(&svc->q_elem);
> + INIT_LIST_HEAD(&svc->replq_elem);
> svc->flags = 0U;
> svc->sdom = dd;
> svc->vcpu = vc;
> @@ -610,7 +782,8 @@ rt_free_vdata(const struct scheduler *ops, void *priv)
> }
>
> /*
> - * This function is called in sched_move_domain() in schedule.c
> + * It is called in sched_move_domain() and sched_init_vcpu
> + * in schedule.c.
> * When move a domain to a new cpupool.
> * It inserts vcpus of moving domain to the scheduler's RunQ in
> * dest. cpupool.
> @@ -628,8 +801,13 @@ rt_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
> if ( now >= svc->cur_deadline )
> rt_update_deadline(now, svc);
>
> - if ( !__vcpu_on_q(svc) && vcpu_runnable(vc) && !vc->is_running )
> - __runq_insert(ops, svc);
> + if ( !__vcpu_on_q(svc) && vcpu_runnable(vc) )
> + {
> + __replq_insert(ops, svc);
> +
> + if ( !vc->is_running )
> + __runq_insert(ops, svc);
> + }
> vcpu_schedule_unlock_irq(lock, vc);
>
> SCHED_STAT_CRANK(vcpu_insert);
> @@ -652,6 +830,10 @@ rt_vcpu_remove(const struct scheduler *ops, struct vcpu *vc)
> lock = vcpu_schedule_lock_irq(vc);
> if ( __vcpu_on_q(svc) )
> __q_remove(svc);
> +
> + if ( vcpu_on_replq(svc) )
> + __replq_remove(ops,svc);
> +
> vcpu_schedule_unlock_irq(lock, vc);
> }
>
> @@ -706,8 +888,15 @@ burn_budget(const struct scheduler *ops, struct rt_vcpu *svc, s_time_t now)
>
> svc->cur_budget -= delta;
>
> - if ( svc->cur_budget < 0 )
> + if ( svc->cur_budget <= 0 )
> + {
> svc->cur_budget = 0;
> + /*
> + * Set __RTDS_was_depleted so the replenishment
> + * handler can let this vcpu tickle if it was refilled.
> + */
> + set_bit(__RTDS_was_depleted, &svc->flags);
> + }
>
> /* TRACE */
> {
> @@ -784,44 +973,6 @@ __runq_pick(const struct scheduler *ops, const cpumask_t *mask)
> }
>
> /*
> - * Update vcpu's budget and
> - * sort runq by insert the modifed vcpu back to runq
> - * lock is grabbed before calling this function
> - */
> -static void
> -__repl_update(const struct scheduler *ops, s_time_t now)
> -{
> - struct list_head *runq = rt_runq(ops);
> - struct list_head *depletedq = rt_depletedq(ops);
> - struct list_head *iter;
> - struct list_head *tmp;
> - struct rt_vcpu *svc = NULL;
> -
> - list_for_each_safe(iter, tmp, runq)
> - {
> - svc = __q_elem(iter);
> - if ( now < svc->cur_deadline )
> - break;
> -
> - rt_update_deadline(now, svc);
> - /* reinsert the vcpu if its deadline is updated */
> - __q_remove(svc);
> - __runq_insert(ops, svc);
> - }
> -
> - list_for_each_safe(iter, tmp, depletedq)
> - {
> - svc = __q_elem(iter);
> - if ( now >= svc->cur_deadline )
> - {
> - rt_update_deadline(now, svc);
> - __q_remove(svc); /* remove from depleted queue */
> - __runq_insert(ops, svc); /* add to runq */
> - }
> - }
> -}
> -
> -/*
> * schedule function for rt scheduler.
> * The lock is already grabbed in schedule.c, no need to lock here
> */
> @@ -840,8 +991,6 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
> /* burn_budget would return for IDLE VCPU */
> burn_budget(ops, scurr, now);
>
> - __repl_update(ops, now);
> -
> if ( tasklet_work_scheduled )
> {
> trace_var(TRC_RTDS_SCHED_TASKLET, 1, 0, NULL);
> @@ -868,6 +1017,7 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
> set_bit(__RTDS_delayed_runq_add, &scurr->flags);
>
> snext->last_start = now;
> + ret.time = -1; /* if an idle vcpu is picked */
> if ( !is_idle_vcpu(snext->vcpu) )
> {
> if ( snext != scurr )
> @@ -880,9 +1030,8 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
> snext->vcpu->processor = cpu;
> ret.migrated = 1;
> }
> + ret.time = snext->budget; /* invoke the scheduler next time */
Ah, this is incorrect, although this is easy to fix.
The ret.time is the relative time when the *budget enforcement* timer
should be invoked.
Since snext is not always full budget, it may have already consumed some budget.
It should be
ret.time = snext->cur_budget
Isn't it? :-)
The other parts of the scheduler looks fine to me, as long as the
above comments are solved.
Hi Tianyang,
Great and expedite work! :-D
Hi Dario,
Could you have another look at this patch and see if it's ok after the
above comments are solved?
Thank you very much!
Meng
--
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2016-03-16 3:11 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-15 0:04 [PATCH v9]xen: sched: convert RTDS from time to event driven model Tianyang Chen
2016-03-16 3:11 ` Meng Xu [this message]
2016-03-16 3:32 ` Chen, Tianyang
2016-03-16 3:40 ` Meng Xu
2016-03-16 10:23 ` Dario Faggioli
2016-03-16 14:20 ` Meng Xu
2016-03-16 14:44 ` Dario Faggioli
2016-03-16 20:51 ` Meng Xu
2016-03-16 14:25 ` Dario Faggioli
2016-03-16 15:43 ` Chen, Tianyang
2016-03-16 16:11 ` Dario Faggioli
2016-03-16 20:54 ` Meng Xu
2016-03-16 20:45 ` Meng Xu
2016-03-17 2:24 ` Chen, Tianyang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAENZ-+kxbhucXbGBrOozQ2h-AGa_w-WVc6hHxCXuhPKaSK22_Q@mail.gmail.com \
--to=mengxu@cis.upenn.edu \
--cc=dario.faggioli@citrix.com \
--cc=dgolomb@seas.upenn.edu \
--cc=george.dunlap@citrix.com \
--cc=tiche@seas.upenn.edu \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).