No HTML, please.
On Wed, 2015-06-10 at 00:18 -0400, Dagaen Golomb wrote:
> And note that, when I say "timer", I mean an actual Xen timer,
> i.e.,
> those things that are started, stopped, and with a timer
> handling
> routine being called when they expire. For an example, you can
> have a
> look, in sched_credit.c, at 'ticker' or 'master_ticker'.
>
>
> I will look at this. However, the current solution is likely
> functionally equivalent and, with only one timer and a single list
> used to arm it, I'm not sure if using a Xen timer is necessary.
>
It is functionally equivalent, by chance (see the issue about calling
runq_tickle() on current vcpus in my reply to Meng). The reason why a
different approach is required is making the code look better, reducing
(instead than increasing) the complexity of sched_rt.c, lowering the
overhead caused by long running operations performed while holding the
global scheduler spin lock, and improving scalability.
> Do they incur any extra overhead or coarsen granularity?
>
"extra overhead or coarsen granularity" as compared to what? s_timer in
schedule.c (which is what you're using) is one of them already!
What I meant with "an actual timer" is that you should ad a new one, and
move some of the stuff currently being done in rt_schedule() in its
handling routine, rather than just adding a new queue of events to be
serviced by abusing s_timer.
> Deciding between a) or b) isn't easy, without actually trying
> to
> implement them. I personally prefer b), and I think it would
> be a
> superior long term solution (see right below), but given te
> current
> architecture of the RTDS scheduler (e.g., the fact that it has
> a global
> runq, etc), I won't nack a), which would most likely be
> simpler.
>
>
> I agree b) would may be better in the long run, but with the current
> architecture a) is simpler. b) can be future work as the scheduler
> evolves.
>
Sure. And in fact, I'm fine with a), if done properly.
>
> With this patch, you're still using rt_schedule() to do both
> scheduling
> and replenishments, although you're now doing it at actual
> replenishment
> times, instead of checking for them every millisecond.
>
> I think once you understand that the timerq is not only
> replenishments, but any event that could cause a branch is the
> scheduler behavior, it becomes more palatable to have them
> intertwined.
>
I got that, and I'm asking you to do it differently.
> Really, the timerq and scheduling aren't as intertwined as they
> appear, they are piratically isolated functionally. Its just that the
> timerq is most naturally serviced during scheduling events when data
> that effects scheduling decisions is changing. Its straightforward and
> efficient.
>
Yeah, replenishments are 'naturally serviced' in a 'straightforward and
efficient' way by looping on all runnable vcpus, in more than one place,
from within super-hot paths, with the global scheduler spin lock held.
Neat! :-/
Oh, and that's before you introducing of another list to be taken care
of, still under the same conditions. :-O
> Also, the bits and pieces that you need to add in order to
> deal with
> this new queue is, really, making things even more complex
> than they are
> now, which is undesirable.
> While it does add some complexity, I don't feel a single queue and
> managing it is that much extra complexity.
>
But in fact the point of making the scheduler 'event driven' is to take
advantage of the chances that this offers for getting stuff *out* of
rt_schedule(), and the purpose is not "not adding that much extra
complexity", is making it _simpler_.
> > diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> > index 7c39a9e..25f0458 100644
> > --- a/xen/common/sched_rt.c
> > +++ b/xen/common/sched_rt.c
>
> > @@ -156,6 +160,7 @@ struct rt_vcpu {
> > s_time_t cur_budget; /* current budget */
> > s_time_t last_start; /* last start time */
> > s_time_t cur_deadline; /* current deadline for EDF
> */
> > + s_time_t next_sched_needed; /* next time to make
> scheduling decision */
> >
> As an example of why I said that things should become simpler,
> and are
> instead becoming more complex: with my solution, you don't
> need anything
> like this. In fact, budget enforcement is already being done
> ok in
> rt_schedule(), so no need to do anything more/different about
> it.
> Replenishments should be programmed to fire at cur_deadline,
> so again,
> no need to add this new field, and multiplex its actual value
> between
> budget and deadline (which, yes, counts as being something
> rather
> complex for me! :-D).
>
>
> As mentioned, the timerq is handling all events that could change the
> scheduling decision, not just replenishments.
>
Yes, exactly, it's handling *too much* events. :-)
For example, have a look at 'struct vcpu', in
xen/include/xen/sched.h. It's got 3 timers in it:
struct timer periodic_timer;
struct timer singleshot_timer;
struct timer poll_timer; /* timeout for SCHEDOP_poll */
It probably would have been possible to just use one, with a single and
mixed event queue, as you're doing, a 1k lines handling routines, etc.
Do you it it would have been easier or harder to implement, understand,
debug and maintain? I bet on harder.
> This tracks the earliest time anything cause this to be scheduled
> differently, and could be extended if any more insights are found.
> Budget enforcement is being done in rt_schedule but its being done by
> this very list: a budget expiration is one possible event that would
> require a vcpu reschedule.
>
OMG, 'extended'... You're thinking to actually put more stuff in
there?!? :-O
It's a rather key and already too long and complex critical section, so
please, let's aim at shortening and making it simpler, i.e., at
improving things, not making them worse.
>
> > +/*
> > + * Return how far the lowest time on the timerq (that is
> after NOW) is in the
> > + * future.
> > + * Lock must be grabbed before calling this.
> > + */
> > +static s_time_t __timerq_next(const struct scheduler *ops,
> s_time_t now)
> > +{
> > + struct list_head *timerq = rt_timerq(ops);
> > + struct list_head *iter, *tmp;
> > +
> > + list_for_each_safe(iter, tmp, timerq)
> > + {
> > + struct rt_vcpu * iter_svc = __t_elem(iter);
> > +
> > + if ( iter_svc->next_sched_needed > now )
> > + return (iter_svc->next_sched_needed - now);
> > + else
> > + __t_remove(iter_svc);
> > + }
> > +
> > + return MAX_SCHEDULE;
> > +}
> >
> If using a timer, you can just get rid of items while, in the
> timer
> routine, you handle the event associated to them.
>
> Also, why is MAX_SCHEDULE still there?
>
>
> Honestly, events do not have to be removed here, but its done to
> prevent walking a list of stale values to get to the newest one on the
> next call. This is done more for performance. Their non-removal would
> be functionally equivalent.
>
Of course you should remove the stale entries, I certainly was not
arguing that the list should just grow indefinitely!
Point is, again, that this is another list walk occurring in an hot
path, with a big spin lock held. We want to get rid of such thing, not
adding more of it.
> With the timer alternative you mention, where would the timer events
> and their data be held? I think that could be extra complexity, unless
> I fail to understand the idea completely.
>
In an event queue like yours, of course. But you won't go through it
and/or bookkeep it while in hot paths, with the scheduler lock held.
See my email to Meng to have more details on what I have in mind, or
fell free to ask more details.
> MAX_SCHEDULE may not be required, but I have it there as an "in case."
> For example, it will cause the scheduler to be called after
> MAX_SCHEDULE even when no events are occurring (such as no vcpus). Its
> there to ensure progress in case of any bugs or unexpected behavior.
>
Wait, so, all this work, and then you still want to call rt_schedule()
every millisecond, when there's nothing to do?!?! :-O
> That being said, in this specific case, you're moving up
> runq_tickle(),
> the purpose of which is to trigger a call to rt_schedule()
> (after going
> through the softirq machinery)... in order to call it in
> rt_schedule().
> That's pretty tricky, and is not how tickling should work.
> It set up to only tickle if needed. I'm not sure if this is an issue,
>
It's wrong, AFAICT. See my reply to Meng and, please, comment by
replying to it, if you've got anything to say about this, to make the
discussion easier to follow.
Regards,
Dario
--
<> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)