No HTML, please. On Wed, 2015-06-10 at 00:18 -0400, Dagaen Golomb wrote: > And note that, when I say "timer", I mean an actual Xen timer, > i.e., > those things that are started, stopped, and with a timer > handling > routine being called when they expire. For an example, you can > have a > look, in sched_credit.c, at 'ticker' or 'master_ticker'. > > > I will look at this. However, the current solution is likely > functionally equivalent and, with only one timer and a single list > used to arm it, I'm not sure if using a Xen timer is necessary. > It is functionally equivalent, by chance (see the issue about calling runq_tickle() on current vcpus in my reply to Meng). The reason why a different approach is required is making the code look better, reducing (instead than increasing) the complexity of sched_rt.c, lowering the overhead caused by long running operations performed while holding the global scheduler spin lock, and improving scalability. > Do they incur any extra overhead or coarsen granularity? > "extra overhead or coarsen granularity" as compared to what? s_timer in schedule.c (which is what you're using) is one of them already! What I meant with "an actual timer" is that you should ad a new one, and move some of the stuff currently being done in rt_schedule() in its handling routine, rather than just adding a new queue of events to be serviced by abusing s_timer. > Deciding between a) or b) isn't easy, without actually trying > to > implement them. I personally prefer b), and I think it would > be a > superior long term solution (see right below), but given te > current > architecture of the RTDS scheduler (e.g., the fact that it has > a global > runq, etc), I won't nack a), which would most likely be > simpler. > > > I agree b) would may be better in the long run, but with the current > architecture a) is simpler. b) can be future work as the scheduler > evolves. > Sure. And in fact, I'm fine with a), if done properly. > > With this patch, you're still using rt_schedule() to do both > scheduling > and replenishments, although you're now doing it at actual > replenishment > times, instead of checking for them every millisecond. > > I think once you understand that the timerq is not only > replenishments, but any event that could cause a branch is the > scheduler behavior, it becomes more palatable to have them > intertwined. > I got that, and I'm asking you to do it differently. > Really, the timerq and scheduling aren't as intertwined as they > appear, they are piratically isolated functionally. Its just that the > timerq is most naturally serviced during scheduling events when data > that effects scheduling decisions is changing. Its straightforward and > efficient. > Yeah, replenishments are 'naturally serviced' in a 'straightforward and efficient' way by looping on all runnable vcpus, in more than one place, from within super-hot paths, with the global scheduler spin lock held. Neat! :-/ Oh, and that's before you introducing of another list to be taken care of, still under the same conditions. :-O > Also, the bits and pieces that you need to add in order to > deal with > this new queue is, really, making things even more complex > than they are > now, which is undesirable. > While it does add some complexity, I don't feel a single queue and > managing it is that much extra complexity. > But in fact the point of making the scheduler 'event driven' is to take advantage of the chances that this offers for getting stuff *out* of rt_schedule(), and the purpose is not "not adding that much extra complexity", is making it _simpler_. > > diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c > > index 7c39a9e..25f0458 100644 > > --- a/xen/common/sched_rt.c > > +++ b/xen/common/sched_rt.c > > > @@ -156,6 +160,7 @@ struct rt_vcpu { > > s_time_t cur_budget; /* current budget */ > > s_time_t last_start; /* last start time */ > > s_time_t cur_deadline; /* current deadline for EDF > */ > > + s_time_t next_sched_needed; /* next time to make > scheduling decision */ > > > As an example of why I said that things should become simpler, > and are > instead becoming more complex: with my solution, you don't > need anything > like this. In fact, budget enforcement is already being done > ok in > rt_schedule(), so no need to do anything more/different about > it. > Replenishments should be programmed to fire at cur_deadline, > so again, > no need to add this new field, and multiplex its actual value > between > budget and deadline (which, yes, counts as being something > rather > complex for me! :-D). > > > As mentioned, the timerq is handling all events that could change the > scheduling decision, not just replenishments. > Yes, exactly, it's handling *too much* events. :-) For example, have a look at 'struct vcpu', in xen/include/xen/sched.h. It's got 3 timers in it: struct timer periodic_timer; struct timer singleshot_timer; struct timer poll_timer; /* timeout for SCHEDOP_poll */ It probably would have been possible to just use one, with a single and mixed event queue, as you're doing, a 1k lines handling routines, etc. Do you it it would have been easier or harder to implement, understand, debug and maintain? I bet on harder. > This tracks the earliest time anything cause this to be scheduled > differently, and could be extended if any more insights are found. > Budget enforcement is being done in rt_schedule but its being done by > this very list: a budget expiration is one possible event that would > require a vcpu reschedule. > OMG, 'extended'... You're thinking to actually put more stuff in there?!? :-O It's a rather key and already too long and complex critical section, so please, let's aim at shortening and making it simpler, i.e., at improving things, not making them worse. > > > +/* > > + * Return how far the lowest time on the timerq (that is > after NOW) is in the > > + * future. > > + * Lock must be grabbed before calling this. > > + */ > > +static s_time_t __timerq_next(const struct scheduler *ops, > s_time_t now) > > +{ > > + struct list_head *timerq = rt_timerq(ops); > > + struct list_head *iter, *tmp; > > + > > + list_for_each_safe(iter, tmp, timerq) > > + { > > + struct rt_vcpu * iter_svc = __t_elem(iter); > > + > > + if ( iter_svc->next_sched_needed > now ) > > + return (iter_svc->next_sched_needed - now); > > + else > > + __t_remove(iter_svc); > > + } > > + > > + return MAX_SCHEDULE; > > +} > > > If using a timer, you can just get rid of items while, in the > timer > routine, you handle the event associated to them. > > Also, why is MAX_SCHEDULE still there? > > > Honestly, events do not have to be removed here, but its done to > prevent walking a list of stale values to get to the newest one on the > next call. This is done more for performance. Their non-removal would > be functionally equivalent. > Of course you should remove the stale entries, I certainly was not arguing that the list should just grow indefinitely! Point is, again, that this is another list walk occurring in an hot path, with a big spin lock held. We want to get rid of such thing, not adding more of it. > With the timer alternative you mention, where would the timer events > and their data be held? I think that could be extra complexity, unless > I fail to understand the idea completely. > In an event queue like yours, of course. But you won't go through it and/or bookkeep it while in hot paths, with the scheduler lock held. See my email to Meng to have more details on what I have in mind, or fell free to ask more details. > MAX_SCHEDULE may not be required, but I have it there as an "in case." > For example, it will cause the scheduler to be called after > MAX_SCHEDULE even when no events are occurring (such as no vcpus). Its > there to ensure progress in case of any bugs or unexpected behavior. > Wait, so, all this work, and then you still want to call rt_schedule() every millisecond, when there's nothing to do?!?! :-O > That being said, in this specific case, you're moving up > runq_tickle(), > the purpose of which is to trigger a call to rt_schedule() > (after going > through the softirq machinery)... in order to call it in > rt_schedule(). > That's pretty tricky, and is not how tickling should work. > It set up to only tickle if needed. I'm not sure if this is an issue, > It's wrong, AFAICT. See my reply to Meng and, please, comment by replying to it, if you've got anything to say about this, to make the discussion easier to follow. Regards, Dario -- <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)