On Mon, 2016-09-12 at 16:14 +0100, George Dunlap wrote: > On 17/08/16 18:17, Dario Faggioli wrote: > > > > If vcpu x has run for 200us, and sched_ratelimit_us is > > 1000us, continue running x _but_ return 1000us-200us as > > the next time slice. This way, next scheduling point will > > happen in 800us, i.e., exactly at the point when x crosses > > the threshold, and can be descheduled (if appropriate). > > > > Right now (without this patch), we're always returning > > sched_ratelimit_us (1000us, in the example above), which > > means we're (potentially) allowing x to run more than > > it should have been able to (even when considering rate > > limiting into account). > Part of the reason I went with this in the first place was because I > wanted to avoid really short timers.  Part of the reason for the > ratelimit was actually to limit the amount of time spent in the > scheduler.  Since we expect ratelimit to normally be pretty short, > waiting for the whole ratelimit time seemed like a good idea. > I see what you mean. I personally am not a fan of ratelimit, because of how it acts behind the algorithm's back and mess with and partly invalidates the algorithms own assumptions and properties (not that these assumptions and properties are very clear in Credit1, even before ratelimiting, but, anyway :-)), and this is an example of that. Nevertheless, I see that this patch may, up to some extent, re- introduce some of the "small timers" that it was ratelemiting's own purpose to mitigate. So, I guess, either we make this a three way condition, introducing an absolute minimum runtime value, under which we don't ever want to go, e.g.:  tsclice = MICROSECS(prv->runtime_us) - runtime > CSCHED_MIN_TIMER :            MICROSECS(prv->runtime_us) - runtime : CSCHED_MIN_TIMER; or we leave things as they are now. The MIN_TIMER option would let at least the cases where the vcpu has run for a few, but not too much, time to be more precisely scheduled. E.g., if ratelimit_us is 1000us, MIN_TIMER is 500us, and the vcpu run for 400us, we let it be preempted after 600us more (i.e., 1000us in total == ratelimit_us), instead of after 1000us more (i.e., 1400us in total). I also agree on the fact that most of the times ratelimit_us and MIN_TIMER will be close enough (like in the example above) that it won't probably matter much... but if someone set ratelimit_us to something higher (say, 10ms --we accept values as big as the timeslice, which is 30ms b default) it may matter a bit. What do you think? If we decide not to care, and leave things as they are, I'd add a comment saying that code is like that on purpose, so we won't trip over this again in 1 or 2 years. :-) Regards, Dario --  <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)