On Mon, 2016-07-18 at 13:22 +0100, Anshul Makkar wrote: > Hey, Anshul. Thanks, and sorry for the delay in reviewing. This version is an improvement, but it looks to me that you've missed a few of the review comments to v1. > It introduces a minimum amount of latency > "introduces context-switch rate-limiting" > to enable a VM to batch its work and > it also ensures that system is not spending most of its time in > VMEXIT/VMENTRY because of VM that is waking/sleeping at high rate. > > ratelimit can be disabled by setting it to 0. > > diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c > index 8b95a47..68bcdb8 100644 > --- a/xen/common/sched_credit2.c > +++ b/xen/common/sched_credit2.c >  > @@ -1601,6 +1602,34 @@ csched2_dom_cntl( >      return rc; >  } >   > +static int csched2_sys_cntl(const struct scheduler *ops, > +                            struct xen_sysctl_scheduler_op *sc) > +{ > +    int rc = -EINVAL; > +    xen_sysctl_credit_schedule_t *params = &sc->u.sched_credit; > +    struct csched2_private *prv = CSCHED2_PRIV(ops); > +    unsigned long flags; > + > +    switch (sc->cmd ) > +    { > +        case XEN_SYSCTL_SCHEDOP_putinfo: > +            if ( params->ratelimit_us && > +                ( params->ratelimit_us < CSCHED2_MIN_TIMER || > +                  params->ratelimit_us > > MICROSECS(CSCHED2_MAX_TIMER) )) > CSCHED2_MIN_TIMER and CSCHED2_MAX_TIMER are defined as follows: #define CSCHED2_MIN_TIMER            MICROSECS(500) #define CSCHED2_MAX_TIMER            MILLISECS(2) Which basically means they're value is 500*1000=500000 and 2*1000000=2000000. ratelimit_us is specified assuming it will be in microseconds. So, basically, you're forcing the value stored in ratelimit_us to be at least 500000, which means 500000 microseconds, which means 500 milliseconds, which is not what we want! I remember saying already (although, it may have be in pvt, not on this list) that I think we should just use XEN_SYSCTL_SCHED_RATELIMIT_MAX and XEN_SYSCTL_SCHED_RATELIMIT_MIN here. CSCHED2_MIN_TIMER and CSCHED2_MAX_TIMER are internal implementation details, and I don't like them exposed (although, indirectly) to the user. > +                return rc; > +            spin_lock_irqsave(&prv->lock, flags); > +            prv->ratelimit_us = params->ratelimit_us; > +            spin_unlock_irqrestore(&prv->lock, flags); > +            break; > + This is ok. However, the code base changed in the meanwhile (sorry! :- P), and this spin_lock_irqsave() needs to become a write_lock_irqsave(). > @@ -1688,9 +1719,20 @@ csched2_runtime(const struct scheduler *ops, > int cpu, struct csched2_vcpu *snext >       * 1) Run until snext's credit will be 0 >       * 2) But if someone is waiting, run until snext's credit is > equal >       * to his > -     * 3) But never run longer than MAX_TIMER or shorter than > MIN_TIMER. > +     * 3) But never run longer than MAX_TIMER or shorter than > MIN_TIMER or > +     * run for ratelimit time. >       */ > I prefer George's version of this comment change: "3) But never run longer than MAX_TIMER or shorter than MIN_TIMER or the ratelimit time." >   > +    /* Calculate mintime */ > +    min_time = CSCHED2_MIN_TIMER; > +    if ( prv->ratelimit_us ) { > Coding style. (Parenthesis goes on next line.) > +        s_time_t ratelimit_min = prv->ratelimit_us; > +        ratelimit_min = snext->vcpu->runstate.state_entry_time + > +            MICROSECS(prv->ratelimit_us) - now; > Mmm... if you wanted to implement my suggestion from <1468400021.13039.33.camel@citrix.com>, you're definitely missing something:      s_time_t ratelimit_min = prv->ratelimit_us;      if ( snext->vcpu->is_running )          ratelimit_min = snext->vcpu->runstate.state_entry_time +                          MICROSECS(prv->ratelimit_us) - now; In fact, you're initializing ratelimit_min and then immediately overriding that... I'm surprised the compiler didn't complain. > +    if ( ratelimit_min > min_time ) > +        min_time = ratelimit_min; > +    } > + > @@ -1707,32 +1749,33 @@ csched2_runtime(const struct scheduler *ops, > int cpu, struct csched2_vcpu *snext >          } >      } >   > -    /* The next guy may actually have a higher credit, if we've > tried to > -     * avoid migrating him from a different cpu.  DTRT.  */ > -    if ( rt_credit <= 0 ) > +    /* > +     * The next guy ont the runqueue may actually have a higher > credit, > +     * if we've tried to avoid migrating him from a different cpu. > +     * Setting time=0 will ensure the minimum timeslice is chosen. > George's draft patch had an empty line within this comment in here, separating the two paragraph. Can we keep it? (I know, this is a very minor thing, but since we're here :-D) > +     * FIXME: See if we can eliminate this conversion if we know > time > +     * will be outside (MIN,MAX).  Probably requires pre-calculating > +     * credit values of MIN,MAX per vcpu, since each vcpu burns > credit > +     * at a different rate. > +     */ > +    if (rt_credit > 0) > +        time = c2t(rqd, rt_credit, snext); > +    else > +        time = 0; > + > +    /* > +     * Never run longer than MAX_TIMER or less than MIN_TIMER or for > +     * rate_limit time. > +     */ >     /*  * 3) But never run longer than MAX_TIMER or shorter than MIN_TIMER      * or the ratelimit time.      */ i.e., fix the style of the comment (wrt George's patch) by putting the "wings" there, but please, keep the "3)", and the fact that it basically mirrors the big comment at the beginning of the function. > @@ -1746,7 +1789,7 @@ void __dump_execstate(void *unused); >  static struct csched2_vcpu * >  runq_candidate(struct csched2_runqueue_data *rqd, >                 struct csched2_vcpu *scurr, > -               int cpu, s_time_t now) > +               int cpu, s_time_t now, struct csched2_private *prv) > Reviewing v1, George said this:   Since we have the cpu, I think we can get ops this way, without   cluttering things up with the extra argument:       struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu)); > @@ -1775,9 +1829,13 @@ runq_candidate(struct csched2_runqueue_data > *rqd, >          } >   >          /* If the next one on the list has more credit than current > -         * (or idle, if current is not runnable), choose it. */ > +         * (or idle, if current is not runnable) and current one has > already > +         * executed for more than ratelimit. choose it. > +         * Control has reached here means that current vcpu has > executed > > +         * ratelimit_us or ratelimit is off, so chose the next one. > +         */ >          if ( svc->credit > snext->credit ) > -            snext = svc; > +                snext = svc; >   Both me and George agreed that changing the comment like this is not helping much and should not be done. You also (pointed out already as well) don't need to touch the 'snext = svc' line. Regards, Dario -- <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)