Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency

From: "Jan Beulich" <JBeulich@suse.com>
To: Hui Lv <hui.lv@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>,
	"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
	"keir@xen.org" <keir@xen.org>,
	George Dunlap <George.Dunlap@eu.citrix.com>,
	Eddie Dong <eddie.dong@intel.com>,
	Jiangang Duan <jiangang.duan@intel.com>,
	Zhidong Yu <zhidong.yu@intel.com>
Subject: Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency
Date: Mon, 19 Dec 2011 08:24:50 +0000	[thread overview]
Message-ID: <4EEF02E20200007800068C60@nat28.tlf.novell.com> (raw)
In-Reply-To: <C10D3FB0CD45994C8A51FEC1227CE22F37CC284376@shsmsx502.ccr.corp.intel.com>

>>> On 17.12.11 at 04:24, "Lv, Hui" <hui.lv@intel.com> wrote:
> The delay method for credit scheduler can do as well as SRC patch (previous 
> one) to gain significant performance boost without obvious drawbacks.
> 
> 1. Basically, the "delay method" can achieve nearly the same benefits as my 
> previous SRC patch, 11% overall performance boost for SPECvirt than original 
> credit scheduler.
> 2. We have tried 1ms delay and 10ms delay, there is no big difference 
> between these two configurations. (1ms is enough to achieve a good 
> performance)
> 3. We have compared different load level response time/latency (low, high, 
> peak), "delay method" didn't bring very much response time increase.
> 4. 1ms delay can reduce 30% context switch at peak performance, where 
> produces the benefits. ("int sched_ratelimit_us = 1000" is the recommended 
> setting)
> 
> 
> Signed-off-by: Hui Lv <hui.lv@intel.com<mailto:hui.lv@intel.com>>
> Signed-off-by: George Dunlap 
> <George.Dunlap@eu.citrix.com<mailto:George.Dunlap@eu.citrix.com>>
> 
> diff -r 1c58bb664d8d xen/common/sched_credit.c
> --- a/xen/common/sched_credit.c Thu Dec 08 17:15:16 2011 +0000
> +++ b/xen/common/sched_credit.c Fri Dec 16 15:08:09 2011 -0500
> @@ -110,6 +110,9 @@ boolean_param("sched_credit_default_yiel
>  static int __read_mostly sched_credit_tslice_ms = CSCHED_DEFAULT_TSLICE_MS;
>  integer_param("sched_credit_tslice_ms", sched_credit_tslice_ms);
> 
> +/* Scheduler generic parameters
> +*/
> +extern int sched_ratelimit_us;
>  /*
>   * Physical CPU
>   */
> @@ -1297,10 +1300,15 @@ csched_schedule(
>      struct csched_private *prv = CSCHED_PRIV(ops);
>      struct csched_vcpu *snext;
>      struct task_slice ret;
> +    s_time_t runtime, tslice;
> 
>      CSCHED_STAT_CRANK(schedule);
>      CSCHED_VCPU_CHECK(current);
> 
> +    runtime = now - current->runstate.state_entry_time;
> +    if ( runtime < 0 ) /* Does this ever happen? */
> +        runtime = 0;
> +
>      if ( !is_idle_vcpu(scurr->vcpu) )
>      {
>          /* Update credits of a non-idle VCPU. */
> @@ -1313,6 +1321,41 @@ csched_schedule(
>          scurr->pri = CSCHED_PRI_IDLE;
>      }
> 
> +    /* Choices, choices:
> +     * - If we have a tasklet, we need to run the idle vcpu no matter what.
> +     * - If sched rate limiting is in effect, and the current vcpu has
> +     *   run for less than that amount of time, continue the current one,
> +     *   but with a shorter timeslice and return it immediately
> +     * - Otherwise, chose the one with the highest priority (which may
> +     *   be the one currently running)
> +     * - If the currently running one is TS_OVER, see if there
> +     *   is a higher priority one waiting on the runqueue of another
> +     *   cpu and steal it.
> +     */
> +
> +    /* If we have schedule rate limiting enabled, check to see
> +     * how long we've run for. */
> +    if ( sched_ratelimit_us
> +         && !tasklet_work_scheduled

How about making the checking order match the description above
(which might also be slightly faster given that tasklet_work_scheduled
is a function parameter [in a register or written recently], and [given
the default value you're picking] you expect sched_ratelimit_us to be
non-zero generally)?

> +         && vcpu_runnable(current)
> +         && !is_idle_vcpu(current)
> +         && runtime < MICROSECS(sched_ratelimit_us) )
> +    {
> +        snext = scurr;
> +        snext->start_time += now;
> +        perfc_incr(delay_ms);
> +        tslice = MICROSECS(sched_ratelimit_us);

So if there happens to be a VM with 

MILLISECS(prv->tslice_ms) < MICROSECS(sched_ratelimit_us)

it'd get *more* time than allowed/intended through this mechanism.

Jan

> +        ret.migrated = 0;
> +        goto out;
> +    }
> +    else
> +    {
> +        /*
> +         * Select next runnable local VCPU (ie top of local runq)
> +        */
> +        tslice = MILLISECS(prv->tslice_ms);
> +    }
> +
>      /*
>       * Select next runnable local VCPU (ie top of local runq)
>       */
> @@ -1367,11 +1410,12 @@ csched_schedule(
>      if ( !is_idle_vcpu(snext->vcpu) )
>          snext->start_time += now;
> 
> +out:
>      /*
>       * Return task to run next...
>       */
>      ret.time = (is_idle_vcpu(snext->vcpu) ?
> -                -1 : MILLISECS(prv->tslice_ms));
> +                -1 : tslice);
>      ret.task = snext->vcpu;
> 
>      CSCHED_VCPU_CHECK(ret.task);
> diff -r 1c58bb664d8d xen/common/schedule.c
> --- a/xen/common/schedule.c     Thu Dec 08 17:15:16 2011 +0000
> +++ b/xen/common/schedule.c     Fri Dec 16 15:08:09 2011 -0500
> @@ -47,6 +47,10 @@ string_param("sched", opt_sched);
>  bool_t sched_smt_power_savings = 0;
>  boolean_param("sched_smt_power_savings", sched_smt_power_savings);
> 
> +/* Default scheduling rate limit: 1ms */
> +int sched_ratelimit_us = 1000;
> +integer_param("sched_ratelimit_us", sched_ratelimit_us);
> +
>  /* Various timer handlers. */
>  static void s_timer_fn(void *unused);
>  static void vcpu_periodic_timer_fn(void *data);
> diff -r 1c58bb664d8d xen/include/xen/perfc_defn.h
> --- a/xen/include/xen/perfc_defn.h      Thu Dec 08 17:15:16 2011 +0000
> +++ b/xen/include/xen/perfc_defn.h      Fri Dec 16 15:08:09 2011 -0500
> @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq,              "sch
>  PERFCOUNTER(sched_run,              "sched: runs through scheduler")
>  PERFCOUNTER(sched_ctx,              "sched: context switches")
> 
> +PERFCOUNTER(delay_ms,              "csched: delay")
>  PERFCOUNTER(vcpu_check,             "csched: vcpu_check")
>  PERFCOUNTER(schedule,               "csched: schedule")
>  PERFCOUNTER(acct_run,               "csched: acct_run")