All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jan Beulich" <JBeulich@suse.com>
To: Hui Lv <hui.lv@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>,
	"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
	"keir@xen.org" <keir@xen.org>,
	George Dunlap <George.Dunlap@eu.citrix.com>,
	Eddie Dong <eddie.dong@intel.com>,
	Jiangang Duan <jiangang.duan@intel.com>,
	Zhidong Yu <zhidong.yu@intel.com>
Subject: Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency
Date: Mon, 19 Dec 2011 08:24:50 +0000	[thread overview]
Message-ID: <4EEF02E20200007800068C60@nat28.tlf.novell.com> (raw)
In-Reply-To: <C10D3FB0CD45994C8A51FEC1227CE22F37CC284376@shsmsx502.ccr.corp.intel.com>

>>> On 17.12.11 at 04:24, "Lv, Hui" <hui.lv@intel.com> wrote:
> The delay method for credit scheduler can do as well as SRC patch (previous 
> one) to gain significant performance boost without obvious drawbacks.
> 
> 1. Basically, the "delay method" can achieve nearly the same benefits as my 
> previous SRC patch, 11% overall performance boost for SPECvirt than original 
> credit scheduler.
> 2. We have tried 1ms delay and 10ms delay, there is no big difference 
> between these two configurations. (1ms is enough to achieve a good 
> performance)
> 3. We have compared different load level response time/latency (low, high, 
> peak), "delay method" didn't bring very much response time increase.
> 4. 1ms delay can reduce 30% context switch at peak performance, where 
> produces the benefits. ("int sched_ratelimit_us = 1000" is the recommended 
> setting)
> 
> 
> Signed-off-by: Hui Lv <hui.lv@intel.com<mailto:hui.lv@intel.com>>
> Signed-off-by: George Dunlap 
> <George.Dunlap@eu.citrix.com<mailto:George.Dunlap@eu.citrix.com>>
> 
> diff -r 1c58bb664d8d xen/common/sched_credit.c
> --- a/xen/common/sched_credit.c Thu Dec 08 17:15:16 2011 +0000
> +++ b/xen/common/sched_credit.c Fri Dec 16 15:08:09 2011 -0500
> @@ -110,6 +110,9 @@ boolean_param("sched_credit_default_yiel
>  static int __read_mostly sched_credit_tslice_ms = CSCHED_DEFAULT_TSLICE_MS;
>  integer_param("sched_credit_tslice_ms", sched_credit_tslice_ms);
> 
> +/* Scheduler generic parameters
> +*/
> +extern int sched_ratelimit_us;
>  /*
>   * Physical CPU
>   */
> @@ -1297,10 +1300,15 @@ csched_schedule(
>      struct csched_private *prv = CSCHED_PRIV(ops);
>      struct csched_vcpu *snext;
>      struct task_slice ret;
> +    s_time_t runtime, tslice;
> 
>      CSCHED_STAT_CRANK(schedule);
>      CSCHED_VCPU_CHECK(current);
> 
> +    runtime = now - current->runstate.state_entry_time;
> +    if ( runtime < 0 ) /* Does this ever happen? */
> +        runtime = 0;
> +
>      if ( !is_idle_vcpu(scurr->vcpu) )
>      {
>          /* Update credits of a non-idle VCPU. */
> @@ -1313,6 +1321,41 @@ csched_schedule(
>          scurr->pri = CSCHED_PRI_IDLE;
>      }
> 
> +    /* Choices, choices:
> +     * - If we have a tasklet, we need to run the idle vcpu no matter what.
> +     * - If sched rate limiting is in effect, and the current vcpu has
> +     *   run for less than that amount of time, continue the current one,
> +     *   but with a shorter timeslice and return it immediately
> +     * - Otherwise, chose the one with the highest priority (which may
> +     *   be the one currently running)
> +     * - If the currently running one is TS_OVER, see if there
> +     *   is a higher priority one waiting on the runqueue of another
> +     *   cpu and steal it.
> +     */
> +
> +    /* If we have schedule rate limiting enabled, check to see
> +     * how long we've run for. */
> +    if ( sched_ratelimit_us
> +         && !tasklet_work_scheduled

How about making the checking order match the description above
(which might also be slightly faster given that tasklet_work_scheduled
is a function parameter [in a register or written recently], and [given
the default value you're picking] you expect sched_ratelimit_us to be
non-zero generally)?

> +         && vcpu_runnable(current)
> +         && !is_idle_vcpu(current)
> +         && runtime < MICROSECS(sched_ratelimit_us) )
> +    {
> +        snext = scurr;
> +        snext->start_time += now;
> +        perfc_incr(delay_ms);
> +        tslice = MICROSECS(sched_ratelimit_us);

So if there happens to be a VM with 

MILLISECS(prv->tslice_ms) < MICROSECS(sched_ratelimit_us)

it'd get *more* time than allowed/intended through this mechanism.

Jan

> +        ret.migrated = 0;
> +        goto out;
> +    }
> +    else
> +    {
> +        /*
> +         * Select next runnable local VCPU (ie top of local runq)
> +        */
> +        tslice = MILLISECS(prv->tslice_ms);
> +    }
> +
>      /*
>       * Select next runnable local VCPU (ie top of local runq)
>       */
> @@ -1367,11 +1410,12 @@ csched_schedule(
>      if ( !is_idle_vcpu(snext->vcpu) )
>          snext->start_time += now;
> 
> +out:
>      /*
>       * Return task to run next...
>       */
>      ret.time = (is_idle_vcpu(snext->vcpu) ?
> -                -1 : MILLISECS(prv->tslice_ms));
> +                -1 : tslice);
>      ret.task = snext->vcpu;
> 
>      CSCHED_VCPU_CHECK(ret.task);
> diff -r 1c58bb664d8d xen/common/schedule.c
> --- a/xen/common/schedule.c     Thu Dec 08 17:15:16 2011 +0000
> +++ b/xen/common/schedule.c     Fri Dec 16 15:08:09 2011 -0500
> @@ -47,6 +47,10 @@ string_param("sched", opt_sched);
>  bool_t sched_smt_power_savings = 0;
>  boolean_param("sched_smt_power_savings", sched_smt_power_savings);
> 
> +/* Default scheduling rate limit: 1ms */
> +int sched_ratelimit_us = 1000;
> +integer_param("sched_ratelimit_us", sched_ratelimit_us);
> +
>  /* Various timer handlers. */
>  static void s_timer_fn(void *unused);
>  static void vcpu_periodic_timer_fn(void *data);
> diff -r 1c58bb664d8d xen/include/xen/perfc_defn.h
> --- a/xen/include/xen/perfc_defn.h      Thu Dec 08 17:15:16 2011 +0000
> +++ b/xen/include/xen/perfc_defn.h      Fri Dec 16 15:08:09 2011 -0500
> @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq,              "sch
>  PERFCOUNTER(sched_run,              "sched: runs through scheduler")
>  PERFCOUNTER(sched_ctx,              "sched: context switches")
> 
> +PERFCOUNTER(delay_ms,              "csched: delay")
>  PERFCOUNTER(vcpu_check,             "csched: vcpu_check")
>  PERFCOUNTER(schedule,               "csched: schedule")
>  PERFCOUNTER(acct_run,               "csched: acct_run")

  reply	other threads:[~2011-12-19  8:24 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <Acy8a1J98NIQa/LRTpOgi509iCZVqg==>
2011-12-17  3:24 ` [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency Lv, Hui
2011-12-19  8:24   ` Jan Beulich [this message]
2011-12-19 11:32     ` George Dunlap
2011-12-19 12:05       ` Jan Beulich
2011-12-19 13:59         ` George Dunlap
2011-12-19 14:59           ` Jan Beulich
2011-12-19 15:25             ` Lv, Hui
2011-12-19 15:38               ` Jan Beulich
2011-12-19 16:48                 ` George Dunlap
2011-12-19 17:04                   ` Jan Beulich
2011-12-19 13:56     ` Lv, Hui
2011-12-19 10:54   ` George Dunlap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4EEF02E20200007800068C60@nat28.tlf.novell.com \
    --to=jbeulich@suse.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=eddie.dong@intel.com \
    --cc=hui.lv@intel.com \
    --cc=jiangang.duan@intel.com \
    --cc=keir@xen.org \
    --cc=kevin.tian@intel.com \
    --cc=xen-devel@lists.xensource.com \
    --cc=zhidong.yu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.