All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency
@ 2011-12-17  3:24 ` Lv, Hui
  2011-12-19  8:24   ` Jan Beulich
  2011-12-19 10:54   ` George Dunlap
  0 siblings, 2 replies; 12+ messages in thread
From: Lv, Hui @ 2011-12-17  3:24 UTC (permalink / raw)
  To: xen-devel, keir, George Dunlap
  Cc: Duan, Jiangang, Yu, Zhidong, Dong, Eddie, Tian, Kevin, Lv, Hui


[-- Attachment #1.1: Type: text/plain, Size: 4986 bytes --]

The delay method for credit scheduler can do as well as SRC patch (previous one) to gain significant performance boost without obvious drawbacks.

1. Basically, the "delay method" can achieve nearly the same benefits as my previous SRC patch, 11% overall performance boost for SPECvirt than original credit scheduler.
2. We have tried 1ms delay and 10ms delay, there is no big difference between these two configurations. (1ms is enough to achieve a good performance)
3. We have compared different load level response time/latency (low, high, peak), "delay method" didn't bring very much response time increase.
4. 1ms delay can reduce 30% context switch at peak performance, where produces the benefits. ("int sched_ratelimit_us = 1000" is the recommended setting)


Signed-off-by: Hui Lv <hui.lv@intel.com<mailto:hui.lv@intel.com>>
Signed-off-by: George Dunlap <George.Dunlap@eu.citrix.com<mailto:George.Dunlap@eu.citrix.com>>

diff -r 1c58bb664d8d xen/common/sched_credit.c
--- a/xen/common/sched_credit.c Thu Dec 08 17:15:16 2011 +0000
+++ b/xen/common/sched_credit.c Fri Dec 16 15:08:09 2011 -0500
@@ -110,6 +110,9 @@ boolean_param("sched_credit_default_yiel
 static int __read_mostly sched_credit_tslice_ms = CSCHED_DEFAULT_TSLICE_MS;
 integer_param("sched_credit_tslice_ms", sched_credit_tslice_ms);

+/* Scheduler generic parameters
+*/
+extern int sched_ratelimit_us;
 /*
  * Physical CPU
  */
@@ -1297,10 +1300,15 @@ csched_schedule(
     struct csched_private *prv = CSCHED_PRIV(ops);
     struct csched_vcpu *snext;
     struct task_slice ret;
+    s_time_t runtime, tslice;

     CSCHED_STAT_CRANK(schedule);
     CSCHED_VCPU_CHECK(current);

+    runtime = now - current->runstate.state_entry_time;
+    if ( runtime < 0 ) /* Does this ever happen? */
+        runtime = 0;
+
     if ( !is_idle_vcpu(scurr->vcpu) )
     {
         /* Update credits of a non-idle VCPU. */
@@ -1313,6 +1321,41 @@ csched_schedule(
         scurr->pri = CSCHED_PRI_IDLE;
     }

+    /* Choices, choices:
+     * - If we have a tasklet, we need to run the idle vcpu no matter what.
+     * - If sched rate limiting is in effect, and the current vcpu has
+     *   run for less than that amount of time, continue the current one,
+     *   but with a shorter timeslice and return it immediately
+     * - Otherwise, chose the one with the highest priority (which may
+     *   be the one currently running)
+     * - If the currently running one is TS_OVER, see if there
+     *   is a higher priority one waiting on the runqueue of another
+     *   cpu and steal it.
+     */
+
+    /* If we have schedule rate limiting enabled, check to see
+     * how long we've run for. */
+    if ( sched_ratelimit_us
+         && !tasklet_work_scheduled
+         && vcpu_runnable(current)
+         && !is_idle_vcpu(current)
+         && runtime < MICROSECS(sched_ratelimit_us) )
+    {
+        snext = scurr;
+        snext->start_time += now;
+        perfc_incr(delay_ms);
+        tslice = MICROSECS(sched_ratelimit_us);
+        ret.migrated = 0;
+        goto out;
+    }
+    else
+    {
+        /*
+         * Select next runnable local VCPU (ie top of local runq)
+        */
+        tslice = MILLISECS(prv->tslice_ms);
+    }
+
     /*
      * Select next runnable local VCPU (ie top of local runq)
      */
@@ -1367,11 +1410,12 @@ csched_schedule(
     if ( !is_idle_vcpu(snext->vcpu) )
         snext->start_time += now;

+out:
     /*
      * Return task to run next...
      */
     ret.time = (is_idle_vcpu(snext->vcpu) ?
-                -1 : MILLISECS(prv->tslice_ms));
+                -1 : tslice);
     ret.task = snext->vcpu;

     CSCHED_VCPU_CHECK(ret.task);
diff -r 1c58bb664d8d xen/common/schedule.c
--- a/xen/common/schedule.c     Thu Dec 08 17:15:16 2011 +0000
+++ b/xen/common/schedule.c     Fri Dec 16 15:08:09 2011 -0500
@@ -47,6 +47,10 @@ string_param("sched", opt_sched);
 bool_t sched_smt_power_savings = 0;
 boolean_param("sched_smt_power_savings", sched_smt_power_savings);

+/* Default scheduling rate limit: 1ms */
+int sched_ratelimit_us = 1000;
+integer_param("sched_ratelimit_us", sched_ratelimit_us);
+
 /* Various timer handlers. */
 static void s_timer_fn(void *unused);
 static void vcpu_periodic_timer_fn(void *data);
diff -r 1c58bb664d8d xen/include/xen/perfc_defn.h
--- a/xen/include/xen/perfc_defn.h      Thu Dec 08 17:15:16 2011 +0000
+++ b/xen/include/xen/perfc_defn.h      Fri Dec 16 15:08:09 2011 -0500
@@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq,              "sch
 PERFCOUNTER(sched_run,              "sched: runs through scheduler")
 PERFCOUNTER(sched_ctx,              "sched: context switches")

+PERFCOUNTER(delay_ms,              "csched: delay")
 PERFCOUNTER(vcpu_check,             "csched: vcpu_check")
 PERFCOUNTER(schedule,               "csched: schedule")
 PERFCOUNTER(acct_run,               "csched: acct_run")


[-- Attachment #1.2: Type: text/html, Size: 13298 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency
  2011-12-17  3:24 ` [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency Lv, Hui
@ 2011-12-19  8:24   ` Jan Beulich
  2011-12-19 11:32     ` George Dunlap
  2011-12-19 13:56     ` Lv, Hui
  2011-12-19 10:54   ` George Dunlap
  1 sibling, 2 replies; 12+ messages in thread
From: Jan Beulich @ 2011-12-19  8:24 UTC (permalink / raw)
  To: Hui Lv
  Cc: Kevin Tian, xen-devel, keir, George Dunlap, Eddie Dong,
	Jiangang Duan, Zhidong Yu

>>> On 17.12.11 at 04:24, "Lv, Hui" <hui.lv@intel.com> wrote:
> The delay method for credit scheduler can do as well as SRC patch (previous 
> one) to gain significant performance boost without obvious drawbacks.
> 
> 1. Basically, the "delay method" can achieve nearly the same benefits as my 
> previous SRC patch, 11% overall performance boost for SPECvirt than original 
> credit scheduler.
> 2. We have tried 1ms delay and 10ms delay, there is no big difference 
> between these two configurations. (1ms is enough to achieve a good 
> performance)
> 3. We have compared different load level response time/latency (low, high, 
> peak), "delay method" didn't bring very much response time increase.
> 4. 1ms delay can reduce 30% context switch at peak performance, where 
> produces the benefits. ("int sched_ratelimit_us = 1000" is the recommended 
> setting)
> 
> 
> Signed-off-by: Hui Lv <hui.lv@intel.com<mailto:hui.lv@intel.com>>
> Signed-off-by: George Dunlap 
> <George.Dunlap@eu.citrix.com<mailto:George.Dunlap@eu.citrix.com>>
> 
> diff -r 1c58bb664d8d xen/common/sched_credit.c
> --- a/xen/common/sched_credit.c Thu Dec 08 17:15:16 2011 +0000
> +++ b/xen/common/sched_credit.c Fri Dec 16 15:08:09 2011 -0500
> @@ -110,6 +110,9 @@ boolean_param("sched_credit_default_yiel
>  static int __read_mostly sched_credit_tslice_ms = CSCHED_DEFAULT_TSLICE_MS;
>  integer_param("sched_credit_tslice_ms", sched_credit_tslice_ms);
> 
> +/* Scheduler generic parameters
> +*/
> +extern int sched_ratelimit_us;
>  /*
>   * Physical CPU
>   */
> @@ -1297,10 +1300,15 @@ csched_schedule(
>      struct csched_private *prv = CSCHED_PRIV(ops);
>      struct csched_vcpu *snext;
>      struct task_slice ret;
> +    s_time_t runtime, tslice;
> 
>      CSCHED_STAT_CRANK(schedule);
>      CSCHED_VCPU_CHECK(current);
> 
> +    runtime = now - current->runstate.state_entry_time;
> +    if ( runtime < 0 ) /* Does this ever happen? */
> +        runtime = 0;
> +
>      if ( !is_idle_vcpu(scurr->vcpu) )
>      {
>          /* Update credits of a non-idle VCPU. */
> @@ -1313,6 +1321,41 @@ csched_schedule(
>          scurr->pri = CSCHED_PRI_IDLE;
>      }
> 
> +    /* Choices, choices:
> +     * - If we have a tasklet, we need to run the idle vcpu no matter what.
> +     * - If sched rate limiting is in effect, and the current vcpu has
> +     *   run for less than that amount of time, continue the current one,
> +     *   but with a shorter timeslice and return it immediately
> +     * - Otherwise, chose the one with the highest priority (which may
> +     *   be the one currently running)
> +     * - If the currently running one is TS_OVER, see if there
> +     *   is a higher priority one waiting on the runqueue of another
> +     *   cpu and steal it.
> +     */
> +
> +    /* If we have schedule rate limiting enabled, check to see
> +     * how long we've run for. */
> +    if ( sched_ratelimit_us
> +         && !tasklet_work_scheduled

How about making the checking order match the description above
(which might also be slightly faster given that tasklet_work_scheduled
is a function parameter [in a register or written recently], and [given
the default value you're picking] you expect sched_ratelimit_us to be
non-zero generally)?

> +         && vcpu_runnable(current)
> +         && !is_idle_vcpu(current)
> +         && runtime < MICROSECS(sched_ratelimit_us) )
> +    {
> +        snext = scurr;
> +        snext->start_time += now;
> +        perfc_incr(delay_ms);
> +        tslice = MICROSECS(sched_ratelimit_us);

So if there happens to be a VM with 

MILLISECS(prv->tslice_ms) < MICROSECS(sched_ratelimit_us)

it'd get *more* time than allowed/intended through this mechanism.

Jan

> +        ret.migrated = 0;
> +        goto out;
> +    }
> +    else
> +    {
> +        /*
> +         * Select next runnable local VCPU (ie top of local runq)
> +        */
> +        tslice = MILLISECS(prv->tslice_ms);
> +    }
> +
>      /*
>       * Select next runnable local VCPU (ie top of local runq)
>       */
> @@ -1367,11 +1410,12 @@ csched_schedule(
>      if ( !is_idle_vcpu(snext->vcpu) )
>          snext->start_time += now;
> 
> +out:
>      /*
>       * Return task to run next...
>       */
>      ret.time = (is_idle_vcpu(snext->vcpu) ?
> -                -1 : MILLISECS(prv->tslice_ms));
> +                -1 : tslice);
>      ret.task = snext->vcpu;
> 
>      CSCHED_VCPU_CHECK(ret.task);
> diff -r 1c58bb664d8d xen/common/schedule.c
> --- a/xen/common/schedule.c     Thu Dec 08 17:15:16 2011 +0000
> +++ b/xen/common/schedule.c     Fri Dec 16 15:08:09 2011 -0500
> @@ -47,6 +47,10 @@ string_param("sched", opt_sched);
>  bool_t sched_smt_power_savings = 0;
>  boolean_param("sched_smt_power_savings", sched_smt_power_savings);
> 
> +/* Default scheduling rate limit: 1ms */
> +int sched_ratelimit_us = 1000;
> +integer_param("sched_ratelimit_us", sched_ratelimit_us);
> +
>  /* Various timer handlers. */
>  static void s_timer_fn(void *unused);
>  static void vcpu_periodic_timer_fn(void *data);
> diff -r 1c58bb664d8d xen/include/xen/perfc_defn.h
> --- a/xen/include/xen/perfc_defn.h      Thu Dec 08 17:15:16 2011 +0000
> +++ b/xen/include/xen/perfc_defn.h      Fri Dec 16 15:08:09 2011 -0500
> @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq,              "sch
>  PERFCOUNTER(sched_run,              "sched: runs through scheduler")
>  PERFCOUNTER(sched_ctx,              "sched: context switches")
> 
> +PERFCOUNTER(delay_ms,              "csched: delay")
>  PERFCOUNTER(vcpu_check,             "csched: vcpu_check")
>  PERFCOUNTER(schedule,               "csched: schedule")
>  PERFCOUNTER(acct_run,               "csched: acct_run")

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency
  2011-12-17  3:24 ` [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency Lv, Hui
  2011-12-19  8:24   ` Jan Beulich
@ 2011-12-19 10:54   ` George Dunlap
  1 sibling, 0 replies; 12+ messages in thread
From: George Dunlap @ 2011-12-19 10:54 UTC (permalink / raw)
  To: Lv, Hui
  Cc: Tian, Kevin, xen-devel, keir, Dong, Eddie, Duan, Jiangang, Yu, Zhidong

Hui,

Unfortunately your mailer is mangling the patch:

 static int __read_mostly sched_credit_tslice_ms =3D CSCHED_DEFAULT_TSLICE_=
MS;

Try using "hg email", or sending it as an attachment.

 -George

On Sat, Dec 17, 2011 at 3:24 AM, Lv, Hui <hui.lv@intel.com> wrote:
> The delay method for credit scheduler can do as well as SRC patch (previous
> one) to gain significant performance boost without obvious drawbacks.
>
> 1. Basically, the "delay method" can achieve nearly the same benefits as my
> previous SRC patch, 11% overall performance boost for SPECvirt than original
> credit scheduler.
> 2. We have tried 1ms delay and 10ms delay, there is no big difference
> between these two configurations. (1ms is enough to achieve a good
> performance)
> 3. We have compared different load level response time/latency (low, high,
> peak), "delay method" didn't bring very much response time increase.
> 4. 1ms delay can reduce 30% context switch at peak performance, where
> produces the benefits. (“int sched_ratelimit_us = 1000” is the recommended
> setting)
>
>
> Signed-off-by: Hui Lv <hui.lv@intel.com>
> Signed-off-by: George Dunlap <George.Dunlap@eu.citrix.com>
>
> diff -r 1c58bb664d8d xen/common/sched_credit.c
> --- a/xen/common/sched_credit.c Thu Dec 08 17:15:16 2011 +0000
> +++ b/xen/common/sched_credit.c Fri Dec 16 15:08:09 2011 -0500
> @@ -110,6 +110,9 @@ boolean_param("sched_credit_default_yiel
> static int __read_mostly sched_credit_tslice_ms = CSCHED_DEFAULT_TSLICE_MS;
> integer_param("sched_credit_tslice_ms", sched_credit_tslice_ms);
>
> +/* Scheduler generic parameters
> +*/
> +extern int sched_ratelimit_us;
> /*
>   * Physical CPU
>   */
> @@ -1297,10 +1300,15 @@ csched_schedule(
>      struct csched_private *prv = CSCHED_PRIV(ops);
>      struct csched_vcpu *snext;
>      struct task_slice ret;
> +    s_time_t runtime, tslice;
>
>      CSCHED_STAT_CRANK(schedule);
>      CSCHED_VCPU_CHECK(current);
>
> +    runtime = now - current->runstate.state_entry_time;
> +    if ( runtime < 0 ) /* Does this ever happen? */
> +        runtime = 0;
> +
>      if ( !is_idle_vcpu(scurr->vcpu) )
>      {
>          /* Update credits of a non-idle VCPU. */
> @@ -1313,6 +1321,41 @@ csched_schedule(
>          scurr->pri = CSCHED_PRI_IDLE;
>      }
>
> +    /* Choices, choices:
> +     * - If we have a tasklet, we need to run the idle vcpu no matter what.
> +     * - If sched rate limiting is in effect, and the current vcpu has
> +     *   run for less than that amount of time, continue the current one,
> +     *   but with a shorter timeslice and return it immediately
> +     * - Otherwise, chose the one with the highest priority (which may
> +     *   be the one currently running)
> +     * - If the currently running one is TS_OVER, see if there
> +     *   is a higher priority one waiting on the runqueue of another
> +     *   cpu and steal it.
> +     */
> +
> +    /* If we have schedule rate limiting enabled, check to see
> +     * how long we've run for. */
> +    if ( sched_ratelimit_us
> +         && !tasklet_work_scheduled
> +         && vcpu_runnable(current)
> +         && !is_idle_vcpu(current)
> +         && runtime < MICROSECS(sched_ratelimit_us) )
> +    {
> +        snext = scurr;
> +        snext->start_time += now;
> +        perfc_incr(delay_ms);
> +        tslice = MICROSECS(sched_ratelimit_us);
> +        ret.migrated = 0;
> +        goto out;
> +    }
> +    else
> +    {
> +        /*
> +         * Select next runnable local VCPU (ie top of local runq)
> +        */
> +        tslice = MILLISECS(prv->tslice_ms);
> +    }
> +
>      /*
>       * Select next runnable local VCPU (ie top of local runq)
>       */
> @@ -1367,11 +1410,12 @@ csched_schedule(
>      if ( !is_idle_vcpu(snext->vcpu) )
>          snext->start_time += now;
>
> +out:
>      /*
>       * Return task to run next...
>       */
>      ret.time = (is_idle_vcpu(snext->vcpu) ?
> -                -1 : MILLISECS(prv->tslice_ms));
> +                -1 : tslice);
>      ret.task = snext->vcpu;
>
>      CSCHED_VCPU_CHECK(ret.task);
> diff -r 1c58bb664d8d xen/common/schedule.c
> --- a/xen/common/schedule.c     Thu Dec 08 17:15:16 2011 +0000
> +++ b/xen/common/schedule.c     Fri Dec 16 15:08:09 2011 -0500
> @@ -47,6 +47,10 @@ string_param("sched", opt_sched);
> bool_t sched_smt_power_savings = 0;
> boolean_param("sched_smt_power_savings", sched_smt_power_savings);
>
> +/* Default scheduling rate limit: 1ms */
> +int sched_ratelimit_us = 1000;
> +integer_param("sched_ratelimit_us", sched_ratelimit_us);
> +
> /* Various timer handlers. */
> static void s_timer_fn(void *unused);
> static void vcpu_periodic_timer_fn(void *data);
> diff -r 1c58bb664d8d xen/include/xen/perfc_defn.h
> --- a/xen/include/xen/perfc_defn.h      Thu Dec 08 17:15:16 2011 +0000
> +++ b/xen/include/xen/perfc_defn.h      Fri Dec 16 15:08:09 2011 -0500
> @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq,              "sch
> PERFCOUNTER(sched_run,              "sched: runs through scheduler")
> PERFCOUNTER(sched_ctx,              "sched: context switches")
>
> +PERFCOUNTER(delay_ms,              "csched: delay")
> PERFCOUNTER(vcpu_check,             "csched: vcpu_check")
> PERFCOUNTER(schedule,               "csched: schedule")
> PERFCOUNTER(acct_run,               "csched: acct_run")
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency
  2011-12-19  8:24   ` Jan Beulich
@ 2011-12-19 11:32     ` George Dunlap
  2011-12-19 12:05       ` Jan Beulich
  2011-12-19 13:56     ` Lv, Hui
  1 sibling, 1 reply; 12+ messages in thread
From: George Dunlap @ 2011-12-19 11:32 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, xen-devel, keir, Eddie Dong, Hui Lv, Jiangang Duan,
	Zhidong Yu

On Mon, Dec 19, 2011 at 8:24 AM, Jan Beulich <JBeulich@suse.com> wrote:
>> +         && vcpu_runnable(current)
>> +         && !is_idle_vcpu(current)
>> +         && runtime < MICROSECS(sched_ratelimit_us) )
>> +    {
>> +        snext = scurr;
>> +        snext->start_time += now;
>> +        perfc_incr(delay_ms);
>> +        tslice = MICROSECS(sched_ratelimit_us);
>
> So if there happens to be a VM with
>
> MILLISECS(prv->tslice_ms) < MICROSECS(sched_ratelimit_us)
>
> it'd get *more* time than allowed/intended through this mechanism.

Yeah, if you set your default timeslice to 1ms, and then set your
minimum scheduling rate to 5ms, you're going to get weird results. :-)

The way it stands now, the ratelimit value will override the timeslice
value.  It had to be one way or the other; do you think the timeslice
value should be the priority?

 -George

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency
  2011-12-19 11:32     ` George Dunlap
@ 2011-12-19 12:05       ` Jan Beulich
  2011-12-19 13:59         ` George Dunlap
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2011-12-19 12:05 UTC (permalink / raw)
  To: George Dunlap
  Cc: Kevin Tian, xen-devel, keir, Eddie Dong, Hui Lv, Jiangang Duan,
	Zhidong Yu

>>> On 19.12.11 at 12:32, George Dunlap <George.Dunlap@eu.citrix.com> wrote:
> On Mon, Dec 19, 2011 at 8:24 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>> +         && vcpu_runnable(current)
>>> +         && !is_idle_vcpu(current)
>>> +         && runtime < MICROSECS(sched_ratelimit_us) )
>>> +    {
>>> +        snext = scurr;
>>> +        snext->start_time += now;
>>> +        perfc_incr(delay_ms);
>>> +        tslice = MICROSECS(sched_ratelimit_us);
>>
>> So if there happens to be a VM with
>>
>> MILLISECS(prv->tslice_ms) < MICROSECS(sched_ratelimit_us)
>>
>> it'd get *more* time than allowed/intended through this mechanism.
> 
> Yeah, if you set your default timeslice to 1ms, and then set your
> minimum scheduling rate to 5ms, you're going to get weird results. :-)
> 
> The way it stands now, the ratelimit value will override the timeslice
> value.  It had to be one way or the other; do you think the timeslice
> value should be the priority?

The minimum of both should be used, I would think.

Jan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency
  2011-12-19  8:24   ` Jan Beulich
  2011-12-19 11:32     ` George Dunlap
@ 2011-12-19 13:56     ` Lv, Hui
  1 sibling, 0 replies; 12+ messages in thread
From: Lv, Hui @ 2011-12-19 13:56 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, 'xen-devel@lists.xensource.com'

> > +    /* If we have schedule rate limiting enabled, check to see
> > +     * how long we've run for. */
> > +    if ( sched_ratelimit_us
> > +         && !tasklet_work_scheduled
> 
> How about making the checking order match the description above (which
> might also be slightly faster given that tasklet_work_scheduled is a function
> parameter [in a register or written recently], and [given the default value
> you're picking] you expect sched_ratelimit_us to be non-zero generally)?
> 
Thanks, agree this.

> > +         && vcpu_runnable(current)
> > +         && !is_idle_vcpu(current)
> > +         && runtime < MICROSECS(sched_ratelimit_us) )
> > +    {
> > +        snext = scurr;
> > +        snext->start_time += now;
> > +        perfc_incr(delay_ms);
> > +        tslice = MICROSECS(sched_ratelimit_us);
> 
> So if there happens to be a VM with
> 
> MILLISECS(prv->tslice_ms) < MICROSECS(sched_ratelimit_us)
> 
> it'd get *more* time than allowed/intended through this mechanism.

First, it does not make sense to set prv->tslice_ms < sched_ratelimit_us. If people really need an extreme smaller time slice(1ms, for example), remember to set sched_ratelimit_us smaller than prv->tslice_ms or zero.
If people forgot to do so, I think sched_ratelimit_us is higher priority to be considered (instead of minimum of both), although this is not the "right"xe configuration.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency
  2011-12-19 12:05       ` Jan Beulich
@ 2011-12-19 13:59         ` George Dunlap
  2011-12-19 14:59           ` Jan Beulich
  0 siblings, 1 reply; 12+ messages in thread
From: George Dunlap @ 2011-12-19 13:59 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, xen-devel, keir, Eddie Dong, Hui Lv, Jiangang Duan,
	Zhidong Yu

On Mon, Dec 19, 2011 at 12:05 PM, Jan Beulich <JBeulich@suse.com> wrote:
>> The way it stands now, the ratelimit value will override the timeslice
>> value.  It had to be one way or the other; do you think the timeslice
>> value should be the priority?
>
> The minimum of both should be used, I would think.

What do you mean?  You mean in the assignment above?

That won't have any effect other than increasing interrupts and trips
through the scheduler.  Suppose the following set of events:
* timeslice is set to 1ms, ratelimit_us to 5000.
* a vcpu V is chosen; it's set to run with 1ms timeout.
* 1ms later, we go through the scheduler; the ratelimit code
determines that it hasn't run for long enough (only 1ms), so choses it
to run again (with a 1ms timeslice)
* Repeat until V has run for 5ms.

So although the timeslice is set to 1ms, and interrupts are happening
after 1ms, the ratelimit is overriding the 1ms of the timeslice and
making it 5ms.  Fundamentally, one of the two parameters must override
the other.  With this patch the way it is now, ratelimit will override
timeslice.  if you want the timeslice to override ratelimit, then
there will have to be more code to make that happen.

 -George

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency
  2011-12-19 13:59         ` George Dunlap
@ 2011-12-19 14:59           ` Jan Beulich
  2011-12-19 15:25             ` Lv, Hui
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2011-12-19 14:59 UTC (permalink / raw)
  To: George Dunlap
  Cc: Kevin Tian, xen-devel, keir, Eddie Dong, Hui Lv, Jiangang Duan,
	Zhidong Yu

>>> On 19.12.11 at 14:59, George Dunlap <George.Dunlap@eu.citrix.com> wrote:
> On Mon, Dec 19, 2011 at 12:05 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>> The way it stands now, the ratelimit value will override the timeslice
>>> value.  It had to be one way or the other; do you think the timeslice
>>> value should be the priority?
>>
>> The minimum of both should be used, I would think.
> 
> What do you mean?  You mean in the assignment above?

Yes, I had thought that this would suffice. But ...

> That won't have any effect other than increasing interrupts and trips
> through the scheduler.  Suppose the following set of events:
> * timeslice is set to 1ms, ratelimit_us to 5000.
> * a vcpu V is chosen; it's set to run with 1ms timeout.
> * 1ms later, we go through the scheduler; the ratelimit code
> determines that it hasn't run for long enough (only 1ms), so choses it
> to run again (with a 1ms timeslice)
> * Repeat until V has run for 5ms.

... if that's what's happening, the whole thing looks bogus to me.
Clearly the rate limit code should not force the time slice to be
extended.

> So although the timeslice is set to 1ms, and interrupts are happening
> after 1ms, the ratelimit is overriding the 1ms of the timeslice and
> making it 5ms.  Fundamentally, one of the two parameters must override
> the other.  With this patch the way it is now, ratelimit will override
> timeslice.  if you want the timeslice to override ratelimit, then
> there will have to be more code to make that happen.

Overriding the rate limit by the time slice isn't the right thing either,
as that (the way I "read" it) means there's no rate limiting at all.
What "rate limit" to me means is preventing quickly switching away
from a vCPU recently scheduled without extending its (remaining)
time slice, i.e. in any place a respective evaluation is done the
shorter of the two should be used.

Jan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency
  2011-12-19 14:59           ` Jan Beulich
@ 2011-12-19 15:25             ` Lv, Hui
  2011-12-19 15:38               ` Jan Beulich
  0 siblings, 1 reply; 12+ messages in thread
From: Lv, Hui @ 2011-12-19 15:25 UTC (permalink / raw)
  To: Jan Beulich, George Dunlap
  Cc: Tian, Kevin, xen-devel, keir, Dong, Eddie, Duan, Jiangang, Yu, Zhidong


> Overriding the rate limit by the time slice isn't the right thing either, as that
> (the way I "read" it) means there's no rate limiting at all.
> What "rate limit" to me means is preventing quickly switching away from a
> vCPU recently scheduled without extending its (remaining) time slice, i.e. in any
> place a respective evaluation is done the shorter of the two should be used.
> 
> Jan

So the basic thing is to avoid "time slice" < "rate limit", happen.
I really don't understand why people want a 1ms time slice, but set the rate_limit to 5000(us), that is insubstantial.

If, this unfortunately happens, I prefer to put "rate_limit" at higher priority, which means extending the running time slice.
Some warnings should be put before the parameter of sched_ratelimit_us to avoid this.
Is this reasonable?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency
  2011-12-19 15:25             ` Lv, Hui
@ 2011-12-19 15:38               ` Jan Beulich
  2011-12-19 16:48                 ` George Dunlap
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2011-12-19 15:38 UTC (permalink / raw)
  To: George Dunlap, Hui Lv
  Cc: Kevin Tian, xen-devel, keir, Eddie Dong, Jiangang Duan, Zhidong Yu

>>> On 19.12.11 at 16:25, "Lv, Hui" <hui.lv@intel.com> wrote:
>> Overriding the rate limit by the time slice isn't the right thing either, as 
> that
>> (the way I "read" it) means there's no rate limiting at all.
>> What "rate limit" to me means is preventing quickly switching away from a
>> vCPU recently scheduled without extending its (remaining) time slice, i.e. 
> in any
>> place a respective evaluation is done the shorter of the two should be used.
>> 
>> Jan
> 
> So the basic thing is to avoid "time slice" < "rate limit", happen.
> I really don't understand why people want a 1ms time slice, but set the 
> rate_limit to 5000(us), that is insubstantial.

So if someone set the (global) rate limit value to 5000us and then,
days or weeks later, migrates a VM with a 3ms time slice to that
host, why should this be an admin mistake? To me, the rate limit is a
performance improvement, while the time slice may be (an indirect
result of) a to be enforced policy.

Jan

> If, this unfortunately happens, I prefer to put "rate_limit" at higher 
> priority, which means extending the running time slice.
> Some warnings should be put before the parameter of sched_ratelimit_us to 
> avoid this.
> Is this reasonable?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency
  2011-12-19 15:38               ` Jan Beulich
@ 2011-12-19 16:48                 ` George Dunlap
  2011-12-19 17:04                   ` Jan Beulich
  0 siblings, 1 reply; 12+ messages in thread
From: George Dunlap @ 2011-12-19 16:48 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, xen-devel, Keir (Xen.org),
	George Dunlap, Eddie Dong, Hui Lv, Jiangang Duan, Zhidong Yu

On Mon, 2011-12-19 at 15:38 +0000, Jan Beulich wrote:
> >>> On 19.12.11 at 16:25, "Lv, Hui" <hui.lv@intel.com> wrote:
> >> Overriding the rate limit by the time slice isn't the right thing either, as 
> > that
> >> (the way I "read" it) means there's no rate limiting at all.
> >> What "rate limit" to me means is preventing quickly switching away from a
> >> vCPU recently scheduled without extending its (remaining) time slice, i.e. 
> > in any
> >> place a respective evaluation is done the shorter of the two should be used.
> >> 
> >> Jan
> > 
> > So the basic thing is to avoid "time slice" < "rate limit", happen.
> > I really don't understand why people want a 1ms time slice, but set the 
> > rate_limit to 5000(us), that is insubstantial.
> 
> So if someone set the (global) rate limit value to 5000us and then,
> days or weeks later, migrates a VM with a 3ms time slice to that
> host, why should this be an admin mistake? To me, the rate limit is a
> performance improvement, while the time slice may be (an indirect
> result of) a to be enforced policy.

Right now the timeslice is effectively global.  There's a per-scheduler
variable, but it's only set from the boot-time option.  Before 4.2 I'm
going to add some code that will allow that to be changed on a scheduler
granularity; but there was never a plan to make it per-VM.

It would make sense, in theory, to let the VM run for the rest of its
timeslice; but there's not an easy way at the moment to get that
information from an already-running vcpu.  The timescales we're talking
about here are so small that it doesn't seem worth the extra
complication to me.

We're really bike-shedding at this point.  I don't think
functionality-wise it matters either way, and the code is simpler the
way it is.  I think we should just leave it.

 -George

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency
  2011-12-19 16:48                 ` George Dunlap
@ 2011-12-19 17:04                   ` Jan Beulich
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2011-12-19 17:04 UTC (permalink / raw)
  To: George Dunlap, George.Dunlap
  Cc: Kevin Tian, xen-devel, Keir(Xen.org),
	Eddie Dong, Hui Lv, Jiangang Duan, Zhidong Yu

>>> On 19.12.11 at 17:48, George Dunlap <george.dunlap@citrix.com> wrote:
> On Mon, 2011-12-19 at 15:38 +0000, Jan Beulich wrote:
>> >>> On 19.12.11 at 16:25, "Lv, Hui" <hui.lv@intel.com> wrote:
>> >> Overriding the rate limit by the time slice isn't the right thing either, 
> as 
>> > that
>> >> (the way I "read" it) means there's no rate limiting at all.
>> >> What "rate limit" to me means is preventing quickly switching away from a
>> >> vCPU recently scheduled without extending its (remaining) time slice, i.e. 
>> > in any
>> >> place a respective evaluation is done the shorter of the two should be 
> used.
>> >> 
>> >> Jan
>> > 
>> > So the basic thing is to avoid "time slice" < "rate limit", happen.
>> > I really don't understand why people want a 1ms time slice, but set the 
>> > rate_limit to 5000(us), that is insubstantial.
>> 
>> So if someone set the (global) rate limit value to 5000us and then,
>> days or weeks later, migrates a VM with a 3ms time slice to that
>> host, why should this be an admin mistake? To me, the rate limit is a
>> performance improvement, while the time slice may be (an indirect
>> result of) a to be enforced policy.
> 
> Right now the timeslice is effectively global.  There's a per-scheduler
> variable, but it's only set from the boot-time option.  Before 4.2 I'm
> going to add some code that will allow that to be changed on a scheduler
> granularity; but there was never a plan to make it per-VM.

Oh, okay, I missed that point. In that case just warning about the
two values having a bad relation would seem fine.

Sorry for the noise then.

Jan

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-12-19 17:04 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <Acy8a1J98NIQa/LRTpOgi509iCZVqg==>
2011-12-17  3:24 ` [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency Lv, Hui
2011-12-19  8:24   ` Jan Beulich
2011-12-19 11:32     ` George Dunlap
2011-12-19 12:05       ` Jan Beulich
2011-12-19 13:59         ` George Dunlap
2011-12-19 14:59           ` Jan Beulich
2011-12-19 15:25             ` Lv, Hui
2011-12-19 15:38               ` Jan Beulich
2011-12-19 16:48                 ` George Dunlap
2011-12-19 17:04                   ` Jan Beulich
2011-12-19 13:56     ` Lv, Hui
2011-12-19 10:54   ` George Dunlap

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.