xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/16] Fixes and improvement (including hard affinity!) for Credit2
@ 2016-03-18 19:03 Dario Faggioli
  2016-03-18 19:04 ` [PATCH 01/16] xen: sched: fix locking when allocating an RTDS pCPU Dario Faggioli
                   ` (15 more replies)
  0 siblings, 16 replies; 61+ messages in thread
From: Dario Faggioli @ 2016-03-18 19:03 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Justin Weaver, George Dunlap, Tianyang Chen,
	Robert VanVossen, Uma Sharma, Josh Whitehead, Meng Xu,
	Jan Beulich

Hi everyone,

This series implements a bunch of both bugfixes and improvements for scheduling
in general, but mostly for Credit2.

Almost all the patches have actually been posted already (so that's why some of
them have a mention of what changed since v1), but I felt like the best thing
was just to put together a new series containing them all (and hence I'm
considering this here to be version 1 of it).

So, quickly, what we do is:
 a) fix a potential race and an actual ASSERT(), when moving pCPUs around
    cpupools;
 b) fix how Credit2 constructs its runqueue;
 c) implement hard affinity in Credit2;
 d) a couple more (minor) fixes and improvements.

In particular, to make a) possible, quite a few restructuring of the way in
which we deal with pCPUs changing scheduler when they move between pools
(basically, in schedule_cpu_switch()) has been necessary.

That also helped making b) possible, so it was a win-win. :-)

And finally, yes, I know it's late in this release cycle, but still would like
to give this series a chance to get in Xen 4.7. (BTW, technically, it's really
mostly bugfixing, the only exception being hard affinity for Credit2).

Thanks and Regards,
Dario

---
Dario Faggioli (14):
      xen: sched: fix locking when allocating an RTDS pCPU
      xen: sched: add .init_pdata hook to the scheduler interface
      xen: sched: make implementing .alloc_pdata optional
      xen: sched: implement .init_pdata in all schedulers
      xen: sched: move pCPU initialization in an helper
      xen: sched: prepare a .switch_sched hook for Credit1
      xen: sched: prepare a .switch_sched hook for Credit2
      xen: sched: prepare a .switch_sched hook for Credit2
      xen: sched: close potential races when switching scheduler to CPUs
      xen: sched: on Credit2, don't reprogram the timer if idle
      xen: sched: fix per-socket runqueue creation in credit2
      xen: sched: allow for choosing credit2 runqueues configuration at boot
      xen: sched: per-core runqueues as default in credit2
      xen: sched: scratch space for cpumasks on Credit2

Justin Weaver (1):
      xen: sched: implement vcpu hard affinity in Credit2

Uma Sharma (1):
      xen: sched: improve credit2 bootparams' scope, placement and signedness

 docs/misc/xen-command-line.markdown |   19 +
 xen/common/sched_arinc653.c         |   31 --
 xen/common/sched_credit.c           |   75 +++++-
 xen/common/sched_credit2.c          |  467 +++++++++++++++++++++++++----------
 xen/common/sched_rt.c               |   69 ++++-
 xen/common/schedule.c               |   53 +++-
 xen/include/xen/sched-if.h          |    5 
 7 files changed, 505 insertions(+), 214 deletions(-)
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 01/16] xen: sched: fix locking when allocating an RTDS pCPU
  2016-03-18 19:03 [PATCH 00/16] Fixes and improvement (including hard affinity!) for Credit2 Dario Faggioli
@ 2016-03-18 19:04 ` Dario Faggioli
  2016-03-19  2:22   ` Meng Xu
  2016-03-23 15:37   ` George Dunlap
  2016-03-18 19:04 ` [PATCH 02/16] xen: sched: add .init_pdata hook to the scheduler interface Dario Faggioli
                   ` (14 subsequent siblings)
  15 siblings, 2 replies; 61+ messages in thread
From: Dario Faggioli @ 2016-03-18 19:04 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Tianyang Chen, Meng Xu

as doing that include changing the scheduler lock
mapping for the pCPU itself, and the correct way
of doing that is:
 - take the lock that the pCPU is using right now
   (which may be the lock of another scheduler);
 - change the mapping of the lock to the RTDS one;
 - release the lock (the one that has actually been
   taken!)

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: Meng Xu <mengxu@cis.upenn.edu>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Tianyang Chen <tiche@seas.upenn.edu>
---
 xen/common/sched_rt.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index c896a6f..d98bfb6 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -653,11 +653,16 @@ static void *
 rt_alloc_pdata(const struct scheduler *ops, int cpu)
 {
     struct rt_private *prv = rt_priv(ops);
+    spinlock_t *old_lock;
     unsigned long flags;
 
-    spin_lock_irqsave(&prv->lock, flags);
+    /* Move the scheduler lock to our global runqueue lock.  */
+    old_lock = pcpu_schedule_lock_irqsave(cpu, &flags);
+
     per_cpu(schedule_data, cpu).schedule_lock = &prv->lock;
-    spin_unlock_irqrestore(&prv->lock, flags);
+
+    /* _Not_ pcpu_schedule_unlock(): per_cpu().schedule_lock changed! */
+    spin_unlock_irqrestore(old_lock, flags);
 
     if ( !alloc_cpumask_var(&_cpumask_scratch[cpu]) )
         return NULL;


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 02/16] xen: sched: add .init_pdata hook to the scheduler interface
  2016-03-18 19:03 [PATCH 00/16] Fixes and improvement (including hard affinity!) for Credit2 Dario Faggioli
  2016-03-18 19:04 ` [PATCH 01/16] xen: sched: fix locking when allocating an RTDS pCPU Dario Faggioli
@ 2016-03-18 19:04 ` Dario Faggioli
  2016-03-22  8:08   ` Juergen Gross
  2016-03-23 17:32   ` George Dunlap
  2016-03-18 19:04 ` [PATCH 03/16] xen: sched: make implementing .alloc_pdata optional Dario Faggioli
                   ` (13 subsequent siblings)
  15 siblings, 2 replies; 61+ messages in thread
From: Dario Faggioli @ 2016-03-18 19:04 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Juergen Gross

with the purpose of decoupling the allocation phase and
the initialization one, for per-pCPU data of the schedulers.

This makes it possible to perform the initialization later
in the pCPU bringup/assignement process, when more information
(for instance, the host CPU topology) are available. This,
for now, is important only for Credit2, but it can well be
useful to other schedulers.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Juergen Gross <jgross@suse.com>
---
Changes from v1:
 * in schedule_cpu_switch(), call to init_pdata() moved up,
   close to the call to alloc_pdata() (for consistency with
   other call sites) and prototype slightly changed.
---
During v1 review, it was agreed to add ASSERTS() and comments
to clarify the use of schedule_cpu_switch(). This can't be
found here, but only because it has happened in another patch.
---
 xen/common/schedule.c      |    7 +++++++
 xen/include/xen/sched-if.h |    1 +
 2 files changed, 8 insertions(+)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index e57b659..0627eb5 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1517,10 +1517,15 @@ static int cpu_schedule_callback(
     struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
     unsigned int cpu = (unsigned long)hcpu;
+    struct scheduler *sched = per_cpu(scheduler, cpu);
+    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
     int rc = 0;
 
     switch ( action )
     {
+    case CPU_STARTING:
+        SCHED_OP(sched, init_pdata, sd->sched_priv, cpu);
+        break;
     case CPU_UP_PREPARE:
         rc = cpu_schedule_up(cpu);
         break;
@@ -1597,6 +1602,7 @@ void __init scheduler_init(void)
     if ( ops.alloc_pdata &&
          !(this_cpu(schedule_data).sched_priv = ops.alloc_pdata(&ops, 0)) )
         BUG();
+    SCHED_OP(&ops, init_pdata, this_cpu(schedule_data).sched_priv, 0);
 }
 
 /*
@@ -1640,6 +1646,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     ppriv = SCHED_OP(new_ops, alloc_pdata, cpu);
     if ( ppriv == NULL )
         return -ENOMEM;
+    SCHED_OP(new_ops, init_pdata, ppriv, cpu);
     vpriv = SCHED_OP(new_ops, alloc_vdata, idle, idle->domain->sched_priv);
     if ( vpriv == NULL )
     {
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 825f1ad..70c08c6 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -133,6 +133,7 @@ struct scheduler {
                                     void *);
     void         (*free_pdata)     (const struct scheduler *, void *, int);
     void *       (*alloc_pdata)    (const struct scheduler *, int);
+    void         (*init_pdata)     (const struct scheduler *, void *, int);
     void         (*free_domdata)   (const struct scheduler *, void *);
     void *       (*alloc_domdata)  (const struct scheduler *, struct domain *);
 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 03/16] xen: sched: make implementing .alloc_pdata optional
  2016-03-18 19:03 [PATCH 00/16] Fixes and improvement (including hard affinity!) for Credit2 Dario Faggioli
  2016-03-18 19:04 ` [PATCH 01/16] xen: sched: fix locking when allocating an RTDS pCPU Dario Faggioli
  2016-03-18 19:04 ` [PATCH 02/16] xen: sched: add .init_pdata hook to the scheduler interface Dario Faggioli
@ 2016-03-18 19:04 ` Dario Faggioli
  2016-03-19  2:23   ` Meng Xu
                     ` (3 more replies)
  2016-03-18 19:04 ` [PATCH 04/16] xen: sched: implement .init_pdata in all schedulers Dario Faggioli
                   ` (12 subsequent siblings)
  15 siblings, 4 replies; 61+ messages in thread
From: Dario Faggioli @ 2016-03-18 19:04 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, George Dunlap, Robert VanVossen, Josh Whitehead,
	Meng Xu, Jan Beulich

The .alloc_pdata scheduler hook must, before this change,
be implemented by all schedulers --even those ones that
don't need to allocate anything.

Make it possible to just use the SCHED_OP(), like for
the other hooks, by using ERR_PTR() and IS_ERR() for
error reporting. This:
 - makes NULL a variant of success;
 - allows for errors other than ENOMEM to be properly
   communicated (if ever necessary).

This, in turn, means that schedulers not needing to
allocate any per-pCPU data, can avoid implementing the
hook. In fact, the artificial implementation of
.alloc_pdata in the ARINC653 is removed (and, while there,
nuke .free_pdata too, as it is equally useless).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Robert VanVossen <robert.vanvossen@dornerworks.com>
Cc: Josh Whitehead <josh.whitehead@dornerworks.com>
Cc: Meng Xu <mengxu@cis.upenn.edu>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Juergen Gross <jgross@suse.com>
---
Changes from v1:
 * use IS_ERR() and friends to deal with the return value
   of alloc_pdata, as suggested during review.
---
 xen/common/sched_arinc653.c |   31 -------------------------------
 xen/common/sched_credit.c   |    4 ++--
 xen/common/sched_credit2.c  |    2 +-
 xen/common/sched_rt.c       |    7 +++----
 xen/common/schedule.c       |   18 ++++++++----------
 xen/include/xen/sched-if.h  |    1 +
 6 files changed, 15 insertions(+), 48 deletions(-)

diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index 8a11a2f..b79fcdf 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -456,34 +456,6 @@ a653sched_free_vdata(const struct scheduler *ops, void *priv)
 }
 
 /**
- * This function allocates scheduler-specific data for a physical CPU
- *
- * We do not actually make use of any per-CPU data but the hypervisor expects
- * a non-NULL return value
- *
- * @param ops       Pointer to this instance of the scheduler structure
- *
- * @return          Pointer to the allocated data
- */
-static void *
-a653sched_alloc_pdata(const struct scheduler *ops, int cpu)
-{
-    /* return a non-NULL value to keep schedule.c happy */
-    return SCHED_PRIV(ops);
-}
-
-/**
- * This function frees scheduler-specific data for a physical CPU
- *
- * @param ops       Pointer to this instance of the scheduler structure
- */
-static void
-a653sched_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
-{
-    /* nop */
-}
-
-/**
  * This function allocates scheduler-specific data for a domain
  *
  * We do not actually make use of any per-domain data but the hypervisor
@@ -737,9 +709,6 @@ static const struct scheduler sched_arinc653_def = {
     .free_vdata     = a653sched_free_vdata,
     .alloc_vdata    = a653sched_alloc_vdata,
 
-    .free_pdata     = a653sched_free_pdata,
-    .alloc_pdata    = a653sched_alloc_pdata,
-
     .free_domdata   = a653sched_free_domdata,
     .alloc_domdata  = a653sched_alloc_domdata,
 
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 305889a..288749f 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -532,12 +532,12 @@ csched_alloc_pdata(const struct scheduler *ops, int cpu)
     /* Allocate per-PCPU info */
     spc = xzalloc(struct csched_pcpu);
     if ( spc == NULL )
-        return NULL;
+        return ERR_PTR(-ENOMEM);
 
     if ( !alloc_cpumask_var(&spc->balance_mask) )
     {
         xfree(spc);
-        return NULL;
+        return ERR_PTR(-ENOMEM);
     }
 
     spin_lock_irqsave(&prv->lock, flags);
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 7ddad38..36dc9ee 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -2044,7 +2044,7 @@ csched2_alloc_pdata(const struct scheduler *ops, int cpu)
         printk("%s: cpu %d not online yet, deferring initializatgion\n",
                __func__, cpu);
 
-    return (void *)1;
+    return NULL;
 }
 
 static void
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index d98bfb6..ac8019f 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -665,7 +665,7 @@ rt_alloc_pdata(const struct scheduler *ops, int cpu)
     spin_unlock_irqrestore(old_lock, flags);
 
     if ( !alloc_cpumask_var(&_cpumask_scratch[cpu]) )
-        return NULL;
+        return ERR_PTR(-ENOMEM);
 
     if ( prv->repl_timer == NULL )
     {
@@ -673,13 +673,12 @@ rt_alloc_pdata(const struct scheduler *ops, int cpu)
         prv->repl_timer = xzalloc(struct timer);
 
         if ( prv->repl_timer == NULL )
-            return NULL;
+            return ERR_PTR(-ENOMEM);
 
         init_timer(prv->repl_timer, repl_timer_handler, (void *)ops, cpu);
     }
 
-    /* 1 indicates alloc. succeed in schedule.c */
-    return (void *)1;
+    return NULL;
 }
 
 static void
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 0627eb5..1adc0e2 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1491,9 +1491,9 @@ static int cpu_schedule_up(unsigned int cpu)
     if ( idle_vcpu[cpu] == NULL )
         return -ENOMEM;
 
-    if ( (ops.alloc_pdata != NULL) &&
-         ((sd->sched_priv = ops.alloc_pdata(&ops, cpu)) == NULL) )
-        return -ENOMEM;
+    sd->sched_priv = SCHED_OP(&ops, alloc_pdata, cpu);
+    if ( IS_ERR(sd->sched_priv) )
+        return PTR_ERR(sd->sched_priv);
 
     return 0;
 }
@@ -1503,8 +1503,7 @@ static void cpu_schedule_down(unsigned int cpu)
     struct schedule_data *sd = &per_cpu(schedule_data, cpu);
     struct scheduler *sched = per_cpu(scheduler, cpu);
 
-    if ( sd->sched_priv != NULL )
-        SCHED_OP(sched, free_pdata, sd->sched_priv, cpu);
+    SCHED_OP(sched, free_pdata, sd->sched_priv, cpu);
     SCHED_OP(sched, free_vdata, idle_vcpu[cpu]->sched_priv);
 
     idle_vcpu[cpu]->sched_priv = NULL;
@@ -1599,9 +1598,8 @@ void __init scheduler_init(void)
     idle_domain->max_vcpus = nr_cpu_ids;
     if ( alloc_vcpu(idle_domain, 0, 0) == NULL )
         BUG();
-    if ( ops.alloc_pdata &&
-         !(this_cpu(schedule_data).sched_priv = ops.alloc_pdata(&ops, 0)) )
-        BUG();
+    this_cpu(schedule_data).sched_priv = SCHED_OP(&ops, alloc_pdata, 0);
+    BUG_ON(IS_ERR(this_cpu(schedule_data).sched_priv));
     SCHED_OP(&ops, init_pdata, this_cpu(schedule_data).sched_priv, 0);
 }
 
@@ -1644,8 +1642,8 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
 
     idle = idle_vcpu[cpu];
     ppriv = SCHED_OP(new_ops, alloc_pdata, cpu);
-    if ( ppriv == NULL )
-        return -ENOMEM;
+    if ( IS_ERR(ppriv) )
+        return PTR_ERR(ppriv);
     SCHED_OP(new_ops, init_pdata, ppriv, cpu);
     vpriv = SCHED_OP(new_ops, alloc_vdata, idle, idle->domain->sched_priv);
     if ( vpriv == NULL )
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 70c08c6..560cba5 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -9,6 +9,7 @@
 #define __XEN_SCHED_IF_H__
 
 #include <xen/percpu.h>
+#include <xen/err.h>
 
 /* A global pointer to the initial cpupool (POOL0). */
 extern struct cpupool *cpupool0;


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 04/16] xen: sched: implement .init_pdata in all schedulers
  2016-03-18 19:03 [PATCH 00/16] Fixes and improvement (including hard affinity!) for Credit2 Dario Faggioli
                   ` (2 preceding siblings ...)
  2016-03-18 19:04 ` [PATCH 03/16] xen: sched: make implementing .alloc_pdata optional Dario Faggioli
@ 2016-03-18 19:04 ` Dario Faggioli
  2016-03-19  2:24   ` Meng Xu
  2016-03-22  8:03   ` Juergen Gross
  2016-03-18 19:04 ` [PATCH 05/16] xen: sched: move pCPU initialization in an helper Dario Faggioli
                   ` (11 subsequent siblings)
  15 siblings, 2 replies; 61+ messages in thread
From: Dario Faggioli @ 2016-03-18 19:04 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Juergen Gross, Meng Xu

by borrowing some of the code of .alloc_pdata, i.e.,
the bits that perform initializations, leaving only
actual allocations in there, when any, which is the
case for Credit1 and RTDS.

On the other hand, in Credit2, since we don't really
need any per-pCPU data allocation, everything that was
being done in .alloc_pdata, is now done in .init_pdata.
And the fact that now .alloc_pdata can be left undefined,
allows us to just get rid of it.

Still for Credit2, the fact that .init_pdata is called
during CPU_STARTING (rather than CPU_UP_PREPARE) kills
the need for the scheduler to setup a similar callback
itself, simplifying the code.

And thanks to such simplification, it is now also ok to
move some of the logic meant at double checking that a
cpu was (or was not) initialized, into ASSERTS (rather
than an if() and a BUG_ON).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Meng Xu <mengxu@cis.upenn.edu>
Cc: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_credit.c  |   20 +++++++++---
 xen/common/sched_credit2.c |   72 +++-----------------------------------------
 xen/common/sched_rt.c      |    9 ++++--
 3 files changed, 26 insertions(+), 75 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 288749f..d4a0f5e 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -526,8 +526,6 @@ static void *
 csched_alloc_pdata(const struct scheduler *ops, int cpu)
 {
     struct csched_pcpu *spc;
-    struct csched_private *prv = CSCHED_PRIV(ops);
-    unsigned long flags;
 
     /* Allocate per-PCPU info */
     spc = xzalloc(struct csched_pcpu);
@@ -540,6 +538,19 @@ csched_alloc_pdata(const struct scheduler *ops, int cpu)
         return ERR_PTR(-ENOMEM);
     }
 
+    return spc;
+}
+
+static void
+csched_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
+{
+    struct csched_private *prv = CSCHED_PRIV(ops);
+    struct csched_pcpu * const spc = pdata;
+    unsigned long flags;
+
+    /* cpu data needs to be allocated, but STILL uninitialized */
+    ASSERT(spc && spc->runq.next == spc->runq.prev && spc->runq.next == NULL);
+
     spin_lock_irqsave(&prv->lock, flags);
 
     /* Initialize/update system-wide config */
@@ -560,16 +571,12 @@ csched_alloc_pdata(const struct scheduler *ops, int cpu)
     INIT_LIST_HEAD(&spc->runq);
     spc->runq_sort_last = prv->runq_sort;
     spc->idle_bias = nr_cpu_ids - 1;
-    if ( per_cpu(schedule_data, cpu).sched_priv == NULL )
-        per_cpu(schedule_data, cpu).sched_priv = spc;
 
     /* Start off idling... */
     BUG_ON(!is_idle_vcpu(curr_on_cpu(cpu)));
     cpumask_set_cpu(cpu, prv->idlers);
 
     spin_unlock_irqrestore(&prv->lock, flags);
-
-    return spc;
 }
 
 #ifndef NDEBUG
@@ -2051,6 +2058,7 @@ static const struct scheduler sched_credit_def = {
     .alloc_vdata    = csched_alloc_vdata,
     .free_vdata     = csched_free_vdata,
     .alloc_pdata    = csched_alloc_pdata,
+    .init_pdata     = csched_init_pdata,
     .free_pdata     = csched_free_pdata,
     .alloc_domdata  = csched_alloc_domdata,
     .free_domdata   = csched_free_domdata,
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 36dc9ee..b1642a8 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1968,7 +1968,8 @@ static void deactivate_runqueue(struct csched2_private *prv, int rqi)
     cpumask_clear_cpu(rqi, &prv->active_queues);
 }
 
-static void init_pcpu(const struct scheduler *ops, int cpu)
+static void
+csched2_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
 {
     unsigned rqi;
     unsigned long flags;
@@ -1978,12 +1979,7 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
 
     spin_lock_irqsave(&prv->lock, flags);
 
-    if ( cpumask_test_cpu(cpu, &prv->initialized) )
-    {
-        printk("%s: Strange, cpu %d already initialized!\n", __func__, cpu);
-        spin_unlock_irqrestore(&prv->lock, flags);
-        return;
-    }
+    ASSERT(!cpumask_test_cpu(cpu, &prv->initialized));
 
     /* Figure out which runqueue to put it in */
     rqi = 0;
@@ -2033,20 +2029,6 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
     return;
 }
 
-static void *
-csched2_alloc_pdata(const struct scheduler *ops, int cpu)
-{
-    /* Check to see if the cpu is online yet */
-    /* Note: cpu 0 doesn't get a STARTING callback */
-    if ( cpu == 0 || cpu_to_socket(cpu) != XEN_INVALID_SOCKET_ID )
-        init_pcpu(ops, cpu);
-    else
-        printk("%s: cpu %d not online yet, deferring initializatgion\n",
-               __func__, cpu);
-
-    return NULL;
-}
-
 static void
 csched2_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
 {
@@ -2058,7 +2040,7 @@ csched2_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
 
     spin_lock_irqsave(&prv->lock, flags);
 
-    BUG_ON(!cpumask_test_cpu(cpu, &prv->initialized));
+    ASSERT(cpumask_test_cpu(cpu, &prv->initialized));
     
     /* Find the old runqueue and remove this cpu from it */
     rqi = prv->runq_map[cpu];
@@ -2096,49 +2078,6 @@ csched2_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
 }
 
 static int
-csched2_cpu_starting(int cpu)
-{
-    struct scheduler *ops;
-
-    /* Hope this is safe from cpupools switching things around. :-) */
-    ops = per_cpu(scheduler, cpu);
-
-    if ( ops->alloc_pdata == csched2_alloc_pdata )
-        init_pcpu(ops, cpu);
-
-    return NOTIFY_DONE;
-}
-
-static int cpu_credit2_callback(
-    struct notifier_block *nfb, unsigned long action, void *hcpu)
-{
-    unsigned int cpu = (unsigned long)hcpu;
-    int rc = 0;
-
-    switch ( action )
-    {
-    case CPU_STARTING:
-        csched2_cpu_starting(cpu);
-        break;
-    default:
-        break;
-    }
-
-    return !rc ? NOTIFY_DONE : notifier_from_errno(rc);
-}
-
-static struct notifier_block cpu_credit2_nfb = {
-    .notifier_call = cpu_credit2_callback
-};
-
-static int
-csched2_global_init(void)
-{
-    register_cpu_notifier(&cpu_credit2_nfb);
-    return 0;
-}
-
-static int
 csched2_init(struct scheduler *ops)
 {
     int i;
@@ -2216,12 +2155,11 @@ static const struct scheduler sched_credit2_def = {
 
     .dump_cpu_state = csched2_dump_pcpu,
     .dump_settings  = csched2_dump,
-    .global_init    = csched2_global_init,
     .init           = csched2_init,
     .deinit         = csched2_deinit,
     .alloc_vdata    = csched2_alloc_vdata,
     .free_vdata     = csched2_free_vdata,
-    .alloc_pdata    = csched2_alloc_pdata,
+    .init_pdata     = csched2_init_pdata,
     .free_pdata     = csched2_free_pdata,
     .alloc_domdata  = csched2_alloc_domdata,
     .free_domdata   = csched2_free_domdata,
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index ac8019f..b6ac3ad 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -649,8 +649,8 @@ rt_deinit(struct scheduler *ops)
  * Point per_cpu spinlock to the global system lock;
  * All cpu have same global system lock
  */
-static void *
-rt_alloc_pdata(const struct scheduler *ops, int cpu)
+static void
+rt_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
 {
     struct rt_private *prv = rt_priv(ops);
     spinlock_t *old_lock;
@@ -663,7 +663,11 @@ rt_alloc_pdata(const struct scheduler *ops, int cpu)
 
     /* _Not_ pcpu_schedule_unlock(): per_cpu().schedule_lock changed! */
     spin_unlock_irqrestore(old_lock, flags);
+}
 
+static void *
+rt_alloc_pdata(const struct scheduler *ops, int cpu)
+{
     if ( !alloc_cpumask_var(&_cpumask_scratch[cpu]) )
         return ERR_PTR(-ENOMEM);
 
@@ -1395,6 +1399,7 @@ static const struct scheduler sched_rtds_def = {
     .deinit         = rt_deinit,
     .alloc_pdata    = rt_alloc_pdata,
     .free_pdata     = rt_free_pdata,
+    .init_pdata     = rt_init_pdata,
     .alloc_domdata  = rt_alloc_domdata,
     .free_domdata   = rt_free_domdata,
     .init_domain    = rt_dom_init,


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 05/16] xen: sched: move pCPU initialization in an helper
  2016-03-18 19:03 [PATCH 00/16] Fixes and improvement (including hard affinity!) for Credit2 Dario Faggioli
                   ` (3 preceding siblings ...)
  2016-03-18 19:04 ` [PATCH 04/16] xen: sched: implement .init_pdata in all schedulers Dario Faggioli
@ 2016-03-18 19:04 ` Dario Faggioli
  2016-03-23 17:51   ` George Dunlap
  2016-03-18 19:04 ` [PATCH 06/16] xen: sched: prepare a .switch_sched hook for Credit1 Dario Faggioli
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 61+ messages in thread
From: Dario Faggioli @ 2016-03-18 19:04 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap

That will turn out useful in following patches, where such
code will need to be called more than just once. Create an
helper now, and move the code there, to avoid mixing code
motion and functional changes later.

In Credit2, some style cleanup is also done.

No functional change intended.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
---
 xen/common/sched_credit.c  |   22 +++++++++++++---------
 xen/common/sched_credit2.c |   26 ++++++++++++++++----------
 2 files changed, 29 insertions(+), 19 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index d4a0f5e..4488d7c 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -542,16 +542,11 @@ csched_alloc_pdata(const struct scheduler *ops, int cpu)
 }
 
 static void
-csched_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
+init_pdata(struct csched_private *prv, struct csched_pcpu *spc, int cpu)
 {
-    struct csched_private *prv = CSCHED_PRIV(ops);
-    struct csched_pcpu * const spc = pdata;
-    unsigned long flags;
-
-    /* cpu data needs to be allocated, but STILL uninitialized */
-    ASSERT(spc && spc->runq.next == spc->runq.prev && spc->runq.next == NULL);
-
-    spin_lock_irqsave(&prv->lock, flags);
+    ASSERT(spin_is_locked(&prv->lock));
+    /* cpu data needs to be allocated, but STILL uninitialized. */
+    ASSERT(spc && spc->runq.next == NULL && spc->runq.prev == NULL);
 
     /* Initialize/update system-wide config */
     prv->credit += prv->credits_per_tslice;
@@ -575,7 +570,16 @@ csched_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
     /* Start off idling... */
     BUG_ON(!is_idle_vcpu(curr_on_cpu(cpu)));
     cpumask_set_cpu(cpu, prv->idlers);
+}
 
+static void
+csched_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
+{
+    unsigned long flags;
+    struct csched_private *prv = CSCHED_PRIV(ops);
+
+    spin_lock_irqsave(&prv->lock, flags);
+    init_pdata(prv, pdata, cpu);
     spin_unlock_irqrestore(&prv->lock, flags);
 }
 
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index b1642a8..919ca13 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1969,16 +1969,13 @@ static void deactivate_runqueue(struct csched2_private *prv, int rqi)
 }
 
 static void
-csched2_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
+init_pdata(struct csched2_private *prv, unsigned int cpu)
 {
     unsigned rqi;
-    unsigned long flags;
-    struct csched2_private *prv = CSCHED2_PRIV(ops);
     struct csched2_runqueue_data *rqd;
     spinlock_t *old_lock;
 
-    spin_lock_irqsave(&prv->lock, flags);
-
+    ASSERT(spin_is_locked(&prv->lock));
     ASSERT(!cpumask_test_cpu(cpu, &prv->initialized));
 
     /* Figure out which runqueue to put it in */
@@ -1998,7 +1995,7 @@ csched2_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
         BUG();
     }
 
-    rqd=prv->rqd + rqi;
+    rqd = prv->rqd + rqi;
 
     printk("Adding cpu %d to runqueue %d\n", cpu, rqi);
     if ( ! cpumask_test_cpu(rqi, &prv->active_queues) )
@@ -2008,13 +2005,13 @@ csched2_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
     }
     
     /* IRQs already disabled */
-    old_lock=pcpu_schedule_lock(cpu);
+    old_lock = pcpu_schedule_lock(cpu);
 
     /* Move spinlock to new runq lock.  */
     per_cpu(schedule_data, cpu).schedule_lock = &rqd->lock;
 
     /* Set the runqueue map */
-    prv->runq_map[cpu]=rqi;
+    prv->runq_map[cpu] = rqi;
     
     cpumask_set_cpu(cpu, &rqd->idle);
     cpumask_set_cpu(cpu, &rqd->active);
@@ -2024,12 +2021,21 @@ csched2_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
 
     cpumask_set_cpu(cpu, &prv->initialized);
 
-    spin_unlock_irqrestore(&prv->lock, flags);
-
     return;
 }
 
 static void
+csched2_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
+{
+    struct csched2_private *prv = CSCHED2_PRIV(ops);
+    unsigned long flags;
+
+    spin_lock_irqsave(&prv->lock, flags);
+    init_pdata(prv, cpu);
+    spin_unlock_irqrestore(&prv->lock, flags);
+}
+
+static void
 csched2_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
 {
     unsigned long flags;


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 06/16] xen: sched: prepare a .switch_sched hook for Credit1
  2016-03-18 19:03 [PATCH 00/16] Fixes and improvement (including hard affinity!) for Credit2 Dario Faggioli
                   ` (4 preceding siblings ...)
  2016-03-18 19:04 ` [PATCH 05/16] xen: sched: move pCPU initialization in an helper Dario Faggioli
@ 2016-03-18 19:04 ` Dario Faggioli
  2016-03-18 19:04 ` [PATCH 07/16] xen: sched: prepare a .switch_sched hook for Credit2 Dario Faggioli
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 61+ messages in thread
From: Dario Faggioli @ 2016-03-18 19:04 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap

In fact, right now, if we switch cpu X from, say,
Credit2 to Credit, we do:

 schedule_cpu_switch(x, csched2 --> csched):
   //scheduler[x] is csched2
   //schedule_lock[x] is csched2_lock
   csched_alloc_pdata(x)
   csched_init_pdata(x)
   pcpu_schedule_lock(x) ----> takes csched2_lock
   scheduler[X] = csched
   pcpu_schedule_unlock(x) --> unlocks csched2_lock
   [*]
   csched2_free_pdata(x)
     pcpu_schedule_lock(x) --> takes csched2_lock
     schedule_lock[x] = csched_lock
     spin_unlock(csched2_lock)

So, if anything scheduling related and involving CPU X
happens, at time [*], we will:
 - take csched2_lock,
 - operate on Credit1 functions and data structures,
which is no good!

The problem arises because there is a window where the
scheduler is already the new one, but the lock is still the
old one. And, in this specific transition this is due to the
fact that it is only csched2_free_pdata() that (re)sets the
lock in the way that Credit1 needs it, instead of Credit1
itself doing it on its own.

This patch, therefore, introduce a new hook in the scheduler
interface, called switch_sched, meant at being used when
switching scheduler on a CPU, and implements it for Credit1.

This allows to do the various operations in the correct order
and under the protection of the best suited (set of) lock(s).

It is necessary to add the hook (as compare to keep doing
things in generic code), because different schedulers may
have different locking scheme. See, for instance, the different
lock nesting rules in Credit1 and Credit2.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
---
 xen/common/sched_credit.c  |   38 ++++++++++++++++++++++++++++++++++++++
 xen/include/xen/sched-if.h |    3 +++
 2 files changed, 41 insertions(+)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 4488d7c..929ba9c 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -583,6 +583,43 @@ csched_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
     spin_unlock_irqrestore(&prv->lock, flags);
 }
 
+/* Change the scheduler of cpu to us (Credit). */
+static void
+csched_switch_sched(struct scheduler *ops, unsigned int cpu,
+                    void *pdata, void *vdata)
+{
+    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    struct csched_private *prv = CSCHED_PRIV(ops);
+    struct csched_vcpu *svc = vdata;
+    spinlock_t * old_lock;
+
+    ASSERT(svc && is_idle_vcpu(svc->vcpu));
+
+    /*
+     * We may be acquiring the lock of another scheduler here (the one cpu
+     * still belongs to when calling this function). That is ok as, anyone
+     * trying to schedule on this cpu will block until when we release that
+     * lock (bottom of this function). When unblocked --because of the loop
+     * implemented by schedule_lock() functions-- he will notice the lock
+     * changed, and acquire ours before being able to proceed.
+     */
+    old_lock = pcpu_schedule_lock_irq(cpu);
+
+    idle_vcpu[cpu]->sched_priv = vdata;
+
+    spin_lock(&prv->lock);
+    init_pdata(prv, pdata, cpu);
+    spin_unlock(&prv->lock);
+
+    per_cpu(scheduler, cpu) = ops;
+    per_cpu(schedule_data, cpu).sched_priv = pdata;
+    /* (Re?)route the lock to the per pCPU lock. */
+    sd->schedule_lock = &sd->_lock;
+
+    /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
+    spin_unlock_irq(old_lock);
+}
+
 #ifndef NDEBUG
 static inline void
 __csched_vcpu_check(struct vcpu *vc)
@@ -2064,6 +2101,7 @@ static const struct scheduler sched_credit_def = {
     .alloc_pdata    = csched_alloc_pdata,
     .init_pdata     = csched_init_pdata,
     .free_pdata     = csched_free_pdata,
+    .switch_sched   = csched_switch_sched,
     .alloc_domdata  = csched_alloc_domdata,
     .free_domdata   = csched_free_domdata,
 
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 560cba5..0aa00de 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -138,6 +138,9 @@ struct scheduler {
     void         (*free_domdata)   (const struct scheduler *, void *);
     void *       (*alloc_domdata)  (const struct scheduler *, struct domain *);
 
+    void         (*switch_sched)   (struct scheduler *, unsigned int,
+                                    void *, void *);
+
     int          (*init_domain)    (const struct scheduler *, struct domain *);
     void         (*destroy_domain) (const struct scheduler *, struct domain *);
 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 07/16] xen: sched: prepare a .switch_sched hook for Credit2
  2016-03-18 19:03 [PATCH 00/16] Fixes and improvement (including hard affinity!) for Credit2 Dario Faggioli
                   ` (5 preceding siblings ...)
  2016-03-18 19:04 ` [PATCH 06/16] xen: sched: prepare a .switch_sched hook for Credit1 Dario Faggioli
@ 2016-03-18 19:04 ` Dario Faggioli
  2016-03-18 19:04 ` [PATCH 08/16] " Dario Faggioli
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 61+ messages in thread
From: Dario Faggioli @ 2016-03-18 19:04 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap

In fact, right now, if we switch cpu X from, say,
Credit to Credit2, we do:

 schedule_cpu_switch(X, csched --> csched2):
   //scheduler[x] is csched
   //schedule_lock[x] is csched_lock
   csched2_alloc_pdata(x)
   csched2_init_pdata(x)
     pcpu_schedule_lock(x) --> takes csched_lock
     schedule_lock[x] = csched2_lock
     spin_unlock(csched_lock)
   [1]
   pcpu_schedule_lock(x) ----> takes csched2_lock
   scheduler[X] = csched2
   pcpu_schedule_unlock(x) --> unlocks csched2_lock
   csched_free_pdata(x)

So, if anything scheduling related and involving CPU X
happens, at time [1], we will:
 - take csched2_lock,
 - operate on Credit1 functions and data structures,
which is no good!

Furthermore, if we switch cpu X from RTDS to Credit2,
we do:

 schedule_cpu_switch(X, RTDS --> csched2):
   //scheduler[x] is rtds
   //schedule_lock[x] is rtds_lock
   csched2_alloc_pdata(x)
   csched2_init_pdata(x)
     pcpu_schedule_lock(x) --> takes rtds_lock
     schedule_lock[x] = csched2_lock
     spin_unlock(rtds_lock)
   pcpu_schedule_lock(x) ----> takes csched2_lock
   scheduler[x] = csched2
   pcpu_schedule_unlock(x) --> unlocks csched2_lock
   rtds_free_pdata(x)
     spin_lock(rtds_lock)
     ASSERT(schedule_lock[x] == rtds_lock) [2]
     schedule_lock[x] = DEFAULT_SCHEDULE_LOCK [3]
     spin_unlock(rtds_lock)

Which means:
 1) the ASSERT at [2] triggers!
 2) At [3], we screw up the lock remapping we've done for
    ourself in csched2_init_pdata()!

The former problem arises because there is a window during
which the lock is already the new one, but the scheduler is
still the old one. The latter problem, becase we let other's
scheduler mess with lock (re)mapping during their freeing
path, instead of doing it ourself.

This patch, therefore, introduces the new switch_sched hook,
for Credit2, as done already (in "xen: sched: prepare
.switch_sched for Credit1") for Credit1.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
---
 xen/common/sched_credit2.c |   43 +++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 41 insertions(+), 2 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 919ca13..25d8e85 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1968,7 +1968,8 @@ static void deactivate_runqueue(struct csched2_private *prv, int rqi)
     cpumask_clear_cpu(rqi, &prv->active_queues);
 }
 
-static void
+/* Returns the ID of the runqueue the cpu is assigned to. */
+static unsigned
 init_pdata(struct csched2_private *prv, unsigned int cpu)
 {
     unsigned rqi;
@@ -2021,7 +2022,7 @@ init_pdata(struct csched2_private *prv, unsigned int cpu)
 
     cpumask_set_cpu(cpu, &prv->initialized);
 
-    return;
+    return rqi;
 }
 
 static void
@@ -2035,6 +2036,43 @@ csched2_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
     spin_unlock_irqrestore(&prv->lock, flags);
 }
 
+/* Change the scheduler of cpu to us (Credit2). */
+static void
+csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu,
+                     void *pdata, void *vdata)
+{
+    struct csched2_private *prv = CSCHED2_PRIV(new_ops);
+    struct csched2_vcpu *svc = vdata;
+    spinlock_t *old_lock;
+    unsigned rqi;
+
+    ASSERT(!pdata && svc && is_idle_vcpu(svc->vcpu));
+
+    spin_lock_irq(&prv->lock);
+    /*
+     * We may be acquiring the lock of another scheduler here (the one cpu
+     * still belongs to when calling this function). That is ok as, anyone
+     * trying to schedule on this cpu will block until when we release that
+     * lock (bottom of this function). When unblocked --because of the loop
+     * implemented by schedule_lock() functions-- he will notice the lock
+     * changed, and acquire ours before being able to proceed.
+     */
+    old_lock = pcpu_schedule_lock(cpu);
+
+    idle_vcpu[cpu]->sched_priv = vdata;
+
+    rqi = init_pdata(prv, cpu);
+
+    per_cpu(scheduler, cpu) = new_ops;
+    per_cpu(schedule_data, cpu).sched_priv = NULL; /* no pdata */
+    /* (Re?)route the lock to the per pCPU lock. */
+    per_cpu(schedule_data, cpu).schedule_lock = &prv->rqd[rqi].lock;
+
+    /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
+    spin_unlock(old_lock);
+    spin_unlock_irq(&prv->lock);
+}
+
 static void
 csched2_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
 {
@@ -2167,6 +2205,7 @@ static const struct scheduler sched_credit2_def = {
     .free_vdata     = csched2_free_vdata,
     .init_pdata     = csched2_init_pdata,
     .free_pdata     = csched2_free_pdata,
+    .switch_sched   = csched2_switch_sched,
     .alloc_domdata  = csched2_alloc_domdata,
     .free_domdata   = csched2_free_domdata,
 };


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 08/16] xen: sched: prepare a .switch_sched hook for Credit2
  2016-03-18 19:03 [PATCH 00/16] Fixes and improvement (including hard affinity!) for Credit2 Dario Faggioli
                   ` (6 preceding siblings ...)
  2016-03-18 19:04 ` [PATCH 07/16] xen: sched: prepare a .switch_sched hook for Credit2 Dario Faggioli
@ 2016-03-18 19:04 ` Dario Faggioli
  2016-03-19  2:24   ` Meng Xu
  2016-03-21 14:25   ` Jan Beulich
  2016-03-18 19:05 ` [PATCH 09/16] xen: sched: close potential races when switching scheduler to CPUs Dario Faggioli
                   ` (7 subsequent siblings)
  15 siblings, 2 replies; 61+ messages in thread
From: Dario Faggioli @ 2016-03-18 19:04 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Tianyang Chen, Meng Xu

RTDS is basically identical to Credit2, as far as scheduler
lock (re)mapping is concerned. Therefore, the same analisys
and considerations expressed for the previous patch ("xen:
sched: prepare a .switch_sched hook for Credit2"), applies
to it to.

This patch, therefore, introduces the switch_sched hook
for RTDS, as done already for Credit2 and Credit1.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Meng Xu <mengxu@cis.upenn.edu>
Cc: Tianyang Chen <tiche@seas.upenn.edu>
---
 xen/common/sched_rt.c |   31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index b6ac3ad..92be248 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -665,6 +665,36 @@ rt_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
     spin_unlock_irqrestore(old_lock, flags);
 }
 
+/* Change the scheduler of cpu to us (RTDS). */
+static void
+rt_switch_sched(struct scheduler *new_ops, unsigned int cpu,
+                void *pdata, void *vdata)
+{
+    struct rt_vcpu *svc = vdata;
+    spinlock_t *old_lock;
+
+    ASSERT(!pdata && svc && is_idle_vcpu(svc->vcpu));
+
+    /*
+     * We may be acquiring the lock of another scheduler here (the one cpu
+     * still belongs to when calling this function). That is ok as, anyone
+     * trying to schedule on this cpu will block until when we release that
+     * lock (bottom of this function). When unblocked --because of the loop
+     * implemented by schedule_lock() functions-- he will notice the lock
+     * changed, and acquire ours before being able to proceed.
+     */
+    old_lock = pcpu_schedule_lock_irq(cpu);
+
+    idle_vcpu[cpu]->sched_priv = vdata;
+    per_cpu(scheduler, cpu) = new_ops;
+    per_cpu(schedule_data, cpu).sched_priv = NULL; /* no pdata */
+    /* (Re?)route the lock to our global lock. */
+    per_cpu(schedule_data, cpu).schedule_lock = &rt_priv(new_ops)->lock;
+
+    /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
+    spin_unlock_irq(old_lock);
+}
+
 static void *
 rt_alloc_pdata(const struct scheduler *ops, int cpu)
 {
@@ -1400,6 +1430,7 @@ static const struct scheduler sched_rtds_def = {
     .alloc_pdata    = rt_alloc_pdata,
     .free_pdata     = rt_free_pdata,
     .init_pdata     = rt_init_pdata,
+    .switch_sched   = rt_switch_sched,
     .alloc_domdata  = rt_alloc_domdata,
     .free_domdata   = rt_free_domdata,
     .init_domain    = rt_dom_init,


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 09/16] xen: sched: close potential races when switching scheduler to CPUs
  2016-03-18 19:03 [PATCH 00/16] Fixes and improvement (including hard affinity!) for Credit2 Dario Faggioli
                   ` (7 preceding siblings ...)
  2016-03-18 19:04 ` [PATCH 08/16] " Dario Faggioli
@ 2016-03-18 19:05 ` Dario Faggioli
  2016-03-19  2:25   ` Meng Xu
                     ` (2 more replies)
  2016-03-18 19:05 ` [PATCH 10/16] xen: sched: improve credit2 bootparams' scope, placement and signedness Dario Faggioli
                   ` (6 subsequent siblings)
  15 siblings, 3 replies; 61+ messages in thread
From: Dario Faggioli @ 2016-03-18 19:05 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Tianyang Chen, Meng Xu

by using the sched_switch hook that we have introduced in
the various schedulers.

The key is to let the actual switch of scheduler and the
remapping of the scheduler lock for the CPU (if necessary)
happen together (in the same critical section) protected
(at least) by the old scheduler lock for the CPU.

This also means that, in Credit2 and RTDS, we can get rid
of the code that was doing the scheduler lock remapping
in csched2_free_pdata() and rt_free_pdata(), and of their
triggering ASSERT-s.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Meng Xu <mengxu@cis.upenn.edu>
Cc: Tianyang Chen <tiche@seas.upenn.edu>
---
 xen/common/sched_credit.c  |    9 +++++++++
 xen/common/sched_credit2.c |   28 ++++++++++------------------
 xen/common/sched_rt.c      |   13 -------------
 xen/common/schedule.c      |   30 +++++++++++++++++++++---------
 4 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 929ba9c..903a704 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -577,6 +577,15 @@ csched_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
 {
     unsigned long flags;
     struct csched_private *prv = CSCHED_PRIV(ops);
+    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+
+    /*
+     * This is called either during during boot, resume or hotplug, in
+     * case Credit1 is the scheduler chosen at boot. In such cases, the
+     * scheduler lock for cpu is already pointing to the default per-cpu
+     * spinlock, as Credit1 needs it, so there is no remapping to be done.
+     */
+    ASSERT(sd->schedule_lock == &sd->_lock && !spin_is_locked(&sd->_lock));
 
     spin_lock_irqsave(&prv->lock, flags);
     init_pdata(prv, pdata, cpu);
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 25d8e85..64fb028 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1974,7 +1974,6 @@ init_pdata(struct csched2_private *prv, unsigned int cpu)
 {
     unsigned rqi;
     struct csched2_runqueue_data *rqd;
-    spinlock_t *old_lock;
 
     ASSERT(spin_is_locked(&prv->lock));
     ASSERT(!cpumask_test_cpu(cpu, &prv->initialized));
@@ -2005,21 +2004,11 @@ init_pdata(struct csched2_private *prv, unsigned int cpu)
         activate_runqueue(prv, rqi);
     }
     
-    /* IRQs already disabled */
-    old_lock = pcpu_schedule_lock(cpu);
-
-    /* Move spinlock to new runq lock.  */
-    per_cpu(schedule_data, cpu).schedule_lock = &rqd->lock;
-
     /* Set the runqueue map */
     prv->runq_map[cpu] = rqi;
     
     cpumask_set_cpu(cpu, &rqd->idle);
     cpumask_set_cpu(cpu, &rqd->active);
-
-    /* _Not_ pcpu_schedule_unlock(): per_cpu().schedule_lock changed! */
-    spin_unlock(old_lock);
-
     cpumask_set_cpu(cpu, &prv->initialized);
 
     return rqi;
@@ -2029,10 +2018,19 @@ static void
 csched2_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
 {
     struct csched2_private *prv = CSCHED2_PRIV(ops);
+    spinlock_t *old_lock;
     unsigned long flags;
+    unsigned rqi;
 
     spin_lock_irqsave(&prv->lock, flags);
-    init_pdata(prv, cpu);
+    old_lock = pcpu_schedule_lock(cpu);
+
+    rqi = init_pdata(prv, cpu);
+    /* Move the scheduler lock to the new runq lock. */
+    per_cpu(schedule_data, cpu).schedule_lock = &prv->rqd[rqi].lock;
+
+    /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
+    spin_unlock(old_lock);
     spin_unlock_irqrestore(&prv->lock, flags);
 }
 
@@ -2079,7 +2077,6 @@ csched2_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
     unsigned long flags;
     struct csched2_private *prv = CSCHED2_PRIV(ops);
     struct csched2_runqueue_data *rqd;
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
     int rqi;
 
     spin_lock_irqsave(&prv->lock, flags);
@@ -2107,11 +2104,6 @@ csched2_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
         deactivate_runqueue(prv, rqi);
     }
 
-    /* Move spinlock to the original lock.  */
-    ASSERT(sd->schedule_lock == &rqd->lock);
-    ASSERT(!spin_is_locked(&sd->_lock));
-    sd->schedule_lock = &sd->_lock;
-
     spin_unlock(&rqd->lock);
 
     cpumask_clear_cpu(cpu, &prv->initialized);
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 92be248..0564b1d 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -718,19 +718,6 @@ rt_alloc_pdata(const struct scheduler *ops, int cpu)
 static void
 rt_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
 {
-    struct rt_private *prv = rt_priv(ops);
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
-    unsigned long flags;
-
-    spin_lock_irqsave(&prv->lock, flags);
-
-    /* Move spinlock back to the default lock */
-    ASSERT(sd->schedule_lock == &prv->lock);
-    ASSERT(!spin_is_locked(&sd->_lock));
-    sd->schedule_lock = &sd->_lock;
-
-    spin_unlock_irqrestore(&prv->lock, flags);
-
     free_cpumask_var(_cpumask_scratch[cpu]);
 }
 
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 1adc0e2..29582a6 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1617,7 +1617,6 @@ void __init scheduler_init(void)
 int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
 {
     struct vcpu *idle;
-    spinlock_t *lock;
     void *ppriv, *ppriv_old, *vpriv, *vpriv_old;
     struct scheduler *old_ops = per_cpu(scheduler, cpu);
     struct scheduler *new_ops = (c == NULL) ? &ops : c->sched;
@@ -1640,11 +1639,21 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     if ( old_ops == new_ops )
         goto out;
 
+    /*
+     * To setup the cpu for the new scheduler we need:
+     *  - a valid instance of per-CPU scheduler specific data, as it is
+     *    allocated by SCHED_OP(alloc_pdata). Note that we do not want to
+     *    initialize it yet (i.e., we are not calling SCHED_OP(init_pdata)).
+     *    That will be done by the target scheduler, in SCHED_OP(switch_sched),
+     *    in proper ordering and with locking.
+     *  - a valid instance of per-vCPU scheduler specific data, for the idle
+     *    vCPU of cpu. That is what the target scheduler will use for the
+     *    sched_priv field of the per-vCPU info of the idle domain.
+     */
     idle = idle_vcpu[cpu];
     ppriv = SCHED_OP(new_ops, alloc_pdata, cpu);
     if ( IS_ERR(ppriv) )
         return PTR_ERR(ppriv);
-    SCHED_OP(new_ops, init_pdata, ppriv, cpu);
     vpriv = SCHED_OP(new_ops, alloc_vdata, idle, idle->domain->sched_priv);
     if ( vpriv == NULL )
     {
@@ -1652,17 +1661,20 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
         return -ENOMEM;
     }
 
-    lock = pcpu_schedule_lock_irq(cpu);
-
     SCHED_OP(old_ops, tick_suspend, cpu);
+
+    /*
+     * The actual switch, including (if necessary) the rerouting of the
+     * scheduler lock to whatever new_ops prefers,  needs to happen in one
+     * critical section, protected by old_ops' lock, or races are possible.
+     * Since each scheduler has its own contraints and locking scheme, do
+     * that inside specific scheduler code, rather than here.
+     */
     vpriv_old = idle->sched_priv;
-    idle->sched_priv = vpriv;
-    per_cpu(scheduler, cpu) = new_ops;
     ppriv_old = per_cpu(schedule_data, cpu).sched_priv;
-    per_cpu(schedule_data, cpu).sched_priv = ppriv;
-    SCHED_OP(new_ops, tick_resume, cpu);
+    SCHED_OP(new_ops, switch_sched, cpu, ppriv, vpriv);
 
-    pcpu_schedule_unlock_irq(lock, cpu);
+    SCHED_OP(new_ops, tick_resume, cpu);
 
     SCHED_OP(old_ops, free_vdata, vpriv_old);
     SCHED_OP(old_ops, free_pdata, ppriv_old, cpu);


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 10/16] xen: sched: improve credit2 bootparams' scope, placement and signedness
  2016-03-18 19:03 [PATCH 00/16] Fixes and improvement (including hard affinity!) for Credit2 Dario Faggioli
                   ` (8 preceding siblings ...)
  2016-03-18 19:05 ` [PATCH 09/16] xen: sched: close potential races when switching scheduler to CPUs Dario Faggioli
@ 2016-03-18 19:05 ` Dario Faggioli
  2016-03-21 14:51   ` Juergen Gross
  2016-03-24 12:20   ` George Dunlap
  2016-03-18 19:05 ` [PATCH 11/16] xen: sched: on Credit2, don't reprogram the timer if idle Dario Faggioli
                   ` (5 subsequent siblings)
  15 siblings, 2 replies; 61+ messages in thread
From: Dario Faggioli @ 2016-03-18 19:05 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Juergen Gross, Jan Beulich, Uma Sharma

From: Uma Sharma <uma.sharma523@gmail.com>

and, while we are adjusting signedness of opt_load_window_shift,
make also prv->load_window_shift unsigned, as approapriate.

Signed-off-by: Uma Sharma <uma.sharma523@gmail.com>
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Jan Beulich <JBeulich@suse.com>
---
Patch has changed, so I'm not sticking any tag v1 received
(namely, Juergen's Reviewed-by:).
---
Changes from v1:
 * improve signedness as well, as requested during review.
---
 xen/common/sched_credit2.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 64fb028..2fd4175 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -162,7 +162,7 @@
 #define CSFLAG_runq_migrate_request (1<<__CSFLAG_runq_migrate_request)
 
 
-int opt_migrate_resist=500;
+static unsigned int __read_mostly opt_migrate_resist = 500;
 integer_param("sched_credit2_migrate_resist", opt_migrate_resist);
 
 /*
@@ -185,12 +185,12 @@ integer_param("sched_credit2_migrate_resist", opt_migrate_resist);
  *   to a load of 1.
  */
 #define LOADAVG_GRANULARITY_SHIFT (10)
-int opt_load_window_shift=18;
+static unsigned int __read_mostly opt_load_window_shift = 18;
 #define  LOADAVG_WINDOW_SHIFT_MIN 4
 integer_param("credit2_load_window_shift", opt_load_window_shift);
-int opt_underload_balance_tolerance=0;
+static int __read_mostly opt_underload_balance_tolerance = 0;
 integer_param("credit2_balance_under", opt_underload_balance_tolerance);
-int opt_overload_balance_tolerance=-3;
+static int __read_mostly opt_overload_balance_tolerance = -3;
 integer_param("credit2_balance_over", opt_overload_balance_tolerance);
 
 /*
@@ -227,7 +227,7 @@ struct csched2_private {
     cpumask_t active_queues; /* Queues which may have active cpus */
     struct csched2_runqueue_data rqd[NR_CPUS];
 
-    int load_window_shift;
+    unsigned int load_window_shift;
 };
 
 /*


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 11/16] xen: sched: on Credit2, don't reprogram the timer if idle
  2016-03-18 19:03 [PATCH 00/16] Fixes and improvement (including hard affinity!) for Credit2 Dario Faggioli
                   ` (9 preceding siblings ...)
  2016-03-18 19:05 ` [PATCH 10/16] xen: sched: improve credit2 bootparams' scope, placement and signedness Dario Faggioli
@ 2016-03-18 19:05 ` Dario Faggioli
  2016-03-24 15:03   ` George Dunlap
  2016-03-18 19:05 ` [PATCH 12/16] xen: sched: fix per-socket runqueue creation in credit2 Dario Faggioli
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 61+ messages in thread
From: Dario Faggioli @ 2016-03-18 19:05 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Tianyang Chen

as other schedulers are doing already: if the idle vcpu
is picked and scheduled, there is no need to reprogram the
scheduler timer to fire and invoke csched2_schedule()
again in future.

Tickling or external events will serve as pokes, when
necessary, but until we can, we should just stay idle.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reported-by: Tianyang Chen <tiche@seas.upenn.edu>
Suggested-by: George Dunlap <dunlapg@umich.edu>
---
Cc: George Dunlap <dunlapg@umich.edu>
Cc: Tianyang Chen <tiche@seas.upenn.edu>
---
 xen/common/sched_credit2.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 2fd4175..4ff26c9 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1540,8 +1540,12 @@ csched2_runtime(const struct scheduler *ops, int cpu, struct csched2_vcpu *snext
     struct csched2_runqueue_data *rqd = RQD(ops, cpu);
     struct list_head *runq = &rqd->runq;
 
+    /*
+     * If we're idle, just stay so. Others (or external events)
+     * will poke us when necessary.
+     */
     if ( is_idle_vcpu(snext->vcpu) )
-        return CSCHED2_MAX_TIMER;
+        return -1;
 
     /* General algorithm:
      * 1) Run until snext's credit will be 0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 12/16] xen: sched: fix per-socket runqueue creation in credit2
  2016-03-18 19:03 [PATCH 00/16] Fixes and improvement (including hard affinity!) for Credit2 Dario Faggioli
                   ` (10 preceding siblings ...)
  2016-03-18 19:05 ` [PATCH 11/16] xen: sched: on Credit2, don't reprogram the timer if idle Dario Faggioli
@ 2016-03-18 19:05 ` Dario Faggioli
  2016-03-24 12:24   ` George Dunlap
  2016-03-18 19:05 ` [PATCH 13/16] xen: sched: allow for choosing credit2 runqueues configuration at boot Dario Faggioli
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 61+ messages in thread
From: Dario Faggioli @ 2016-03-18 19:05 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Justin Weaver

The credit2 scheduler tries to setup runqueues in such
a way that there is one of them per each socket. However,
that does not work. The issue is described in bug #36
"credit2 only uses one runqueue instead of one runq per
socket" (http://bugs.xenproject.org/xen/bug/36), and a
solution has been attempted by an old patch series:

 http://lists.xen.org/archives/html/xen-devel/2014-08/msg02168.html

Here, we take advantage of the fact that now initialization
happens (for all schedulers) during CPU_STARTING, so we
have all the topology information available when necessary.

This is true for all the pCPUs _except_ the boot CPU. That
is not an issue, though. In fact, no runqueue exists yet
when the boot CPU is initialized, so we can just create
one and put the boot CPU in there.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Justin Weaver <jtweaver@hawaii.edu>
---
 xen/common/sched_credit2.c |   59 ++++++++++++++++++++++++++++++++------------
 1 file changed, 43 insertions(+), 16 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 4ff26c9..456b9ea 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -53,7 +53,6 @@
  *  http://wiki.xen.org/wiki/Credit2_Scheduler_Development
  * TODO:
  * + Multiple sockets
- *  - Detect cpu layout and make runqueue map, one per L2 (make_runq_map())
  *  - Simple load balancer / runqueue assignment
  *  - Runqueue load measurement
  *  - Load-based load balancer
@@ -1972,6 +1971,48 @@ static void deactivate_runqueue(struct csched2_private *prv, int rqi)
     cpumask_clear_cpu(rqi, &prv->active_queues);
 }
 
+static unsigned int
+cpu_to_runqueue(struct csched2_private *prv, unsigned int cpu)
+{
+    struct csched2_runqueue_data *rqd;
+    unsigned int rqi;
+
+    for ( rqi = 0; rqi < nr_cpu_ids; rqi++ )
+    {
+        unsigned int peer_cpu;
+
+        /*
+         * As soon as we come across an uninitialized runqueue, use it.
+         * In fact, either:
+         *  - we are initializing the first cpu, and we assign it to
+         *    runqueue 0. This is handy, especially if we are dealing
+         *    with the boot cpu (if credit2 is the default scheduler),
+         *    as we would not be able to use cpu_to_socket() and similar
+         *    helpers anyway (they're result of which is not reliable yet);
+         *  - we have gone through all the active runqueues, and have not
+         *    found anyone whose cpus' topology matches the one we are
+         *    dealing with, so ativating a new runqueue is what we want.
+         */
+        if ( prv->rqd[rqi].id == -1 )
+            break;
+
+        rqd = prv->rqd + rqi;
+        BUG_ON(cpumask_empty(&rqd->active));
+
+        peer_cpu = cpumask_first(&rqd->active);
+        BUG_ON(cpu_to_socket(cpu) == XEN_INVALID_SOCKET_ID ||
+               cpu_to_socket(peer_cpu) == XEN_INVALID_SOCKET_ID);
+
+        if ( cpu_to_socket(cpumask_first(&rqd->active)) == cpu_to_socket(cpu) )
+            break;
+    }
+
+    /* We really expect to be able to assign each cpu to a runqueue. */
+    BUG_ON(rqi >= nr_cpu_ids);
+
+    return rqi;
+}
+
 /* Returns the ID of the runqueue the cpu is assigned to. */
 static unsigned
 init_pdata(struct csched2_private *prv, unsigned int cpu)
@@ -1983,21 +2024,7 @@ init_pdata(struct csched2_private *prv, unsigned int cpu)
     ASSERT(!cpumask_test_cpu(cpu, &prv->initialized));
 
     /* Figure out which runqueue to put it in */
-    rqi = 0;
-
-    /* Figure out which runqueue to put it in */
-    /* NB: cpu 0 doesn't get a STARTING callback, so we hard-code it to runqueue 0. */
-    if ( cpu == 0 )
-        rqi = 0;
-    else
-        rqi = cpu_to_socket(cpu);
-
-    if ( rqi == XEN_INVALID_SOCKET_ID )
-    {
-        printk("%s: cpu_to_socket(%d) returned %d!\n",
-               __func__, cpu, rqi);
-        BUG();
-    }
+    rqi = cpu_to_runqueue(prv, cpu);
 
     rqd = prv->rqd + rqi;
 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 13/16] xen: sched: allow for choosing credit2 runqueues configuration at boot
  2016-03-18 19:03 [PATCH 00/16] Fixes and improvement (including hard affinity!) for Credit2 Dario Faggioli
                   ` (11 preceding siblings ...)
  2016-03-18 19:05 ` [PATCH 12/16] xen: sched: fix per-socket runqueue creation in credit2 Dario Faggioli
@ 2016-03-18 19:05 ` Dario Faggioli
  2016-03-22  7:46   ` Juergen Gross
  2016-03-24 12:36   ` George Dunlap
  2016-03-18 19:05 ` [PATCH 14/16] xen: sched: per-core runqueues as default in credit2 Dario Faggioli
                   ` (2 subsequent siblings)
  15 siblings, 2 replies; 61+ messages in thread
From: Dario Faggioli @ 2016-03-18 19:05 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Juergen Gross, Uma Sharma

In fact, credit2 uses CPU topology to decide how to arrange
its internal runqueues. Before this change, only 'one runqueue
per socket' was allowed. However, experiments have shown that,
for instance, having one runqueue per physical core improves
performance, especially in case hyperthreading is available.

In general, it makes sense to allow users to pick one runqueue
arrangement at boot time, so that:
 - more experiments can be easily performed to even better
   assess and improve performance;
 - one can select the best configuration for his specific
   use case and/or hardware.

This patch enables the above.

Note that, for correctly arranging runqueues to be per-core,
just checking cpu_to_core() on the host CPUs is not enough.
In fact, cores (and hyperthreads) on different sockets, can
have the same core (and thread) IDs! We, therefore, need to
check whether the full topology of two CPUs matches, for
them to be put in the same runqueue.

Note also that the default (although not functional) for
credit2, since now, has been per-socket runqueue. This patch
leaves things that way, to avoid mixing policy and technical
changes.

Finally, it would be a nice feature to be able to select
a particular runqueue arrangement, even when creating a
Credit2 cpupool. This is left as future work.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Signed-off-by: Uma Sharma <uma.sharma523@gmail.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Uma Sharma <uma.sharma523@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
---
Cahnges from v1:
 * added 'node' and 'global' runqueue arrangements, as
   suggested during review;
---
 docs/misc/xen-command-line.markdown |   19 +++++++++
 xen/common/sched_credit2.c          |   76 +++++++++++++++++++++++++++++++++--
 2 files changed, 90 insertions(+), 5 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index ca77e3b..0047f94 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -469,6 +469,25 @@ combination with the `low_crashinfo` command line option.
 ### credit2\_load\_window\_shift
 > `= <integer>`
 
+### credit2\_runqueue
+> `= core | socket | node | all`
+
+> Default: `socket`
+
+Specify how host CPUs are arranged in runqueues. Runqueues are kept
+balanced with respect to the load generated by the vCPUs running on
+them. Smaller runqueues (as in with `core`) means more accurate load
+balancing (for instance, it will deal better with hyperthreading),
+but also more overhead.
+
+Available alternatives, with their meaning, are:
+* `core`: one runqueue per each physical core of the host;
+* `socket`: one runqueue per each physical socket (which often,
+            but not always, matches a NUMA node) of the host;
+* `node`: one runqueue per each NUMA node of the host;
+* `all`: just one runqueue shared by all the logical pCPUs of
+         the host
+
 ### dbgp
 > `= ehci[ <integer> | @pci<bus>:<slot>.<func> ]`
 
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 456b9ea..c242dc4 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -81,10 +81,6 @@
  * Credits are "reset" when the next vcpu in the runqueue is less than
  * or equal to zero.  At that point, everyone's credits are "clipped"
  * to a small value, and a fixed credit is added to everyone.
- *
- * The plan is for all cores that share an L2 will share the same
- * runqueue.  At the moment, there is one global runqueue for all
- * cores.
  */
 
 /*
@@ -193,6 +189,55 @@ static int __read_mostly opt_overload_balance_tolerance = -3;
 integer_param("credit2_balance_over", opt_overload_balance_tolerance);
 
 /*
+ * Runqueue organization.
+ *
+ * The various cpus are to be assigned each one to a runqueue, and we
+ * want that to happen basing on topology. At the moment, it is possible
+ * to choose to arrange runqueues to be:
+ *
+ * - per-core: meaning that there will be one runqueue per each physical
+ *             core of the host. This will happen if the opt_runqueue
+ *             parameter is set to 'core';
+ *
+ * - per-node: meaning that there will be one runqueue per each physical
+ *             NUMA node of the host. This will happen if the opt_runqueue
+ *             parameter is set to 'node';
+ *
+ * - per-socket: meaning that there will be one runqueue per each physical
+ *               socket (AKA package, which often, but not always, also
+ *               matches a NUMA node) of the host; This will happen if
+ *               the opt_runqueue parameter is set to 'socket';
+ *
+ * - global: meaning that there will be only one runqueue to which all the
+ *           (logical) processors of the host belongs. This will happen if
+ *           the opt_runqueue parameter is set to 'all'.
+ *
+ * Depending on the value of opt_runqueue, therefore, cpus that are part of
+ * either the same physical core, or of the same physical socket, will be
+ * put together to form runqueues.
+ */
+#define OPT_RUNQUEUE_CORE   1
+#define OPT_RUNQUEUE_SOCKET 2
+#define OPT_RUNQUEUE_NODE   3
+#define OPT_RUNQUEUE_ALL    4
+static int __read_mostly opt_runqueue = OPT_RUNQUEUE_SOCKET;
+
+static void parse_credit2_runqueue(const char *s)
+{
+    if ( !strncmp(s, "core", 4) && !s[4] )
+        opt_runqueue = OPT_RUNQUEUE_CORE;
+    else if ( !strncmp(s, "socket", 6) && !s[6] )
+        opt_runqueue = OPT_RUNQUEUE_SOCKET;
+    else if ( !strncmp(s, "node", 4) && !s[4] )
+        opt_runqueue = OPT_RUNQUEUE_NODE;
+    else if ( !strncmp(s, "all", 6) && !s[6] )
+        opt_runqueue = OPT_RUNQUEUE_ALL;
+    else
+        printk("WARNING, unrecognized value of credit2_runqueue option!\n");
+}
+custom_param("credit2_runqueue", parse_credit2_runqueue);
+
+/*
  * Per-runqueue data
  */
 struct csched2_runqueue_data {
@@ -1971,6 +2016,22 @@ static void deactivate_runqueue(struct csched2_private *prv, int rqi)
     cpumask_clear_cpu(rqi, &prv->active_queues);
 }
 
+static inline bool_t same_node(unsigned int cpua, unsigned int cpub)
+{
+    return cpu_to_node(cpua) == cpu_to_node(cpub);
+}
+
+static inline bool_t same_socket(unsigned int cpua, unsigned int cpub)
+{
+    return cpu_to_socket(cpua) == cpu_to_socket(cpub);
+}
+
+static inline bool_t same_core(unsigned int cpua, unsigned int cpub)
+{
+    return same_socket(cpua, cpub) &&
+           cpu_to_core(cpua) == cpu_to_core(cpub);
+}
+
 static unsigned int
 cpu_to_runqueue(struct csched2_private *prv, unsigned int cpu)
 {
@@ -2003,7 +2064,10 @@ cpu_to_runqueue(struct csched2_private *prv, unsigned int cpu)
         BUG_ON(cpu_to_socket(cpu) == XEN_INVALID_SOCKET_ID ||
                cpu_to_socket(peer_cpu) == XEN_INVALID_SOCKET_ID);
 
-        if ( cpu_to_socket(cpumask_first(&rqd->active)) == cpu_to_socket(cpu) )
+        if ( opt_runqueue == OPT_RUNQUEUE_ALL ||
+             (opt_runqueue == OPT_RUNQUEUE_CORE && same_core(peer_cpu, cpu)) ||
+             (opt_runqueue == OPT_RUNQUEUE_SOCKET && same_socket(peer_cpu, cpu)) ||
+             (opt_runqueue == OPT_RUNQUEUE_NODE && same_node(peer_cpu, cpu)) )
             break;
     }
 
@@ -2157,6 +2221,8 @@ csched2_init(struct scheduler *ops)
     printk(" load_window_shift: %d\n", opt_load_window_shift);
     printk(" underload_balance_tolerance: %d\n", opt_underload_balance_tolerance);
     printk(" overload_balance_tolerance: %d\n", opt_overload_balance_tolerance);
+    printk(" runqueues arrangement: per-%s\n",
+           opt_runqueue == OPT_RUNQUEUE_CORE ? "core" : "socket");
 
     if ( opt_load_window_shift < LOADAVG_WINDOW_SHIFT_MIN )
     {


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 14/16] xen: sched: per-core runqueues as default in credit2
  2016-03-18 19:03 [PATCH 00/16] Fixes and improvement (including hard affinity!) for Credit2 Dario Faggioli
                   ` (12 preceding siblings ...)
  2016-03-18 19:05 ` [PATCH 13/16] xen: sched: allow for choosing credit2 runqueues configuration at boot Dario Faggioli
@ 2016-03-18 19:05 ` Dario Faggioli
  2016-03-24 12:37   ` George Dunlap
  2016-03-18 19:06 ` [PATCH 15/16] xen: sched: scratch space for cpumasks on Credit2 Dario Faggioli
  2016-03-18 19:06 ` [PATCH 16/16] xen: sched: implement vcpu hard affinity in Credit2 Dario Faggioli
  15 siblings, 1 reply; 61+ messages in thread
From: Dario Faggioli @ 2016-03-18 19:05 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Uma Sharma

Experiments have shown that arranging the scheduing
runqueues on a per-core basis yields better results,
in most cases.

Such evaluation has been done, for the first time,
by Uma Sharma, during her participation to OPW. Some
of the results she got are summarized here:

 http://lists.xen.org/archives/html/xen-devel/2015-03/msg01499.html

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Signed-off-by: Uma Sharma <uma.sharma523@gmail.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
---
 docs/misc/xen-command-line.markdown |    2 +-
 xen/common/sched_credit2.c          |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 0047f94..5d801b8 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -472,7 +472,7 @@ combination with the `low_crashinfo` command line option.
 ### credit2\_runqueue
 > `= core | socket | node | all`
 
-> Default: `socket`
+> Default: `core`
 
 Specify how host CPUs are arranged in runqueues. Runqueues are kept
 balanced with respect to the load generated by the vCPUs running on
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index c242dc4..07b8c67 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -220,7 +220,7 @@ integer_param("credit2_balance_over", opt_overload_balance_tolerance);
 #define OPT_RUNQUEUE_SOCKET 2
 #define OPT_RUNQUEUE_NODE   3
 #define OPT_RUNQUEUE_ALL    4
-static int __read_mostly opt_runqueue = OPT_RUNQUEUE_SOCKET;
+static int __read_mostly opt_runqueue = OPT_RUNQUEUE_CORE;
 
 static void parse_credit2_runqueue(const char *s)
 {


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 15/16] xen: sched: scratch space for cpumasks on Credit2
  2016-03-18 19:03 [PATCH 00/16] Fixes and improvement (including hard affinity!) for Credit2 Dario Faggioli
                   ` (13 preceding siblings ...)
  2016-03-18 19:05 ` [PATCH 14/16] xen: sched: per-core runqueues as default in credit2 Dario Faggioli
@ 2016-03-18 19:06 ` Dario Faggioli
  2016-03-18 19:27   ` Andrew Cooper
  2016-03-18 19:06 ` [PATCH 16/16] xen: sched: implement vcpu hard affinity in Credit2 Dario Faggioli
  15 siblings, 1 reply; 61+ messages in thread
From: Dario Faggioli @ 2016-03-18 19:06 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap

like what's there already in both Credit1 and RTDS. In
fact, when playing with affinity, a lot of cpumask
manipulation is necessary, inside of various functions.

To avoid having a lot of cpumask_var_t on the stack,
this patch introduces a global scratch area.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
---
 xen/common/sched_credit2.c |   50 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 07b8c67..a650216 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -238,6 +238,23 @@ static void parse_credit2_runqueue(const char *s)
 custom_param("credit2_runqueue", parse_credit2_runqueue);
 
 /*
+ * Scratch space, useful to avoid having too many cpumask_var_t on the stack.
+ *
+ * We want to only allocate the array the first time an instance of this
+ * scheduler is used, and avoid reallocating it (e.g., when more instances
+ * are activated inside new cpupools) or leaking it (e.g., when the last
+ * instance is de-inited).
+ *
+ * Counting the number of active Credit2 instances is all we need, and it
+ * does not even have to happen via atomic_t operations, as the counter
+ * only changes during during boot, or under the cpupool_lock.
+ */
+static cpumask_t **_cpumask_scratch;
+#define cpumask_scratch _cpumask_scratch[smp_processor_id()]
+
+static unsigned int nr_csched2_ops;
+
+/*
  * Per-runqueue data
  */
 struct csched2_runqueue_data {
@@ -2166,6 +2183,15 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu,
     spin_unlock_irq(&prv->lock);
 }
 
+static void *
+csched2_alloc_pdata(const struct scheduler *ops, int cpu)
+{
+    if ( !zalloc_cpumask_var(&_cpumask_scratch[cpu]) )
+        return ERR_PTR(-ENOMEM);
+
+    return NULL;
+}
+
 static void
 csched2_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
 {
@@ -2205,6 +2231,9 @@ csched2_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
 
     spin_unlock_irqrestore(&prv->lock, flags);
 
+    free_cpumask_var(_cpumask_scratch[cpu]);
+    _cpumask_scratch[cpu] = NULL;
+
     return;
 }
 
@@ -2239,6 +2268,19 @@ csched2_init(struct scheduler *ops)
     if ( prv == NULL )
         return -ENOMEM;
     ops->sched_data = prv;
+
+    ASSERT( _cpumask_scratch == NULL || nr_csched2_ops > 0 );
+    if ( !_cpumask_scratch )
+    {
+        _cpumask_scratch = xmalloc_array(cpumask_var_t, nr_cpu_ids);
+        if ( !_cpumask_scratch )
+        {
+            xfree(prv);
+            return -ENOMEM;
+        }
+    }
+    nr_csched2_ops++;
+
     spin_lock_init(&prv->lock);
     INIT_LIST_HEAD(&prv->sdom);
 
@@ -2259,6 +2301,13 @@ csched2_deinit(struct scheduler *ops)
 {
     struct csched2_private *prv;
 
+    ASSERT( _cpumask_scratch && nr_csched2_ops > 0 );
+    if ( (--nr_csched2_ops) == 0 )
+    {
+        xfree(_cpumask_scratch);
+        _cpumask_scratch = NULL;
+    }
+
     prv = CSCHED2_PRIV(ops);
     ops->sched_data = NULL;
     xfree(prv);
@@ -2293,6 +2342,7 @@ static const struct scheduler sched_credit2_def = {
     .alloc_vdata    = csched2_alloc_vdata,
     .free_vdata     = csched2_free_vdata,
     .init_pdata     = csched2_init_pdata,
+    .alloc_pdata    = csched2_alloc_pdata,
     .free_pdata     = csched2_free_pdata,
     .switch_sched   = csched2_switch_sched,
     .alloc_domdata  = csched2_alloc_domdata,


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 16/16] xen: sched: implement vcpu hard affinity in Credit2
  2016-03-18 19:03 [PATCH 00/16] Fixes and improvement (including hard affinity!) for Credit2 Dario Faggioli
                   ` (14 preceding siblings ...)
  2016-03-18 19:06 ` [PATCH 15/16] xen: sched: scratch space for cpumasks on Credit2 Dario Faggioli
@ 2016-03-18 19:06 ` Dario Faggioli
  2016-03-24 15:42   ` George Dunlap
  15 siblings, 1 reply; 61+ messages in thread
From: Dario Faggioli @ 2016-03-18 19:06 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Justin Weaver

From: Justin Weaver <jtweaver@hawaii.edu>

as it was still missing.

Note that this patch "only" implements hard affinity,
i.e., the possibility of specifying on what pCPUs a
certain vCPU can run. Soft affinity (which express a
preference for vCPUs to run on certain pCPUs) is still
not supported by Credit2, even after this patch.

Signed-off-by: Justin Weaver <jtweaver@hawaii.edu>
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <dunlapg@umich.edu>
---
 xen/common/sched_credit2.c |  131 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 102 insertions(+), 29 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index a650216..3190eb3 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -327,6 +327,36 @@ struct csched2_dom {
     uint16_t nr_vcpus;
 };
 
+/*
+ * When a hard affinity change occurs, we may not be able to check some
+ * (any!) of the other runqueues, when looking for the best new processor
+ * for svc (as trylock-s in choose_cpu() can fail). If that happens, we
+ * pick, in order of decreasing preference:
+ *  - svc's current pcpu;
+ *  - another pcpu from svc's current runq;
+ *  - any cpu.
+ */
+static int get_fallback_cpu(struct csched2_vcpu *svc)
+{
+    int cpu;
+
+    if ( likely(cpumask_test_cpu(svc->vcpu->processor,
+                                 svc->vcpu->cpu_hard_affinity)) )
+        return svc->vcpu->processor;
+
+    cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
+                &svc->rqd->active);
+    cpu = cpumask_first(cpumask_scratch);
+    if ( likely(cpu < nr_cpu_ids) )
+        return cpu;
+
+    cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
+                cpupool_domain_cpumask(svc->vcpu->domain));
+
+    ASSERT(!cpumask_empty(cpumask_scratch));
+
+    return cpumask_first(cpumask_scratch);
+}
 
 /*
  * Time-to-credit, credit-to-time.
@@ -560,8 +590,9 @@ runq_tickle(const struct scheduler *ops, unsigned int cpu, struct csched2_vcpu *
         goto tickle;
     }
     
-    /* Get a mask of idle, but not tickled */
+    /* Get a mask of idle, but not tickled, that new is allowed to run on. */
     cpumask_andnot(&mask, &rqd->idle, &rqd->tickled);
+    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
     
     /* If it's not empty, choose one */
     i = cpumask_cycle(cpu, &mask);
@@ -572,9 +603,11 @@ runq_tickle(const struct scheduler *ops, unsigned int cpu, struct csched2_vcpu *
     }
 
     /* Otherwise, look for the non-idle cpu with the lowest credit,
-     * skipping cpus which have been tickled but not scheduled yet */
+     * skipping cpus which have been tickled but not scheduled yet,
+     * that new is allowed to run on. */
     cpumask_andnot(&mask, &rqd->active, &rqd->idle);
     cpumask_andnot(&mask, &mask, &rqd->tickled);
+    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
 
     for_each_cpu(i, &mask)
     {
@@ -1124,9 +1157,8 @@ choose_cpu(const struct scheduler *ops, struct vcpu *vc)
             d2printk("%pv -\n", svc->vcpu);
             clear_bit(__CSFLAG_runq_migrate_request, &svc->flags);
         }
-        /* Leave it where it is for now.  When we actually pay attention
-         * to affinity we'll have to figure something out... */
-        return vc->processor;
+
+        return get_fallback_cpu(svc);
     }
 
     /* First check to see if we're here because someone else suggested a place
@@ -1137,45 +1169,56 @@ choose_cpu(const struct scheduler *ops, struct vcpu *vc)
         {
             printk("%s: Runqueue migrate aborted because target runqueue disappeared!\n",
                    __func__);
-            /* Fall-through to normal cpu pick */
         }
         else
         {
-            d2printk("%pv +\n", svc->vcpu);
-            new_cpu = cpumask_cycle(vc->processor, &svc->migrate_rqd->active);
-            goto out_up;
+            cpumask_and(cpumask_scratch, vc->cpu_hard_affinity,
+                        &svc->migrate_rqd->active);
+            new_cpu = cpumask_any(cpumask_scratch);
+            if ( new_cpu < nr_cpu_ids )
+            {
+                d2printk("%pv +\n", svc->vcpu);
+                goto out_up;
+            }
         }
+        /* Fall-through to normal cpu pick */
     }
 
-    /* FIXME: Pay attention to cpu affinity */                                                                                      
-
     min_avgload = MAX_LOAD;
 
     /* Find the runqueue with the lowest instantaneous load */
     for_each_cpu(i, &prv->active_queues)
     {
         struct csched2_runqueue_data *rqd;
-        s_time_t rqd_avgload;
+        s_time_t rqd_avgload = MAX_LOAD;
 
         rqd = prv->rqd + i;
 
-        /* If checking a different runqueue, grab the lock,
-         * read the avg, and then release the lock.
+        /*
+         * If checking a different runqueue, grab the lock, check hard
+         * affinity, read the avg, and then release the lock.
          *
          * If on our own runqueue, don't grab or release the lock;
          * but subtract our own load from the runqueue load to simulate
-         * impartiality */
+         * impartiality.
+         *
+         * Note that, if svc's hard affinity has changed, this is the
+         * first time when we see such change, so it is indeed possible
+         * that none of the cpus in svc's current runqueue is in our
+         * (new) hard affinity!
+         */
         if ( rqd == svc->rqd )
         {
-            rqd_avgload = rqd->b_avgload - svc->avgload;
+            if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
+                rqd_avgload = rqd->b_avgload - svc->avgload;
         }
         else if ( spin_trylock(&rqd->lock) )
         {
-            rqd_avgload = rqd->b_avgload;
+            if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
+                rqd_avgload = rqd->b_avgload;
+
             spin_unlock(&rqd->lock);
         }
-        else
-            continue;
 
         if ( rqd_avgload < min_avgload )
         {
@@ -1184,12 +1227,14 @@ choose_cpu(const struct scheduler *ops, struct vcpu *vc)
         }
     }
 
-    /* We didn't find anyone (most likely because of spinlock contention); leave it where it is */
+    /* We didn't find anyone (most likely because of spinlock contention). */
     if ( min_rqi == -1 )
-        new_cpu = vc->processor;
+        new_cpu = get_fallback_cpu(svc);
     else
     {
-        new_cpu = cpumask_cycle(vc->processor, &prv->rqd[min_rqi].active);
+        cpumask_and(cpumask_scratch, vc->cpu_hard_affinity,
+                    &prv->rqd[min_rqi].active);
+        new_cpu = cpumask_any(cpumask_scratch);
         BUG_ON(new_cpu >= nr_cpu_ids);
     }
 
@@ -1269,7 +1314,12 @@ static void migrate(const struct scheduler *ops,
             on_runq=1;
         }
         __runq_deassign(svc);
-        svc->vcpu->processor = cpumask_any(&trqd->active);
+
+        cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
+                    &trqd->active);
+        svc->vcpu->processor = cpumask_any(cpumask_scratch);
+        BUG_ON(svc->vcpu->processor >= nr_cpu_ids);
+
         __runq_assign(svc, trqd);
         if ( on_runq )
         {
@@ -1283,6 +1333,17 @@ static void migrate(const struct scheduler *ops,
     }
 }
 
+/*
+ * It makes sense considering migrating svc to rqd, if:
+ *  - svc is not already flagged to migrate,
+ *  - if svc is allowed to run on at least one of the pcpus of rqd.
+ */
+static bool_t vcpu_is_migrateable(struct csched2_vcpu *svc,
+                                  struct csched2_runqueue_data *rqd)
+{
+    return !(svc->flags & CSFLAG_runq_migrate_request) &&
+           cpumask_intersects(svc->vcpu->cpu_hard_affinity, &rqd->active);
+}
 
 static void balance_load(const struct scheduler *ops, int cpu, s_time_t now)
 {
@@ -1391,8 +1452,7 @@ retry:
 
         __update_svc_load(ops, push_svc, 0, now);
 
-        /* Skip this one if it's already been flagged to migrate */
-        if ( push_svc->flags & CSFLAG_runq_migrate_request )
+        if ( !vcpu_is_migrateable(push_svc, st.orqd) )
             continue;
 
         list_for_each( pull_iter, &st.orqd->svc )
@@ -1404,8 +1464,7 @@ retry:
                 __update_svc_load(ops, pull_svc, 0, now);
             }
         
-            /* Skip this one if it's already been flagged to migrate */
-            if ( pull_svc->flags & CSFLAG_runq_migrate_request )
+            if ( !vcpu_is_migrateable(pull_svc, st.lrqd) )
                 continue;
 
             consider(&st, push_svc, pull_svc);
@@ -1421,8 +1480,7 @@ retry:
     {
         struct csched2_vcpu * pull_svc = list_entry(pull_iter, struct csched2_vcpu, rqd_elem);
         
-        /* Skip this one if it's already been flagged to migrate */
-        if ( pull_svc->flags & CSFLAG_runq_migrate_request )
+        if ( !vcpu_is_migrateable(pull_svc, st.lrqd) )
             continue;
 
         /* Consider pull only */
@@ -1461,11 +1519,22 @@ csched2_vcpu_migrate(
 
     /* Check if new_cpu is valid */
     BUG_ON(!cpumask_test_cpu(new_cpu, &CSCHED2_PRIV(ops)->initialized));
+    ASSERT(cpumask_test_cpu(new_cpu, vc->cpu_hard_affinity));
 
     trqd = RQD(ops, new_cpu);
 
+    /*
+     * Do the actual movement toward new_cpu, and update vc->processor.
+     * If we are changing runqueue, migrate() takes care of everything.
+     * If we are not changing runqueue, we need to update vc->processor
+     * here. In fact, if, for instance, we are here because the vcpu's
+     * hard affinity changed, we don't want to risk leaving vc->processor
+     * pointing to a pcpu where we can't run any longer.
+     */
     if ( trqd != svc->rqd )
         migrate(ops, svc, trqd, NOW());
+    else
+        vc->processor = new_cpu;
 }
 
 static int
@@ -1685,6 +1754,10 @@ runq_candidate(struct csched2_runqueue_data *rqd,
     {
         struct csched2_vcpu * svc = list_entry(iter, struct csched2_vcpu, runq_elem);
 
+        /* Only consider vcpus that are allowed to run on this processor. */
+        if ( !cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity) )
+            continue;
+
         /* If this is on a different processor, don't pull it unless
          * its credit is at least CSCHED2_MIGRATE_RESIST higher. */
         if ( svc->vcpu->processor != cpu


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 15/16] xen: sched: scratch space for cpumasks on Credit2
  2016-03-18 19:06 ` [PATCH 15/16] xen: sched: scratch space for cpumasks on Credit2 Dario Faggioli
@ 2016-03-18 19:27   ` Andrew Cooper
  2016-03-24 12:44     ` George Dunlap
  0 siblings, 1 reply; 61+ messages in thread
From: Andrew Cooper @ 2016-03-18 19:27 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap

On 18/03/16 19:06, Dario Faggioli wrote:
> like what's there already in both Credit1 and RTDS. In
> fact, when playing with affinity, a lot of cpumask
> manipulation is necessary, inside of various functions.
>
> To avoid having a lot of cpumask_var_t on the stack,
> this patch introduces a global scratch area.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> ---

I would suggest instead going with

static DEFINE_PER_CPU(cpumask_t, csched2_cpumask_scratch);

Functions which want to use it can use

cpumask_t *scratch = &this_cpu(csched2_cpumask_scratch);


This avoids all this opencoded allocation/refcounting, the chance that
starting a scheduler would fail for memory reasons, and one extra
cpumask in the per-cpu data area won't break the bank.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 01/16] xen: sched: fix locking when allocating an RTDS pCPU
  2016-03-18 19:04 ` [PATCH 01/16] xen: sched: fix locking when allocating an RTDS pCPU Dario Faggioli
@ 2016-03-19  2:22   ` Meng Xu
  2016-03-23 15:37   ` George Dunlap
  1 sibling, 0 replies; 61+ messages in thread
From: Meng Xu @ 2016-03-19  2:22 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel, Tianyang Chen

On Fri, Mar 18, 2016 at 3:04 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> as doing that include changing the scheduler lock
> mapping for the pCPU itself, and the correct way
> of doing that is:
>  - take the lock that the pCPU is using right now
>    (which may be the lock of another scheduler);
>  - change the mapping of the lock to the RTDS one;
>  - release the lock (the one that has actually been
>    taken!)
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: Meng Xu <mengxu@cis.upenn.edu>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Tianyang Chen <tiche@seas.upenn.edu>

Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>
-- 
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/16] xen: sched: make implementing .alloc_pdata optional
  2016-03-18 19:04 ` [PATCH 03/16] xen: sched: make implementing .alloc_pdata optional Dario Faggioli
@ 2016-03-19  2:23   ` Meng Xu
  2016-03-21 14:22   ` Jan Beulich
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 61+ messages in thread
From: Meng Xu @ 2016-03-19  2:23 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Juergen Gross, George Dunlap, Robert VanVossen, Josh Whitehead,
	Jan Beulich, xen-devel

On Fri, Mar 18, 2016 at 3:04 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> The .alloc_pdata scheduler hook must, before this change,
> be implemented by all schedulers --even those ones that
> don't need to allocate anything.
>
> Make it possible to just use the SCHED_OP(), like for
> the other hooks, by using ERR_PTR() and IS_ERR() for
> error reporting. This:
>  - makes NULL a variant of success;
>  - allows for errors other than ENOMEM to be properly
>    communicated (if ever necessary).
>
> This, in turn, means that schedulers not needing to
> allocate any per-pCPU data, can avoid implementing the
> hook. In fact, the artificial implementation of
> .alloc_pdata in the ARINC653 is removed (and, while there,
> nuke .free_pdata too, as it is equally useless).
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Robert VanVossen <robert.vanvossen@dornerworks.com>
> Cc: Josh Whitehead <josh.whitehead@dornerworks.com>
> Cc: Meng Xu <mengxu@cis.upenn.edu>
> Cc: Jan Beulich <JBeulich@suse.com>
> Cc: Juergen Gross <jgross@suse.com>
> ---

>  static void
> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index d98bfb6..ac8019f 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -665,7 +665,7 @@ rt_alloc_pdata(const struct scheduler *ops, int cpu)
>      spin_unlock_irqrestore(old_lock, flags);
>
>      if ( !alloc_cpumask_var(&_cpumask_scratch[cpu]) )
> -        return NULL;
> +        return ERR_PTR(-ENOMEM);
>
>      if ( prv->repl_timer == NULL )
>      {
> @@ -673,13 +673,12 @@ rt_alloc_pdata(const struct scheduler *ops, int cpu)
>          prv->repl_timer = xzalloc(struct timer);
>
>          if ( prv->repl_timer == NULL )
> -            return NULL;
> +            return ERR_PTR(-ENOMEM);
>
>          init_timer(prv->repl_timer, repl_timer_handler, (void *)ops, cpu);
>      }
>
> -    /* 1 indicates alloc. succeed in schedule.c */
> -    return (void *)1;
> +    return NULL;
>  }
>
>  static void
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 0627eb5..1adc0e2 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -1491,9 +1491,9 @@ static int cpu_schedule_up(unsigned int cpu)
>      if ( idle_vcpu[cpu] == NULL )
>          return -ENOMEM;
>
> -    if ( (ops.alloc_pdata != NULL) &&
> -         ((sd->sched_priv = ops.alloc_pdata(&ops, cpu)) == NULL) )
> -        return -ENOMEM;
> +    sd->sched_priv = SCHED_OP(&ops, alloc_pdata, cpu);
> +    if ( IS_ERR(sd->sched_priv) )
> +        return PTR_ERR(sd->sched_priv);
>
>      return 0;
>  }
> @@ -1503,8 +1503,7 @@ static void cpu_schedule_down(unsigned int cpu)
>      struct schedule_data *sd = &per_cpu(schedule_data, cpu);
>      struct scheduler *sched = per_cpu(scheduler, cpu);
>
> -    if ( sd->sched_priv != NULL )
> -        SCHED_OP(sched, free_pdata, sd->sched_priv, cpu);
> +    SCHED_OP(sched, free_pdata, sd->sched_priv, cpu);
>      SCHED_OP(sched, free_vdata, idle_vcpu[cpu]->sched_priv);
>
>      idle_vcpu[cpu]->sched_priv = NULL;
> @@ -1599,9 +1598,8 @@ void __init scheduler_init(void)
>      idle_domain->max_vcpus = nr_cpu_ids;
>      if ( alloc_vcpu(idle_domain, 0, 0) == NULL )
>          BUG();
> -    if ( ops.alloc_pdata &&
> -         !(this_cpu(schedule_data).sched_priv = ops.alloc_pdata(&ops, 0)) )
> -        BUG();
> +    this_cpu(schedule_data).sched_priv = SCHED_OP(&ops, alloc_pdata, 0);
> +    BUG_ON(IS_ERR(this_cpu(schedule_data).sched_priv));
>      SCHED_OP(&ops, init_pdata, this_cpu(schedule_data).sched_priv, 0);
>  }
>
> @@ -1644,8 +1642,8 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
>
>      idle = idle_vcpu[cpu];
>      ppriv = SCHED_OP(new_ops, alloc_pdata, cpu);
> -    if ( ppriv == NULL )
> -        return -ENOMEM;
> +    if ( IS_ERR(ppriv) )
> +        return PTR_ERR(ppriv);
>      SCHED_OP(new_ops, init_pdata, ppriv, cpu);
>      vpriv = SCHED_OP(new_ops, alloc_vdata, idle, idle->domain->sched_priv);
>      if ( vpriv == NULL )
> diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
> index 70c08c6..560cba5 100644
> --- a/xen/include/xen/sched-if.h
> +++ b/xen/include/xen/sched-if.h
> @@ -9,6 +9,7 @@
>  #define __XEN_SCHED_IF_H__
>
>  #include <xen/percpu.h>
> +#include <xen/err.h>
>
>  /* A global pointer to the initial cpupool (POOL0). */
>  extern struct cpupool *cpupool0;
>

For the sched_rt.c and schedule.c,
Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>

Thanks,

Meng

-- 
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 04/16] xen: sched: implement .init_pdata in all schedulers
  2016-03-18 19:04 ` [PATCH 04/16] xen: sched: implement .init_pdata in all schedulers Dario Faggioli
@ 2016-03-19  2:24   ` Meng Xu
  2016-03-22  8:03   ` Juergen Gross
  1 sibling, 0 replies; 61+ messages in thread
From: Meng Xu @ 2016-03-19  2:24 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel, Juergen Gross

On Fri, Mar 18, 2016 at 3:04 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> by borrowing some of the code of .alloc_pdata, i.e.,
> the bits that perform initializations, leaving only
> actual allocations in there, when any, which is the
> case for Credit1 and RTDS.

I didn't follow the commit log.
I think the reason why we "have to" implement .init_pdata in all
schedulers is missing in the commit log.

> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index ac8019f..b6ac3ad 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -649,8 +649,8 @@ rt_deinit(struct scheduler *ops)
>   * Point per_cpu spinlock to the global system lock;
>   * All cpu have same global system lock
>   */
> -static void *
> -rt_alloc_pdata(const struct scheduler *ops, int cpu)
> +static void
> +rt_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
>  {
>      struct rt_private *prv = rt_priv(ops);
>      spinlock_t *old_lock;
> @@ -663,7 +663,11 @@ rt_alloc_pdata(const struct scheduler *ops, int cpu)
>
>      /* _Not_ pcpu_schedule_unlock(): per_cpu().schedule_lock changed! */
>      spin_unlock_irqrestore(old_lock, flags);
> +}
>
> +static void *
> +rt_alloc_pdata(const struct scheduler *ops, int cpu)
> +{
>      if ( !alloc_cpumask_var(&_cpumask_scratch[cpu]) )
>          return ERR_PTR(-ENOMEM);
>
> @@ -1395,6 +1399,7 @@ static const struct scheduler sched_rtds_def = {
>      .deinit         = rt_deinit,
>      .alloc_pdata    = rt_alloc_pdata,
>      .free_pdata     = rt_free_pdata,
> +    .init_pdata     = rt_init_pdata,
>      .alloc_domdata  = rt_alloc_domdata,
>      .free_domdata   = rt_free_domdata,
>      .init_domain    = rt_dom_init,
>

Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>

Thanks,

Meng

-- 
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 08/16] xen: sched: prepare a .switch_sched hook for Credit2
  2016-03-18 19:04 ` [PATCH 08/16] " Dario Faggioli
@ 2016-03-19  2:24   ` Meng Xu
  2016-03-21 14:25   ` Jan Beulich
  1 sibling, 0 replies; 61+ messages in thread
From: Meng Xu @ 2016-03-19  2:24 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel, Tianyang Chen

On Fri, Mar 18, 2016 at 3:04 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> RTDS is basically identical to Credit2, as far as scheduler
> lock (re)mapping is concerned. Therefore, the same analisys
> and considerations expressed for the previous patch ("xen:
> sched: prepare a .switch_sched hook for Credit2"), applies
> to it to.
>
> This patch, therefore, introduces the switch_sched hook
> for RTDS, as done already for Credit2 and Credit1.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Meng Xu <mengxu@cis.upenn.edu>
> Cc: Tianyang Chen <tiche@seas.upenn.edu>
> ---

Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>

-- 
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 09/16] xen: sched: close potential races when switching scheduler to CPUs
  2016-03-18 19:05 ` [PATCH 09/16] xen: sched: close potential races when switching scheduler to CPUs Dario Faggioli
@ 2016-03-19  2:25   ` Meng Xu
  2016-03-23 19:05   ` George Dunlap
  2016-03-24 12:14   ` George Dunlap
  2 siblings, 0 replies; 61+ messages in thread
From: Meng Xu @ 2016-03-19  2:25 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel, Tianyang Chen

On Fri, Mar 18, 2016 at 3:05 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> by using the sched_switch hook that we have introduced in
> the various schedulers.
>
> The key is to let the actual switch of scheduler and the
> remapping of the scheduler lock for the CPU (if necessary)
> happen together (in the same critical section) protected
> (at least) by the old scheduler lock for the CPU.
>
> This also means that, in Credit2 and RTDS, we can get rid
> of the code that was doing the scheduler lock remapping
> in csched2_free_pdata() and rt_free_pdata(), and of their
> triggering ASSERT-s.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Meng Xu <mengxu@cis.upenn.edu>
> Cc: Tianyang Chen <tiche@seas.upenn.edu>
> ---
>  xen/common/sched_credit.c  |    9 +++++++++
>  xen/common/sched_credit2.c |   28 ++++++++++------------------
>  xen/common/sched_rt.c      |   13 -------------
>  xen/common/schedule.c      |   30 +++++++++++++++++++++---------

For the sched_rt.c and schedule.c,
Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/16] xen: sched: make implementing .alloc_pdata optional
  2016-03-18 19:04 ` [PATCH 03/16] xen: sched: make implementing .alloc_pdata optional Dario Faggioli
  2016-03-19  2:23   ` Meng Xu
@ 2016-03-21 14:22   ` Jan Beulich
  2016-03-23 17:36     ` George Dunlap
  2016-03-21 14:48   ` Juergen Gross
  2016-03-23 17:38   ` George Dunlap
  3 siblings, 1 reply; 61+ messages in thread
From: Jan Beulich @ 2016-03-21 14:22 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Juergen Gross, George Dunlap, Robert VanVossen, Josh Whitehead,
	Meng Xu, xen-devel

>>> On 18.03.16 at 20:04, <dario.faggioli@citrix.com> wrote:
> --- a/xen/include/xen/sched-if.h
> +++ b/xen/include/xen/sched-if.h
> @@ -9,6 +9,7 @@
>  #define __XEN_SCHED_IF_H__
>  
>  #include <xen/percpu.h>
> +#include <xen/err.h>
>  
>  /* A global pointer to the initial cpupool (POOL0). */
>  extern struct cpupool *cpupool0;

There is no visible use in this header of what err.h defines - why
does it get included all of the sudden?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 08/16] xen: sched: prepare a .switch_sched hook for Credit2
  2016-03-18 19:04 ` [PATCH 08/16] " Dario Faggioli
  2016-03-19  2:24   ` Meng Xu
@ 2016-03-21 14:25   ` Jan Beulich
  1 sibling, 0 replies; 61+ messages in thread
From: Jan Beulich @ 2016-03-21 14:25 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel, Tianyang Chen, Meng Xu

>>> On 18.03.16 at 20:04, <dario.faggioli@citrix.com> wrote:
> RTDS is basically identical to Credit2, as far as scheduler
> lock (re)mapping is concerned. Therefore, the same analisys
> and considerations expressed for the previous patch ("xen:
> sched: prepare a .switch_sched hook for Credit2"), applies
> to it to.

Yet the title should still say RTDS ...

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/16] xen: sched: make implementing .alloc_pdata optional
  2016-03-18 19:04 ` [PATCH 03/16] xen: sched: make implementing .alloc_pdata optional Dario Faggioli
  2016-03-19  2:23   ` Meng Xu
  2016-03-21 14:22   ` Jan Beulich
@ 2016-03-21 14:48   ` Juergen Gross
  2016-03-21 15:07     ` Jan Beulich
  2016-03-23 17:38   ` George Dunlap
  3 siblings, 1 reply; 61+ messages in thread
From: Juergen Gross @ 2016-03-21 14:48 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: George Dunlap, Jan Beulich, Josh Whitehead, Meng Xu, Robert VanVossen

On 18/03/16 20:04, Dario Faggioli wrote:
> The .alloc_pdata scheduler hook must, before this change,
> be implemented by all schedulers --even those ones that
> don't need to allocate anything.
> 
> Make it possible to just use the SCHED_OP(), like for
> the other hooks, by using ERR_PTR() and IS_ERR() for
> error reporting. This:
>  - makes NULL a variant of success;
>  - allows for errors other than ENOMEM to be properly
>    communicated (if ever necessary).
> 
> This, in turn, means that schedulers not needing to
> allocate any per-pCPU data, can avoid implementing the
> hook. In fact, the artificial implementation of
> .alloc_pdata in the ARINC653 is removed (and, while there,
> nuke .free_pdata too, as it is equally useless).
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Robert VanVossen <robert.vanvossen@dornerworks.com>
> Cc: Josh Whitehead <josh.whitehead@dornerworks.com>
> Cc: Meng Xu <mengxu@cis.upenn.edu>
> Cc: Jan Beulich <JBeulich@suse.com>
> Cc: Juergen Gross <jgross@suse.com>
> ---
> Changes from v1:
>  * use IS_ERR() and friends to deal with the return value
>    of alloc_pdata, as suggested during review.
> ---
>  xen/common/sched_arinc653.c |   31 -------------------------------
>  xen/common/sched_credit.c   |    4 ++--
>  xen/common/sched_credit2.c  |    2 +-
>  xen/common/sched_rt.c       |    7 +++----
>  xen/common/schedule.c       |   18 ++++++++----------
>  xen/include/xen/sched-if.h  |    1 +
>  6 files changed, 15 insertions(+), 48 deletions(-)
> 
> diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
> index 8a11a2f..b79fcdf 100644
> --- a/xen/common/sched_arinc653.c
> +++ b/xen/common/sched_arinc653.c
> @@ -456,34 +456,6 @@ a653sched_free_vdata(const struct scheduler *ops, void *priv)
>  }
>  
>  /**
> - * This function allocates scheduler-specific data for a physical CPU
> - *
> - * We do not actually make use of any per-CPU data but the hypervisor expects
> - * a non-NULL return value
> - *
> - * @param ops       Pointer to this instance of the scheduler structure
> - *
> - * @return          Pointer to the allocated data
> - */
> -static void *
> -a653sched_alloc_pdata(const struct scheduler *ops, int cpu)
> -{
> -    /* return a non-NULL value to keep schedule.c happy */
> -    return SCHED_PRIV(ops);
> -}
> -
> -/**
> - * This function frees scheduler-specific data for a physical CPU
> - *
> - * @param ops       Pointer to this instance of the scheduler structure
> - */
> -static void
> -a653sched_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
> -{
> -    /* nop */
> -}
> -
> -/**
>   * This function allocates scheduler-specific data for a domain
>   *
>   * We do not actually make use of any per-domain data but the hypervisor
> @@ -737,9 +709,6 @@ static const struct scheduler sched_arinc653_def = {
>      .free_vdata     = a653sched_free_vdata,
>      .alloc_vdata    = a653sched_alloc_vdata,
>  
> -    .free_pdata     = a653sched_free_pdata,
> -    .alloc_pdata    = a653sched_alloc_pdata,
> -
>      .free_domdata   = a653sched_free_domdata,
>      .alloc_domdata  = a653sched_alloc_domdata,
>  
> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
> index 305889a..288749f 100644
> --- a/xen/common/sched_credit.c
> +++ b/xen/common/sched_credit.c
> @@ -532,12 +532,12 @@ csched_alloc_pdata(const struct scheduler *ops, int cpu)
>      /* Allocate per-PCPU info */
>      spc = xzalloc(struct csched_pcpu);
>      if ( spc == NULL )
> -        return NULL;
> +        return ERR_PTR(-ENOMEM);
>  
>      if ( !alloc_cpumask_var(&spc->balance_mask) )
>      {
>          xfree(spc);
> -        return NULL;
> +        return ERR_PTR(-ENOMEM);
>      }
>  
>      spin_lock_irqsave(&prv->lock, flags);
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 7ddad38..36dc9ee 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -2044,7 +2044,7 @@ csched2_alloc_pdata(const struct scheduler *ops, int cpu)
>          printk("%s: cpu %d not online yet, deferring initializatgion\n",
>                 __func__, cpu);
>  
> -    return (void *)1;
> +    return NULL;
>  }
>  
>  static void
> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index d98bfb6..ac8019f 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -665,7 +665,7 @@ rt_alloc_pdata(const struct scheduler *ops, int cpu)
>      spin_unlock_irqrestore(old_lock, flags);
>  
>      if ( !alloc_cpumask_var(&_cpumask_scratch[cpu]) )
> -        return NULL;
> +        return ERR_PTR(-ENOMEM);
>  
>      if ( prv->repl_timer == NULL )
>      {
> @@ -673,13 +673,12 @@ rt_alloc_pdata(const struct scheduler *ops, int cpu)
>          prv->repl_timer = xzalloc(struct timer);
>  
>          if ( prv->repl_timer == NULL )
> -            return NULL;
> +            return ERR_PTR(-ENOMEM);
>  
>          init_timer(prv->repl_timer, repl_timer_handler, (void *)ops, cpu);
>      }
>  
> -    /* 1 indicates alloc. succeed in schedule.c */
> -    return (void *)1;
> +    return NULL;
>  }
>  
>  static void
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 0627eb5..1adc0e2 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -1491,9 +1491,9 @@ static int cpu_schedule_up(unsigned int cpu)
>      if ( idle_vcpu[cpu] == NULL )
>          return -ENOMEM;
>  
> -    if ( (ops.alloc_pdata != NULL) &&
> -         ((sd->sched_priv = ops.alloc_pdata(&ops, cpu)) == NULL) )
> -        return -ENOMEM;
> +    sd->sched_priv = SCHED_OP(&ops, alloc_pdata, cpu);
> +    if ( IS_ERR(sd->sched_priv) )
> +        return PTR_ERR(sd->sched_priv);

Calling xfree() with an IS_ERR() value might be a bad idea.
Either you need to set sd->sched_priv to NULL in error case or you
modify xfree() to return immediately not only in the NULL case, but
in the IS_ERR() case as well.


Juergen

>  
>      return 0;
>  }
> @@ -1503,8 +1503,7 @@ static void cpu_schedule_down(unsigned int cpu)
>      struct schedule_data *sd = &per_cpu(schedule_data, cpu);
>      struct scheduler *sched = per_cpu(scheduler, cpu);
>  
> -    if ( sd->sched_priv != NULL )
> -        SCHED_OP(sched, free_pdata, sd->sched_priv, cpu);
> +    SCHED_OP(sched, free_pdata, sd->sched_priv, cpu);
>      SCHED_OP(sched, free_vdata, idle_vcpu[cpu]->sched_priv);
>  
>      idle_vcpu[cpu]->sched_priv = NULL;
> @@ -1599,9 +1598,8 @@ void __init scheduler_init(void)
>      idle_domain->max_vcpus = nr_cpu_ids;
>      if ( alloc_vcpu(idle_domain, 0, 0) == NULL )
>          BUG();
> -    if ( ops.alloc_pdata &&
> -         !(this_cpu(schedule_data).sched_priv = ops.alloc_pdata(&ops, 0)) )
> -        BUG();
> +    this_cpu(schedule_data).sched_priv = SCHED_OP(&ops, alloc_pdata, 0);
> +    BUG_ON(IS_ERR(this_cpu(schedule_data).sched_priv));
>      SCHED_OP(&ops, init_pdata, this_cpu(schedule_data).sched_priv, 0);
>  }
>  
> @@ -1644,8 +1642,8 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
>  
>      idle = idle_vcpu[cpu];
>      ppriv = SCHED_OP(new_ops, alloc_pdata, cpu);
> -    if ( ppriv == NULL )
> -        return -ENOMEM;
> +    if ( IS_ERR(ppriv) )
> +        return PTR_ERR(ppriv);
>      SCHED_OP(new_ops, init_pdata, ppriv, cpu);
>      vpriv = SCHED_OP(new_ops, alloc_vdata, idle, idle->domain->sched_priv);
>      if ( vpriv == NULL )
> diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
> index 70c08c6..560cba5 100644
> --- a/xen/include/xen/sched-if.h
> +++ b/xen/include/xen/sched-if.h
> @@ -9,6 +9,7 @@
>  #define __XEN_SCHED_IF_H__
>  
>  #include <xen/percpu.h>
> +#include <xen/err.h>
>  
>  /* A global pointer to the initial cpupool (POOL0). */
>  extern struct cpupool *cpupool0;
> 
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 10/16] xen: sched: improve credit2 bootparams' scope, placement and signedness
  2016-03-18 19:05 ` [PATCH 10/16] xen: sched: improve credit2 bootparams' scope, placement and signedness Dario Faggioli
@ 2016-03-21 14:51   ` Juergen Gross
  2016-03-24 12:20   ` George Dunlap
  1 sibling, 0 replies; 61+ messages in thread
From: Juergen Gross @ 2016-03-21 14:51 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Jan Beulich, Uma Sharma

On 18/03/16 20:05, Dario Faggioli wrote:
> From: Uma Sharma <uma.sharma523@gmail.com>
> 
> and, while we are adjusting signedness of opt_load_window_shift,
> make also prv->load_window_shift unsigned, as approapriate.
> 
> Signed-off-by: Uma Sharma <uma.sharma523@gmail.com>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Jan Beulich <JBeulich@suse.com>
> ---
> Patch has changed, so I'm not sticking any tag v1 received
> (namely, Juergen's Reviewed-by:).

Adding it again:

Reviewed-by: Juergen Gross <jgross@suse.com>

> ---
> Changes from v1:
>  * improve signedness as well, as requested during review.
> ---
>  xen/common/sched_credit2.c |   10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 64fb028..2fd4175 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -162,7 +162,7 @@
>  #define CSFLAG_runq_migrate_request (1<<__CSFLAG_runq_migrate_request)
>  
>  
> -int opt_migrate_resist=500;
> +static unsigned int __read_mostly opt_migrate_resist = 500;
>  integer_param("sched_credit2_migrate_resist", opt_migrate_resist);
>  
>  /*
> @@ -185,12 +185,12 @@ integer_param("sched_credit2_migrate_resist", opt_migrate_resist);
>   *   to a load of 1.
>   */
>  #define LOADAVG_GRANULARITY_SHIFT (10)
> -int opt_load_window_shift=18;
> +static unsigned int __read_mostly opt_load_window_shift = 18;
>  #define  LOADAVG_WINDOW_SHIFT_MIN 4
>  integer_param("credit2_load_window_shift", opt_load_window_shift);
> -int opt_underload_balance_tolerance=0;
> +static int __read_mostly opt_underload_balance_tolerance = 0;
>  integer_param("credit2_balance_under", opt_underload_balance_tolerance);
> -int opt_overload_balance_tolerance=-3;
> +static int __read_mostly opt_overload_balance_tolerance = -3;
>  integer_param("credit2_balance_over", opt_overload_balance_tolerance);
>  
>  /*
> @@ -227,7 +227,7 @@ struct csched2_private {
>      cpumask_t active_queues; /* Queues which may have active cpus */
>      struct csched2_runqueue_data rqd[NR_CPUS];
>  
> -    int load_window_shift;
> +    unsigned int load_window_shift;
>  };
>  
>  /*
> 
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/16] xen: sched: make implementing .alloc_pdata optional
  2016-03-21 14:48   ` Juergen Gross
@ 2016-03-21 15:07     ` Jan Beulich
  2016-04-01 17:01       ` Dario Faggioli
  0 siblings, 1 reply; 61+ messages in thread
From: Jan Beulich @ 2016-03-21 15:07 UTC (permalink / raw)
  To: Dario Faggioli, Juergen Gross
  Cc: George Dunlap, xen-devel, Josh Whitehead, Meng Xu, Robert VanVossen

>>> On 21.03.16 at 15:48, <JGross@suse.com> wrote:
> On 18/03/16 20:04, Dario Faggioli wrote:
>> --- a/xen/common/schedule.c
>> +++ b/xen/common/schedule.c
>> @@ -1491,9 +1491,9 @@ static int cpu_schedule_up(unsigned int cpu)
>>      if ( idle_vcpu[cpu] == NULL )
>>          return -ENOMEM;
>>  
>> -    if ( (ops.alloc_pdata != NULL) &&
>> -         ((sd->sched_priv = ops.alloc_pdata(&ops, cpu)) == NULL) )
>> -        return -ENOMEM;
>> +    sd->sched_priv = SCHED_OP(&ops, alloc_pdata, cpu);
>> +    if ( IS_ERR(sd->sched_priv) )
>> +        return PTR_ERR(sd->sched_priv);
> 
> Calling xfree() with an IS_ERR() value might be a bad idea.
> Either you need to set sd->sched_priv to NULL in error case or you
> modify xfree() to return immediately not only in the NULL case, but
> in the IS_ERR() case as well.

The latter option is a no-go imo.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 13/16] xen: sched: allow for choosing credit2 runqueues configuration at boot
  2016-03-18 19:05 ` [PATCH 13/16] xen: sched: allow for choosing credit2 runqueues configuration at boot Dario Faggioli
@ 2016-03-22  7:46   ` Juergen Gross
  2016-03-24 12:36   ` George Dunlap
  1 sibling, 0 replies; 61+ messages in thread
From: Juergen Gross @ 2016-03-22  7:46 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Uma Sharma

On 18/03/16 20:05, Dario Faggioli wrote:
> In fact, credit2 uses CPU topology to decide how to arrange
> its internal runqueues. Before this change, only 'one runqueue
> per socket' was allowed. However, experiments have shown that,
> for instance, having one runqueue per physical core improves
> performance, especially in case hyperthreading is available.
> 
> In general, it makes sense to allow users to pick one runqueue
> arrangement at boot time, so that:
>  - more experiments can be easily performed to even better
>    assess and improve performance;
>  - one can select the best configuration for his specific
>    use case and/or hardware.
> 
> This patch enables the above.
> 
> Note that, for correctly arranging runqueues to be per-core,
> just checking cpu_to_core() on the host CPUs is not enough.
> In fact, cores (and hyperthreads) on different sockets, can
> have the same core (and thread) IDs! We, therefore, need to
> check whether the full topology of two CPUs matches, for
> them to be put in the same runqueue.
> 
> Note also that the default (although not functional) for
> credit2, since now, has been per-socket runqueue. This patch
> leaves things that way, to avoid mixing policy and technical
> changes.
> 
> Finally, it would be a nice feature to be able to select
> a particular runqueue arrangement, even when creating a
> Credit2 cpupool. This is left as future work.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Signed-off-by: Uma Sharma <uma.sharma523@gmail.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Uma Sharma <uma.sharma523@gmail.com>
> Cc: Juergen Gross <jgross@suse.com>
> ---
> Cahnges from v1:
>  * added 'node' and 'global' runqueue arrangements, as
>    suggested during review;
> ---
>  docs/misc/xen-command-line.markdown |   19 +++++++++
>  xen/common/sched_credit2.c          |   76 +++++++++++++++++++++++++++++++++--
>  2 files changed, 90 insertions(+), 5 deletions(-)
> 
> diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
> index ca77e3b..0047f94 100644
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -469,6 +469,25 @@ combination with the `low_crashinfo` command line option.
>  ### credit2\_load\_window\_shift
>  > `= <integer>`
>  
> +### credit2\_runqueue
> +> `= core | socket | node | all`
> +
> +> Default: `socket`
> +
> +Specify how host CPUs are arranged in runqueues. Runqueues are kept
> +balanced with respect to the load generated by the vCPUs running on
> +them. Smaller runqueues (as in with `core`) means more accurate load
> +balancing (for instance, it will deal better with hyperthreading),
> +but also more overhead.
> +
> +Available alternatives, with their meaning, are:
> +* `core`: one runqueue per each physical core of the host;
> +* `socket`: one runqueue per each physical socket (which often,
> +            but not always, matches a NUMA node) of the host;
> +* `node`: one runqueue per each NUMA node of the host;
> +* `all`: just one runqueue shared by all the logical pCPUs of
> +         the host
> +
>  ### dbgp
>  > `= ehci[ <integer> | @pci<bus>:<slot>.<func> ]`
>  
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 456b9ea..c242dc4 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -81,10 +81,6 @@
>   * Credits are "reset" when the next vcpu in the runqueue is less than
>   * or equal to zero.  At that point, everyone's credits are "clipped"
>   * to a small value, and a fixed credit is added to everyone.
> - *
> - * The plan is for all cores that share an L2 will share the same
> - * runqueue.  At the moment, there is one global runqueue for all
> - * cores.
>   */
>  
>  /*
> @@ -193,6 +189,55 @@ static int __read_mostly opt_overload_balance_tolerance = -3;
>  integer_param("credit2_balance_over", opt_overload_balance_tolerance);
>  
>  /*
> + * Runqueue organization.
> + *
> + * The various cpus are to be assigned each one to a runqueue, and we
> + * want that to happen basing on topology. At the moment, it is possible
> + * to choose to arrange runqueues to be:
> + *
> + * - per-core: meaning that there will be one runqueue per each physical
> + *             core of the host. This will happen if the opt_runqueue
> + *             parameter is set to 'core';
> + *
> + * - per-node: meaning that there will be one runqueue per each physical
> + *             NUMA node of the host. This will happen if the opt_runqueue
> + *             parameter is set to 'node';
> + *
> + * - per-socket: meaning that there will be one runqueue per each physical
> + *               socket (AKA package, which often, but not always, also
> + *               matches a NUMA node) of the host; This will happen if
> + *               the opt_runqueue parameter is set to 'socket';
> + *
> + * - global: meaning that there will be only one runqueue to which all the
> + *           (logical) processors of the host belongs. This will happen if
> + *           the opt_runqueue parameter is set to 'all'.
> + *
> + * Depending on the value of opt_runqueue, therefore, cpus that are part of
> + * either the same physical core, or of the same physical socket, will be
> + * put together to form runqueues.
> + */
> +#define OPT_RUNQUEUE_CORE   1
> +#define OPT_RUNQUEUE_SOCKET 2
> +#define OPT_RUNQUEUE_NODE   3
> +#define OPT_RUNQUEUE_ALL    4
> +static int __read_mostly opt_runqueue = OPT_RUNQUEUE_SOCKET;
> +
> +static void parse_credit2_runqueue(const char *s)
> +{
> +    if ( !strncmp(s, "core", 4) && !s[4] )
> +        opt_runqueue = OPT_RUNQUEUE_CORE;
> +    else if ( !strncmp(s, "socket", 6) && !s[6] )
> +        opt_runqueue = OPT_RUNQUEUE_SOCKET;
> +    else if ( !strncmp(s, "node", 4) && !s[4] )
> +        opt_runqueue = OPT_RUNQUEUE_NODE;
> +    else if ( !strncmp(s, "all", 6) && !s[6] )

The length is wrong. Should be 3 instead of 6 here.

Which poses the question: why don't you use strcmp() here? I don't see
any advantage using strncmp() in this case, especially as you've just
proven it is more error prone here.

> +        opt_runqueue = OPT_RUNQUEUE_ALL;
> +    else
> +        printk("WARNING, unrecognized value of credit2_runqueue option!\n");
> +}
> +custom_param("credit2_runqueue", parse_credit2_runqueue);
> +
> +/*
>   * Per-runqueue data
>   */
>  struct csched2_runqueue_data {
> @@ -1971,6 +2016,22 @@ static void deactivate_runqueue(struct csched2_private *prv, int rqi)
>      cpumask_clear_cpu(rqi, &prv->active_queues);
>  }
>  
> +static inline bool_t same_node(unsigned int cpua, unsigned int cpub)
> +{
> +    return cpu_to_node(cpua) == cpu_to_node(cpub);
> +}
> +
> +static inline bool_t same_socket(unsigned int cpua, unsigned int cpub)
> +{
> +    return cpu_to_socket(cpua) == cpu_to_socket(cpub);
> +}
> +
> +static inline bool_t same_core(unsigned int cpua, unsigned int cpub)
> +{
> +    return same_socket(cpua, cpub) &&
> +           cpu_to_core(cpua) == cpu_to_core(cpub);
> +}
> +
>  static unsigned int
>  cpu_to_runqueue(struct csched2_private *prv, unsigned int cpu)
>  {
> @@ -2003,7 +2064,10 @@ cpu_to_runqueue(struct csched2_private *prv, unsigned int cpu)
>          BUG_ON(cpu_to_socket(cpu) == XEN_INVALID_SOCKET_ID ||
>                 cpu_to_socket(peer_cpu) == XEN_INVALID_SOCKET_ID);
>  
> -        if ( cpu_to_socket(cpumask_first(&rqd->active)) == cpu_to_socket(cpu) )
> +        if ( opt_runqueue == OPT_RUNQUEUE_ALL ||
> +             (opt_runqueue == OPT_RUNQUEUE_CORE && same_core(peer_cpu, cpu)) ||
> +             (opt_runqueue == OPT_RUNQUEUE_SOCKET && same_socket(peer_cpu, cpu)) ||
> +             (opt_runqueue == OPT_RUNQUEUE_NODE && same_node(peer_cpu, cpu)) )
>              break;
>      }
>  
> @@ -2157,6 +2221,8 @@ csched2_init(struct scheduler *ops)
>      printk(" load_window_shift: %d\n", opt_load_window_shift);
>      printk(" underload_balance_tolerance: %d\n", opt_underload_balance_tolerance);
>      printk(" overload_balance_tolerance: %d\n", opt_overload_balance_tolerance);
> +    printk(" runqueues arrangement: per-%s\n",
> +           opt_runqueue == OPT_RUNQUEUE_CORE ? "core" : "socket");

node? all?

>  
>      if ( opt_load_window_shift < LOADAVG_WINDOW_SHIFT_MIN )
>      {

Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 04/16] xen: sched: implement .init_pdata in all schedulers
  2016-03-18 19:04 ` [PATCH 04/16] xen: sched: implement .init_pdata in all schedulers Dario Faggioli
  2016-03-19  2:24   ` Meng Xu
@ 2016-03-22  8:03   ` Juergen Gross
  2016-03-23 17:46     ` George Dunlap
  1 sibling, 1 reply; 61+ messages in thread
From: Juergen Gross @ 2016-03-22  8:03 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Meng Xu

On 18/03/16 20:04, Dario Faggioli wrote:
> by borrowing some of the code of .alloc_pdata, i.e.,
> the bits that perform initializations, leaving only
> actual allocations in there, when any, which is the
> case for Credit1 and RTDS.
> 
> On the other hand, in Credit2, since we don't really
> need any per-pCPU data allocation, everything that was
> being done in .alloc_pdata, is now done in .init_pdata.
> And the fact that now .alloc_pdata can be left undefined,
> allows us to just get rid of it.
> 
> Still for Credit2, the fact that .init_pdata is called
> during CPU_STARTING (rather than CPU_UP_PREPARE) kills
> the need for the scheduler to setup a similar callback
> itself, simplifying the code.
> 
> And thanks to such simplification, it is now also ok to
> move some of the logic meant at double checking that a
> cpu was (or was not) initialized, into ASSERTS (rather
> than an if() and a BUG_ON).
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Meng Xu <mengxu@cis.upenn.edu>
> Cc: Juergen Gross <jgross@suse.com>
> ---
>  xen/common/sched_credit.c  |   20 +++++++++---
>  xen/common/sched_credit2.c |   72 +++-----------------------------------------
>  xen/common/sched_rt.c      |    9 ++++--
>  3 files changed, 26 insertions(+), 75 deletions(-)
> 
> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
> index 288749f..d4a0f5e 100644
> --- a/xen/common/sched_credit.c
> +++ b/xen/common/sched_credit.c
> @@ -526,8 +526,6 @@ static void *
>  csched_alloc_pdata(const struct scheduler *ops, int cpu)
>  {
>      struct csched_pcpu *spc;
> -    struct csched_private *prv = CSCHED_PRIV(ops);
> -    unsigned long flags;
>  
>      /* Allocate per-PCPU info */
>      spc = xzalloc(struct csched_pcpu);
> @@ -540,6 +538,19 @@ csched_alloc_pdata(const struct scheduler *ops, int cpu)
>          return ERR_PTR(-ENOMEM);
>      }
>  
> +    return spc;
> +}
> +
> +static void
> +csched_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
> +{
> +    struct csched_private *prv = CSCHED_PRIV(ops);
> +    struct csched_pcpu * const spc = pdata;
> +    unsigned long flags;
> +
> +    /* cpu data needs to be allocated, but STILL uninitialized */
> +    ASSERT(spc && spc->runq.next == spc->runq.prev && spc->runq.next == NULL);

This looks weird. I'd prefer:

ASSERT(spc && spc->runq.next == NULL && spc->runq.prev == NULL);

> +
>      spin_lock_irqsave(&prv->lock, flags);
>  
>      /* Initialize/update system-wide config */
> @@ -560,16 +571,12 @@ csched_alloc_pdata(const struct scheduler *ops, int cpu)
>      INIT_LIST_HEAD(&spc->runq);
>      spc->runq_sort_last = prv->runq_sort;
>      spc->idle_bias = nr_cpu_ids - 1;
> -    if ( per_cpu(schedule_data, cpu).sched_priv == NULL )
> -        per_cpu(schedule_data, cpu).sched_priv = spc;
>  
>      /* Start off idling... */
>      BUG_ON(!is_idle_vcpu(curr_on_cpu(cpu)));
>      cpumask_set_cpu(cpu, prv->idlers);
>  
>      spin_unlock_irqrestore(&prv->lock, flags);
> -
> -    return spc;
>  }
>  
>  #ifndef NDEBUG
> @@ -2051,6 +2058,7 @@ static const struct scheduler sched_credit_def = {
>      .alloc_vdata    = csched_alloc_vdata,
>      .free_vdata     = csched_free_vdata,
>      .alloc_pdata    = csched_alloc_pdata,
> +    .init_pdata     = csched_init_pdata,
>      .free_pdata     = csched_free_pdata,
>      .alloc_domdata  = csched_alloc_domdata,
>      .free_domdata   = csched_free_domdata,
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 36dc9ee..b1642a8 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -1968,7 +1968,8 @@ static void deactivate_runqueue(struct csched2_private *prv, int rqi)
>      cpumask_clear_cpu(rqi, &prv->active_queues);
>  }
>  
> -static void init_pcpu(const struct scheduler *ops, int cpu)
> +static void
> +csched2_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
>  {
>      unsigned rqi;
>      unsigned long flags;
> @@ -1978,12 +1979,7 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
>  
>      spin_lock_irqsave(&prv->lock, flags);
>  
> -    if ( cpumask_test_cpu(cpu, &prv->initialized) )
> -    {
> -        printk("%s: Strange, cpu %d already initialized!\n", __func__, cpu);
> -        spin_unlock_irqrestore(&prv->lock, flags);
> -        return;
> -    }
> +    ASSERT(!cpumask_test_cpu(cpu, &prv->initialized));
>  
>      /* Figure out which runqueue to put it in */
>      rqi = 0;
> @@ -2033,20 +2029,6 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
>      return;
>  }
>  
> -static void *
> -csched2_alloc_pdata(const struct scheduler *ops, int cpu)
> -{
> -    /* Check to see if the cpu is online yet */
> -    /* Note: cpu 0 doesn't get a STARTING callback */

Sorry if I missed it in an other patch: Where is the init_pdata hook
being called for cpu 0?


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 02/16] xen: sched: add .init_pdata hook to the scheduler interface
  2016-03-18 19:04 ` [PATCH 02/16] xen: sched: add .init_pdata hook to the scheduler interface Dario Faggioli
@ 2016-03-22  8:08   ` Juergen Gross
  2016-03-23 17:32   ` George Dunlap
  1 sibling, 0 replies; 61+ messages in thread
From: Juergen Gross @ 2016-03-22  8:08 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap

On 18/03/16 20:04, Dario Faggioli wrote:
> with the purpose of decoupling the allocation phase and
> the initialization one, for per-pCPU data of the schedulers.
> 
> This makes it possible to perform the initialization later
> in the pCPU bringup/assignement process, when more information
> (for instance, the host CPU topology) are available. This,
> for now, is important only for Credit2, but it can well be
> useful to other schedulers.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Juergen Gross <jgross@suse.com>
> ---
> Changes from v1:
>  * in schedule_cpu_switch(), call to init_pdata() moved up,
>    close to the call to alloc_pdata() (for consistency with
>    other call sites) and prototype slightly changed.
> ---
> During v1 review, it was agreed to add ASSERTS() and comments
> to clarify the use of schedule_cpu_switch(). This can't be
> found here, but only because it has happened in another patch.
> ---
>  xen/common/schedule.c      |    7 +++++++
>  xen/include/xen/sched-if.h |    1 +
>  2 files changed, 8 insertions(+)
> 
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index e57b659..0627eb5 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -1517,10 +1517,15 @@ static int cpu_schedule_callback(
>      struct notifier_block *nfb, unsigned long action, void *hcpu)
>  {
>      unsigned int cpu = (unsigned long)hcpu;
> +    struct scheduler *sched = per_cpu(scheduler, cpu);
> +    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
>      int rc = 0;
>  
>      switch ( action )
>      {
> +    case CPU_STARTING:
> +        SCHED_OP(sched, init_pdata, sd->sched_priv, cpu);
> +        break;
>      case CPU_UP_PREPARE:
>          rc = cpu_schedule_up(cpu);
>          break;
> @@ -1597,6 +1602,7 @@ void __init scheduler_init(void)
>      if ( ops.alloc_pdata &&
>           !(this_cpu(schedule_data).sched_priv = ops.alloc_pdata(&ops, 0)) )
>          BUG();
> +    SCHED_OP(&ops, init_pdata, this_cpu(schedule_data).sched_priv, 0);

Aah, here is the init_pdata call I missed in patch 3. Sorry for the
noise.

>  }
>  
>  /*
> @@ -1640,6 +1646,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
>      ppriv = SCHED_OP(new_ops, alloc_pdata, cpu);
>      if ( ppriv == NULL )
>          return -ENOMEM;
> +    SCHED_OP(new_ops, init_pdata, ppriv, cpu);
>      vpriv = SCHED_OP(new_ops, alloc_vdata, idle, idle->domain->sched_priv);
>      if ( vpriv == NULL )
>      {
> diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
> index 825f1ad..70c08c6 100644
> --- a/xen/include/xen/sched-if.h
> +++ b/xen/include/xen/sched-if.h
> @@ -133,6 +133,7 @@ struct scheduler {
>                                      void *);
>      void         (*free_pdata)     (const struct scheduler *, void *, int);
>      void *       (*alloc_pdata)    (const struct scheduler *, int);
> +    void         (*init_pdata)     (const struct scheduler *, void *, int);
>      void         (*free_domdata)   (const struct scheduler *, void *);
>      void *       (*alloc_domdata)  (const struct scheduler *, struct domain *);

Reviewed-by: Juergen Gross <jgross@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 01/16] xen: sched: fix locking when allocating an RTDS pCPU
  2016-03-18 19:04 ` [PATCH 01/16] xen: sched: fix locking when allocating an RTDS pCPU Dario Faggioli
  2016-03-19  2:22   ` Meng Xu
@ 2016-03-23 15:37   ` George Dunlap
  1 sibling, 0 replies; 61+ messages in thread
From: George Dunlap @ 2016-03-23 15:37 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Tianyang Chen, Meng Xu

On 18/03/16 19:04, Dario Faggioli wrote:
> as doing that include changing the scheduler lock
> mapping for the pCPU itself, and the correct way
> of doing that is:
>  - take the lock that the pCPU is using right now
>    (which may be the lock of another scheduler);
>  - change the mapping of the lock to the RTDS one;
>  - release the lock (the one that has actually been
>    taken!)
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Reviewed-by: George Dunlap <george.dunlap@citrix.com>

> ---
> Cc: Meng Xu <mengxu@cis.upenn.edu>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Tianyang Chen <tiche@seas.upenn.edu>
> ---
>  xen/common/sched_rt.c |    9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index c896a6f..d98bfb6 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -653,11 +653,16 @@ static void *
>  rt_alloc_pdata(const struct scheduler *ops, int cpu)
>  {
>      struct rt_private *prv = rt_priv(ops);
> +    spinlock_t *old_lock;
>      unsigned long flags;
>  
> -    spin_lock_irqsave(&prv->lock, flags);
> +    /* Move the scheduler lock to our global runqueue lock.  */
> +    old_lock = pcpu_schedule_lock_irqsave(cpu, &flags);
> +
>      per_cpu(schedule_data, cpu).schedule_lock = &prv->lock;
> -    spin_unlock_irqrestore(&prv->lock, flags);
> +
> +    /* _Not_ pcpu_schedule_unlock(): per_cpu().schedule_lock changed! */
> +    spin_unlock_irqrestore(old_lock, flags);
>  
>      if ( !alloc_cpumask_var(&_cpumask_scratch[cpu]) )
>          return NULL;
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 02/16] xen: sched: add .init_pdata hook to the scheduler interface
  2016-03-18 19:04 ` [PATCH 02/16] xen: sched: add .init_pdata hook to the scheduler interface Dario Faggioli
  2016-03-22  8:08   ` Juergen Gross
@ 2016-03-23 17:32   ` George Dunlap
  1 sibling, 0 replies; 61+ messages in thread
From: George Dunlap @ 2016-03-23 17:32 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Juergen Gross

On 18/03/16 19:04, Dario Faggioli wrote:
> with the purpose of decoupling the allocation phase and
> the initialization one, for per-pCPU data of the schedulers.
> 
> This makes it possible to perform the initialization later
> in the pCPU bringup/assignement process, when more information
> (for instance, the host CPU topology) are available. This,
> for now, is important only for Credit2, but it can well be
> useful to other schedulers.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Reviewed-by: George Dunlap <george.dunlap@citrix.com>

Just one note -- I found this patch harder to review than necessary, I
think, because it implemented the callback but nobody was using it.  I
had to keep switching back and forth between the patches to find out
what was going on.  I personally would have folded patches 2 and 4 together.

(Just to be clear, no action necessary.)

> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Juergen Gross <jgross@suse.com>
> ---
> Changes from v1:
>  * in schedule_cpu_switch(), call to init_pdata() moved up,
>    close to the call to alloc_pdata() (for consistency with
>    other call sites) and prototype slightly changed.
> ---
> During v1 review, it was agreed to add ASSERTS() and comments
> to clarify the use of schedule_cpu_switch(). This can't be
> found here, but only because it has happened in another patch.
> ---
>  xen/common/schedule.c      |    7 +++++++
>  xen/include/xen/sched-if.h |    1 +
>  2 files changed, 8 insertions(+)
> 
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index e57b659..0627eb5 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -1517,10 +1517,15 @@ static int cpu_schedule_callback(
>      struct notifier_block *nfb, unsigned long action, void *hcpu)
>  {
>      unsigned int cpu = (unsigned long)hcpu;
> +    struct scheduler *sched = per_cpu(scheduler, cpu);
> +    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
>      int rc = 0;
>  
>      switch ( action )
>      {
> +    case CPU_STARTING:
> +        SCHED_OP(sched, init_pdata, sd->sched_priv, cpu);
> +        break;
>      case CPU_UP_PREPARE:
>          rc = cpu_schedule_up(cpu);
>          break;
> @@ -1597,6 +1602,7 @@ void __init scheduler_init(void)
>      if ( ops.alloc_pdata &&
>           !(this_cpu(schedule_data).sched_priv = ops.alloc_pdata(&ops, 0)) )
>          BUG();
> +    SCHED_OP(&ops, init_pdata, this_cpu(schedule_data).sched_priv, 0);
>  }
>  
>  /*
> @@ -1640,6 +1646,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
>      ppriv = SCHED_OP(new_ops, alloc_pdata, cpu);
>      if ( ppriv == NULL )
>          return -ENOMEM;
> +    SCHED_OP(new_ops, init_pdata, ppriv, cpu);
>      vpriv = SCHED_OP(new_ops, alloc_vdata, idle, idle->domain->sched_priv);
>      if ( vpriv == NULL )
>      {
> diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
> index 825f1ad..70c08c6 100644
> --- a/xen/include/xen/sched-if.h
> +++ b/xen/include/xen/sched-if.h
> @@ -133,6 +133,7 @@ struct scheduler {
>                                      void *);
>      void         (*free_pdata)     (const struct scheduler *, void *, int);
>      void *       (*alloc_pdata)    (const struct scheduler *, int);
> +    void         (*init_pdata)     (const struct scheduler *, void *, int);
>      void         (*free_domdata)   (const struct scheduler *, void *);
>      void *       (*alloc_domdata)  (const struct scheduler *, struct domain *);
>  
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/16] xen: sched: make implementing .alloc_pdata optional
  2016-03-21 14:22   ` Jan Beulich
@ 2016-03-23 17:36     ` George Dunlap
  2016-03-24  9:43       ` Jan Beulich
  0 siblings, 1 reply; 61+ messages in thread
From: George Dunlap @ 2016-03-23 17:36 UTC (permalink / raw)
  To: Jan Beulich, Dario Faggioli
  Cc: Juergen Gross, George Dunlap, Robert VanVossen, Josh Whitehead,
	Meng Xu, xen-devel

On 21/03/16 14:22, Jan Beulich wrote:
>>>> On 18.03.16 at 20:04, <dario.faggioli@citrix.com> wrote:
>> --- a/xen/include/xen/sched-if.h
>> +++ b/xen/include/xen/sched-if.h
>> @@ -9,6 +9,7 @@
>>  #define __XEN_SCHED_IF_H__
>>  
>>  #include <xen/percpu.h>
>> +#include <xen/err.h>
>>  
>>  /* A global pointer to the initial cpupool (POOL0). */
>>  extern struct cpupool *cpupool0;
> 
> There is no visible use in this header of what err.h defines - why
> does it get included all of the sudden?

I'm guessing it's so that all the files that use the scheduler interface
automatically get IS_ERR and PTR_ERR without having to include xen/err.h
directly.

But of course that means files like sched_arinc653.c and sched_credit2.c
end up including xen/err.c even though they don't use those macros.
Would you prefer the other files include it directly instead?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/16] xen: sched: make implementing .alloc_pdata optional
  2016-03-18 19:04 ` [PATCH 03/16] xen: sched: make implementing .alloc_pdata optional Dario Faggioli
                     ` (2 preceding siblings ...)
  2016-03-21 14:48   ` Juergen Gross
@ 2016-03-23 17:38   ` George Dunlap
  3 siblings, 0 replies; 61+ messages in thread
From: George Dunlap @ 2016-03-23 17:38 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: Juergen Gross, George Dunlap, Robert VanVossen, Josh Whitehead,
	Meng Xu, Jan Beulich

On 18/03/16 19:04, Dario Faggioli wrote:
> The .alloc_pdata scheduler hook must, before this change,
> be implemented by all schedulers --even those ones that
> don't need to allocate anything.
> 
> Make it possible to just use the SCHED_OP(), like for
> the other hooks, by using ERR_PTR() and IS_ERR() for
> error reporting. This:
>  - makes NULL a variant of success;
>  - allows for errors other than ENOMEM to be properly
>    communicated (if ever necessary).
> 
> This, in turn, means that schedulers not needing to
> allocate any per-pCPU data, can avoid implementing the
> hook. In fact, the artificial implementation of
> .alloc_pdata in the ARINC653 is removed (and, while there,
> nuke .free_pdata too, as it is equally useless).
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

With the xfree issue Juergen pointed out fixed, this looks good to me.

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 04/16] xen: sched: implement .init_pdata in all schedulers
  2016-03-22  8:03   ` Juergen Gross
@ 2016-03-23 17:46     ` George Dunlap
  0 siblings, 0 replies; 61+ messages in thread
From: George Dunlap @ 2016-03-23 17:46 UTC (permalink / raw)
  To: Juergen Gross, Dario Faggioli, xen-devel; +Cc: George Dunlap, Meng Xu

On 22/03/16 08:03, Juergen Gross wrote:
> On 18/03/16 20:04, Dario Faggioli wrote:
>> by borrowing some of the code of .alloc_pdata, i.e.,
>> the bits that perform initializations, leaving only
>> actual allocations in there, when any, which is the
>> case for Credit1 and RTDS.
>>
>> On the other hand, in Credit2, since we don't really
>> need any per-pCPU data allocation, everything that was
>> being done in .alloc_pdata, is now done in .init_pdata.
>> And the fact that now .alloc_pdata can be left undefined,
>> allows us to just get rid of it.
>>
>> Still for Credit2, the fact that .init_pdata is called
>> during CPU_STARTING (rather than CPU_UP_PREPARE) kills
>> the need for the scheduler to setup a similar callback
>> itself, simplifying the code.
>>
>> And thanks to such simplification, it is now also ok to
>> move some of the logic meant at double checking that a
>> cpu was (or was not) initialized, into ASSERTS (rather
>> than an if() and a BUG_ON).
>>
>> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
>> ---
>> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>> Cc: Meng Xu <mengxu@cis.upenn.edu>
>> Cc: Juergen Gross <jgross@suse.com>
>> ---
>>  xen/common/sched_credit.c  |   20 +++++++++---
>>  xen/common/sched_credit2.c |   72 +++-----------------------------------------
>>  xen/common/sched_rt.c      |    9 ++++--
>>  3 files changed, 26 insertions(+), 75 deletions(-)
>>
>> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
>> index 288749f..d4a0f5e 100644
>> --- a/xen/common/sched_credit.c
>> +++ b/xen/common/sched_credit.c
>> @@ -526,8 +526,6 @@ static void *
>>  csched_alloc_pdata(const struct scheduler *ops, int cpu)
>>  {
>>      struct csched_pcpu *spc;
>> -    struct csched_private *prv = CSCHED_PRIV(ops);
>> -    unsigned long flags;
>>  
>>      /* Allocate per-PCPU info */
>>      spc = xzalloc(struct csched_pcpu);
>> @@ -540,6 +538,19 @@ csched_alloc_pdata(const struct scheduler *ops, int cpu)
>>          return ERR_PTR(-ENOMEM);
>>      }
>>  
>> +    return spc;
>> +}
>> +
>> +static void
>> +csched_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
>> +{
>> +    struct csched_private *prv = CSCHED_PRIV(ops);
>> +    struct csched_pcpu * const spc = pdata;
>> +    unsigned long flags;
>> +
>> +    /* cpu data needs to be allocated, but STILL uninitialized */
>> +    ASSERT(spc && spc->runq.next == spc->runq.prev && spc->runq.next == NULL);
> 
> This looks weird. I'd prefer:
> 
> ASSERT(spc && spc->runq.next == NULL && spc->runq.prev == NULL);

I prefer Juergen's suggestion too.  I wouldn't say it's worth respinning
over, but since you have to make adjustments to the previous patch
anyway, you might as well change this while you're at it.

With that change:

Reviewed-by: George Dunlap <george.dunlap@citrix.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 05/16] xen: sched: move pCPU initialization in an helper
  2016-03-18 19:04 ` [PATCH 05/16] xen: sched: move pCPU initialization in an helper Dario Faggioli
@ 2016-03-23 17:51   ` George Dunlap
  2016-03-23 18:09     ` George Dunlap
  2016-03-24 13:21     ` Dario Faggioli
  0 siblings, 2 replies; 61+ messages in thread
From: George Dunlap @ 2016-03-23 17:51 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Juergen Gross

On 18/03/16 19:04, Dario Faggioli wrote:
> That will turn out useful in following patches, where such
> code will need to be called more than just once. Create an
> helper now, and move the code there, to avoid mixing code
> motion and functional changes later.
> 
> In Credit2, some style cleanup is also done.
> 
> No functional change intended.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> ---
>  xen/common/sched_credit.c  |   22 +++++++++++++---------
>  xen/common/sched_credit2.c |   26 ++++++++++++++++----------
>  2 files changed, 29 insertions(+), 19 deletions(-)
> 
> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
> index d4a0f5e..4488d7c 100644
> --- a/xen/common/sched_credit.c
> +++ b/xen/common/sched_credit.c
> @@ -542,16 +542,11 @@ csched_alloc_pdata(const struct scheduler *ops, int cpu)
>  }
>  
>  static void
> -csched_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
> +init_pdata(struct csched_private *prv, struct csched_pcpu *spc, int cpu)
>  {
> -    struct csched_private *prv = CSCHED_PRIV(ops);
> -    struct csched_pcpu * const spc = pdata;
> -    unsigned long flags;
> -
> -    /* cpu data needs to be allocated, but STILL uninitialized */
> -    ASSERT(spc && spc->runq.next == spc->runq.prev && spc->runq.next == NULL);
> -
> -    spin_lock_irqsave(&prv->lock, flags);
> +    ASSERT(spin_is_locked(&prv->lock));
> +    /* cpu data needs to be allocated, but STILL uninitialized. */
> +    ASSERT(spc && spc->runq.next == NULL && spc->runq.prev == NULL);

Actually, Juergen, looks like Dario already agrees with us. ;-)

Obviously this should be updated in the previous patch instead.

With that done:

Reviewed-by: George Dunlap <george.dunlap@citrix.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 05/16] xen: sched: move pCPU initialization in an helper
  2016-03-23 17:51   ` George Dunlap
@ 2016-03-23 18:09     ` George Dunlap
  2016-03-24 13:21     ` Dario Faggioli
  1 sibling, 0 replies; 61+ messages in thread
From: George Dunlap @ 2016-03-23 18:09 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Juergen Gross

On 23/03/16 17:51, George Dunlap wrote:
> On 18/03/16 19:04, Dario Faggioli wrote:
>> That will turn out useful in following patches, where such
>> code will need to be called more than just once. Create an
>> helper now, and move the code there, to avoid mixing code
>> motion and functional changes later.
>>
>> In Credit2, some style cleanup is also done.
>>
>> No functional change intended.
>>
>> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
>> ---
>> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>> ---
>>  xen/common/sched_credit.c  |   22 +++++++++++++---------
>>  xen/common/sched_credit2.c |   26 ++++++++++++++++----------
>>  2 files changed, 29 insertions(+), 19 deletions(-)
>>
>> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
>> index d4a0f5e..4488d7c 100644
>> --- a/xen/common/sched_credit.c
>> +++ b/xen/common/sched_credit.c
>> @@ -542,16 +542,11 @@ csched_alloc_pdata(const struct scheduler *ops, int cpu)
>>  }
>>  
>>  static void
>> -csched_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
>> +init_pdata(struct csched_private *prv, struct csched_pcpu *spc, int cpu)
>>  {
>> -    struct csched_private *prv = CSCHED_PRIV(ops);
>> -    struct csched_pcpu * const spc = pdata;
>> -    unsigned long flags;
>> -
>> -    /* cpu data needs to be allocated, but STILL uninitialized */
>> -    ASSERT(spc && spc->runq.next == spc->runq.prev && spc->runq.next == NULL);
>> -
>> -    spin_lock_irqsave(&prv->lock, flags);
>> +    ASSERT(spin_is_locked(&prv->lock));
>> +    /* cpu data needs to be allocated, but STILL uninitialized. */
>> +    ASSERT(spc && spc->runq.next == NULL && spc->runq.prev == NULL);
> 
> Actually, Juergen, looks like Dario already agrees with us. ;-)
> 
> Obviously this should be updated in the previous patch instead.
> 
> With that done:
> 
> Reviewed-by: George Dunlap <george.dunlap@citrix.com>

Oops -- somehow still have Juergen's Fujitsu address in my addressbook...

 -G

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 09/16] xen: sched: close potential races when switching scheduler to CPUs
  2016-03-18 19:05 ` [PATCH 09/16] xen: sched: close potential races when switching scheduler to CPUs Dario Faggioli
  2016-03-19  2:25   ` Meng Xu
@ 2016-03-23 19:05   ` George Dunlap
  2016-04-05 16:26     ` Dario Faggioli
  2016-03-24 12:14   ` George Dunlap
  2 siblings, 1 reply; 61+ messages in thread
From: George Dunlap @ 2016-03-23 19:05 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Tianyang Chen, Meng Xu

On 18/03/16 19:05, Dario Faggioli wrote:
> by using the sched_switch hook that we have introduced in
> the various schedulers.
> 
> The key is to let the actual switch of scheduler and the
> remapping of the scheduler lock for the CPU (if necessary)
> happen together (in the same critical section) protected
> (at least) by the old scheduler lock for the CPU.
> 
> This also means that, in Credit2 and RTDS, we can get rid
> of the code that was doing the scheduler lock remapping
> in csched2_free_pdata() and rt_free_pdata(), and of their
> triggering ASSERT-s.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Similar to my comment before -- in my own tree I squashed patches 6-9
into a single commit and found it much easier to review. :-)

One important question...

> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 1adc0e2..29582a6 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -1617,7 +1617,6 @@ void __init scheduler_init(void)
>  int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
>  {
>      struct vcpu *idle;
> -    spinlock_t *lock;
>      void *ppriv, *ppriv_old, *vpriv, *vpriv_old;
>      struct scheduler *old_ops = per_cpu(scheduler, cpu);
>      struct scheduler *new_ops = (c == NULL) ? &ops : c->sched;
> @@ -1640,11 +1639,21 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
>      if ( old_ops == new_ops )
>          goto out;
>  
> +    /*
> +     * To setup the cpu for the new scheduler we need:
> +     *  - a valid instance of per-CPU scheduler specific data, as it is
> +     *    allocated by SCHED_OP(alloc_pdata). Note that we do not want to
> +     *    initialize it yet (i.e., we are not calling SCHED_OP(init_pdata)).
> +     *    That will be done by the target scheduler, in SCHED_OP(switch_sched),
> +     *    in proper ordering and with locking.
> +     *  - a valid instance of per-vCPU scheduler specific data, for the idle
> +     *    vCPU of cpu. That is what the target scheduler will use for the
> +     *    sched_priv field of the per-vCPU info of the idle domain.
> +     */
>      idle = idle_vcpu[cpu];
>      ppriv = SCHED_OP(new_ops, alloc_pdata, cpu);
>      if ( IS_ERR(ppriv) )
>          return PTR_ERR(ppriv);
> -    SCHED_OP(new_ops, init_pdata, ppriv, cpu);
>      vpriv = SCHED_OP(new_ops, alloc_vdata, idle, idle->domain->sched_priv);
>      if ( vpriv == NULL )
>      {
> @@ -1652,17 +1661,20 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
>          return -ENOMEM;
>      }
>  
> -    lock = pcpu_schedule_lock_irq(cpu);
> -
>      SCHED_OP(old_ops, tick_suspend, cpu);
> +
> +    /*
> +     * The actual switch, including (if necessary) the rerouting of the
> +     * scheduler lock to whatever new_ops prefers,  needs to happen in one
> +     * critical section, protected by old_ops' lock, or races are possible.
> +     * Since each scheduler has its own contraints and locking scheme, do
> +     * that inside specific scheduler code, rather than here.
> +     */
>      vpriv_old = idle->sched_priv;
> -    idle->sched_priv = vpriv;
> -    per_cpu(scheduler, cpu) = new_ops;
>      ppriv_old = per_cpu(schedule_data, cpu).sched_priv;
> -    per_cpu(schedule_data, cpu).sched_priv = ppriv;
> -    SCHED_OP(new_ops, tick_resume, cpu);
> +    SCHED_OP(new_ops, switch_sched, cpu, ppriv, vpriv);

Is it really safe to read sched_priv without the lock held?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/16] xen: sched: make implementing .alloc_pdata optional
  2016-03-23 17:36     ` George Dunlap
@ 2016-03-24  9:43       ` Jan Beulich
  2016-03-24 13:14         ` Dario Faggioli
  0 siblings, 1 reply; 61+ messages in thread
From: Jan Beulich @ 2016-03-24  9:43 UTC (permalink / raw)
  To: Dario Faggioli, George Dunlap
  Cc: Juergen Gross, George Dunlap, Robert VanVossen, Josh Whitehead,
	Meng Xu, xen-devel

>>> On 23.03.16 at 18:36, <george.dunlap@citrix.com> wrote:
> On 21/03/16 14:22, Jan Beulich wrote:
>>>>> On 18.03.16 at 20:04, <dario.faggioli@citrix.com> wrote:
>>> --- a/xen/include/xen/sched-if.h
>>> +++ b/xen/include/xen/sched-if.h
>>> @@ -9,6 +9,7 @@
>>>  #define __XEN_SCHED_IF_H__
>>>  
>>>  #include <xen/percpu.h>
>>> +#include <xen/err.h>
>>>  
>>>  /* A global pointer to the initial cpupool (POOL0). */
>>>  extern struct cpupool *cpupool0;
>> 
>> There is no visible use in this header of what err.h defines - why
>> does it get included all of the sudden?
> 
> I'm guessing it's so that all the files that use the scheduler interface
> automatically get IS_ERR and PTR_ERR without having to include xen/err.h
> directly.
> 
> But of course that means files like sched_arinc653.c and sched_credit2.c
> end up including xen/err.c even though they don't use those macros.
> Would you prefer the other files include it directly instead?

Since there are other files than just xen/common/sched*.c which
include this header - yes, I'd prefer that. In general I think that
avoiding the need to include headers needed in multiple sources
is fine for private headers (i.e. such outside of xen/include/),
but isn't in ones available for general consumption.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 09/16] xen: sched: close potential races when switching scheduler to CPUs
  2016-03-18 19:05 ` [PATCH 09/16] xen: sched: close potential races when switching scheduler to CPUs Dario Faggioli
  2016-03-19  2:25   ` Meng Xu
  2016-03-23 19:05   ` George Dunlap
@ 2016-03-24 12:14   ` George Dunlap
  2016-04-05 17:37     ` Dario Faggioli
  2 siblings, 1 reply; 61+ messages in thread
From: George Dunlap @ 2016-03-24 12:14 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Tianyang Chen, Meng Xu

On 18/03/16 19:05, Dario Faggioli wrote:
> by using the sched_switch hook that we have introduced in
> the various schedulers.
> 
> The key is to let the actual switch of scheduler and the
> remapping of the scheduler lock for the CPU (if necessary)
> happen together (in the same critical section) protected
> (at least) by the old scheduler lock for the CPU.

Thanks for trying to sort this out -- I've been looking this since
yesterday afternoon and it certainly makes my head hurt. :-)

It looks like you want to do the locking inside the sched_switch()
callback, rather than outside of it, so that you can get the locking
order right (global private before per-cpu scheduler lock).  Otherwise
you could just have schedule_cpu_switch grab and release the lock, and
let the sched_switch() callback set the lock as needed (knowing that the
correct lock is already held and will be released).

But the ordering between prv->lock and the scheduler lock only needs to
be between the prv lock and scheduler lock *of a specific instance* of
the credit2 scheduler -- i.e., between prv->lock and prv->rqd[].lock.

And, critically, if we're calling sched_switch, then we already know
that the current pcpu lock is *not* one of the prv->rqd[].lock's because
we check that at the top of schedule_cpu_switch().

So I think there should be no problem with:
1. Grabbing the pcpu schedule lock in schedule_cpu_switch()
2. Grabbing prv->lock in csched2_switch_sched()
3. Setting the per_cpu schedule lock as the very last thing in
csched2_switch_sched()
4. Releasing the (old) pcpu schedule lock in schedule_cpu_switch().

What do you think?

That would allow us to read ppriv_old and vpriv_old with the
schedule_lock held.

Unfortunately I can't off the top of my head think of a good assertion
to put in at #2 to assert that the per-pcpu lock is *not* one of
runqueue locks in prv, because we don't yet know which runqueue this cpu
will be assigned to.  But we could check when we actually do the lock
assignment to make sure that it's not already equal.  That way we'll
either deadlock or ASSERT (which is not as good as always ASSERTing, but
is better than either deadlocking or working fine).

As an aside -- it seems to me that as soon as we change the scheduler
lock, there's a risk that something else may come along and try to grab
it / access the data.  Does that mean we really ought to use memory
barriers to make sure that the lock is written only after all changes to
the scheduler data have been appropriately made?

> This also means that, in Credit2 and RTDS, we can get rid
> of the code that was doing the scheduler lock remapping
> in csched2_free_pdata() and rt_free_pdata(), and of their
> triggering ASSERT-s.

Right -- so to put it a different way, *all* schedulers must now set the
locking scheme they wish to use, even if they want to use the default
per-cpu locks.  I think that means we have to do that for arinc653 too,
right?

At first I thought we could look at having schedule_cpu_switch() always
reset the lock before calling the switch_sched() callback; but if my
comment about memory barriers is accurate, then that won't work either.
 In any case, there are only 4 schedulers, so it's not that hard to just
have them all set the locking scheme they want.

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 10/16] xen: sched: improve credit2 bootparams' scope, placement and signedness
  2016-03-18 19:05 ` [PATCH 10/16] xen: sched: improve credit2 bootparams' scope, placement and signedness Dario Faggioli
  2016-03-21 14:51   ` Juergen Gross
@ 2016-03-24 12:20   ` George Dunlap
  1 sibling, 0 replies; 61+ messages in thread
From: George Dunlap @ 2016-03-24 12:20 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: George Dunlap, Juergen Gross, Jan Beulich, Uma Sharma

On 18/03/16 19:05, Dario Faggioli wrote:
> From: Uma Sharma <uma.sharma523@gmail.com>
> 
> and, while we are adjusting signedness of opt_load_window_shift,
> make also prv->load_window_shift unsigned, as approapriate.
> 
> Signed-off-by: Uma Sharma <uma.sharma523@gmail.com>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Acked-by: George Dunlap <george.dunlap@citrix.com>

> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Jan Beulich <JBeulich@suse.com>
> ---
> Patch has changed, so I'm not sticking any tag v1 received
> (namely, Juergen's Reviewed-by:).
> ---
> Changes from v1:
>  * improve signedness as well, as requested during review.
> ---
>  xen/common/sched_credit2.c |   10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 64fb028..2fd4175 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -162,7 +162,7 @@
>  #define CSFLAG_runq_migrate_request (1<<__CSFLAG_runq_migrate_request)
>  
>  
> -int opt_migrate_resist=500;
> +static unsigned int __read_mostly opt_migrate_resist = 500;
>  integer_param("sched_credit2_migrate_resist", opt_migrate_resist);
>  
>  /*
> @@ -185,12 +185,12 @@ integer_param("sched_credit2_migrate_resist", opt_migrate_resist);
>   *   to a load of 1.
>   */
>  #define LOADAVG_GRANULARITY_SHIFT (10)
> -int opt_load_window_shift=18;
> +static unsigned int __read_mostly opt_load_window_shift = 18;
>  #define  LOADAVG_WINDOW_SHIFT_MIN 4
>  integer_param("credit2_load_window_shift", opt_load_window_shift);
> -int opt_underload_balance_tolerance=0;
> +static int __read_mostly opt_underload_balance_tolerance = 0;
>  integer_param("credit2_balance_under", opt_underload_balance_tolerance);
> -int opt_overload_balance_tolerance=-3;
> +static int __read_mostly opt_overload_balance_tolerance = -3;
>  integer_param("credit2_balance_over", opt_overload_balance_tolerance);
>  
>  /*
> @@ -227,7 +227,7 @@ struct csched2_private {
>      cpumask_t active_queues; /* Queues which may have active cpus */
>      struct csched2_runqueue_data rqd[NR_CPUS];
>  
> -    int load_window_shift;
> +    unsigned int load_window_shift;
>  };
>  
>  /*
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 12/16] xen: sched: fix per-socket runqueue creation in credit2
  2016-03-18 19:05 ` [PATCH 12/16] xen: sched: fix per-socket runqueue creation in credit2 Dario Faggioli
@ 2016-03-24 12:24   ` George Dunlap
  0 siblings, 0 replies; 61+ messages in thread
From: George Dunlap @ 2016-03-24 12:24 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Justin Weaver

On 18/03/16 19:05, Dario Faggioli wrote:
> The credit2 scheduler tries to setup runqueues in such
> a way that there is one of them per each socket. However,
> that does not work. The issue is described in bug #36
> "credit2 only uses one runqueue instead of one runq per
> socket" (http://bugs.xenproject.org/xen/bug/36), and a
> solution has been attempted by an old patch series:
> 
>  http://lists.xen.org/archives/html/xen-devel/2014-08/msg02168.html
> 
> Here, we take advantage of the fact that now initialization
> happens (for all schedulers) during CPU_STARTING, so we
> have all the topology information available when necessary.
> 
> This is true for all the pCPUs _except_ the boot CPU. That
> is not an issue, though. In fact, no runqueue exists yet
> when the boot CPU is initialized, so we can just create
> one and put the boot CPU in there.

The loop is a genius idea, Dario! Well done.

One minor nit...

> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Justin Weaver <jtweaver@hawaii.edu>
> ---
>  xen/common/sched_credit2.c |   59 ++++++++++++++++++++++++++++++++------------
>  1 file changed, 43 insertions(+), 16 deletions(-)
> 
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 4ff26c9..456b9ea 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -53,7 +53,6 @@
>   *  http://wiki.xen.org/wiki/Credit2_Scheduler_Development
>   * TODO:
>   * + Multiple sockets
> - *  - Detect cpu layout and make runqueue map, one per L2 (make_runq_map())
>   *  - Simple load balancer / runqueue assignment
>   *  - Runqueue load measurement
>   *  - Load-based load balancer
> @@ -1972,6 +1971,48 @@ static void deactivate_runqueue(struct csched2_private *prv, int rqi)
>      cpumask_clear_cpu(rqi, &prv->active_queues);
>  }
>  
> +static unsigned int
> +cpu_to_runqueue(struct csched2_private *prv, unsigned int cpu)
> +{
> +    struct csched2_runqueue_data *rqd;
> +    unsigned int rqi;
> +
> +    for ( rqi = 0; rqi < nr_cpu_ids; rqi++ )
> +    {
> +        unsigned int peer_cpu;
> +
> +        /*
> +         * As soon as we come across an uninitialized runqueue, use it.
> +         * In fact, either:
> +         *  - we are initializing the first cpu, and we assign it to
> +         *    runqueue 0. This is handy, especially if we are dealing
> +         *    with the boot cpu (if credit2 is the default scheduler),
> +         *    as we would not be able to use cpu_to_socket() and similar
> +         *    helpers anyway (they're result of which is not reliable yet);
> +         *  - we have gone through all the active runqueues, and have not
> +         *    found anyone whose cpus' topology matches the one we are
> +         *    dealing with, so ativating a new runqueue is what we want.
> +         */

*activating

With that fixed:

Reviewed-by: George Dunlap <george.dunlap@citrix.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 13/16] xen: sched: allow for choosing credit2 runqueues configuration at boot
  2016-03-18 19:05 ` [PATCH 13/16] xen: sched: allow for choosing credit2 runqueues configuration at boot Dario Faggioli
  2016-03-22  7:46   ` Juergen Gross
@ 2016-03-24 12:36   ` George Dunlap
  1 sibling, 0 replies; 61+ messages in thread
From: George Dunlap @ 2016-03-24 12:36 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Juergen Gross, Uma Sharma

On 18/03/16 19:05, Dario Faggioli wrote:
> In fact, credit2 uses CPU topology to decide how to arrange
> its internal runqueues. Before this change, only 'one runqueue
> per socket' was allowed. However, experiments have shown that,
> for instance, having one runqueue per physical core improves
> performance, especially in case hyperthreading is available.
> 
> In general, it makes sense to allow users to pick one runqueue
> arrangement at boot time, so that:
>  - more experiments can be easily performed to even better
>    assess and improve performance;
>  - one can select the best configuration for his specific
>    use case and/or hardware.
> 
> This patch enables the above.
> 
> Note that, for correctly arranging runqueues to be per-core,
> just checking cpu_to_core() on the host CPUs is not enough.
> In fact, cores (and hyperthreads) on different sockets, can
> have the same core (and thread) IDs! We, therefore, need to
> check whether the full topology of two CPUs matches, for
> them to be put in the same runqueue.
> 
> Note also that the default (although not functional) for
> credit2, since now, has been per-socket runqueue. This patch
> leaves things that way, to avoid mixing policy and technical
> changes.
> 
> Finally, it would be a nice feature to be able to select
> a particular runqueue arrangement, even when creating a
> Credit2 cpupool. This is left as future work.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Signed-off-by: Uma Sharma <uma.sharma523@gmail.com>

Looks good, apart from the two errors Juergen pointed out.  Thanks,

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 14/16] xen: sched: per-core runqueues as default in credit2
  2016-03-18 19:05 ` [PATCH 14/16] xen: sched: per-core runqueues as default in credit2 Dario Faggioli
@ 2016-03-24 12:37   ` George Dunlap
  0 siblings, 0 replies; 61+ messages in thread
From: George Dunlap @ 2016-03-24 12:37 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Juergen Gross, George Dunlap, Uma Sharma

On 18/03/16 19:05, Dario Faggioli wrote:
> Experiments have shown that arranging the scheduing
> runqueues on a per-core basis yields better results,
> in most cases.
> 
> Such evaluation has been done, for the first time,
> by Uma Sharma, during her participation to OPW. Some
> of the results she got are summarized here:
> 
>  http://lists.xen.org/archives/html/xen-devel/2015-03/msg01499.html
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Signed-off-by: Uma Sharma <uma.sharma523@gmail.com>
> Reviewed-by: Juergen Gross <jgross@suse.com>

Acked-by: George Dunlap <george.dunlap@citrix.com>

> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> ---
>  docs/misc/xen-command-line.markdown |    2 +-
>  xen/common/sched_credit2.c          |    2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
> index 0047f94..5d801b8 100644
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -472,7 +472,7 @@ combination with the `low_crashinfo` command line option.
>  ### credit2\_runqueue
>  > `= core | socket | node | all`
>  
> -> Default: `socket`
> +> Default: `core`
>  
>  Specify how host CPUs are arranged in runqueues. Runqueues are kept
>  balanced with respect to the load generated by the vCPUs running on
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index c242dc4..07b8c67 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -220,7 +220,7 @@ integer_param("credit2_balance_over", opt_overload_balance_tolerance);
>  #define OPT_RUNQUEUE_SOCKET 2
>  #define OPT_RUNQUEUE_NODE   3
>  #define OPT_RUNQUEUE_ALL    4
> -static int __read_mostly opt_runqueue = OPT_RUNQUEUE_SOCKET;
> +static int __read_mostly opt_runqueue = OPT_RUNQUEUE_CORE;
>  
>  static void parse_credit2_runqueue(const char *s)
>  {
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 15/16] xen: sched: scratch space for cpumasks on Credit2
  2016-03-18 19:27   ` Andrew Cooper
@ 2016-03-24 12:44     ` George Dunlap
  2016-03-24 12:56       ` Andrew Cooper
  2016-03-24 13:10       ` Dario Faggioli
  0 siblings, 2 replies; 61+ messages in thread
From: George Dunlap @ 2016-03-24 12:44 UTC (permalink / raw)
  To: Andrew Cooper, Dario Faggioli, xen-devel; +Cc: George Dunlap

On 18/03/16 19:27, Andrew Cooper wrote:
> On 18/03/16 19:06, Dario Faggioli wrote:
>> like what's there already in both Credit1 and RTDS. In
>> fact, when playing with affinity, a lot of cpumask
>> manipulation is necessary, inside of various functions.
>>
>> To avoid having a lot of cpumask_var_t on the stack,
>> this patch introduces a global scratch area.
>>
>> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
>> ---
>> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>> ---
> 
> I would suggest instead going with
> 
> static DEFINE_PER_CPU(cpumask_t, csched2_cpumask_scratch);
> 
> Functions which want to use it can use
> 
> cpumask_t *scratch = &this_cpu(csched2_cpumask_scratch);
> 
> 
> This avoids all this opencoded allocation/refcounting, the chance that
> starting a scheduler would fail for memory reasons, and one extra
> cpumask in the per-cpu data area won't break the bank.

Going a bit further, since (according to the changelog) both credit1 and
rtds also do this, would it make sense to have schedule.c define these,
and allow any of the schedulers to use them?

(Assuming that both credit1 and rtds allocate exactly one mask per cpu.)

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 15/16] xen: sched: scratch space for cpumasks on Credit2
  2016-03-24 12:44     ` George Dunlap
@ 2016-03-24 12:56       ` Andrew Cooper
  2016-03-24 13:10       ` Dario Faggioli
  1 sibling, 0 replies; 61+ messages in thread
From: Andrew Cooper @ 2016-03-24 12:56 UTC (permalink / raw)
  To: George Dunlap, Dario Faggioli, xen-devel; +Cc: George Dunlap

On 24/03/16 12:44, George Dunlap wrote:
> On 18/03/16 19:27, Andrew Cooper wrote:
>> On 18/03/16 19:06, Dario Faggioli wrote:
>>> like what's there already in both Credit1 and RTDS. In
>>> fact, when playing with affinity, a lot of cpumask
>>> manipulation is necessary, inside of various functions.
>>>
>>> To avoid having a lot of cpumask_var_t on the stack,
>>> this patch introduces a global scratch area.
>>>
>>> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
>>> ---
>>> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>>> ---
>> I would suggest instead going with
>>
>> static DEFINE_PER_CPU(cpumask_t, csched2_cpumask_scratch);
>>
>> Functions which want to use it can use
>>
>> cpumask_t *scratch = &this_cpu(csched2_cpumask_scratch);
>>
>>
>> This avoids all this opencoded allocation/refcounting, the chance that
>> starting a scheduler would fail for memory reasons, and one extra
>> cpumask in the per-cpu data area won't break the bank.
> Going a bit further, since (according to the changelog) both credit1 and
> rtds also do this, would it make sense to have schedule.c define these,
> and allow any of the schedulers to use them?
>
> (Assuming that both credit1 and rtds allocate exactly one mask per cpu.)

If more than one scheduler needs scratch space, then yes.  That would be
better than having one scratch per scheduler per cpu.

After all, we have things like keyhandler_scratch, for a similar reason.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 15/16] xen: sched: scratch space for cpumasks on Credit2
  2016-03-24 12:44     ` George Dunlap
  2016-03-24 12:56       ` Andrew Cooper
@ 2016-03-24 13:10       ` Dario Faggioli
  1 sibling, 0 replies; 61+ messages in thread
From: Dario Faggioli @ 2016-03-24 13:10 UTC (permalink / raw)
  To: George Dunlap, Andrew Cooper, xen-devel; +Cc: George Dunlap


[-- Attachment #1.1: Type: text/plain, Size: 1354 bytes --]

On Thu, 2016-03-24 at 12:44 +0000, George Dunlap wrote:
> On 18/03/16 19:27, Andrew Cooper wrote:
> > This avoids all this opencoded allocation/refcounting, the chance
> > that
> > starting a scheduler would fail for memory reasons, and one extra
> > cpumask in the per-cpu data area won't break the bank.
> Going a bit further, since (according to the changelog) both credit1
> and
> rtds also do this, would it make sense to have schedule.c define
> these,
> and allow any of the schedulers to use them?
> 
Yeah, well, I guess it could.

> (Assuming that both credit1 and rtds allocate exactly one mask per
> cpu.)
> 
They do.

The only difference between credit1 and the other two is that credit1
already has a per-cpu private scheduler structure (csched_pcpu, where
also the runqueue is), and the mask has been put there, so it's handled
a little bit differently.

But it should be no problem turning it into using a mask from
schedule.c, in the same way that RTDS and Credit2 will do.

I like the idea, and I'm up for it for v2.

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/16] xen: sched: make implementing .alloc_pdata optional
  2016-03-24  9:43       ` Jan Beulich
@ 2016-03-24 13:14         ` Dario Faggioli
  0 siblings, 0 replies; 61+ messages in thread
From: Dario Faggioli @ 2016-03-24 13:14 UTC (permalink / raw)
  To: Jan Beulich, George Dunlap
  Cc: Juergen Gross, George Dunlap, Robert VanVossen, Josh Whitehead,
	Meng Xu, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 765 bytes --]

On Thu, 2016-03-24 at 03:43 -0600, Jan Beulich wrote:
> > > > On 23.03.16 at 18:36, <george.dunlap@citrix.com> wrote:
> > 
> > But of course that means files like sched_arinc653.c and
> > sched_credit2.c
> > end up including xen/err.c even though they don't use those macros.
> > Would you prefer the other files include it directly instead?
> Since there are other files than just xen/common/sched*.c which
> include this header - yes, I'd prefer that. 
>
Ok.

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 05/16] xen: sched: move pCPU initialization in an helper
  2016-03-23 17:51   ` George Dunlap
  2016-03-23 18:09     ` George Dunlap
@ 2016-03-24 13:21     ` Dario Faggioli
  1 sibling, 0 replies; 61+ messages in thread
From: Dario Faggioli @ 2016-03-24 13:21 UTC (permalink / raw)
  To: George Dunlap, xen-devel; +Cc: George Dunlap, Juergen Gross


[-- Attachment #1.1: Type: text/plain, Size: 1683 bytes --]

On Wed, 2016-03-23 at 17:51 +0000, George Dunlap wrote:
> On 18/03/16 19:04, Dario Faggioli wrote:
> > --- a/xen/common/sched_credit.c
> > +++ b/xen/common/sched_credit.c
> > @@ -542,16 +542,11 @@ csched_alloc_pdata(const struct scheduler
> > *ops, int cpu)
> >  }
> >  
> >  static void
> > -csched_init_pdata(const struct scheduler *ops, void *pdata, int
> > cpu)
> > +init_pdata(struct csched_private *prv, struct csched_pcpu *spc,
> > int cpu)
> >  {
> > -    struct csched_private *prv = CSCHED_PRIV(ops);
> > -    struct csched_pcpu * const spc = pdata;
> > -    unsigned long flags;
> > -
> > -    /* cpu data needs to be allocated, but STILL uninitialized */
> > -    ASSERT(spc && spc->runq.next == spc->runq.prev && spc-
> > >runq.next == NULL);
> > -
> > -    spin_lock_irqsave(&prv->lock, flags);
> > +    ASSERT(spin_is_locked(&prv->lock));
> > +    /* cpu data needs to be allocated, but STILL uninitialized. */
> > +    ASSERT(spc && spc->runq.next == NULL && spc->runq.prev ==
> > NULL);
> Actually, Juergen, looks like Dario already agrees with us. ;-)
> 
Indeed I do! :-)

> Obviously this should be updated in the previous patch instead.
> 
Yeah, sorry. Rebasing / patch-splitting gone wrong! :-/

> With that done:
> 
> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
> 
I'll do that.

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 11/16] xen: sched: on Credit2, don't reprogram the timer if idle
  2016-03-18 19:05 ` [PATCH 11/16] xen: sched: on Credit2, don't reprogram the timer if idle Dario Faggioli
@ 2016-03-24 15:03   ` George Dunlap
  0 siblings, 0 replies; 61+ messages in thread
From: George Dunlap @ 2016-03-24 15:03 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: xen-devel, Tianyang Chen

On Fri, Mar 18, 2016 at 7:05 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> as other schedulers are doing already: if the idle vcpu
> is picked and scheduled, there is no need to reprogram the
> scheduler timer to fire and invoke csched2_schedule()
> again in future.
>
> Tickling or external events will serve as pokes, when
> necessary, but until we can, we should just stay idle.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Reported-by: Tianyang Chen <tiche@seas.upenn.edu>
> Suggested-by: George Dunlap <dunlapg@umich.edu>

This should be my work address (george.dunlap@citrix.com).  With that change:

Acked-by: George Dunlap <george.dunlap@citrix.com>

> ---
> Cc: George Dunlap <dunlapg@umich.edu>
> Cc: Tianyang Chen <tiche@seas.upenn.edu>
> ---
>  xen/common/sched_credit2.c |    6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 2fd4175..4ff26c9 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -1540,8 +1540,12 @@ csched2_runtime(const struct scheduler *ops, int cpu, struct csched2_vcpu *snext
>      struct csched2_runqueue_data *rqd = RQD(ops, cpu);
>      struct list_head *runq = &rqd->runq;
>
> +    /*
> +     * If we're idle, just stay so. Others (or external events)
> +     * will poke us when necessary.
> +     */
>      if ( is_idle_vcpu(snext->vcpu) )
> -        return CSCHED2_MAX_TIMER;
> +        return -1;
>
>      /* General algorithm:
>       * 1) Run until snext's credit will be 0
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 16/16] xen: sched: implement vcpu hard affinity in Credit2
  2016-03-18 19:06 ` [PATCH 16/16] xen: sched: implement vcpu hard affinity in Credit2 Dario Faggioli
@ 2016-03-24 15:42   ` George Dunlap
  2016-04-05 16:50     ` Dario Faggioli
  0 siblings, 1 reply; 61+ messages in thread
From: George Dunlap @ 2016-03-24 15:42 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: xen-devel, Justin Weaver

On Fri, Mar 18, 2016 at 7:06 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> From: Justin Weaver <jtweaver@hawaii.edu>
>
> as it was still missing.
>
> Note that this patch "only" implements hard affinity,
> i.e., the possibility of specifying on what pCPUs a
> certain vCPU can run. Soft affinity (which express a
> preference for vCPUs to run on certain pCPUs) is still
> not supported by Credit2, even after this patch.
>
> Signed-off-by: Justin Weaver <jtweaver@hawaii.edu>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Just checking, are the main changes between this patch and the v4 that
Justin posted:

1) Moving the "scratch_mask" to a different patch
2) The code-cleanups you listed?

One rather tangential question...

> ---
> Cc: George Dunlap <dunlapg@umich.edu>
> ---
>  xen/common/sched_credit2.c |  131 ++++++++++++++++++++++++++++++++++----------
>  1 file changed, 102 insertions(+), 29 deletions(-)
>
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index a650216..3190eb3 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -327,6 +327,36 @@ struct csched2_dom {
>      uint16_t nr_vcpus;
>  };
>
> +/*
> + * When a hard affinity change occurs, we may not be able to check some
> + * (any!) of the other runqueues, when looking for the best new processor
> + * for svc (as trylock-s in choose_cpu() can fail). If that happens, we
> + * pick, in order of decreasing preference:
> + *  - svc's current pcpu;
> + *  - another pcpu from svc's current runq;
> + *  - any cpu.
> + */
> +static int get_fallback_cpu(struct csched2_vcpu *svc)
> +{
> +    int cpu;
> +
> +    if ( likely(cpumask_test_cpu(svc->vcpu->processor,
> +                                 svc->vcpu->cpu_hard_affinity)) )
> +        return svc->vcpu->processor;
> +
> +    cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
> +                &svc->rqd->active);
> +    cpu = cpumask_first(cpumask_scratch);
> +    if ( likely(cpu < nr_cpu_ids) )
> +        return cpu;
> +
> +    cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
> +                cpupool_domain_cpumask(svc->vcpu->domain));
> +
> +    ASSERT(!cpumask_empty(cpumask_scratch));
> +
> +    return cpumask_first(cpumask_scratch);
> +}
>
>  /*
>   * Time-to-credit, credit-to-time.
> @@ -560,8 +590,9 @@ runq_tickle(const struct scheduler *ops, unsigned int cpu, struct csched2_vcpu *
>          goto tickle;
>      }
>
> -    /* Get a mask of idle, but not tickled */
> +    /* Get a mask of idle, but not tickled, that new is allowed to run on. */
>      cpumask_andnot(&mask, &rqd->idle, &rqd->tickled);
> +    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);

It looks like this uses a cpumask_t on the stack -- can we use
scratch_mask here, or is there some reason we need to use the local
variable?

But that's really something to either add to the previous patch, or to
do in yet a different patch.

Acked-by: George Dunlap <george.dunlap@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/16] xen: sched: make implementing .alloc_pdata optional
  2016-03-21 15:07     ` Jan Beulich
@ 2016-04-01 17:01       ` Dario Faggioli
  2016-04-04  4:21         ` Juergen Gross
  2016-04-04  6:13         ` Jan Beulich
  0 siblings, 2 replies; 61+ messages in thread
From: Dario Faggioli @ 2016-04-01 17:01 UTC (permalink / raw)
  To: Jan Beulich, Juergen Gross
  Cc: George Dunlap, xen-devel, Josh Whitehead, Meng Xu, Robert VanVossen


[-- Attachment #1.1: Type: text/plain, Size: 1954 bytes --]

On Mon, 2016-03-21 at 09:07 -0600, Jan Beulich wrote:
> > > > On 21.03.16 at 15:48, <JGross@suse.com> wrote:
> > On 18/03/16 20:04, Dario Faggioli wrote:
> > > @@ -1491,9 +1491,9 @@ static int cpu_schedule_up(unsigned int
> > > cpu)
> > >      if ( idle_vcpu[cpu] == NULL )
> > >          return -ENOMEM;
> > >  
> > > -    if ( (ops.alloc_pdata != NULL) &&
> > > -         ((sd->sched_priv = ops.alloc_pdata(&ops, cpu)) == NULL)
> > > )
> > > -        return -ENOMEM;
> > > +    sd->sched_priv = SCHED_OP(&ops, alloc_pdata, cpu);
> > > +    if ( IS_ERR(sd->sched_priv) )
> > > +        return PTR_ERR(sd->sched_priv);
> > Calling xfree() with an IS_ERR() value might be a bad idea.
> > Either you need to set sd->sched_priv to NULL in error case or you
> > modify xfree() to return immediately not only in the NULL case, but
> > in the IS_ERR() case as well.
> The latter option is a no-go imo.
> 
Ok, I'll do:

    sd->sched_priv = SCHED_OP(&ops, alloc_pdata, cpu);
    if ( IS_ERR(sd->sched_priv) )
    {
        int err = PTR_ERR(sd->sched_priv);

        sd->sched_priv = NULL;
        return err;
    }

Is this ok?

And, just to be sure I got your point (Juergen), you're referring to
the situation where cpu_schedule_up() fails, we go to CPU_UP_CANCELLED,
which calls cpu_schedule_donw(), which calls free_pdata on
sd->sched_priv (inside which we may reach an xfree()), aren't you?

In fact, alloc_pdata is also called in schedule_cpu_switch(), but in
that case, I don't see anyone calling xfree() if alloc_pdata fails...
Am I missing it?

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/16] xen: sched: make implementing .alloc_pdata optional
  2016-04-01 17:01       ` Dario Faggioli
@ 2016-04-04  4:21         ` Juergen Gross
  2016-04-04  6:13         ` Jan Beulich
  1 sibling, 0 replies; 61+ messages in thread
From: Juergen Gross @ 2016-04-04  4:21 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich
  Cc: George Dunlap, xen-devel, Josh Whitehead, Meng Xu, Robert VanVossen

On 01/04/16 19:01, Dario Faggioli wrote:
> On Mon, 2016-03-21 at 09:07 -0600, Jan Beulich wrote:
>>>>> On 21.03.16 at 15:48, <JGross@suse.com> wrote:
>>> On 18/03/16 20:04, Dario Faggioli wrote:
>>>> @@ -1491,9 +1491,9 @@ static int cpu_schedule_up(unsigned int
>>>> cpu)
>>>>      if ( idle_vcpu[cpu] == NULL )
>>>>          return -ENOMEM;
>>>>  
>>>> -    if ( (ops.alloc_pdata != NULL) &&
>>>> -         ((sd->sched_priv = ops.alloc_pdata(&ops, cpu)) == NULL)
>>>> )
>>>> -        return -ENOMEM;
>>>> +    sd->sched_priv = SCHED_OP(&ops, alloc_pdata, cpu);
>>>> +    if ( IS_ERR(sd->sched_priv) )
>>>> +        return PTR_ERR(sd->sched_priv);
>>> Calling xfree() with an IS_ERR() value might be a bad idea.
>>> Either you need to set sd->sched_priv to NULL in error case or you
>>> modify xfree() to return immediately not only in the NULL case, but
>>> in the IS_ERR() case as well.
>> The latter option is a no-go imo.
>>
> Ok, I'll do:
> 
>     sd->sched_priv = SCHED_OP(&ops, alloc_pdata, cpu);
>     if ( IS_ERR(sd->sched_priv) )
>     {
>         int err = PTR_ERR(sd->sched_priv);
> 
>         sd->sched_priv = NULL;
>         return err;
>     }
> 
> Is this ok?

Yes, that's what I would prefer.

> And, just to be sure I got your point (Juergen), you're referring to
> the situation where cpu_schedule_up() fails, we go to CPU_UP_CANCELLED,
> which calls cpu_schedule_donw(), which calls free_pdata on
> sd->sched_priv (inside which we may reach an xfree()), aren't you?

Basically, yes.

> In fact, alloc_pdata is also called in schedule_cpu_switch(), but in
> that case, I don't see anyone calling xfree() if alloc_pdata fails...
> Am I missing it?

I just want to avoid the situation where sched_priv would contain a
non-NULL value not pointing to an allocated area. That's always
dangerous, even if with current code nothing bad might happen.

Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/16] xen: sched: make implementing .alloc_pdata optional
  2016-04-01 17:01       ` Dario Faggioli
  2016-04-04  4:21         ` Juergen Gross
@ 2016-04-04  6:13         ` Jan Beulich
  2016-04-05 16:01           ` Dario Faggioli
  1 sibling, 1 reply; 61+ messages in thread
From: Jan Beulich @ 2016-04-04  6:13 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Juergen Gross, George Dunlap, Robert VanVossen, Josh Whitehead,
	Meng Xu, xen-devel

>>> On 01.04.16 at 19:01, <dario.faggioli@citrix.com> wrote:
> On Mon, 2016-03-21 at 09:07 -0600, Jan Beulich wrote:
>> > > > On 21.03.16 at 15:48, <JGross@suse.com> wrote:
>> > On 18/03/16 20:04, Dario Faggioli wrote:
>> > > @@ -1491,9 +1491,9 @@ static int cpu_schedule_up(unsigned int
>> > > cpu)
>> > >      if ( idle_vcpu[cpu] == NULL )
>> > >          return -ENOMEM;
>> > >  
>> > > -    if ( (ops.alloc_pdata != NULL) &&
>> > > -         ((sd->sched_priv = ops.alloc_pdata(&ops, cpu)) == NULL)
>> > > )
>> > > -        return -ENOMEM;
>> > > +    sd->sched_priv = SCHED_OP(&ops, alloc_pdata, cpu);
>> > > +    if ( IS_ERR(sd->sched_priv) )
>> > > +        return PTR_ERR(sd->sched_priv);
>> > Calling xfree() with an IS_ERR() value might be a bad idea.
>> > Either you need to set sd->sched_priv to NULL in error case or you
>> > modify xfree() to return immediately not only in the NULL case, but
>> > in the IS_ERR() case as well.
>> The latter option is a no-go imo.
>> 
> Ok, I'll do:
> 
>     sd->sched_priv = SCHED_OP(&ops, alloc_pdata, cpu);
>     if ( IS_ERR(sd->sched_priv) )
>     {
>         int err = PTR_ERR(sd->sched_priv);
> 
>         sd->sched_priv = NULL;
>         return err;
>     }
> 
> Is this ok?

Depends: Can ->sched_priv be looked at by another CPU between
the first and second assignments? If yes, you need to use an
intermediary local variable.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/16] xen: sched: make implementing .alloc_pdata optional
  2016-04-04  6:13         ` Jan Beulich
@ 2016-04-05 16:01           ` Dario Faggioli
  0 siblings, 0 replies; 61+ messages in thread
From: Dario Faggioli @ 2016-04-05 16:01 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, George Dunlap, Robert VanVossen, Josh Whitehead,
	Meng Xu, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1029 bytes --]

On Mon, 2016-04-04 at 00:13 -0600, Jan Beulich wrote:
> > > > On 01.04.16 at 19:01, <dario.faggioli@citrix.com> wrote:
> > Ok, I'll do:
> > 
> >     sd->sched_priv = SCHED_OP(&ops, alloc_pdata, cpu);
> >     if ( IS_ERR(sd->sched_priv) )
> >     {
> >         int err = PTR_ERR(sd->sched_priv);
> > 
> >         sd->sched_priv = NULL;
> >         return err;
> >     }
> > 
> > Is this ok?
> Depends: Can ->sched_priv be looked at by another CPU between
> the first and second assignments? If yes, you need to use an
> intermediary local variable.
> 
Mmm... good point. I don't see this happening, but using a local
variable is more "defensive programming oriented", so I'm going for it.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 09/16] xen: sched: close potential races when switching scheduler to CPUs
  2016-03-23 19:05   ` George Dunlap
@ 2016-04-05 16:26     ` Dario Faggioli
  2016-04-06 15:51       ` Dario Faggioli
  0 siblings, 1 reply; 61+ messages in thread
From: Dario Faggioli @ 2016-04-05 16:26 UTC (permalink / raw)
  To: George Dunlap, xen-devel; +Cc: Tianyang Chen, Meng Xu


[-- Attachment #1.1: Type: text/plain, Size: 4247 bytes --]

On Wed, 2016-03-23 at 19:05 +0000, George Dunlap wrote:
> On 18/03/16 19:05, Dario Faggioli wrote:
> > 
> > by using the sched_switch hook that we have introduced in
> > the various schedulers.
> > 
> > The key is to let the actual switch of scheduler and the
> > remapping of the scheduler lock for the CPU (if necessary)
> > happen together (in the same critical section) protected
> > (at least) by the old scheduler lock for the CPU.
> > 
> > This also means that, in Credit2 and RTDS, we can get rid
> > of the code that was doing the scheduler lock remapping
> > in csched2_free_pdata() and rt_free_pdata(), and of their
> > triggering ASSERT-s.
> > 
> > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Similar to my comment before -- in my own tree I squashed patches 6-9
> into a single commit and found it much easier to review. :-)
> 
I understand your point.

I'll consider doing something like this for v2 (that I'm just finishing
putting together), but I'm not sure I like it.

For instance, although the issue has the same roots and similar
consequences for all schedulers, the actual race is different between
Credit1 and Credit2 (RTDS is the same as Credit2), and having distinct
patches for each scheduler allows me to describe both the situations in
details, in their respective changelog, without the changelogs
themselves becoming too long (they're actually quite long already!!).

> One important question...
> > 
> > --- a/xen/common/schedule.c
> > +++ b/xen/common/schedule.c
> > 
> > @@ -1652,17 +1661,20 @@ int schedule_cpu_switch(unsigned int cpu,
> > struct cpupool *c)
> >          return -ENOMEM;
> >      }
> >  
> > -    lock = pcpu_schedule_lock_irq(cpu);
> > -
> >      SCHED_OP(old_ops, tick_suspend, cpu);
> > +
> > +    /*
> > +     * The actual switch, including (if necessary) the rerouting
> > of the
> > +     * scheduler lock to whatever new_ops prefers,  needs to
> > happen in one
> > +     * critical section, protected by old_ops' lock, or races are
> > possible.
> > +     * Since each scheduler has its own contraints and locking
> > scheme, do
> > +     * that inside specific scheduler code, rather than here.
> > +     */
> >      vpriv_old = idle->sched_priv;
> > -    idle->sched_priv = vpriv;
> > -    per_cpu(scheduler, cpu) = new_ops;
> >      ppriv_old = per_cpu(schedule_data, cpu).sched_priv;
> > -    per_cpu(schedule_data, cpu).sched_priv = ppriv;
> > -    SCHED_OP(new_ops, tick_resume, cpu);
> > +    SCHED_OP(new_ops, switch_sched, cpu, ppriv, vpriv);
> Is it really safe to read sched_priv without the lock held?
> 
So, you put down a lot more reasoning on this issue in another email,
and I'll reply in more length to that one.

But just about this specific thing. We're in schedule_cpu_switch(), and
schedule_cpu_switch() is indeed the only function that changes the
content of sd->sched_priv, when the system is _live_. It both reads the
old pointer, stash it, allocate the new one, assign it, and free the
old one. It's therefore only because of this function that a race can
happen.

In fact, the only other situation where sched_priv changes is during
cpu bringup (CPU_UP_PREPARE phase), or teardown. But those cases are
not of much concern (and, in fact, there's no locking in there,
independently from this series).

Now, schedule_cpu_switch is called by:

1 cpupool.c  cpupool_assign_cpu_locked    268 ret = schedule_cpu_switch(cpu, c);
2 cpupool.c  cpupool_unassign_cpu_helper  325 ret = schedule_cpu_switch(cpu, NULL);

to move the cpu inside or outside from a cpupool, and in both cases, we
have taken the cpupool_lock spinlock already when calling it.

So, yes, it looks to me that sched_priv is safe to be manipulated like
the patch is doing... Am I overlooking something?

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 16/16] xen: sched: implement vcpu hard affinity in Credit2
  2016-03-24 15:42   ` George Dunlap
@ 2016-04-05 16:50     ` Dario Faggioli
  0 siblings, 0 replies; 61+ messages in thread
From: Dario Faggioli @ 2016-04-05 16:50 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Justin Weaver


[-- Attachment #1.1: Type: text/plain, Size: 2191 bytes --]

On Thu, 2016-03-24 at 15:42 +0000, George Dunlap wrote:
> On Fri, Mar 18, 2016 at 7:06 PM, Dario Faggioli
> <dario.faggioli@citrix.com> wrote:
> > 
> > From: Justin Weaver <jtweaver@hawaii.edu>
> > 
> > Signed-off-by: Justin Weaver <jtweaver@hawaii.edu>
> > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Just checking, are the main changes between this patch and the v4
> that
> Justin posted:
> 
> 1) Moving the "scratch_mask" to a different patch
> 2) The code-cleanups you listed?
> 
Yes, indeed! Sorry for not pointing that out more clearly (e.g., in the
cover letter).

> One rather tangential question...

> > --- a/xen/common/sched_credit2.c
> > +++ b/xen/common/sched_credit2.c
> > @@ -560,8 +590,9 @@ runq_tickle(const struct scheduler *ops,
> > unsigned int cpu, struct csched2_vcpu *
> >          goto tickle;
> >      }
> > 
> > -    /* Get a mask of idle, but not tickled */
> > +    /* Get a mask of idle, but not tickled, that new is allowed to
> > run on. */
> >      cpumask_andnot(&mask, &rqd->idle, &rqd->tickled);
> > +    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
> It looks like this uses a cpumask_t on the stack -- can we use
> scratch_mask here, or is there some reason we need to use the local
> variable?
> 
> But that's really something to either add to the previous patch, or
> to
> do in yet a different patch.
> 
It's because that cpumask stack variable is there already, and the
primary purpose of the scratch mask is to avoid _adding_more_ of them.
This is how it has been introduced and used so far in Credit1 and RTDS.

That said, I do agree it can well be used to get rid of some of the
already existing stack variable, but, as you suggest yourself, I'm
thinking about doing this in another patch.

> Acked-by: George Dunlap <george.dunlap@citrix.com>
>
Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 09/16] xen: sched: close potential races when switching scheduler to CPUs
  2016-03-24 12:14   ` George Dunlap
@ 2016-04-05 17:37     ` Dario Faggioli
  2016-04-06 16:21       ` Dario Faggioli
  0 siblings, 1 reply; 61+ messages in thread
From: Dario Faggioli @ 2016-04-05 17:37 UTC (permalink / raw)
  To: George Dunlap, xen-devel; +Cc: George Dunlap, Tianyang Chen, Meng Xu


[-- Attachment #1.1: Type: text/plain, Size: 5615 bytes --]

On Thu, 2016-03-24 at 12:14 +0000, George Dunlap wrote:
> On 18/03/16 19:05, Dario Faggioli wrote:
> > 
> > by using the sched_switch hook that we have introduced in
> > the various schedulers.
> > 
> > The key is to let the actual switch of scheduler and the
> > remapping of the scheduler lock for the CPU (if necessary)
> > happen together (in the same critical section) protected
> > (at least) by the old scheduler lock for the CPU.
> Thanks for trying to sort this out -- I've been looking this since
> yesterday afternoon and it certainly makes my head hurt. :-)
> 
> It looks like you want to do the locking inside the sched_switch()
> callback, rather than outside of it, so that you can get the locking
> order right (global private before per-cpu scheduler lock).  
>
Yes, especially considering that such locking order varies with the
scheduler. In fact, in Credit1, it's per-cpu first, global second; in
Credit2, it's the other way round: global first, per-runq second. In
RTDS there is only one global lock and in ARINC only the per-cpu locks.

> Otherwise
> you could just have schedule_cpu_switch grab and release the lock,
> and
> let the sched_switch() callback set the lock as needed (knowing that
> the
> correct lock is already held and will be released).
> 
Yeah, that's after all what is being done even right now. Of course,
right now it's buggy, but even if it probably can be made to work, in
the way you suggest, I thought bringing anything that has to do with
the locking order required by a specific scheduler _inside_ the
specific scheduler code would be a good thing.

I actually still think that, but I recognize the ugliness of accessing
ppriv_old and vpriv_old outside of scheduler locks (although, as I said
in the other email, it should be safe, as switching scheduler is
serialized by cpupool_lock anyway... perhaps I should put this in a
comment), not to mention how ugly it is the interface provided by
sched_switch() (talking about the last two parameters :-/).

> But the ordering between prv->lock and the scheduler lock only needs
> to
> be between the prv lock and scheduler lock *of a specific instance*
> of
> the credit2 scheduler -- i.e., between prv->lock and prv->rqd[].lock.
> 
True...

> And, critically, if we're calling sched_switch, then we already know
> that the current pcpu lock is *not* one of the prv->rqd[].lock's
> because
> we check that at the top of schedule_cpu_switch().
> 
... True again...

> So I think there should be no problem with:
> 1. Grabbing the pcpu schedule lock in schedule_cpu_switch()
> 2. Grabbing prv->lock in csched2_switch_sched()
> 3. Setting the per_cpu schedule lock as the very last thing in
> csched2_switch_sched()
> 4. Releasing the (old) pcpu schedule lock in schedule_cpu_switch().
> 
> What do you think?
> 
I think it should work. We'll be doing the scheduler lock manipulation
protected by the old and wrong (as it's the one of another scheduler)
per-cpu/runq lock, and the correct global private lock. It would look
like the ordering between the two locks is the wrong one, in Credit2,
but it's not because of the fact that the per-runq lock is the other
scheduler's one.

Tricky, but everything is in here! :-/

I'm not sure whether it's better or worse wrt what I have. I'm trying
coding it up, to see how it looks like (and if it allows me to get rid
of the ugliness in sched_switch()).

> That would allow us to read ppriv_old and vpriv_old with the
> schedule_lock held.
> 
Also good. :-)

> Unfortunately I can't off the top of my head think of a good
> assertion
> to put in at #2 to assert that the per-pcpu lock is *not* one of
> runqueue locks in prv, because we don't yet know which runqueue this
> cpu
> will be assigned to.  But we could check when we actually do the lock
> assignment to make sure that it's not already equal.  That way we'll
> either deadlock or ASSERT (which is not as good as always ASSERTing,
> but
> is better than either deadlocking or working fine).
> 
Yep, I don't think we can do much better than that.

> As an aside -- it seems to me that as soon as we change the scheduler
> lock, there's a risk that something else may come along and try to
> grab
> it / access the data.  Does that mean we really ought to use memory
> barriers to make sure that the lock is written only after all changes
> to
> the scheduler data have been appropriately made?
> 
Yes, if it were only for this code, I think you're right, barriers
would be necessary. I once again think this is actually safe, because
it's serialized elsewhere, but thinking more about it, I can well add
both barriers (and a comment).

> > 
> > This also means that, in Credit2 and RTDS, we can get rid
> > of the code that was doing the scheduler lock remapping
> > in csched2_free_pdata() and rt_free_pdata(), and of their
> > triggering ASSERT-s.
> Right -- so to put it a different way, *all* schedulers must now set
> the
> locking scheme they wish to use, even if they want to use the default
> per-cpu locks.  
>
Exactly.

> I think that means we have to do that for arinc653 too,
> right?
> 
Mmm... right, I'll have a look at that.

Thanks for looking at this, and sorry for the late reply.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 09/16] xen: sched: close potential races when switching scheduler to CPUs
  2016-04-05 16:26     ` Dario Faggioli
@ 2016-04-06 15:51       ` Dario Faggioli
  0 siblings, 0 replies; 61+ messages in thread
From: Dario Faggioli @ 2016-04-06 15:51 UTC (permalink / raw)
  To: George Dunlap, xen-devel; +Cc: Tianyang Chen, Meng Xu


[-- Attachment #1.1: Type: text/plain, Size: 1406 bytes --]

On Tue, 2016-04-05 at 18:26 +0200, Dario Faggioli wrote:
> On Wed, 2016-03-23 at 19:05 +0000, George Dunlap wrote:
> > On 18/03/16 19:05, Dario Faggioli wrote:
> > > This also means that, in Credit2 and RTDS, we can get rid
> > > of the code that was doing the scheduler lock remapping
> > > in csched2_free_pdata() and rt_free_pdata(), and of their
> > > triggering ASSERT-s.
> > > 
> > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> > Similar to my comment before -- in my own tree I squashed patches
> > 6-9
> > into a single commit and found it much easier to review. :-)
> > 
> I understand your point.
> 
> I'll consider doing something like this for v2 (that I'm just
> finishing
> putting together), but I'm not sure I like it.
> 
BTW, I changed my mind and, in the patch series I'm about to send, I
did as you suggest and squashed these patches into one. :-)

The changelog is indeed rather long, but still fine, IMO, and there's a
lot less duplication, both in `git log' and in code (if one looks at it
'patches after patches', rather than just the final result).

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 09/16] xen: sched: close potential races when switching scheduler to CPUs
  2016-04-05 17:37     ` Dario Faggioli
@ 2016-04-06 16:21       ` Dario Faggioli
  0 siblings, 0 replies; 61+ messages in thread
From: Dario Faggioli @ 2016-04-06 16:21 UTC (permalink / raw)
  To: George Dunlap, xen-devel
  Cc: George Dunlap, Tianyang Chen, Josh Whitehead, Meng Xu, Robert VanVossen


[-- Attachment #1.1: Type: text/plain, Size: 3636 bytes --]

On Tue, 2016-04-05 at 19:37 +0200, Dario Faggioli wrote:
> On Thu, 2016-03-24 at 12:14 +0000, George Dunlap wrote:
> > On 18/03/16 19:05, Dario Faggioli wrote:
> > So I think there should be no problem with:
> > 1. Grabbing the pcpu schedule lock in schedule_cpu_switch()
> > 2. Grabbing prv->lock in csched2_switch_sched()
> > 3. Setting the per_cpu schedule lock as the very last thing in
> > csched2_switch_sched()
> > 4. Releasing the (old) pcpu schedule lock in schedule_cpu_switch().
> > 
> > What do you think?
> > 
> I think it should work. We'll be doing the scheduler lock
> manipulation
> protected by the old and wrong (as it's the one of another scheduler)
> per-cpu/runq lock, and the correct global private lock. It would look
> like the ordering between the two locks is the wrong one, in Credit2,
> but it's not because of the fact that the per-runq lock is the other
> scheduler's one.
> 
> Tricky, but everything is in here! :-/
> 
I've done it as you suggest above.

The new .switch_sched hook is still there, and still does look the
same. But I do indeed like the final look of the code better, and it
appears to be working ok.

Have a look. ;-)

> > As an aside -- it seems to me that as soon as we change the
> > scheduler
> > lock, there's a risk that something else may come along and try to
> > grab
> > it / access the data.  Does that mean we really ought to use memory
> > barriers to make sure that the lock is written only after all
> > changes
> > to
> > the scheduler data have been appropriately made?
> > 
> Yes, if it were only for this code, I think you're right, barriers
> would be necessary. I once again think this is actually safe, because
> it's serialized elsewhere, but thinking more about it, I can well add
> both barriers (and a comment).
> 
And I've added smp_mb()-s too.

> > > This also means that, in Credit2 and RTDS, we can get rid
> > > of the code that was doing the scheduler lock remapping
> > > in csched2_free_pdata() and rt_free_pdata(), and of their
> > > triggering ASSERT-s.
> > Right -- so to put it a different way, *all* schedulers must now
> > set
> > the
> > locking scheme they wish to use, even if they want to use the
> > default
> > per-cpu locks.  
> > 
> Exactly.
> 
> > 
> > I think that means we have to do that for arinc653 too,
> > right?
> > 
> Mmm... right, I'll have a look at that.
> 
And, finally, I did have a look at this too, and I actually don't think
ARINC needs any of this.

In fact, ARINC brings the idea of "doing its own locking" much further
than the other schedulers we have. They have their lock and they use it
in such a way that they don't even care to what
{v,p}cpu_schedule_lock() and friends points to.

As an example, check a653sched_do_schedule(). It's called from
schedule(), right after taking the runqueue lock,
with pcpu_schedule_lock_irq(), and yet it does this:

 spin_lock_irqsave(&sched_priv->lock, flags);

So, I actually better _not_ add anything to this series that re-maps
sd->schedule_lock to point to their sched_priv->lock, or we'd deadlock!

I'm not sure the design behind all this is the best possible one, but
that's a different issue, to be dealt with with another series in
another moment. :-)

In any case, I've added Robert and Josh.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2016-04-06 16:21 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-18 19:03 [PATCH 00/16] Fixes and improvement (including hard affinity!) for Credit2 Dario Faggioli
2016-03-18 19:04 ` [PATCH 01/16] xen: sched: fix locking when allocating an RTDS pCPU Dario Faggioli
2016-03-19  2:22   ` Meng Xu
2016-03-23 15:37   ` George Dunlap
2016-03-18 19:04 ` [PATCH 02/16] xen: sched: add .init_pdata hook to the scheduler interface Dario Faggioli
2016-03-22  8:08   ` Juergen Gross
2016-03-23 17:32   ` George Dunlap
2016-03-18 19:04 ` [PATCH 03/16] xen: sched: make implementing .alloc_pdata optional Dario Faggioli
2016-03-19  2:23   ` Meng Xu
2016-03-21 14:22   ` Jan Beulich
2016-03-23 17:36     ` George Dunlap
2016-03-24  9:43       ` Jan Beulich
2016-03-24 13:14         ` Dario Faggioli
2016-03-21 14:48   ` Juergen Gross
2016-03-21 15:07     ` Jan Beulich
2016-04-01 17:01       ` Dario Faggioli
2016-04-04  4:21         ` Juergen Gross
2016-04-04  6:13         ` Jan Beulich
2016-04-05 16:01           ` Dario Faggioli
2016-03-23 17:38   ` George Dunlap
2016-03-18 19:04 ` [PATCH 04/16] xen: sched: implement .init_pdata in all schedulers Dario Faggioli
2016-03-19  2:24   ` Meng Xu
2016-03-22  8:03   ` Juergen Gross
2016-03-23 17:46     ` George Dunlap
2016-03-18 19:04 ` [PATCH 05/16] xen: sched: move pCPU initialization in an helper Dario Faggioli
2016-03-23 17:51   ` George Dunlap
2016-03-23 18:09     ` George Dunlap
2016-03-24 13:21     ` Dario Faggioli
2016-03-18 19:04 ` [PATCH 06/16] xen: sched: prepare a .switch_sched hook for Credit1 Dario Faggioli
2016-03-18 19:04 ` [PATCH 07/16] xen: sched: prepare a .switch_sched hook for Credit2 Dario Faggioli
2016-03-18 19:04 ` [PATCH 08/16] " Dario Faggioli
2016-03-19  2:24   ` Meng Xu
2016-03-21 14:25   ` Jan Beulich
2016-03-18 19:05 ` [PATCH 09/16] xen: sched: close potential races when switching scheduler to CPUs Dario Faggioli
2016-03-19  2:25   ` Meng Xu
2016-03-23 19:05   ` George Dunlap
2016-04-05 16:26     ` Dario Faggioli
2016-04-06 15:51       ` Dario Faggioli
2016-03-24 12:14   ` George Dunlap
2016-04-05 17:37     ` Dario Faggioli
2016-04-06 16:21       ` Dario Faggioli
2016-03-18 19:05 ` [PATCH 10/16] xen: sched: improve credit2 bootparams' scope, placement and signedness Dario Faggioli
2016-03-21 14:51   ` Juergen Gross
2016-03-24 12:20   ` George Dunlap
2016-03-18 19:05 ` [PATCH 11/16] xen: sched: on Credit2, don't reprogram the timer if idle Dario Faggioli
2016-03-24 15:03   ` George Dunlap
2016-03-18 19:05 ` [PATCH 12/16] xen: sched: fix per-socket runqueue creation in credit2 Dario Faggioli
2016-03-24 12:24   ` George Dunlap
2016-03-18 19:05 ` [PATCH 13/16] xen: sched: allow for choosing credit2 runqueues configuration at boot Dario Faggioli
2016-03-22  7:46   ` Juergen Gross
2016-03-24 12:36   ` George Dunlap
2016-03-18 19:05 ` [PATCH 14/16] xen: sched: per-core runqueues as default in credit2 Dario Faggioli
2016-03-24 12:37   ` George Dunlap
2016-03-18 19:06 ` [PATCH 15/16] xen: sched: scratch space for cpumasks on Credit2 Dario Faggioli
2016-03-18 19:27   ` Andrew Cooper
2016-03-24 12:44     ` George Dunlap
2016-03-24 12:56       ` Andrew Cooper
2016-03-24 13:10       ` Dario Faggioli
2016-03-18 19:06 ` [PATCH 16/16] xen: sched: implement vcpu hard affinity in Credit2 Dario Faggioli
2016-03-24 15:42   ` George Dunlap
2016-04-05 16:50     ` Dario Faggioli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).