All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] The 'null' Scheduler
@ 2017-04-07  0:33 Dario Faggioli
  2017-04-07  0:33 ` [PATCH v2 1/5] xen: sched: improve robustness (and rename) DOM2OP() Dario Faggioli
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Dario Faggioli @ 2017-04-07  0:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Jonathan Davies, George Dunlap, Marcus Granado,
	Stefano Stabellini, Julien Grall

Hello,

here it comes v2 of this series about the 'null' scheduler.

v1, with much details about the idea and the goal of this scheduler is here:
 https://lists.xen.org/archives/html/xen-devel/2017-03/msg02316.html

In this version, I'm basically addressing the review comments I received.

There are 5 patches now, instead of 3, because:
 - I've added "xen: sched: improve robustness (and rename) DOM2OP()" at the
   beginning of the series. That is totally independent from the series itself,
   I just needed to repost it, and did it like this for convenience;
 - patch 2 is new as it contains what was an hunk of another patch in v1,
   but it really wanted to live in its own separate patches (that was also
   one of the review comments).

Finally, about last patch. Wei and Stefano acked it in v1, so I've kept it as
it was, and added their Acked-by tags. Since then, there has been some
discussion with George about changing the behavior of libxl's scheduling
parameter getting function.

If what George is proposing is considered better, I'm happy to do it and
resend. Just let me know.

A git branch is available here:
 git://xenbits.xen.org/people/dariof/xen.git  rel/sched/null-sched-v2
 https://travis-ci.org/fdario/xen/builds/219505940  [*]

Thanks and Regards,
Dario

[*] Clang build are failing, but that's a known issue for current staging,
    independent from this series.
---
Dario Faggioli (5):
      xen: sched: improve robustness (and rename) DOM2OP()
      xen: sched: make sure a pCPU added to a pool runs the scheduler ASAP
      xen: sched: introduce the 'null' semi-static scheduler
      xen: sched_null: support for hard affinity
      tools: sched: add support for 'null' scheduler

 docs/misc/xen-command-line.markdown |    2 
 tools/libxl/libxl.h                 |    6 
 tools/libxl/libxl_sched.c           |   24 +
 tools/libxl/libxl_types.idl         |    1 
 xen/common/Kconfig                  |   11 
 xen/common/Makefile                 |    1 
 xen/common/sched_null.c             |  804 +++++++++++++++++++++++++++++++++++
 xen/common/schedule.c               |   59 ++-
 xen/include/public/domctl.h         |    1 
 9 files changed, 889 insertions(+), 20 deletions(-)
 create mode 100644 xen/common/sched_null.c
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/5] xen: sched: improve robustness (and rename) DOM2OP()
  2017-04-07  0:33 [PATCH v2 0/5] The 'null' Scheduler Dario Faggioli
@ 2017-04-07  0:33 ` Dario Faggioli
  2017-04-07  8:44   ` George Dunlap
  2017-04-07  0:34 ` [PATCH v2 2/5] xen: sched: make sure a pCPU added to a pool runs the scheduler ASAP Dario Faggioli
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Dario Faggioli @ 2017-04-07  0:33 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Jan Beulich

Clarify and enforce (with ASSERTs) when the function
is called on the idle domain, and explain in comments
what it means and when it is ok to do so.

While there, change the name of the function to a more
self-explanatory one, and do the same to VCPU2OP.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
Changes from v1:
 - new patch;
 - renamed VCPU2OP, as suggested during v1's review of patch 1.

Changes from v1 of the null scheduler series:
 - renamed the helpers to dom_scheduler() and vcpu_scheduler().
---
 xen/common/schedule.c |   56 ++++++++++++++++++++++++++++++++-----------------
 1 file changed, 37 insertions(+), 19 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index d344b7c..d67227f 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -77,8 +77,25 @@ static struct scheduler __read_mostly ops;
          (( (opsptr)->fn != NULL ) ? (opsptr)->fn(opsptr, ##__VA_ARGS__ )  \
           : (typeof((opsptr)->fn(opsptr, ##__VA_ARGS__)))0 )
 
-#define DOM2OP(_d)    (((_d)->cpupool == NULL) ? &ops : ((_d)->cpupool->sched))
-static inline struct scheduler *VCPU2OP(const struct vcpu *v)
+static inline struct scheduler *dom_scheduler(const struct domain *d)
+{
+    if ( likely(d->cpupool != NULL) )
+        return d->cpupool->sched;
+
+    /*
+     * If d->cpupool is NULL, this is the idle domain. This is special
+     * because the idle domain does not really bolong to any cpupool, and,
+     * hence, does not really have a scheduler.
+     *
+     * This is (should be!) only called like this for allocating the idle
+     * vCPUs for the first time, during boot, in which case what we want
+     * is the default scheduler that has been, choosen at boot.
+     */
+    ASSERT(is_idle_domain(d));
+    return &ops;
+}
+
+static inline struct scheduler *vcpu_scheduler(const struct vcpu *v)
 {
     struct domain *d = v->domain;
 
@@ -260,7 +277,8 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     init_timer(&v->poll_timer, poll_timer_fn,
                v, v->processor);
 
-    v->sched_priv = SCHED_OP(DOM2OP(d), alloc_vdata, v, d->sched_priv);
+    v->sched_priv = SCHED_OP(dom_scheduler(d), alloc_vdata, v,
+		             d->sched_priv);
     if ( v->sched_priv == NULL )
         return 1;
 
@@ -272,7 +290,7 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     }
     else
     {
-        SCHED_OP(DOM2OP(d), insert_vcpu, v);
+        SCHED_OP(dom_scheduler(d), insert_vcpu, v);
     }
 
     return 0;
@@ -326,7 +344,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
     domain_pause(d);
 
-    old_ops = DOM2OP(d);
+    old_ops = dom_scheduler(d);
     old_domdata = d->sched_priv;
 
     for_each_vcpu ( d, v )
@@ -389,8 +407,8 @@ void sched_destroy_vcpu(struct vcpu *v)
     kill_timer(&v->poll_timer);
     if ( test_and_clear_bool(v->is_urgent) )
         atomic_dec(&per_cpu(schedule_data, v->processor).urgent_count);
-    SCHED_OP(VCPU2OP(v), remove_vcpu, v);
-    SCHED_OP(VCPU2OP(v), free_vdata, v->sched_priv);
+    SCHED_OP(vcpu_scheduler(v), remove_vcpu, v);
+    SCHED_OP(vcpu_scheduler(v), free_vdata, v->sched_priv);
 }
 
 int sched_init_domain(struct domain *d, int poolid)
@@ -404,7 +422,7 @@ int sched_init_domain(struct domain *d, int poolid)
 
     SCHED_STAT_CRANK(dom_init);
     TRACE_1D(TRC_SCHED_DOM_ADD, d->domain_id);
-    return SCHED_OP(DOM2OP(d), init_domain, d);
+    return SCHED_OP(dom_scheduler(d), init_domain, d);
 }
 
 void sched_destroy_domain(struct domain *d)
@@ -413,7 +431,7 @@ void sched_destroy_domain(struct domain *d)
 
     SCHED_STAT_CRANK(dom_destroy);
     TRACE_1D(TRC_SCHED_DOM_REM, d->domain_id);
-    SCHED_OP(DOM2OP(d), destroy_domain, d);
+    SCHED_OP(dom_scheduler(d), destroy_domain, d);
 
     cpupool_rm_domain(d);
 }
@@ -432,7 +450,7 @@ void vcpu_sleep_nosync(struct vcpu *v)
         if ( v->runstate.state == RUNSTATE_runnable )
             vcpu_runstate_change(v, RUNSTATE_offline, NOW());
 
-        SCHED_OP(VCPU2OP(v), sleep, v);
+        SCHED_OP(vcpu_scheduler(v), sleep, v);
     }
 
     vcpu_schedule_unlock_irqrestore(lock, flags, v);
@@ -461,7 +479,7 @@ void vcpu_wake(struct vcpu *v)
     {
         if ( v->runstate.state >= RUNSTATE_blocked )
             vcpu_runstate_change(v, RUNSTATE_runnable, NOW());
-        SCHED_OP(VCPU2OP(v), wake, v);
+        SCHED_OP(vcpu_scheduler(v), wake, v);
     }
     else if ( !(v->pause_flags & VPF_blocked) )
     {
@@ -516,8 +534,8 @@ static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
      * Actual CPU switch to new CPU.  This is safe because the lock
      * pointer cant' change while the current lock is held.
      */
-    if ( VCPU2OP(v)->migrate )
-        SCHED_OP(VCPU2OP(v), migrate, v, new_cpu);
+    if ( vcpu_scheduler(v)->migrate )
+        SCHED_OP(vcpu_scheduler(v), migrate, v, new_cpu);
     else
         v->processor = new_cpu;
 }
@@ -583,7 +601,7 @@ static void vcpu_migrate(struct vcpu *v)
                 break;
 
             /* Select a new CPU. */
-            new_cpu = SCHED_OP(VCPU2OP(v), pick_cpu, v);
+            new_cpu = SCHED_OP(vcpu_scheduler(v), pick_cpu, v);
             if ( (new_lock == per_cpu(schedule_data, new_cpu).schedule_lock) &&
                  cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) )
                 break;
@@ -685,7 +703,7 @@ void restore_vcpu_affinity(struct domain *d)
         spin_unlock_irq(lock);;
 
         lock = vcpu_schedule_lock_irq(v);
-        v->processor = SCHED_OP(VCPU2OP(v), pick_cpu, v);
+        v->processor = SCHED_OP(vcpu_scheduler(v), pick_cpu, v);
         spin_unlock_irq(lock);
     }
 
@@ -975,7 +993,7 @@ long vcpu_yield(void)
     struct vcpu * v=current;
     spinlock_t *lock = vcpu_schedule_lock_irq(v);
 
-    SCHED_OP(VCPU2OP(v), yield, v);
+    SCHED_OP(vcpu_scheduler(v), yield, v);
     vcpu_schedule_unlock_irq(lock, v);
 
     SCHED_STAT_CRANK(vcpu_yield);
@@ -1288,7 +1306,7 @@ long sched_adjust(struct domain *d, struct xen_domctl_scheduler_op *op)
     if ( ret )
         return ret;
 
-    if ( op->sched_id != DOM2OP(d)->sched_id )
+    if ( op->sched_id != dom_scheduler(d)->sched_id )
         return -EINVAL;
 
     switch ( op->cmd )
@@ -1304,7 +1322,7 @@ long sched_adjust(struct domain *d, struct xen_domctl_scheduler_op *op)
 
     /* NB: the pluggable scheduler code needs to take care
      * of locking by itself. */
-    if ( (ret = SCHED_OP(DOM2OP(d), adjust, d, op)) == 0 )
+    if ( (ret = SCHED_OP(dom_scheduler(d), adjust, d, op)) == 0 )
         TRACE_1D(TRC_SCHED_ADJDOM, d->domain_id);
 
     return ret;
@@ -1482,7 +1500,7 @@ void context_saved(struct vcpu *prev)
     /* Check for migration request /after/ clearing running flag. */
     smp_mb();
 
-    SCHED_OP(VCPU2OP(prev), context_saved, prev);
+    SCHED_OP(vcpu_scheduler(prev), context_saved, prev);
 
     if ( unlikely(prev->pause_flags & VPF_migrating) )
         vcpu_migrate(prev);


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 2/5] xen: sched: make sure a pCPU added to a pool runs the scheduler ASAP
  2017-04-07  0:33 [PATCH v2 0/5] The 'null' Scheduler Dario Faggioli
  2017-04-07  0:33 ` [PATCH v2 1/5] xen: sched: improve robustness (and rename) DOM2OP() Dario Faggioli
@ 2017-04-07  0:34 ` Dario Faggioli
  2017-04-07  8:47   ` George Dunlap
  2017-04-07  0:34 ` [PATCH v2 3/5] xen: sched: introduce the 'null' semi-static scheduler Dario Faggioli
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Dario Faggioli @ 2017-04-07  0:34 UTC (permalink / raw)
  To: xen-devel; +Cc: Stefano Stabellini, George Dunlap

When a pCPU is added to a cpupool, the pool's scheduler
should immediately run on it so, for instance, any runnable
but not running vCPU can start executing there.

This currently does not happen. Make it happen by raising
the scheduler softirq directly from the function that
sets up the new scheduler for the pCPU.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
---
 xen/common/schedule.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index d67227f..4aae423 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1823,6 +1823,9 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
 
  out:
     per_cpu(cpupool, cpu) = c;
+    /* When a cpu is added to a pool, trigger it to go pick up some work */
+    if ( c != NULL )
+        cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
 
     return 0;
 }


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 3/5] xen: sched: introduce the 'null' semi-static scheduler
  2017-04-07  0:33 [PATCH v2 0/5] The 'null' Scheduler Dario Faggioli
  2017-04-07  0:33 ` [PATCH v2 1/5] xen: sched: improve robustness (and rename) DOM2OP() Dario Faggioli
  2017-04-07  0:34 ` [PATCH v2 2/5] xen: sched: make sure a pCPU added to a pool runs the scheduler ASAP Dario Faggioli
@ 2017-04-07  0:34 ` Dario Faggioli
  2017-04-07  7:24   ` Alan Robinson
  2017-04-07  9:17   ` George Dunlap
  2017-04-07  0:34 ` [PATCH v2 4/5] xen: sched_null: support for hard affinity Dario Faggioli
  2017-04-07  0:34 ` [PATCH v2 5/5] tools: sched: add support for 'null' scheduler Dario Faggioli
  4 siblings, 2 replies; 13+ messages in thread
From: Dario Faggioli @ 2017-04-07  0:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Jonathan Davies, Julien Grall, Stefano Stabellini, George Dunlap,
	Marcus Granado

In cases where one is absolutely sure that there will be
less vCPUs than pCPUs, having to pay the cose, mostly in
terms of overhead, of an advanced scheduler may be not
desirable.

The simple scheduler implemented here could be a solution.
Here how it works:
 - each vCPU is statically assigned to a pCPU;
 - if there are pCPUs without any vCPU assigned, they
   stay idle (as in, the run their idle vCPU);
 - if there are vCPUs which are not assigned to any
   pCPU (e.g., because there are more vCPUs than pCPUs)
   they *don't* run, until they get assigned;
 - if a vCPU assigned to a pCPU goes away, one of the
   waiting to be assigned vCPU, if any, gets assigned
   to the pCPU and can run there.

This scheduler, therefore, if used in configurations
where every vCPUs can be assigned to a pCPU, guarantees
low overhead, low latency, and consistent performance.

If used as default scheduler, at Xen boot, it is
recommended to limit the number of Dom0 vCPUs (e.g., with
'dom0_max_vcpus=x'). Otherwise, all the pCPUs will have
one Dom0's vCPU assigned, and there won't be room for
running efficiently (if at all) any guest.

Target use cases are embedded and HPC, but it may well
be interesting also in circumnstances.

Kconfig and documentation are update accordingly.

While there, also document the availability of sched=rtds
as boot parameter, which apparently had been forgotten.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Jonathan Davies <Jonathan.Davies@citrix.com>
Cc: Marcus Granado <marcus.granado@citrix.com>
---
Changed from v1:
- coding style fixes (removed some hard tabs);
- removed the null_vcpu->ndom up pointer, as suggested during review. It's
  actually a pattern to have it in the schedulers, but in this one, it was
  basically never used (only exception was NULL_VCPU_CHECK());
- removed the redundant pcpu field in null_vcpu, as suggested during review;
- factored common out code from null_vcpu_remove() and null_vcpu_migrate(),
  as suggested during review;
- use dprintk() for logging (as we don't care who current is), but with
  XENLOG_G_<xxx>, because we do want rate limiting;
- add a warning when a vCPU is put in the waitqueue;
- fixed null_vcpu_insert() locking, so that we hold change the value of
  v->processor while holding the lock on the original pCPU, as it should
  be done;
- triggering SCHEDULE_SOFTIRQ in schedule.c:schedule_cpu_switch() moved
  to its own patch.
---
 docs/misc/xen-command-line.markdown |    2 
 xen/common/Kconfig                  |   11 
 xen/common/Makefile                 |    1 
 xen/common/sched_null.c             |  786 +++++++++++++++++++++++++++++++++++
 xen/include/public/domctl.h         |    1 
 5 files changed, 800 insertions(+), 1 deletion(-)
 create mode 100644 xen/common/sched_null.c

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 5815d87..4c8fe2f 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1423,7 +1423,7 @@ Map the HPET page as read only in Dom0. If disabled the page will be mapped
 with read and write permissions.
 
 ### sched
-> `= credit | credit2 | arinc653`
+> `= credit | credit2 | arinc653 | rtds | null`
 
 > Default: `sched=credit`
 
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index afbc0e9..dc8e876 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -187,6 +187,14 @@ config SCHED_ARINC653
 	  The ARINC653 scheduler is a hard real-time scheduler for single
 	  cores, targeted for avionics, drones, and medical devices.
 
+config SCHED_NULL
+	bool "Null scheduler support (EXPERIMENTAL)"
+	default y
+	---help---
+	  The null scheduler is a static, zero overhead scheduler,
+	  for when there always are less vCPUs than pCPUs, typically
+	  in embedded or HPC scenarios.
+
 choice
 	prompt "Default Scheduler?"
 	default SCHED_CREDIT_DEFAULT
@@ -199,6 +207,8 @@ choice
 		bool "RT Scheduler" if SCHED_RTDS
 	config SCHED_ARINC653_DEFAULT
 		bool "ARINC653 Scheduler" if SCHED_ARINC653
+	config SCHED_NULL_DEFAULT
+		bool "Null Scheduler" if SCHED_NULL
 endchoice
 
 config SCHED_DEFAULT
@@ -207,6 +217,7 @@ config SCHED_DEFAULT
 	default "credit2" if SCHED_CREDIT2_DEFAULT
 	default "rtds" if SCHED_RTDS_DEFAULT
 	default "arinc653" if SCHED_ARINC653_DEFAULT
+	default "null" if SCHED_NULL_DEFAULT
 	default "credit"
 
 endmenu
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 0fed30b..26c5a64 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -40,6 +40,7 @@ obj-$(CONFIG_SCHED_ARINC653) += sched_arinc653.o
 obj-$(CONFIG_SCHED_CREDIT) += sched_credit.o
 obj-$(CONFIG_SCHED_CREDIT2) += sched_credit2.o
 obj-$(CONFIG_SCHED_RTDS) += sched_rt.o
+obj-$(CONFIG_SCHED_NULL) += sched_null.o
 obj-y += schedule.o
 obj-y += shutdown.o
 obj-y += softirq.o
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
new file mode 100644
index 0000000..c2c4182
--- /dev/null
+++ b/xen/common/sched_null.c
@@ -0,0 +1,786 @@
+/*
+ * xen/common/sched_null.c
+ *
+ *  Copyright (c) 2017, Dario Faggioli, Citrix Ltd
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * The 'null' scheduler always choose to run, on each pCPU, either nothing
+ * (i.e., the pCPU stays idle) or always the same vCPU.
+ *
+ * It is aimed at supporting static scenarios, where there always are
+ * less vCPUs than pCPUs (and the vCPUs don't need to move among pCPUs
+ * for any reason) with the least possible overhead.
+ *
+ * Typical usecase are embedded applications, but also HPC, especially
+ * if the scheduler is used inside a cpupool.
+ */
+
+#include <xen/sched.h>
+#include <xen/sched-if.h>
+#include <xen/softirq.h>
+#include <xen/keyhandler.h>
+
+
+/*
+ * Locking:
+ * - Scheduler-lock (a.k.a. runqueue lock):
+ *  + is per-pCPU;
+ *  + serializes assignment and deassignment of vCPUs to a pCPU.
+ * - Private data lock (a.k.a. private scheduler lock):
+ *  + is scheduler-wide;
+ *  + serializes accesses to the list of domains in this scheduler.
+ * - Waitqueue lock:
+ *  + is scheduler-wide;
+ *  + serialize accesses to the list of vCPUs waiting to be assigned
+ *    to pCPUs.
+ *
+ * Ordering is: private lock, runqueue lock, waitqueue lock. Or, OTOH,
+ * waitqueue lock nests inside runqueue lock which nests inside private
+ * lock. More specifically:
+ *  + if we need both runqueue and private locks, we must acquire the
+ *    private lock for first;
+ *  + if we need both runqueue and waitqueue locks, we must acquire
+ *    the runqueue lock for first;
+ *  + if we need both private and waitqueue locks, we must acquire
+ *    the private lock for first;
+ *  + if we already own a runqueue lock, we must never acquire
+ *    the private lock;
+ *  + if we already own the waitqueue lock, we must never acquire
+ *    the runqueue lock or the private lock.
+ */
+
+/*
+ * System-wide private data
+ */
+struct null_private {
+    spinlock_t lock;        /* scheduler lock; nests inside cpupool_lock */
+    struct list_head ndom;  /* Domains of this scheduler                 */
+    struct list_head waitq; /* vCPUs not assigned to any pCPU            */
+    spinlock_t waitq_lock;  /* serializes waitq; nests inside runq locks */
+    cpumask_t cpus_free;    /* CPUs without a vCPU associated to them    */
+};
+
+/*
+ * Physical CPU
+ */
+struct null_pcpu {
+    struct vcpu *vcpu;
+};
+DEFINE_PER_CPU(struct null_pcpu, npc);
+
+/*
+ * Virtual CPU
+ */
+struct null_vcpu {
+    struct list_head waitq_elem;
+    struct vcpu *vcpu;
+};
+
+/*
+ * Domain
+ */
+struct null_dom {
+    struct list_head ndom_elem;
+    struct domain *dom;
+};
+
+/*
+ * Accessor helpers functions
+ */
+static inline struct null_private *null_priv(const struct scheduler *ops)
+{
+    return ops->sched_data;
+}
+
+static inline struct null_vcpu *null_vcpu(const struct vcpu *v)
+{
+    return v->sched_priv;
+}
+
+static inline struct null_dom *null_dom(const struct domain *d)
+{
+    return d->sched_priv;
+}
+
+static int null_init(struct scheduler *ops)
+{
+    struct null_private *prv;
+
+    printk("Initializing null scheduler\n"
+           "WARNING: This is experimental software in development.\n"
+           "Use at your own risk.\n");
+
+    prv = xzalloc(struct null_private);
+    if ( prv == NULL )
+        return -ENOMEM;
+
+    spin_lock_init(&prv->lock);
+    spin_lock_init(&prv->waitq_lock);
+    INIT_LIST_HEAD(&prv->ndom);
+    INIT_LIST_HEAD(&prv->waitq);
+
+    ops->sched_data = prv;
+
+    return 0;
+}
+
+static void null_deinit(struct scheduler *ops)
+{
+    xfree(ops->sched_data);
+    ops->sched_data = NULL;
+}
+
+static void init_pdata(struct null_private *prv, unsigned int cpu)
+{
+    /* Mark the pCPU as free, and with no vCPU assigned */
+    cpumask_set_cpu(cpu, &prv->cpus_free);
+    per_cpu(npc, cpu).vcpu = NULL;
+}
+
+static void null_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
+{
+    struct null_private *prv = null_priv(ops);
+    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+
+    /* alloc_pdata is not implemented, so we want this to be NULL. */
+    ASSERT(!pdata);
+
+    /*
+     * The scheduler lock points already to the default per-cpu spinlock,
+     * so there is no remapping to be done.
+     */
+    ASSERT(sd->schedule_lock == &sd->_lock && !spin_is_locked(&sd->_lock));
+
+    init_pdata(prv, cpu);
+}
+
+static void null_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
+{
+    struct null_private *prv = null_priv(ops);
+
+    /* alloc_pdata not implemented, so this must have stayed NULL */
+    ASSERT(!pcpu);
+
+    cpumask_clear_cpu(cpu, &prv->cpus_free);
+    per_cpu(npc, cpu).vcpu = NULL;
+}
+
+static void *null_alloc_vdata(const struct scheduler *ops,
+                              struct vcpu *v, void *dd)
+{
+    struct null_vcpu *nvc;
+
+    nvc = xzalloc(struct null_vcpu);
+    if ( nvc == NULL )
+        return NULL;
+
+    INIT_LIST_HEAD(&nvc->waitq_elem);
+    nvc->vcpu = v;
+
+    SCHED_STAT_CRANK(vcpu_alloc);
+
+    return nvc;
+}
+
+static void null_free_vdata(const struct scheduler *ops, void *priv)
+{
+    struct null_vcpu *nvc = priv;
+
+    xfree(nvc);
+}
+
+static void * null_alloc_domdata(const struct scheduler *ops,
+                                 struct domain *d)
+{
+    struct null_private *prv = null_priv(ops);
+    struct null_dom *ndom;
+    unsigned long flags;
+
+    ndom = xzalloc(struct null_dom);
+    if ( ndom == NULL )
+        return NULL;
+
+    ndom->dom = d;
+
+    spin_lock_irqsave(&prv->lock, flags);
+    list_add_tail(&ndom->ndom_elem, &null_priv(ops)->ndom);
+    spin_unlock_irqrestore(&prv->lock, flags);
+
+    return (void*)ndom;
+}
+
+static void null_free_domdata(const struct scheduler *ops, void *data)
+{
+    unsigned long flags;
+    struct null_dom *ndom = data;
+    struct null_private *prv = null_priv(ops);
+
+    spin_lock_irqsave(&prv->lock, flags);
+    list_del_init(&ndom->ndom_elem);
+    spin_unlock_irqrestore(&prv->lock, flags);
+
+    xfree(data);
+}
+
+static int null_dom_init(const struct scheduler *ops, struct domain *d)
+{
+    struct null_dom *ndom;
+
+    if ( is_idle_domain(d) )
+        return 0;
+
+    ndom = null_alloc_domdata(ops, d);
+    if ( ndom == NULL )
+        return -ENOMEM;
+
+    d->sched_priv = ndom;
+
+    return 0;
+}
+static void null_dom_destroy(const struct scheduler *ops, struct domain *d)
+{
+    null_free_domdata(ops, null_dom(d));
+}
+
+/*
+ * vCPU to pCPU assignment and placement. This _only_ happens:
+ *  - on insert,
+ *  - on migrate.
+ *
+ * Insert occurs when a vCPU joins this scheduler for the first time
+ * (e.g., when the domain it's part of is moved to the scheduler's
+ * cpupool).
+ *
+ * Migration may be necessary if a pCPU (with a vCPU assigned to it)
+ * is removed from the scheduler's cpupool.
+ *
+ * So this is not part of any hot path.
+ */
+static unsigned int pick_cpu(struct null_private *prv, struct vcpu *v)
+{
+    unsigned int cpu = v->processor, new_cpu;
+    cpumask_t *cpus = cpupool_domain_cpumask(v->domain);
+
+    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+
+    /*
+     * If our processor is free, or we are assigned to it, and it is
+     * also still valid, just go for it.
+     */
+    if ( likely((per_cpu(npc, cpu).vcpu == NULL || per_cpu(npc, cpu).vcpu == v)
+                && cpumask_test_cpu(cpu, cpus)) )
+        return cpu;
+
+    /* If not, just go for a valid free pCPU, if any */
+    cpumask_and(cpumask_scratch_cpu(cpu), &prv->cpus_free, cpus);
+    new_cpu = cpumask_first(cpumask_scratch_cpu(cpu));
+
+    if ( likely(new_cpu != nr_cpu_ids) )
+        return new_cpu;
+
+    /*
+     * If we didn't find any free pCPU, just pick any valid pcpu, even if
+     * it has another vCPU assigned. This will happen during shutdown and
+     * suspend/resume, but it may also happen during "normal operation", if
+     * all the pCPUs are busy.
+     *
+     * In fact, there must always be something sane in v->processor, or
+     * vcpu_schedule_lock() and friends won't work. This is not a problem,
+     * as we will actually assign the vCPU to the pCPU we return from here,
+     * only if the pCPU is free.
+     */
+    return cpumask_any(cpus);
+}
+
+static void vcpu_assign(struct null_private *prv, struct vcpu *v,
+                        unsigned int cpu)
+{
+    per_cpu(npc, cpu).vcpu = v;
+    v->processor = cpu;
+    cpumask_clear_cpu(cpu, &prv->cpus_free);
+
+    dprintk(XENLOG_G_INFO, "%d <-- d%dv%d\n", cpu, v->domain->domain_id, v->vcpu_id);
+}
+
+static void vcpu_deassign(struct null_private *prv, struct vcpu *v,
+                          unsigned int cpu)
+{
+    per_cpu(npc, cpu).vcpu = NULL;
+    cpumask_set_cpu(cpu, &prv->cpus_free);
+
+    dprintk(XENLOG_G_INFO, "%d <-- NULL (d%dv%d)\n", cpu, v->domain->domain_id, v->vcpu_id);
+}
+
+/* Change the scheduler of cpu to us (null). */
+static void null_switch_sched(struct scheduler *new_ops, unsigned int cpu,
+                              void *pdata, void *vdata)
+{
+    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    struct null_private *prv = null_priv(new_ops);
+    struct null_vcpu *nvc = vdata;
+
+    ASSERT(nvc && is_idle_vcpu(nvc->vcpu));
+
+    idle_vcpu[cpu]->sched_priv = vdata;
+
+    /*
+     * We are holding the runqueue lock already (it's been taken in
+     * schedule_cpu_switch()). It actually may or may not be the 'right'
+     * one for this cpu, but that is ok for preventing races.
+     */
+    ASSERT(!local_irq_is_enabled());
+
+    init_pdata(prv, cpu);
+
+    per_cpu(scheduler, cpu) = new_ops;
+    per_cpu(schedule_data, cpu).sched_priv = pdata;
+
+    /*
+     * (Re?)route the lock to the per pCPU lock as /last/ thing. In fact,
+     * if it is free (and it can be) we want that anyone that manages
+     * taking it, finds all the initializations we've done above in place.
+     */
+    smp_mb();
+    sd->schedule_lock = &sd->_lock;
+}
+
+static void null_vcpu_insert(const struct scheduler *ops, struct vcpu *v)
+{
+    struct null_private *prv = null_priv(ops);
+    struct null_vcpu *nvc = null_vcpu(v);
+    spinlock_t *lock;
+
+    ASSERT(!is_idle_vcpu(v));
+
+    lock = vcpu_schedule_lock_irq(v);
+ retry:
+
+    v->processor = pick_cpu(prv, v);
+
+    spin_unlock(lock);
+
+    lock = vcpu_schedule_lock(v);
+
+    /* If the pCPU is free, we assign v to it */
+    if ( likely(per_cpu(npc, v->processor).vcpu == NULL) )
+    {
+        /*
+         * Insert is followed by vcpu_wake(), so there's no need to poke
+         * the pcpu with the SCHEDULE_SOFTIRQ, as wake will do that.
+         */
+        vcpu_assign(prv, v, v->processor);
+    }
+    else if ( cpumask_intersects(&prv->cpus_free,
+                                 cpupool_domain_cpumask(v->domain)) )
+    {
+        /*
+         * If the pCPU is not free (e.g., because we raced with another
+         * insert or a migrate), but there are other free pCPUs, we can
+         * try to pick again.
+         */
+         goto retry;
+    }
+    else
+    {
+        /*
+         * If the pCPU is not free, and there aren't any (valid) others,
+         * we have no alternatives than to go into the waitqueue.
+         */
+        spin_lock(&prv->waitq_lock);
+        list_add_tail(&nvc->waitq_elem, &prv->waitq);
+        dprintk(XENLOG_G_WARNING, "WARNING: d%dv%d not assigned to any CPU!\n",
+                v->domain->domain_id, v->vcpu_id);
+        spin_unlock(&prv->waitq_lock);
+    }
+    spin_unlock_irq(lock);
+
+    SCHED_STAT_CRANK(vcpu_insert);
+}
+
+static void _vcpu_remove(struct null_private *prv, struct vcpu *v)
+{
+    unsigned int cpu = v->processor;
+    struct domain *d = v->domain;
+    struct null_vcpu *wvc;
+
+    ASSERT(list_empty(&null_vcpu(v)->waitq_elem));
+
+    spin_lock(&prv->waitq_lock);
+
+    /*
+     * If v is assigned to a pCPU, let's see if there is someone waiting.
+     * If yes, we assign it to cpu, in spite of v.
+     */
+    wvc = list_first_entry_or_null(&prv->waitq, struct null_vcpu, waitq_elem);
+    if ( wvc && cpumask_test_cpu(cpu, cpupool_domain_cpumask(d)) )
+    {
+        list_del_init(&wvc->waitq_elem);
+        vcpu_assign(prv, wvc->vcpu, cpu);
+        cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
+    }
+    else
+    {
+        vcpu_deassign(prv, v, cpu);
+    }
+
+    spin_unlock(&prv->waitq_lock);
+}
+
+static void null_vcpu_remove(const struct scheduler *ops, struct vcpu *v)
+{
+    struct null_private *prv = null_priv(ops);
+    struct null_vcpu *nvc = null_vcpu(v);
+    spinlock_t *lock;
+
+    ASSERT(!is_idle_vcpu(v));
+
+    lock = vcpu_schedule_lock_irq(v);
+
+    /* If v is in waitqueue, just get it out of there and bail */
+    if ( unlikely(!list_empty(&nvc->waitq_elem)) )
+    {
+        spin_lock(&prv->waitq_lock);
+        list_del_init(&nvc->waitq_elem);
+        spin_unlock(&prv->waitq_lock);
+
+        goto out;
+    }
+
+    ASSERT(per_cpu(npc, v->processor).vcpu == v);
+    ASSERT(!cpumask_test_cpu(v->processor, &prv->cpus_free));
+
+    _vcpu_remove(prv, v);
+
+ out:
+    vcpu_schedule_unlock_irq(lock, v);
+
+    SCHED_STAT_CRANK(vcpu_remove);
+}
+
+static void null_vcpu_wake(const struct scheduler *ops, struct vcpu *v)
+{
+    ASSERT(!is_idle_vcpu(v));
+
+    if ( unlikely(curr_on_cpu(v->processor) == v) )
+    {
+        SCHED_STAT_CRANK(vcpu_wake_running);
+        return;
+    }
+
+    if ( unlikely(!list_empty(&null_vcpu(v)->waitq_elem)) )
+    {
+        /* Not exactly "on runq", but close enough for reusing the counter */
+        SCHED_STAT_CRANK(vcpu_wake_onrunq);
+        return;
+    }
+
+    if ( likely(vcpu_runnable(v)) )
+        SCHED_STAT_CRANK(vcpu_wake_runnable);
+    else
+        SCHED_STAT_CRANK(vcpu_wake_not_runnable);
+
+    /* Note that we get here only for vCPUs assigned to a pCPU */
+    cpu_raise_softirq(v->processor, SCHEDULE_SOFTIRQ);
+}
+
+static void null_vcpu_sleep(const struct scheduler *ops, struct vcpu *v)
+{
+    ASSERT(!is_idle_vcpu(v));
+
+    /* If v is not assigned to a pCPU, or is not running, no need to bother */
+    if ( curr_on_cpu(v->processor) == v )
+        cpu_raise_softirq(v->processor, SCHEDULE_SOFTIRQ);
+
+    SCHED_STAT_CRANK(vcpu_sleep);
+}
+
+static int null_cpu_pick(const struct scheduler *ops, struct vcpu *v)
+{
+    ASSERT(!is_idle_vcpu(v));
+    return pick_cpu(null_priv(ops), v);
+}
+
+static void null_vcpu_migrate(const struct scheduler *ops, struct vcpu *v,
+                              unsigned int new_cpu)
+{
+    struct null_private *prv = null_priv(ops);
+    struct null_vcpu *nvc = null_vcpu(v);
+
+    ASSERT(!is_idle_vcpu(v));
+
+    if ( v->processor == new_cpu )
+        return;
+
+    /*
+     * v is either assigned to a pCPU, or in the waitqueue.
+     *
+     * In the former case, the pCPU to which it was assigned would
+     * become free, and we, therefore, should check whether there is
+     * anyone in the waitqueue that can be assigned to it.
+     *
+     * In the latter, there is just nothing to do.
+     */
+    if ( likely(list_empty(&nvc->waitq_elem)) )
+    {
+        _vcpu_remove(prv, v);
+        SCHED_STAT_CRANK(migrate_running);
+    }
+    else
+        SCHED_STAT_CRANK(migrate_on_runq);
+
+    SCHED_STAT_CRANK(migrated);
+
+    /*
+     * Let's now consider new_cpu, which is where v is being sent. It can be
+     * either free, or have a vCPU already assigned to it.
+     *
+     * In the former case, we should assign v to it, and try to get it to run.
+     *
+     * In latter, all we can do is to park v in the waitqueue.
+     */
+    if ( per_cpu(npc, new_cpu).vcpu == NULL )
+    {
+        /* v might have been in the waitqueue, so remove it */
+        spin_lock(&prv->waitq_lock);
+        list_del_init(&nvc->waitq_elem);
+        spin_unlock(&prv->waitq_lock);
+
+        vcpu_assign(prv, v, new_cpu);
+    }
+    else
+    {
+        /* Put v in the waitqueue, if it wasn't there already */
+        spin_lock(&prv->waitq_lock);
+        if ( list_empty(&nvc->waitq_elem) )
+        {
+            list_add_tail(&nvc->waitq_elem, &prv->waitq);
+            dprintk(XENLOG_G_WARNING, "WARNING: d%dv%d not assigned to any CPU!\n",
+                    v->domain->domain_id, v->vcpu_id);
+        }
+        spin_unlock(&prv->waitq_lock);
+    }
+
+    /*
+     * Whatever all the above, we always at least override v->processor.
+     * This is especially important for shutdown or suspend/resume paths,
+     * when it is important to let our caller (cpu_disable_scheduler())
+     * know that the migration did happen, to the best of our possibilities,
+     * at least. In case of suspend, any temporary inconsistency caused
+     * by this, will be fixed-up during resume.
+     */
+    v->processor = new_cpu;
+}
+
+#ifndef NDEBUG
+static inline void null_vcpu_check(struct vcpu *v)
+{
+    struct null_vcpu * const nvc = null_vcpu(v);
+    struct null_dom * const ndom = null_dom(v->domain);
+
+    BUG_ON(nvc->vcpu != v);
+
+    if ( ndom )
+        BUG_ON(is_idle_vcpu(v));
+    else
+        BUG_ON(!is_idle_vcpu(v));
+
+    SCHED_STAT_CRANK(vcpu_check);
+}
+#define NULL_VCPU_CHECK(v)  (null_vcpu_check(v))
+#else
+#define NULL_VCPU_CHECK(v)
+#endif
+
+
+/*
+ * The most simple scheduling function of all times! We either return:
+ *  - the vCPU assigned to the pCPU, if there's one and it can run;
+ *  - the idle vCPU, otherwise.
+ */
+static struct task_slice null_schedule(const struct scheduler *ops,
+                                       s_time_t now,
+                                       bool_t tasklet_work_scheduled)
+{
+    const unsigned int cpu = smp_processor_id();
+    struct null_private *prv = null_priv(ops);
+    struct null_vcpu *wvc;
+    struct task_slice ret;
+
+    SCHED_STAT_CRANK(schedule);
+    NULL_VCPU_CHECK(current);
+
+    ret.task = per_cpu(npc, cpu).vcpu;
+    ret.migrated = 0;
+    ret.time = -1;
+
+    /*
+     * We may be new in the cpupool, or just coming back online. In which
+     * case, there may be vCPUs in the waitqueue that we can assign to us
+     * and run.
+     */
+    if ( unlikely(ret.task == NULL) )
+    {
+        spin_lock(&prv->waitq_lock);
+        wvc = list_first_entry_or_null(&prv->waitq, struct null_vcpu, waitq_elem);
+        if ( wvc )
+        {
+            vcpu_assign(prv, wvc->vcpu, cpu);
+            list_del_init(&wvc->waitq_elem);
+            ret.task = wvc->vcpu;
+        }
+        spin_unlock(&prv->waitq_lock);
+    }
+
+    if ( unlikely(tasklet_work_scheduled ||
+                  ret.task == NULL ||
+                  !vcpu_runnable(ret.task)) )
+        ret.task = idle_vcpu[cpu];
+
+    NULL_VCPU_CHECK(ret.task);
+    return ret;
+}
+
+static inline void dump_vcpu(struct null_private *prv, struct null_vcpu *nvc)
+{
+    printk("[%i.%i] pcpu=%d", nvc->vcpu->domain->domain_id,
+            nvc->vcpu->vcpu_id, list_empty(&nvc->waitq_elem) ?
+                                nvc->vcpu->processor : -1);
+}
+
+static void null_dump_pcpu(const struct scheduler *ops, int cpu)
+{
+    struct null_private *prv = null_priv(ops);
+    struct null_vcpu *nvc;
+    spinlock_t *lock;
+    unsigned long flags;
+#define cpustr keyhandler_scratch
+
+    lock = pcpu_schedule_lock_irqsave(cpu, &flags);
+
+    cpumask_scnprintf(cpustr, sizeof(cpustr), per_cpu(cpu_sibling_mask, cpu));
+    printk("CPU[%02d] sibling=%s, ", cpu, cpustr);
+    cpumask_scnprintf(cpustr, sizeof(cpustr), per_cpu(cpu_core_mask, cpu));
+    printk("core=%s", cpustr);
+    if ( per_cpu(npc, cpu).vcpu != NULL )
+        printk(", vcpu=d%dv%d", per_cpu(npc, cpu).vcpu->domain->domain_id,
+               per_cpu(npc, cpu).vcpu->vcpu_id);
+    printk("\n");
+
+    /* current VCPU (nothing to say if that's the idle vcpu) */
+    nvc = null_vcpu(curr_on_cpu(cpu));
+    if ( nvc && !is_idle_vcpu(nvc->vcpu) )
+    {
+        printk("\trun: ");
+        dump_vcpu(prv, nvc);
+        printk("\n");
+    }
+
+    pcpu_schedule_unlock_irqrestore(lock, flags, cpu);
+#undef cpustr
+}
+
+static void null_dump(const struct scheduler *ops)
+{
+    struct null_private *prv = null_priv(ops);
+    struct list_head *iter;
+    unsigned long flags;
+    unsigned int loop;
+#define cpustr keyhandler_scratch
+
+    spin_lock_irqsave(&prv->lock, flags);
+
+    cpulist_scnprintf(cpustr, sizeof(cpustr), &prv->cpus_free);
+    printk("\tcpus_free = %s\n", cpustr);
+
+    printk("Domain info:\n");
+    loop = 0;
+    list_for_each( iter, &prv->ndom )
+    {
+        struct null_dom *ndom;
+        struct vcpu *v;
+
+        ndom = list_entry(iter, struct null_dom, ndom_elem);
+
+        printk("\tDomain: %d\n", ndom->dom->domain_id);
+        for_each_vcpu( ndom->dom, v )
+        {
+            struct null_vcpu * const nvc = null_vcpu(v);
+            spinlock_t *lock;
+
+            lock = vcpu_schedule_lock(nvc->vcpu);
+
+            printk("\t%3d: ", ++loop);
+            dump_vcpu(prv, nvc);
+            printk("\n");
+
+            vcpu_schedule_unlock(lock, nvc->vcpu);
+        }
+    }
+
+    printk("Waitqueue: ");
+    loop = 0;
+    spin_lock(&prv->waitq_lock);
+    list_for_each( iter, &prv->waitq )
+    {
+        struct null_vcpu *nvc = list_entry(iter, struct null_vcpu, waitq_elem);
+
+        if ( loop++ != 0 )
+            printk(", ");
+        if ( loop % 24 == 0 )
+            printk("\n\t");
+        printk("d%dv%d", nvc->vcpu->domain->domain_id, nvc->vcpu->vcpu_id);
+    }
+    printk("\n");
+    spin_unlock(&prv->waitq_lock);
+
+    spin_unlock_irqrestore(&prv->lock, flags);
+#undef cpustr
+}
+
+const struct scheduler sched_null_def = {
+    .name           = "null Scheduler",
+    .opt_name       = "null",
+    .sched_id       = XEN_SCHEDULER_NULL,
+    .sched_data     = NULL,
+
+    .init           = null_init,
+    .deinit         = null_deinit,
+    .init_pdata     = null_init_pdata,
+    .switch_sched   = null_switch_sched,
+    .deinit_pdata   = null_deinit_pdata,
+
+    .alloc_vdata    = null_alloc_vdata,
+    .free_vdata     = null_free_vdata,
+    .alloc_domdata  = null_alloc_domdata,
+    .free_domdata   = null_free_domdata,
+
+    .init_domain    = null_dom_init,
+    .destroy_domain = null_dom_destroy,
+
+    .insert_vcpu    = null_vcpu_insert,
+    .remove_vcpu    = null_vcpu_remove,
+
+    .wake           = null_vcpu_wake,
+    .sleep          = null_vcpu_sleep,
+    .pick_cpu       = null_cpu_pick,
+    .migrate        = null_vcpu_migrate,
+    .do_schedule    = null_schedule,
+
+    .dump_cpu_state = null_dump_pcpu,
+    .dump_settings  = null_dump,
+};
+
+REGISTER_SCHEDULER(sched_null_def);
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 9e3ce21..477f8f3 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -346,6 +346,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t);
 #define XEN_SCHEDULER_CREDIT2  6
 #define XEN_SCHEDULER_ARINC653 7
 #define XEN_SCHEDULER_RTDS     8
+#define XEN_SCHEDULER_NULL     9
 
 typedef struct xen_domctl_sched_credit {
     uint16_t weight;


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 4/5] xen: sched_null: support for hard affinity
  2017-04-07  0:33 [PATCH v2 0/5] The 'null' Scheduler Dario Faggioli
                   ` (2 preceding siblings ...)
  2017-04-07  0:34 ` [PATCH v2 3/5] xen: sched: introduce the 'null' semi-static scheduler Dario Faggioli
@ 2017-04-07  0:34 ` Dario Faggioli
  2017-04-07 10:08   ` George Dunlap
  2017-04-07  0:34 ` [PATCH v2 5/5] tools: sched: add support for 'null' scheduler Dario Faggioli
  4 siblings, 1 reply; 13+ messages in thread
From: Dario Faggioli @ 2017-04-07  0:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Jonathan Davies, Julien Grall, George Dunlap,
	Marcus Granado

As a (rudimental) way of directing and affecting the
placement logic implemented by the scheduler, support
vCPU hard affinity.

Basically, a vCPU will now be assigned only to a pCPU
that is part of its own hard affinity. If such pCPU(s)
is (are) busy, the vCPU will wait, like it happens
when there are no free pCPUs.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Stefano Stabellini <stefano@aporeto.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Jonathan Davies <Jonathan.Davies@citrix.com>
Cc: Marcus Granado <marcus.granado@citrix.com>
---
Changes from v1:
- coding style fixes (removed some hard tabs);
- better signature for check_nvc_affinity() (also renamed in
  vcpu_check_affinity());
- fixed bug in null_vcpu_remove() using uninitialized cpumask.
---
 xen/common/sched_null.c |   50 ++++++++++++++++++++++++++++++++---------------
 1 file changed, 34 insertions(+), 16 deletions(-)

diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index c2c4182..96652a0 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -115,6 +115,14 @@ static inline struct null_dom *null_dom(const struct domain *d)
     return d->sched_priv;
 }
 
+static inline bool vcpu_check_affinity(struct vcpu *v, unsigned int cpu)
+{
+    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                cpupool_domain_cpumask(v->domain));
+
+    return cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu));
+}
+
 static int null_init(struct scheduler *ops)
 {
     struct null_private *prv;
@@ -276,16 +284,22 @@ static unsigned int pick_cpu(struct null_private *prv, struct vcpu *v)
 
     ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
 
+    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity, cpus);
+
     /*
-     * If our processor is free, or we are assigned to it, and it is
-     * also still valid, just go for it.
+     * If our processor is free, or we are assigned to it, and it is also
+     * still valid and part of our affinity, just go for it.
+     * (Note that we may call vcpu_check_affinity(), but we deliberately
+     * don't, so we get to keep in the scratch cpumask what we have just
+     * put in it.)
      */
     if ( likely((per_cpu(npc, cpu).vcpu == NULL || per_cpu(npc, cpu).vcpu == v)
-                && cpumask_test_cpu(cpu, cpus)) )
+                && cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
         return cpu;
 
-    /* If not, just go for a valid free pCPU, if any */
-    cpumask_and(cpumask_scratch_cpu(cpu), &prv->cpus_free, cpus);
+    /* If not, just go for a free pCPU, within our affinity, if any */
+    cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
+                &prv->cpus_free);
     new_cpu = cpumask_first(cpumask_scratch_cpu(cpu));
 
     if ( likely(new_cpu != nr_cpu_ids) )
@@ -302,7 +316,8 @@ static unsigned int pick_cpu(struct null_private *prv, struct vcpu *v)
      * as we will actually assign the vCPU to the pCPU we return from here,
      * only if the pCPU is free.
      */
-    return cpumask_any(cpus);
+    cpumask_and(cpumask_scratch_cpu(cpu), cpus, v->cpu_hard_affinity);
+    return cpumask_any(cpumask_scratch_cpu(cpu));
 }
 
 static void vcpu_assign(struct null_private *prv, struct vcpu *v,
@@ -361,6 +376,7 @@ static void null_vcpu_insert(const struct scheduler *ops, struct vcpu *v)
 {
     struct null_private *prv = null_priv(ops);
     struct null_vcpu *nvc = null_vcpu(v);
+    unsigned int cpu;
     spinlock_t *lock;
 
     ASSERT(!is_idle_vcpu(v));
@@ -368,23 +384,25 @@ static void null_vcpu_insert(const struct scheduler *ops, struct vcpu *v)
     lock = vcpu_schedule_lock_irq(v);
  retry:
 
-    v->processor = pick_cpu(prv, v);
+    cpu = v->processor = pick_cpu(prv, v);
 
     spin_unlock(lock);
 
     lock = vcpu_schedule_lock(v);
 
+    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                cpupool_domain_cpumask(v->domain));
+
     /* If the pCPU is free, we assign v to it */
-    if ( likely(per_cpu(npc, v->processor).vcpu == NULL) )
+    if ( likely(per_cpu(npc, cpu).vcpu == NULL) )
     {
         /*
          * Insert is followed by vcpu_wake(), so there's no need to poke
          * the pcpu with the SCHEDULE_SOFTIRQ, as wake will do that.
          */
-        vcpu_assign(prv, v, v->processor);
+        vcpu_assign(prv, v, cpu);
     }
-    else if ( cpumask_intersects(&prv->cpus_free,
-                                 cpupool_domain_cpumask(v->domain)) )
+    else if ( cpumask_intersects(&prv->cpus_free, cpumask_scratch_cpu(cpu)) )
     {
         /*
          * If the pCPU is not free (e.g., because we raced with another
@@ -413,7 +431,6 @@ static void null_vcpu_insert(const struct scheduler *ops, struct vcpu *v)
 static void _vcpu_remove(struct null_private *prv, struct vcpu *v)
 {
     unsigned int cpu = v->processor;
-    struct domain *d = v->domain;
     struct null_vcpu *wvc;
 
     ASSERT(list_empty(&null_vcpu(v)->waitq_elem));
@@ -425,7 +442,7 @@ static void _vcpu_remove(struct null_private *prv, struct vcpu *v)
      * If yes, we assign it to cpu, in spite of v.
      */
     wvc = list_first_entry_or_null(&prv->waitq, struct null_vcpu, waitq_elem);
-    if ( wvc && cpumask_test_cpu(cpu, cpupool_domain_cpumask(d)) )
+    if ( wvc && vcpu_check_affinity(wvc->vcpu, cpu) )
     {
         list_del_init(&wvc->waitq_elem);
         vcpu_assign(prv, wvc->vcpu, cpu);
@@ -547,11 +564,12 @@ static void null_vcpu_migrate(const struct scheduler *ops, struct vcpu *v,
      * Let's now consider new_cpu, which is where v is being sent. It can be
      * either free, or have a vCPU already assigned to it.
      *
-     * In the former case, we should assign v to it, and try to get it to run.
+     * In the former case, we should assign v to it, and try to get it to run,
+     * if possible, according to affinity.
      *
      * In latter, all we can do is to park v in the waitqueue.
      */
-    if ( per_cpu(npc, new_cpu).vcpu == NULL )
+    if ( per_cpu(npc, new_cpu).vcpu == NULL && vcpu_check_affinity(v, new_cpu) )
     {
         /* v might have been in the waitqueue, so remove it */
         spin_lock(&prv->waitq_lock);
@@ -635,7 +653,7 @@ static struct task_slice null_schedule(const struct scheduler *ops,
     {
         spin_lock(&prv->waitq_lock);
         wvc = list_first_entry_or_null(&prv->waitq, struct null_vcpu, waitq_elem);
-        if ( wvc )
+        if ( wvc && vcpu_check_affinity(wvc->vcpu, cpu) )
         {
             vcpu_assign(prv, wvc->vcpu, cpu);
             list_del_init(&wvc->waitq_elem);


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 5/5] tools: sched: add support for 'null' scheduler
  2017-04-07  0:33 [PATCH v2 0/5] The 'null' Scheduler Dario Faggioli
                   ` (3 preceding siblings ...)
  2017-04-07  0:34 ` [PATCH v2 4/5] xen: sched_null: support for hard affinity Dario Faggioli
@ 2017-04-07  0:34 ` Dario Faggioli
  4 siblings, 0 replies; 13+ messages in thread
From: Dario Faggioli @ 2017-04-07  0:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Ian Jackson, Julien Grall, Wei Liu, George Dunlap

It being very very basic, also means this scheduler does
not need much support at the tools level (for now).

Basically, just the definition of the symbol of the
scheduler itself and a couple of stubs.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Stefano Stabellini <stefano@aporeto.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Julien Grall <julien.grall@arm.com>
---
 tools/libxl/libxl.h         |    6 ++++++
 tools/libxl/libxl_sched.c   |   24 ++++++++++++++++++++++++
 tools/libxl/libxl_types.idl |    1 +
 3 files changed, 31 insertions(+)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index a402236..cf8687a 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -210,6 +210,12 @@
 #define LIBXL_HAVE_SCHED_RTDS 1
 
 /*
+ * LIBXL_HAVE_SCHED_NULL indicates that the 'null' static scheduler
+ * is available.
+ */
+#define LIBXL_HAVE_SCHED_NULL 1
+
+/*
  * libxl_domain_build_info has u.hvm.viridian_enable and _disable bitmaps
  * of the specified width.
  */
diff --git a/tools/libxl/libxl_sched.c b/tools/libxl/libxl_sched.c
index 84d3837..d44fbe1 100644
--- a/tools/libxl/libxl_sched.c
+++ b/tools/libxl/libxl_sched.c
@@ -178,6 +178,20 @@ static int sched_arinc653_domain_set(libxl__gc *gc, uint32_t domid,
     return 0;
 }
 
+static int sched_null_domain_set(libxl__gc *gc, uint32_t domid,
+                                 const libxl_domain_sched_params *scinfo)
+{
+    /* The null scheduler doesn't take any domain-specific parameters. */
+    return 0;
+}
+
+static int sched_null_domain_get(libxl__gc *gc, uint32_t domid,
+                               libxl_domain_sched_params *scinfo)
+{
+    /* The null scheduler doesn't have any domain-specific parameters. */
+    return ERROR_INVAL;
+}
+
 static int sched_credit_domain_get(libxl__gc *gc, uint32_t domid,
                                    libxl_domain_sched_params *scinfo)
 {
@@ -730,6 +744,9 @@ int libxl_domain_sched_params_set(libxl_ctx *ctx, uint32_t domid,
     case LIBXL_SCHEDULER_RTDS:
         ret=sched_rtds_domain_set(gc, domid, scinfo);
         break;
+    case LIBXL_SCHEDULER_NULL:
+        ret=sched_null_domain_set(gc, domid, scinfo);
+        break;
     default:
         LOGD(ERROR, domid, "Unknown scheduler");
         ret=ERROR_INVAL;
@@ -758,6 +775,7 @@ int libxl_vcpu_sched_params_set(libxl_ctx *ctx, uint32_t domid,
     case LIBXL_SCHEDULER_CREDIT:
     case LIBXL_SCHEDULER_CREDIT2:
     case LIBXL_SCHEDULER_ARINC653:
+    case LIBXL_SCHEDULER_NULL:
         LOGD(ERROR, domid, "per-VCPU parameter setting not supported for this scheduler");
         rc = ERROR_INVAL;
         break;
@@ -792,6 +810,7 @@ int libxl_vcpu_sched_params_set_all(libxl_ctx *ctx, uint32_t domid,
     case LIBXL_SCHEDULER_CREDIT:
     case LIBXL_SCHEDULER_CREDIT2:
     case LIBXL_SCHEDULER_ARINC653:
+    case LIBXL_SCHEDULER_NULL:
         LOGD(ERROR, domid, "per-VCPU parameter setting not supported for this scheduler");
         rc = ERROR_INVAL;
         break;
@@ -832,6 +851,9 @@ int libxl_domain_sched_params_get(libxl_ctx *ctx, uint32_t domid,
     case LIBXL_SCHEDULER_RTDS:
         ret=sched_rtds_domain_get(gc, domid, scinfo);
         break;
+    case LIBXL_SCHEDULER_NULL:
+        ret=sched_null_domain_get(gc, domid, scinfo);
+        break;
     default:
         LOGD(ERROR, domid, "Unknown scheduler");
         ret=ERROR_INVAL;
@@ -858,6 +880,7 @@ int libxl_vcpu_sched_params_get(libxl_ctx *ctx, uint32_t domid,
     case LIBXL_SCHEDULER_CREDIT:
     case LIBXL_SCHEDULER_CREDIT2:
     case LIBXL_SCHEDULER_ARINC653:
+    case LIBXL_SCHEDULER_NULL:
         LOGD(ERROR, domid, "per-VCPU parameter getting not supported for this scheduler");
         rc = ERROR_INVAL;
         break;
@@ -890,6 +913,7 @@ int libxl_vcpu_sched_params_get_all(libxl_ctx *ctx, uint32_t domid,
     case LIBXL_SCHEDULER_CREDIT:
     case LIBXL_SCHEDULER_CREDIT2:
     case LIBXL_SCHEDULER_ARINC653:
+    case LIBXL_SCHEDULER_NULL:
         LOGD(ERROR, domid, "per-VCPU parameter getting not supported for this scheduler");
         rc = ERROR_INVAL;
         break;
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index d970284..d42f6a1 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -191,6 +191,7 @@ libxl_scheduler = Enumeration("scheduler", [
     (6, "credit2"),
     (7, "arinc653"),
     (8, "rtds"),
+    (9, "null"),
     ])
 
 # Consistent with SHUTDOWN_* in sched.h (apart from UNKNOWN)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/5] xen: sched: introduce the 'null' semi-static scheduler
  2017-04-07  0:34 ` [PATCH v2 3/5] xen: sched: introduce the 'null' semi-static scheduler Dario Faggioli
@ 2017-04-07  7:24   ` Alan Robinson
  2017-04-07  9:17   ` George Dunlap
  1 sibling, 0 replies; 13+ messages in thread
From: Alan Robinson @ 2017-04-07  7:24 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Jonathan Davies, Stefano Stabellini, George Dunlap,
	Marcus Granado, Julien Grall, xen-devel

On Fri, Apr 07, 2017 at 02:34:07AM +0200, Dario Faggioli wrote:
> In cases where one is absolutely sure that there will be
> less vCPUs than pCPUs, having to pay the cose, mostly in
s/cose/cost/


Alan

-- 
Alan Robinson
Fujitsu, Enterprise Platform Services, Germany

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/5] xen: sched: improve robustness (and rename) DOM2OP()
  2017-04-07  0:33 ` [PATCH v2 1/5] xen: sched: improve robustness (and rename) DOM2OP() Dario Faggioli
@ 2017-04-07  8:44   ` George Dunlap
  2017-04-07  9:05     ` Dario Faggioli
  0 siblings, 1 reply; 13+ messages in thread
From: George Dunlap @ 2017-04-07  8:44 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Juergen Gross, Jan Beulich

On 07/04/17 01:33, Dario Faggioli wrote:
> Clarify and enforce (with ASSERTs) when the function
> is called on the idle domain, and explain in comments
> what it means and when it is ok to do so.
> 
> While there, change the name of the function to a more
> self-explanatory one, and do the same to VCPU2OP.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Acked-by: George Dunlap <george.dunlap@citrix.com>

With one nit...

> ---
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> ---
> Changes from v1:
>  - new patch;
>  - renamed VCPU2OP, as suggested during v1's review of patch 1.
> 
> Changes from v1 of the null scheduler series:
>  - renamed the helpers to dom_scheduler() and vcpu_scheduler().
> ---
>  xen/common/schedule.c |   56 ++++++++++++++++++++++++++++++++-----------------
>  1 file changed, 37 insertions(+), 19 deletions(-)
> 
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index d344b7c..d67227f 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -77,8 +77,25 @@ static struct scheduler __read_mostly ops;
>           (( (opsptr)->fn != NULL ) ? (opsptr)->fn(opsptr, ##__VA_ARGS__ )  \
>            : (typeof((opsptr)->fn(opsptr, ##__VA_ARGS__)))0 )
>  
> -#define DOM2OP(_d)    (((_d)->cpupool == NULL) ? &ops : ((_d)->cpupool->sched))
> -static inline struct scheduler *VCPU2OP(const struct vcpu *v)
> +static inline struct scheduler *dom_scheduler(const struct domain *d)
> +{
> +    if ( likely(d->cpupool != NULL) )
> +        return d->cpupool->sched;
> +
> +    /*
> +     * If d->cpupool is NULL, this is the idle domain. This is special
> +     * because the idle domain does not really bolong to any cpupool, and,

*belong

I can fix this up on check-in if need be.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/5] xen: sched: make sure a pCPU added to a pool runs the scheduler ASAP
  2017-04-07  0:34 ` [PATCH v2 2/5] xen: sched: make sure a pCPU added to a pool runs the scheduler ASAP Dario Faggioli
@ 2017-04-07  8:47   ` George Dunlap
  0 siblings, 0 replies; 13+ messages in thread
From: George Dunlap @ 2017-04-07  8:47 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Stefano Stabellini

On 07/04/17 01:34, Dario Faggioli wrote:
> When a pCPU is added to a cpupool, the pool's scheduler
> should immediately run on it so, for instance, any runnable
> but not running vCPU can start executing there.
> 
> This currently does not happen. Make it happen by raising
> the scheduler softirq directly from the function that
> sets up the new scheduler for the pCPU.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Reviewed-by: George Dunlap <george.dunlap@citrix.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/5] xen: sched: improve robustness (and rename) DOM2OP()
  2017-04-07  8:44   ` George Dunlap
@ 2017-04-07  9:05     ` Dario Faggioli
  0 siblings, 0 replies; 13+ messages in thread
From: Dario Faggioli @ 2017-04-07  9:05 UTC (permalink / raw)
  To: George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1948 bytes --]

On Fri, 2017-04-07 at 09:44 +0100, George Dunlap wrote:
> On 07/04/17 01:33, Dario Faggioli wrote:
> > Clarify and enforce (with ASSERTs) when the function
> > is called on the idle domain, and explain in comments
> > what it means and when it is ok to do so.
> > 
> > While there, change the name of the function to a more
> > self-explanatory one, and do the same to VCPU2OP.
> > 
> > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> 
> Acked-by: George Dunlap <george.dunlap@citrix.com>
> 
> With one nit...
> 
> > --- a/xen/common/schedule.c
> > +++ b/xen/common/schedule.c
> > @@ -77,8 +77,25 @@ static struct scheduler __read_mostly ops;
> >           (( (opsptr)->fn != NULL ) ? (opsptr)->fn(opsptr,
> > ##__VA_ARGS__ )  \
> >            : (typeof((opsptr)->fn(opsptr, ##__VA_ARGS__)))0 )
> >  
> > -#define DOM2OP(_d)    (((_d)->cpupool == NULL) ? &ops : ((_d)-
> > >cpupool->sched))
> > -static inline struct scheduler *VCPU2OP(const struct vcpu *v)
> > +static inline struct scheduler *dom_scheduler(const struct domain
> > *d)
> > +{
> > +    if ( likely(d->cpupool != NULL) )
> > +        return d->cpupool->sched;
> > +
> > +    /*
> > +     * If d->cpupool is NULL, this is the idle domain. This is
> > special
> > +     * because the idle domain does not really bolong to any
> > cpupool, and,
> 
> *belong
> 
Ah. Sorry! :-(

> I can fix this up on check-in if need be.
> 
Yes, feel free.

And the same for the other typo reported by Alan in 3/5, if you're up
for it (and it's the case that there aren't any other reason to resend,
of course).

Thanks,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/5] xen: sched: introduce the 'null' semi-static scheduler
  2017-04-07  0:34 ` [PATCH v2 3/5] xen: sched: introduce the 'null' semi-static scheduler Dario Faggioli
  2017-04-07  7:24   ` Alan Robinson
@ 2017-04-07  9:17   ` George Dunlap
  1 sibling, 0 replies; 13+ messages in thread
From: George Dunlap @ 2017-04-07  9:17 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: Jonathan Davies, Julien Grall, Stefano Stabellini, Marcus Granado

On 07/04/17 01:34, Dario Faggioli wrote:
> In cases where one is absolutely sure that there will be
> less vCPUs than pCPUs, having to pay the cose, mostly in
> terms of overhead, of an advanced scheduler may be not
> desirable.
> 
> The simple scheduler implemented here could be a solution.
> Here how it works:
>  - each vCPU is statically assigned to a pCPU;
>  - if there are pCPUs without any vCPU assigned, they
>    stay idle (as in, the run their idle vCPU);
>  - if there are vCPUs which are not assigned to any
>    pCPU (e.g., because there are more vCPUs than pCPUs)
>    they *don't* run, until they get assigned;
>  - if a vCPU assigned to a pCPU goes away, one of the
>    waiting to be assigned vCPU, if any, gets assigned
>    to the pCPU and can run there.
> 
> This scheduler, therefore, if used in configurations
> where every vCPUs can be assigned to a pCPU, guarantees
> low overhead, low latency, and consistent performance.
> 
> If used as default scheduler, at Xen boot, it is
> recommended to limit the number of Dom0 vCPUs (e.g., with
> 'dom0_max_vcpus=x'). Otherwise, all the pCPUs will have
> one Dom0's vCPU assigned, and there won't be room for
> running efficiently (if at all) any guest.
> 
> Target use cases are embedded and HPC, but it may well
> be interesting also in circumnstances.
> 
> Kconfig and documentation are update accordingly.
> 
> While there, also document the availability of sched=rtds
> as boot parameter, which apparently had been forgotten.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Reviewed-by: George Dunlap <george.dunlap@citrix.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 4/5] xen: sched_null: support for hard affinity
  2017-04-07  0:34 ` [PATCH v2 4/5] xen: sched_null: support for hard affinity Dario Faggioli
@ 2017-04-07 10:08   ` George Dunlap
  2017-04-07 10:11     ` Dario Faggioli
  0 siblings, 1 reply; 13+ messages in thread
From: George Dunlap @ 2017-04-07 10:08 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: Stefano Stabellini, Jonathan Davies, Julien Grall, Marcus Granado

On 07/04/17 01:34, Dario Faggioli wrote:
> As a (rudimental) way of directing and affecting the
> placement logic implemented by the scheduler, support
> vCPU hard affinity.
> 
> Basically, a vCPU will now be assigned only to a pCPU
> that is part of its own hard affinity. If such pCPU(s)
> is (are) busy, the vCPU will wait, like it happens
> when there are no free pCPUs.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

[snip]

> @@ -413,7 +431,6 @@ static void null_vcpu_insert(const struct scheduler *ops, struct vcpu *v)
>  static void _vcpu_remove(struct null_private *prv, struct vcpu *v)
>  {
>      unsigned int cpu = v->processor;
> -    struct domain *d = v->domain;
>      struct null_vcpu *wvc;
>  
>      ASSERT(list_empty(&null_vcpu(v)->waitq_elem));
> @@ -425,7 +442,7 @@ static void _vcpu_remove(struct null_private *prv, struct vcpu *v)
>       * If yes, we assign it to cpu, in spite of v.
>       */
>      wvc = list_first_entry_or_null(&prv->waitq, struct null_vcpu, waitq_elem);
> -    if ( wvc && cpumask_test_cpu(cpu, cpupool_domain_cpumask(d)) )
> +    if ( wvc && vcpu_check_affinity(wvc->vcpu, cpu) )

Hmm, actually I just noticed that this only checks the first item on the
list.  If there are two vcpus on the list, and the first one doesn't
have affinity with the vcpu in question, the second one won't even be
considered.  This was probably OK in the previous case, where the only
time the test could fail is during suspend/resume, but it's not really
OK anymore, I don't think.

Everything else looks OK to me.

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 4/5] xen: sched_null: support for hard affinity
  2017-04-07 10:08   ` George Dunlap
@ 2017-04-07 10:11     ` Dario Faggioli
  0 siblings, 0 replies; 13+ messages in thread
From: Dario Faggioli @ 2017-04-07 10:11 UTC (permalink / raw)
  To: George Dunlap, xen-devel
  Cc: Stefano Stabellini, Jonathan Davies, Julien Grall, Marcus Granado


[-- Attachment #1.1: Type: text/plain, Size: 1724 bytes --]

On Fri, 2017-04-07 at 11:08 +0100, George Dunlap wrote:
> On 07/04/17 01:34, Dario Faggioli wrote:
> > @@ -413,7 +431,6 @@ static void null_vcpu_insert(const struct
> > scheduler *ops, struct vcpu *v)
> >  static void _vcpu_remove(struct null_private *prv, struct vcpu *v)
> >  {
> >      unsigned int cpu = v->processor;
> > -    struct domain *d = v->domain;
> >      struct null_vcpu *wvc;
> >  
> >      ASSERT(list_empty(&null_vcpu(v)->waitq_elem));
> > @@ -425,7 +442,7 @@ static void _vcpu_remove(struct null_private
> > *prv, struct vcpu *v)
> >       * If yes, we assign it to cpu, in spite of v.
> >       */
> >      wvc = list_first_entry_or_null(&prv->waitq, struct null_vcpu,
> > waitq_elem);
> > -    if ( wvc && cpumask_test_cpu(cpu, cpupool_domain_cpumask(d)) )
> > +    if ( wvc && vcpu_check_affinity(wvc->vcpu, cpu) )
> 
> Hmm, actually I just noticed that this only checks the first item on
> the
> list.  If there are two vcpus on the list, and the first one doesn't
> have affinity with the vcpu in question, the second one won't even be
> considered.  This was probably OK in the previous case, where the
> only
> time the test could fail is during suspend/resume, but it's not
> really
> OK anymore, I don't think.
> 
Good point. I need to scan the waitqueue. Will do.

> Everything else looks OK to me.
> 
Good to hear. :-)

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-04-07 10:11 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-07  0:33 [PATCH v2 0/5] The 'null' Scheduler Dario Faggioli
2017-04-07  0:33 ` [PATCH v2 1/5] xen: sched: improve robustness (and rename) DOM2OP() Dario Faggioli
2017-04-07  8:44   ` George Dunlap
2017-04-07  9:05     ` Dario Faggioli
2017-04-07  0:34 ` [PATCH v2 2/5] xen: sched: make sure a pCPU added to a pool runs the scheduler ASAP Dario Faggioli
2017-04-07  8:47   ` George Dunlap
2017-04-07  0:34 ` [PATCH v2 3/5] xen: sched: introduce the 'null' semi-static scheduler Dario Faggioli
2017-04-07  7:24   ` Alan Robinson
2017-04-07  9:17   ` George Dunlap
2017-04-07  0:34 ` [PATCH v2 4/5] xen: sched_null: support for hard affinity Dario Faggioli
2017-04-07 10:08   ` George Dunlap
2017-04-07 10:11     ` Dario Faggioli
2017-04-07  0:34 ` [PATCH v2 5/5] tools: sched: add support for 'null' scheduler Dario Faggioli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.