All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefano Stabellini <sstabellini@kernel.org>
To: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Jonathan Davies <Jonathan.Davies@citrix.com>,
	Stefano Stabellini <sstabellini@kernel.org>,
	George Dunlap <george.dunlap@citrix.com>,
	Marcus Granado <marcus.granado@citrix.com>,
	Julien Grall <julien.grall@arm.com>,
	xen-devel@lists.xenproject.org
Subject: Re: [PATCH 1/3] xen: sched: introduce the 'null' semi-static scheduler
Date: Mon, 20 Mar 2017 16:21:58 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.10.1703201524020.11533@sstabellini-ThinkPad-X260> (raw)
In-Reply-To: <148977617833.29510.4160128186395621610.stgit@Palanthas.fritz.box>

On Fri, 17 Mar 2017, Dario Faggioli wrote:
> In cases where one is absolutely sure that there will be
> less vCPUs than pCPUs, having to pay the cose, mostly in
> terms of overhead, of an advanced scheduler may be not
> desirable.
> 
> The simple scheduler implemented here could be a solution.
> Here how it works:
>  - each vCPU is statically assigned to a pCPU;
>  - if there are pCPUs without any vCPU assigned, they
>    stay idle (as in, the run their idle vCPU);
>  - if there are vCPUs which are not assigned to any
>    pCPU (e.g., because there are more vCPUs than pCPUs)
>    they *don't* run, until they get assigned;
>  - if a vCPU assigned to a pCPU goes away, one of the
>    waiting to be assigned vCPU, if any, gets assigned
>    to the pCPU and can run there.
> 
> This scheduler, therefore, if used in configurations
> where every vCPUs can be assigned to a pCPU, guarantees
> low overhead, low latency, and consistent performance.
> 
> If used as default scheduler, at Xen boot, it is
> recommended to limit the number of Dom0 vCPUs (e.g., with
> 'dom0_max_vcpus=x'). Otherwise, all the pCPUs will have
> one Dom0's vCPU assigned, and there won't be room for
> running efficiently (if at all) any guest.
> 
> Target use cases are embedded and HPC, but it may well
> be interesting also in circumnstances.
> 
> Kconfig and documentation are update accordingly.
> 
> While there, also document the availability of sched=rtds
> as boot parameter, which apparently had been forgotten.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Julien Grall <julien.grall@arm.com>
> Cc: Jonathan Davies <Jonathan.Davies@citrix.com>
> Cc: Marcus Granado <marcus.granado@citrix.com>
> ---
>  docs/misc/xen-command-line.markdown |    2 
>  xen/common/Kconfig                  |   11 
>  xen/common/Makefile                 |    1 
>  xen/common/sched_null.c             |  816 +++++++++++++++++++++++++++++++++++
>  xen/common/schedule.c               |    2 
>  xen/include/public/domctl.h         |    1 
>  6 files changed, 832 insertions(+), 1 deletion(-)
>  create mode 100644 xen/common/sched_null.c
> 
> diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
> index 4daf5b5..ad6a5ca 100644
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -1434,7 +1434,7 @@ Map the HPET page as read only in Dom0. If disabled the page will be mapped
>  with read and write permissions.
>  
>  ### sched
> -> `= credit | credit2 | arinc653`
> +> `= credit | credit2 | arinc653 | rtds | null`
>  
>  > Default: `sched=credit`
>  
> diff --git a/xen/common/Kconfig b/xen/common/Kconfig
> index f2ecbc4..518520e 100644
> --- a/xen/common/Kconfig
> +++ b/xen/common/Kconfig
> @@ -187,6 +187,14 @@ config SCHED_ARINC653
>  	  The ARINC653 scheduler is a hard real-time scheduler for single
>  	  cores, targeted for avionics, drones, and medical devices.
>  
> +config SCHED_NULL
> +	bool "Null scheduler support (EXPERIMENTAL)"
> +	default y
> +	---help---
> +	  The null scheduler is a static, zero overhead scheduler,
> +	  for when there always are less vCPUs than pCPUs, typically
> +	  in embedded or HPC scenarios.
> +
>  choice
>  	prompt "Default Scheduler?"
>  	default SCHED_CREDIT_DEFAULT
> @@ -199,6 +207,8 @@ choice
>  		bool "RT Scheduler" if SCHED_RTDS
>  	config SCHED_ARINC653_DEFAULT
>  		bool "ARINC653 Scheduler" if SCHED_ARINC653
> +	config SCHED_NULL_DEFAULT
> +		bool "Null Scheduler" if SCHED_NULL
>  endchoice
>  
>  config SCHED_DEFAULT
> @@ -207,6 +217,7 @@ config SCHED_DEFAULT
>  	default "credit2" if SCHED_CREDIT2_DEFAULT
>  	default "rtds" if SCHED_RTDS_DEFAULT
>  	default "arinc653" if SCHED_ARINC653_DEFAULT
> +	default "null" if SCHED_NULL_DEFAULT
>  	default "credit"
>  
>  endmenu
> diff --git a/xen/common/Makefile b/xen/common/Makefile
> index 0fed30b..26c5a64 100644
> --- a/xen/common/Makefile
> +++ b/xen/common/Makefile
> @@ -40,6 +40,7 @@ obj-$(CONFIG_SCHED_ARINC653) += sched_arinc653.o
>  obj-$(CONFIG_SCHED_CREDIT) += sched_credit.o
>  obj-$(CONFIG_SCHED_CREDIT2) += sched_credit2.o
>  obj-$(CONFIG_SCHED_RTDS) += sched_rt.o
> +obj-$(CONFIG_SCHED_NULL) += sched_null.o
>  obj-y += schedule.o
>  obj-y += shutdown.o
>  obj-y += softirq.o
> diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
> new file mode 100644
> index 0000000..6a13308
> --- /dev/null
> +++ b/xen/common/sched_null.c
> @@ -0,0 +1,816 @@
> +/*
> + * xen/common/sched_null.c
> + *
> + *  Copyright (c) 2017, Dario Faggioli, Citrix Ltd
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public
> + * License v2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +/*
> + * The 'null' scheduler always choose to run, on each pCPU, either nothing
> + * (i.e., the pCPU stays idle) or always the same vCPU.
> + *
> + * It is aimed at supporting static scenarios, where there always are
> + * less vCPUs than pCPUs (and the vCPUs don't need to move among pCPUs
> + * for any reason) with the least possible overhead.
> + *
> + * Typical usecase are embedded applications, but also HPC, especially
> + * if the scheduler is used inside a cpupool.
> + */
> +
> +#include <xen/sched.h>
> +#include <xen/sched-if.h>
> +#include <xen/softirq.h>
> +#include <xen/keyhandler.h>
> +
> +
> +/*
> + * Locking:
> + * - Scheduler-lock (a.k.a. runqueue lock):
> + *  + is per-pCPU;
> + *  + serializes assignment and deassignment of vCPUs to a pCPU.
> + * - Private data lock (a.k.a. private scheduler lock):
> + *  + is scheduler-wide;
> + *  + serializes accesses to the list of domains in this scheduler.
> + * - Waitqueue lock:
> + *  + is scheduler-wide;
> + *  + serialize accesses to the list of vCPUs waiting to be assigned
> + *    to pCPUs.
> + *
> + * Ordering is: private lock, runqueue lock, waitqueue lock. Or, OTOH,
> + * waitqueue lock nests inside runqueue lock which nests inside private
> + * lock. More specifically:
> + *  + if we need both runqueue and private locks, we must acquire the
> + *    private lock for first;
> + *  + if we need both runqueue and waitqueue locks, we must acquire
> + *    the runqueue lock for first;
> + *  + if we need both private and waitqueue locks, we must acquire
> + *    the private lock for first;
> + *  + if we already own a runqueue lock, we must never acquire
> + *    the private lock;
> + *  + if we already own the waitqueue lock, we must never acquire
> + *    the runqueue lock or the private lock.
> + */
> +
> +/*
> + * System-wide private data
> + */
> +struct null_private {
> +    spinlock_t lock;        /* scheduler lock; nests inside cpupool_lock */
> +    struct list_head ndom;  /* Domains of this scheduler                 */
> +    struct list_head waitq; /* vCPUs not assigned to any pCPU            */
> +    spinlock_t waitq_lock;  /* serializes waitq; nests inside runq locks */
> +    cpumask_t cpus_free;    /* CPUs without a vCPU associated to them    */
> +};
> +
> +/*
> + * Physical CPU
> + */
> +struct null_pcpu {
> +    struct vcpu *vcpu;
> +};
> +DEFINE_PER_CPU(struct null_pcpu, npc);
> +
> +/*
> + * Virtual CPU
> + */
> +struct null_vcpu {
> +    struct list_head waitq_elem;
> +    struct null_dom *ndom;

This field is redundant, given that from struct vcpu you can get struct
domain and from struct domain you can get struct null_dom. I would
remove it.


> +    struct vcpu *vcpu;
> +    int pcpu;          /* To what pCPU the vCPU is assigned (-1 if none) */

Isn't this always the same as struct vcpu->processor?
If it's only different when the vcpu is waiting in the waitq, then you
can just remove this field and replace the pcpu == -1 test with
list_empty(waitq_elem).


> +};
> +
> +/*
> + * Domain
> + */
> +struct null_dom {
> +    struct list_head ndom_elem;
> +    struct domain *dom;
> +};
> +
> +/*
> + * Accessor helpers functions
> + */
> +static inline struct null_private *null_priv(const struct scheduler *ops)
> +{
> +    return ops->sched_data;
> +}
> +
> +static inline struct null_vcpu *null_vcpu(const struct vcpu *v)
> +{
> +    return v->sched_priv;
> +}
> +
> +static inline struct null_dom *null_dom(const struct domain *d)
> +{
> +    return d->sched_priv;
> +}
> +
> +static int null_init(struct scheduler *ops)
> +{
> +    struct null_private *prv;
> +
> +    printk("Initializing null scheduler\n"
> +           "WARNING: This is experimental software in development.\n"
> +           "Use at your own risk.\n");
> +
> +    prv = xzalloc(struct null_private);
> +    if ( prv == NULL )
> +        return -ENOMEM;
> +
> +    spin_lock_init(&prv->lock);
> +    spin_lock_init(&prv->waitq_lock);
> +    INIT_LIST_HEAD(&prv->ndom);
> +    INIT_LIST_HEAD(&prv->waitq);
> +
> +    ops->sched_data = prv;
> +
> +    return 0;
> +}
> +
> +static void null_deinit(struct scheduler *ops)
> +{
> +    xfree(ops->sched_data);
> +    ops->sched_data = NULL;
> +}
> +
> +static void init_pdata(struct null_private *prv, unsigned int cpu)
> +{
> +    /* Mark the pCPU as free, and with no vCPU assigned */
> +    cpumask_set_cpu(cpu, &prv->cpus_free);
> +    per_cpu(npc, cpu).vcpu = NULL;
> +}
> +
> +static void null_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
> +{
> +    struct null_private *prv = null_priv(ops);
> +    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
> +
> +    /* alloc_pdata is not implemented, so we want this to be NULL. */
> +    ASSERT(!pdata);
> +
> +    /*
> +     * The scheduler lock points already to the default per-cpu spinlock,
> +     * so there is no remapping to be done.
> +     */
> +    ASSERT(sd->schedule_lock == &sd->_lock && !spin_is_locked(&sd->_lock));
> +
> +    init_pdata(prv, cpu);
> +}
> +
> +static void null_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
> +{
> +    struct null_private *prv = null_priv(ops);
> +
> +    /* alloc_pdata not implemented, so this must have stayed NULL */
> +    ASSERT(!pcpu);
> +
> +    cpumask_clear_cpu(cpu, &prv->cpus_free);
> +    per_cpu(npc, cpu).vcpu = NULL;
> +}
> +
> +static void *null_alloc_vdata(const struct scheduler *ops,
> +                              struct vcpu *v, void *dd)
> +{
> +    struct null_vcpu *nvc;
> +
> +    nvc = xzalloc(struct null_vcpu);
> +    if ( nvc == NULL )
> +        return NULL;
> +
> +    INIT_LIST_HEAD(&nvc->waitq_elem);
> +
> +    /* Not assigned to any pCPU */
> +    nvc->pcpu = -1;
> +    /* Up pointers */
> +    nvc->ndom = dd;
> +    nvc->vcpu = v;
> +
> +    SCHED_STAT_CRANK(vcpu_alloc);
> +
> +    return nvc;
> +}
> +
> +static void null_free_vdata(const struct scheduler *ops, void *priv)
> +{
> +    struct null_vcpu *nvc = priv;
> +
> +    xfree(nvc);
> +}
> +
> +static void * null_alloc_domdata(const struct scheduler *ops,
> +                                 struct domain *d)
> +{
> +    struct null_private *prv = null_priv(ops);
> +    struct null_dom *ndom;
> +    unsigned long flags;
> +
> +    ndom = xzalloc(struct null_dom);
> +    if ( ndom == NULL )
> +        return NULL;
> +
> +    INIT_LIST_HEAD(&ndom->ndom_elem);

This is not need given the following list_add_tail


> +    ndom->dom = d;
> +
> +    spin_lock_irqsave(&prv->lock, flags);
> +    list_add_tail(&ndom->ndom_elem, &null_priv(ops)->ndom);
> +    spin_unlock_irqrestore(&prv->lock, flags);
> +
> +    return (void*)ndom;
> +}
> +
> +static void null_free_domdata(const struct scheduler *ops, void *data)
> +{
> +    unsigned long flags;
> +    struct null_dom *ndom = data;
> +    struct null_private *prv = null_priv(ops);
> +
> +    spin_lock_irqsave(&prv->lock, flags);
> +    list_del_init(&ndom->ndom_elem);
> +    spin_unlock_irqrestore(&prv->lock, flags);
> +
> +    xfree(data);
> +}
> +
> +static int null_dom_init(const struct scheduler *ops, struct domain *d)
> +{
> +    struct null_dom *ndom;
> +
> +    if ( is_idle_domain(d) )
> +        return 0;
> +
> +    ndom = null_alloc_domdata(ops, d);
> +    if ( ndom == NULL )
> +        return -ENOMEM;
> +
> +    d->sched_priv = ndom;
> +
> +    return 0;
> +}
> +static void null_dom_destroy(const struct scheduler *ops, struct domain *d)
> +{
> +    null_free_domdata(ops, null_dom(d));
> +}
> +
> +/*
> + * vCPU to pCPU assignment and placement. This _only_ happens:
> + *  - on insert,
> + *  - on migrate.
> + *
> + * Insert occurs when a vCPU joins this scheduler for the first time
> + * (e.g., when the domain it's part of is moved to the scheduler's
> + * cpupool).
> + *
> + * Migration may be necessary if a pCPU (with a vCPU assigned to it)
> + * is removed from the scheduler's cpupool.
> + *
> + * So this is not part of any hot path.
> + */
> +static unsigned int pick_cpu(struct null_private *prv, struct vcpu *v)
> +{
> +    unsigned int cpu = v->processor;
> +    cpumask_t *cpus = cpupool_domain_cpumask(v->domain);
> +
> +    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
> +
> +    /*
> +     * If our processor is free, or we are assigned to it, and it is
> +     * also still valid, just go for it.
> +     */
> +    if ( likely((per_cpu(npc, cpu).vcpu == NULL || per_cpu(npc, cpu).vcpu == v)
> +                && cpumask_test_cpu(cpu, cpus)) )
> +        return cpu;
> +
> +    /* If not, just go for a valid free pCPU, if any */
> +    cpumask_and(cpumask_scratch_cpu(cpu), &prv->cpus_free, cpus);
> +    cpu = cpumask_first(cpumask_scratch_cpu(cpu));
> +
> +    /*
> +     * If we didn't find any free pCPU, just pick any valid pcpu, even if
> +     * it has another vCPU assigned. This will happen during shutdown and
> +     * suspend/resume, but it may also happen during "normal operation", if
> +     * all the pCPUs are busy.
> +     *
> +     * In fact, there must always be something sane in v->processor, or
> +     * vcpu_schedule_lock() and friends won't work. This is not a problem,
> +     * as we will actually assign the vCPU to the pCPU we return from here,
> +     * only if the pCPU is free.
> +     */
> +    if ( unlikely(cpu == nr_cpu_ids) )
> +        cpu = cpumask_any(cpus);
> +
> +    return cpu;
> +}
> +
> +static void vcpu_assign(struct null_private *prv, struct vcpu *v,
> +                        unsigned int cpu)
> +{
> +    ASSERT(null_vcpu(v)->pcpu == -1);
> +
> +    per_cpu(npc, cpu).vcpu = v;
> +    v->processor = null_vcpu(v)->pcpu = cpu;
> +    cpumask_clear_cpu(cpu, &prv->cpus_free);
> +
> +    gdprintk(XENLOG_INFO, "%d <-- d%dv%d\n", cpu, v->domain->domain_id, v->vcpu_id);
> +}
> +
> +static void vcpu_deassign(struct null_private *prv, struct vcpu *v,
> +                          unsigned int cpu)
> +{
> +    ASSERT(null_vcpu(v)->pcpu == cpu);
> +
> +    null_vcpu(v)->pcpu = -1;
> +    per_cpu(npc, cpu).vcpu = NULL;
> +    cpumask_set_cpu(cpu, &prv->cpus_free);
> +
> +    gdprintk(XENLOG_INFO, "%d <-- NULL (d%dv%d)\n", cpu, v->domain->domain_id, v->vcpu_id);
> +}
> +
> +/* Change the scheduler of cpu to us (null). */
> +static void null_switch_sched(struct scheduler *new_ops, unsigned int cpu,
> +                              void *pdata, void *vdata)
> +{
> +    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
> +    struct null_private *prv = null_priv(new_ops);
> +    struct null_vcpu *nvc = vdata;
> +
> +    ASSERT(nvc && is_idle_vcpu(nvc->vcpu));
> +
> +    idle_vcpu[cpu]->sched_priv = vdata;
> +
> +    /*
> +     * We are holding the runqueue lock already (it's been taken in
> +     * schedule_cpu_switch()). It actually may or may not be the 'right'
> +     * one for this cpu, but that is ok for preventing races.
> +     */
> +    ASSERT(!local_irq_is_enabled());
> +
> +    init_pdata(prv, cpu);
> +
> +    per_cpu(scheduler, cpu) = new_ops;
> +    per_cpu(schedule_data, cpu).sched_priv = pdata;
> +
> +    /*
> +     * (Re?)route the lock to the per pCPU lock as /last/ thing. In fact,
> +     * if it is free (and it can be) we want that anyone that manages
> +     * taking it, finds all the initializations we've done above in place.
> +     */
> +    smp_mb();
> +    sd->schedule_lock = &sd->_lock;
> +}
> +
> +static void null_vcpu_insert(const struct scheduler *ops, struct vcpu *v)
> +{
> +    struct null_private *prv = null_priv(ops);
> +    struct null_vcpu *nvc = null_vcpu(v);
> +    unsigned int cpu;
> +    spinlock_t *lock;
> +
> +    ASSERT(!is_idle_vcpu(v));
> +
> + retry:
> +    lock = vcpu_schedule_lock_irq(v);
> +
> +    cpu = pick_cpu(prv, v);
> +
> +    /* We hold v->processor's runq lock, but we need cpu's one */
> +    if ( cpu != v->processor )
> +    {
> +        spin_unlock(lock);
> +        lock = pcpu_schedule_lock(cpu);
> +    }
> +
> +    /*
> +     * If the pCPU is free, we assign v to it.
> +     *
> +     * If it is not free (e.g., because we raced with another insert
> +     * or migrate), but there are free pCPUs, we try to pick again.
> +     *
> +     * If the pCPU is not free, and there aren't any (valid) others,
> +     * we have no alternatives than to go into the waitqueue.
> +     */
> +    if ( likely(per_cpu(npc, cpu).vcpu == NULL) )
> +    {
> +        /*
> +         * Insert is followed by vcpu_wake(), so there's no need to poke
> +         * the pcpu with the SCHEDULE_SOFTIRQ, as wake will do that.
> +         */
> +        vcpu_assign(prv, v, cpu);
> +    }
> +    else if ( cpumask_intersects(&prv->cpus_free,
> +                                 cpupool_domain_cpumask(v->domain)) )
> +    {
> +        spin_unlock(lock);
> +        goto retry;
> +    }
> +    else
> +    {
> +        spin_lock(&prv->waitq_lock);
> +        list_add_tail(&nvc->waitq_elem, &prv->waitq);
> +        spin_unlock(&prv->waitq_lock);
> +    }
> +    spin_unlock_irq(lock);
> +
> +    SCHED_STAT_CRANK(vcpu_insert);
> +}
> +
> +static void null_vcpu_remove(const struct scheduler *ops, struct vcpu *v)
> +{
> +    struct null_private *prv = null_priv(ops);
> +    struct null_vcpu *wvc, *nvc = null_vcpu(v);
> +    unsigned int cpu;
> +    spinlock_t *lock;
> +
> +    ASSERT(!is_idle_vcpu(v));
> +
> +    lock = vcpu_schedule_lock_irq(v);
> +
> +    cpu = v->processor;
> +
> +    /* If v is in waitqueue, just get it out of there and bail */
> +    if ( unlikely(nvc->pcpu == -1) )
> +    {
> +        spin_lock(&prv->waitq_lock);
> +
> +        ASSERT(!list_empty(&null_vcpu(v)->waitq_elem));
> +        list_del_init(&nvc->waitq_elem);
> +
> +        spin_unlock(&prv->waitq_lock);
> +
> +        goto out;
> +    }
> +
> +    /*
> +     * If v is assigned to a pCPU, let's see if there is someone waiting.
> +     * If yes, we assign it to cpu, in spite of v. If no, we just set
> +     * cpu free.
> +     */
> +
> +    ASSERT(per_cpu(npc, cpu).vcpu == v);
> +    ASSERT(!cpumask_test_cpu(cpu, &prv->cpus_free));
> +
> +    spin_lock(&prv->waitq_lock);
> +    wvc = list_first_entry_or_null(&prv->waitq, struct null_vcpu, waitq_elem);
> +    if ( wvc )
> +    {
> +        vcpu_assign(prv, wvc->vcpu, cpu);

The vcpu_assign in null_vcpu_insert is protected by the pcpu runq lock,
while this call is protected by the waitq_lock lock. Is that safe?


> +        list_del_init(&wvc->waitq_elem);
> +        cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
> +    }
> +    else
> +    {
> +        vcpu_deassign(prv, v, cpu);
> +    }
> +    spin_unlock(&prv->waitq_lock);
> +
> + out:
> +    vcpu_schedule_unlock_irq(lock, v);
> +
> +    SCHED_STAT_CRANK(vcpu_remove);
> +}
> +
> +static void null_vcpu_wake(const struct scheduler *ops, struct vcpu *v)
> +{
> +    ASSERT(!is_idle_vcpu(v));
> +
> +    if ( unlikely(curr_on_cpu(v->processor) == v) )
> +    {
> +        SCHED_STAT_CRANK(vcpu_wake_running);
> +        return;
> +    }
> +
> +    if ( null_vcpu(v)->pcpu == -1 )
> +    {
> +	/* Not exactly "on runq", but close enough for reusing the counter */
> +        SCHED_STAT_CRANK(vcpu_wake_onrunq);
> +	return;

coding style


> +    }
> +
> +    if ( likely(vcpu_runnable(v)) )
> +        SCHED_STAT_CRANK(vcpu_wake_runnable);
> +    else
> +        SCHED_STAT_CRANK(vcpu_wake_not_runnable);
> +
> +    /* Note that we get here only for vCPUs assigned to a pCPU */
> +    cpu_raise_softirq(v->processor, SCHEDULE_SOFTIRQ);
> +}
> +
> +static void null_vcpu_sleep(const struct scheduler *ops, struct vcpu *v)
> +{
> +    ASSERT(!is_idle_vcpu(v));
> +
> +    /* If v is not assigned to a pCPU, or is not running, no need to bother */
> +    if ( curr_on_cpu(v->processor) == v )
> +        cpu_raise_softirq(v->processor, SCHEDULE_SOFTIRQ);
> +
> +    SCHED_STAT_CRANK(vcpu_sleep);
> +}
> +
> +static int null_cpu_pick(const struct scheduler *ops, struct vcpu *v)
> +{
> +    ASSERT(!is_idle_vcpu(v));
> +    return pick_cpu(null_priv(ops), v);
> +}
> +
> +static void null_vcpu_migrate(const struct scheduler *ops, struct vcpu *v,
> +                              unsigned int new_cpu)
> +{
> +    struct null_private *prv = null_priv(ops);
> +    struct null_vcpu *nvc = null_vcpu(v);
> +    unsigned int cpu = v->processor;
> +
> +    ASSERT(!is_idle_vcpu(v));
> +
> +    if ( v->processor == new_cpu )
> +        return;
> +
> +    /*
> +     * v is either in the waitqueue, or assigned to a pCPU.
> +     *
> +     * In the former case, there is nothing to do.
> +     *
> +     * In the latter, the pCPU to which it was assigned would become free,
> +     * and we, therefore, should check whether there is anyone in the
> +     * waitqueue that can be assigned to it.
> +     */
> +    if ( likely(nvc->pcpu != -1) )
> +    {
> +        struct null_vcpu *wvc;
> +
> +        spin_lock(&prv->waitq_lock);
> +        wvc = list_first_entry_or_null(&prv->waitq, struct null_vcpu, waitq_elem);
> +        if ( wvc && cpumask_test_cpu(cpu, cpupool_domain_cpumask(v->domain)) )
> +        {
> +            vcpu_assign(prv, wvc->vcpu, cpu);
> +            list_del_init(&wvc->waitq_elem);
> +            cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
> +        }
> +        else
> +        {
> +            vcpu_deassign(prv, v, cpu);
> +        }
> +        spin_unlock(&prv->waitq_lock);

This looks very similar to null_vcpu_remove, maybe you want to refactor
the code and call a single shared service function.


> +	SCHED_STAT_CRANK(migrate_running);

coding style


> +    }
> +    else
> +        SCHED_STAT_CRANK(migrate_on_runq);
> +
> +    SCHED_STAT_CRANK(migrated);
> +
> +    /*
> +     * Let's now consider new_cpu, which is where v is being sent. It can be
> +     * either free, or have a vCPU already assigned to it.
> +     *
> +     * In the former case, we should assign v to it, and try to get it to run.
> +     *
> +     * In latter, all we can do is to park v in the waitqueue.
> +     */
> +    if ( per_cpu(npc, new_cpu).vcpu == NULL )
> +    {
> +        /* We don't know whether v was in the waitqueue. If yes, remove it */
> +        spin_lock(&prv->waitq_lock);
> +        list_del_init(&nvc->waitq_elem);
> +        spin_unlock(&prv->waitq_lock);
> +
> +        vcpu_assign(prv, v, new_cpu);

This vcpu_assign call seems to be unprotected. Should it be within a
spin_lock'ed area?


> +    }
> +    else
> +    {
> +        /* We don't know whether v was in the waitqueue. If no, put it there */
> +        spin_lock(&prv->waitq_lock);
> +        if ( list_empty(&nvc->waitq_elem) )
> +        {
> +            list_add_tail(&nvc->waitq_elem, &prv->waitq);
> +            nvc->pcpu = -1;
> +        }
> +        else
> +            ASSERT(nvc->pcpu == -1);
> +        spin_unlock(&prv->waitq_lock);
> +    }
> +
> +    /*
> +     * Whatever all the above, we always at least override v->processor.
> +     * This is especially important for shutdown or suspend/resume paths,
> +     * when it is important to let our caller (cpu_disable_scheduler())
> +     * know that the migration did happen, to the best of our possibilities,
> +     * at least. In case of suspend, any temporary inconsistency caused
> +     * by this, will be fixed-up during resume.
> +     */
> +    v->processor = new_cpu;
> +}
> +
> +#ifndef NDEBUG
> +static inline void null_vcpu_check(struct vcpu *v)
> +{
> +    struct null_vcpu * const nvc = null_vcpu(v);
> +    struct null_dom * const ndom = nvc->ndom;
> +
> +    BUG_ON(nvc->vcpu != v);
> +    BUG_ON(ndom != null_dom(v->domain));
> +    if ( ndom )
> +    {
> +        BUG_ON(is_idle_vcpu(v));
> +        BUG_ON(ndom->dom != v->domain);
> +    }
> +    else
> +    {
> +        BUG_ON(!is_idle_vcpu(v));
> +    }
> +    SCHED_STAT_CRANK(vcpu_check);
> +}
> +#define NULL_VCPU_CHECK(v)  (null_vcpu_check(v))
> +#else
> +#define NULL_VCPU_CHECK(v)
> +#endif
> +
> +
> +/*
> + * The most simple scheduling function of all times! We either return:
> + *  - the vCPU assigned to the pCPU, if there's one and it can run;
> + *  - the idle vCPU, otherwise.
> + */
> +static struct task_slice null_schedule(const struct scheduler *ops,
> +                                       s_time_t now,
> +                                       bool_t tasklet_work_scheduled)
> +{
> +    const unsigned int cpu = smp_processor_id();
> +    struct null_private *prv = null_priv(ops);
> +    struct null_vcpu *wvc;
> +    struct task_slice ret;
> +
> +    SCHED_STAT_CRANK(schedule);
> +    NULL_VCPU_CHECK(current);
> +
> +    ret.task = per_cpu(npc, cpu).vcpu;
> +    ret.migrated = 0;
> +    ret.time = -1;
> +
> +    /*
> +     * We may be new in the cpupool, or just coming back online. In which
> +     * case, there may be vCPUs in the waitqueue that we can assign to us
> +     * and run.
> +     */
> +    if ( unlikely(ret.task == NULL) )
> +    {
> +        spin_lock(&prv->waitq_lock);
> +        wvc = list_first_entry_or_null(&prv->waitq, struct null_vcpu, waitq_elem);
> +        if ( wvc )
> +        {
> +            vcpu_assign(prv, wvc->vcpu, cpu);
> +            list_del_init(&wvc->waitq_elem);
> +            ret.task = wvc->vcpu;
> +        }
> +        spin_unlock(&prv->waitq_lock);
> +    }
> +
> +    if ( unlikely(tasklet_work_scheduled ||
> +                  ret.task == NULL ||
> +                  !vcpu_runnable(ret.task)) )
> +        ret.task = idle_vcpu[cpu];
> +
> +    NULL_VCPU_CHECK(ret.task);
> +    return ret;
> +}
> +
> +static inline void dump_vcpu(struct null_private *prv, struct null_vcpu *nvc)
> +{
> +    printk("[%i.%i] pcpu=%d", nvc->vcpu->domain->domain_id,
> +            nvc->vcpu->vcpu_id, nvc->pcpu);
> +}
> +
> +static void null_dump_pcpu(const struct scheduler *ops, int cpu)
> +{
> +    struct null_private *prv = null_priv(ops);
> +    struct null_vcpu *nvc;
> +    spinlock_t *lock;
> +    unsigned long flags;
> +#define cpustr keyhandler_scratch
> +
> +    lock = pcpu_schedule_lock_irqsave(cpu, &flags);
> +
> +    cpumask_scnprintf(cpustr, sizeof(cpustr), per_cpu(cpu_sibling_mask, cpu));
> +    printk("CPU[%02d] sibling=%s, ", cpu, cpustr);
> +    cpumask_scnprintf(cpustr, sizeof(cpustr), per_cpu(cpu_core_mask, cpu));
> +    printk("core=%s", cpustr);
> +    if ( per_cpu(npc, cpu).vcpu != NULL )
> +        printk(", vcpu=d%dv%d", per_cpu(npc, cpu).vcpu->domain->domain_id,
> +               per_cpu(npc, cpu).vcpu->vcpu_id);
> +    printk("\n");
> +
> +    /* current VCPU (nothing to say if that's the idle vcpu) */
> +    nvc = null_vcpu(curr_on_cpu(cpu));
> +    if ( nvc && !is_idle_vcpu(nvc->vcpu) )
> +    {
> +        printk("\trun: ");
> +        dump_vcpu(prv, nvc);
> +        printk("\n");
> +    }
> +
> +    pcpu_schedule_unlock_irqrestore(lock, flags, cpu);
> +#undef cpustr
> +}
> +
> +static void null_dump(const struct scheduler *ops)
> +{
> +    struct null_private *prv = null_priv(ops);
> +    struct list_head *iter;
> +    unsigned long flags;
> +    unsigned int loop;
> +#define cpustr keyhandler_scratch
> +
> +    spin_lock_irqsave(&prv->lock, flags);
> +
> +    cpulist_scnprintf(cpustr, sizeof(cpustr), &prv->cpus_free);
> +    printk("\tcpus_free = %s\n", cpustr);
> +
> +    printk("Domain info:\n");
> +    loop = 0;
> +    list_for_each( iter, &prv->ndom )
> +    {
> +        struct null_dom *ndom;
> +        struct vcpu *v;
> +
> +        ndom = list_entry(iter, struct null_dom, ndom_elem);
> +
> +        printk("\tDomain: %d\n", ndom->dom->domain_id);
> +        for_each_vcpu( ndom->dom, v )
> +        {
> +            struct null_vcpu * const nvc = null_vcpu(v);
> +            spinlock_t *lock;
> +
> +            lock = vcpu_schedule_lock(nvc->vcpu);
> +
> +            printk("\t%3d: ", ++loop);
> +            dump_vcpu(prv, nvc);
> +            printk("\n");
> +
> +            vcpu_schedule_unlock(lock, nvc->vcpu);
> +        }
> +    }
> +
> +    printk("Waitqueue: ");
> +    loop = 0;
> +    spin_lock(&prv->waitq_lock);
> +    list_for_each( iter, &prv->waitq )
> +    {
> +        struct null_vcpu *nvc = list_entry(iter, struct null_vcpu, waitq_elem);
> +
> +        if ( loop++ != 0 )
> +            printk(", ");
> +        if ( loop % 24 == 0 )
> +            printk("\n\t");
> +        printk("d%dv%d", nvc->vcpu->domain->domain_id, nvc->vcpu->vcpu_id);
> +    }
> +    printk("\n");
> +    spin_unlock(&prv->waitq_lock);
> +
> +    spin_unlock_irqrestore(&prv->lock, flags);
> +#undef cpustr
> +}
> +
> +const struct scheduler sched_null_def = {
> +    .name           = "null Scheduler",
> +    .opt_name       = "null",
> +    .sched_id       = XEN_SCHEDULER_NULL,
> +    .sched_data     = NULL,
> +
> +    .init           = null_init,
> +    .deinit         = null_deinit,
> +    .init_pdata     = null_init_pdata,
> +    .switch_sched   = null_switch_sched,
> +    .deinit_pdata   = null_deinit_pdata,
> +
> +    .alloc_vdata    = null_alloc_vdata,
> +    .free_vdata     = null_free_vdata,
> +    .alloc_domdata  = null_alloc_domdata,
> +    .free_domdata   = null_free_domdata,
> +
> +    .init_domain    = null_dom_init,
> +    .destroy_domain = null_dom_destroy,
> +
> +    .insert_vcpu    = null_vcpu_insert,
> +    .remove_vcpu    = null_vcpu_remove,
> +
> +    .wake           = null_vcpu_wake,
> +    .sleep          = null_vcpu_sleep,
> +    .pick_cpu       = null_cpu_pick,
> +    .migrate        = null_vcpu_migrate,
> +    .do_schedule    = null_schedule,
> +
> +    .dump_cpu_state = null_dump_pcpu,
> +    .dump_settings  = null_dump,
> +};
> +
> +REGISTER_SCHEDULER(sched_null_def);
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 223a120..b482037 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -1785,6 +1785,8 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
>  
>   out:
>      per_cpu(cpupool, cpu) = c;
> +    /* Trigger a reschedule so the CPU can pick up some work ASAP. */
> +    cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
>  
>      return 0;
>  }

Why? This change is not explained in the commit message.


> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index 85cbb7c..32b578d 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -352,6 +352,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t);
>  #define XEN_SCHEDULER_CREDIT2  6
>  #define XEN_SCHEDULER_ARINC653 7
>  #define XEN_SCHEDULER_RTDS     8
> +#define XEN_SCHEDULER_NULL     9
>  
>  typedef struct xen_domctl_sched_credit {
>      uint16_t weight;
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2017-03-20 23:22 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-17 18:42 [PATCH 0/3] The 'null' Scheduler Dario Faggioli
2017-03-17 18:42 ` [PATCH 1/3] xen: sched: introduce the 'null' semi-static scheduler Dario Faggioli
2017-03-20 23:21   ` Stefano Stabellini [this message]
2017-03-21  8:26     ` Dario Faggioli
2017-03-27 10:31   ` George Dunlap
2017-03-27 10:48     ` George Dunlap
2017-04-06 14:43       ` Dario Faggioli
2017-04-06 15:07     ` Dario Faggioli
2017-03-17 18:43 ` [PATCH 2/3] xen: sched_null: support for hard affinity Dario Faggioli
2017-03-20 23:46   ` Stefano Stabellini
2017-03-21  8:47     ` Dario Faggioli
2017-03-17 18:43 ` [PATCH 3/3] tools: sched: add support for 'null' scheduler Dario Faggioli
2017-03-20 22:28   ` Stefano Stabellini
2017-03-21 17:09   ` Wei Liu
2017-03-27 10:50   ` George Dunlap
2017-04-06 10:49     ` Dario Faggioli
2017-04-06 13:59       ` George Dunlap
2017-04-06 15:18         ` Dario Faggioli
2017-04-07  9:42           ` Wei Liu
2017-04-07 10:05             ` Dario Faggioli
2017-04-07 10:13               ` Wei Liu
2017-03-20 22:23 ` [PATCH 0/3] The 'null' Scheduler Stefano Stabellini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.10.1703201524020.11533@sstabellini-ThinkPad-X260 \
    --to=sstabellini@kernel.org \
    --cc=Jonathan.Davies@citrix.com \
    --cc=dario.faggioli@citrix.com \
    --cc=george.dunlap@citrix.com \
    --cc=julien.grall@arm.com \
    --cc=marcus.granado@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.