xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [Xen-devel] [PATCH 0/3] xen/sched: use new idle scheduler for free cpus
@ 2019-08-02 13:07 Juergen Gross
  2019-08-02 13:07 ` [Xen-devel] [PATCH 1/3] xen/sched: populate cpupool0 only after all cpus are up Juergen Gross
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Juergen Gross @ 2019-08-02 13:07 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

These three patches have been carved out from my core scheduling series
as they are sufficiently independent to be applied even without the big
series.

Without this little series there are messages like the following to be
seen on the console when booting with smt=0:

(XEN) Adding cpu 1 to runqueue 0
(XEN) CPU 1 still not dead...
(XEN) CPU 1 still not dead...
(XEN) CPU 1 still not dead...
(XEN) CPU 1 still not dead...

By assigning cpus to Cpupool-0 only after all cpus are up and by not
using the more complicated credit2 scheduler for cpus not in any
cpupool this situation can simply no longer happen, as parking the not
to be started threads is done before.

Juergen Gross (3):
  xen/sched: populate cpupool0 only after all cpus are up
  xen/sched: remove cpu from pool0 before removing it
  xen/sched: add minimalistic idle scheduler for free cpus

 xen/common/cpupool.c       | 194 ++++++++++++++++++++++++++++-----------------
 xen/common/sched_credit.c  |   9 ---
 xen/common/sched_null.c    |   7 --
 xen/common/schedule.c      | 172 +++++++++++++++++++++-------------------
 xen/include/xen/sched-if.h |   2 +
 5 files changed, 213 insertions(+), 171 deletions(-)

-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Xen-devel] [PATCH 1/3] xen/sched: populate cpupool0 only after all cpus are up
  2019-08-02 13:07 [Xen-devel] [PATCH 0/3] xen/sched: use new idle scheduler for free cpus Juergen Gross
@ 2019-08-02 13:07 ` Juergen Gross
  2019-08-13 16:07   ` Dario Faggioli
  2019-08-14 16:15   ` George Dunlap
  2019-08-02 13:07 ` [Xen-devel] [PATCH 2/3] xen/sched: remove cpu from pool0 before removing it Juergen Gross
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 15+ messages in thread
From: Juergen Gross @ 2019-08-02 13:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Dario Faggioli

With core or socket scheduling we need to know the number of siblings
per scheduling unit before we can setup the scheduler properly. In
order to prepare that do cpupool0 population only after all cpus are
up.

With that in place there is no need to create cpupool0 earlier, so
do that just before assigning the cpus. Initialize free cpus with all
online cpus at that time in order to be able to add the cpu notifier
late, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: new patch
---
 xen/common/cpupool.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index f90e496eda..caea5bd8b3 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -762,18 +762,28 @@ static struct notifier_block cpu_nfb = {
     .notifier_call = cpu_callback
 };
 
-static int __init cpupool_presmp_init(void)
+static int __init cpupool_init(void)
 {
+    unsigned int cpu;
     int err;
-    void *cpu = (void *)(long)smp_processor_id();
+
     cpupool0 = cpupool_create(0, 0, &err);
     BUG_ON(cpupool0 == NULL);
     cpupool_put(cpupool0);
-    cpu_callback(&cpu_nfb, CPU_ONLINE, cpu);
     register_cpu_notifier(&cpu_nfb);
+
+    spin_lock(&cpupool_lock);
+
+    cpumask_copy(&cpupool_free_cpus, &cpu_online_map);
+
+    for_each_cpu ( cpu, &cpupool_free_cpus )
+        cpupool_assign_cpu_locked(cpupool0, cpu);
+
+    spin_unlock(&cpupool_lock);
+
     return 0;
 }
-presmp_initcall(cpupool_presmp_init);
+__initcall(cpupool_init);
 
 /*
  * Local variables:
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Xen-devel] [PATCH 2/3] xen/sched: remove cpu from pool0 before removing it
  2019-08-02 13:07 [Xen-devel] [PATCH 0/3] xen/sched: use new idle scheduler for free cpus Juergen Gross
  2019-08-02 13:07 ` [Xen-devel] [PATCH 1/3] xen/sched: populate cpupool0 only after all cpus are up Juergen Gross
@ 2019-08-02 13:07 ` Juergen Gross
  2019-08-13 17:11   ` Dario Faggioli
  2019-08-02 13:07 ` [Xen-devel] [PATCH 3/3] xen/sched: add minimalistic idle scheduler for free cpus Juergen Gross
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Juergen Gross @ 2019-08-02 13:07 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

Today a cpu which is removed from the system is taken directly from
Pool0 to the offline state. This will conflict with the new idle
scheduler, so remove it from Pool0 first. Additionally accept removing
a free cpu instead of requiring it to be in Pool0.

For the resume failed case we need to call the scheduler code for that
situation after the cpupool handling, so move the scheduler code into
a function and call it from cpupool_cpu_remove_forced() and remove the
CPU_RESUME_FAILED case from cpu_schedule_callback().

Note that we are calling now schedule_cpu_switch() in stop_machine
context so we need to switch from spinlock_irq to spinlock_irqsave.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/cpupool.c       | 176 +++++++++++++++++++++++++++------------------
 xen/common/schedule.c      |  27 ++++---
 xen/include/xen/sched-if.h |   2 +
 3 files changed, 128 insertions(+), 77 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index caea5bd8b3..e0f8eec57b 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -282,22 +282,14 @@ static int cpupool_assign_cpu_locked(struct cpupool *c, unsigned int cpu)
     return 0;
 }
 
-static long cpupool_unassign_cpu_helper(void *info)
+static int cpupool_unassign_cpu_epilogue(struct cpupool *c)
 {
     int cpu = cpupool_moving_cpu;
-    struct cpupool *c = info;
     struct domain *d;
-    long ret;
-
-    cpupool_dprintk("cpupool_unassign_cpu(pool=%d,cpu=%d)\n",
-                    cpupool_cpu_moving->cpupool_id, cpu);
+    int ret;
 
-    spin_lock(&cpupool_lock);
     if ( c != cpupool_cpu_moving )
-    {
-        ret = -EADDRNOTAVAIL;
-        goto out;
-    }
+        return -EADDRNOTAVAIL;
 
     /*
      * We need this for scanning the domain list, both in
@@ -332,39 +324,19 @@ static long cpupool_unassign_cpu_helper(void *info)
         domain_update_node_affinity(d);
     }
     rcu_read_unlock(&domlist_read_lock);
-out:
-    spin_unlock(&cpupool_lock);
-    cpupool_dprintk("cpupool_unassign_cpu ret=%ld\n", ret);
+
     return ret;
 }
 
-/*
- * unassign a specific cpu from a cpupool
- * we must be sure not to run on the cpu to be unassigned! to achieve this
- * the main functionality is performed via continue_hypercall_on_cpu on a
- * specific cpu.
- * if the cpu to be removed is the last one of the cpupool no active domain
- * must be bound to the cpupool. dying domains are moved to cpupool0 as they
- * might be zombies.
- * possible failures:
- * - last cpu and still active domains in cpupool
- * - cpu just being unplugged
- */
-static int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu)
+static int cpupool_unassign_cpu_prologue(struct cpupool *c, unsigned int cpu)
 {
-    int work_cpu;
     int ret;
     struct domain *d;
 
-    cpupool_dprintk("cpupool_unassign_cpu(pool=%d,cpu=%d)\n",
-                    c->cpupool_id, cpu);
-
     spin_lock(&cpupool_lock);
     ret = -EADDRNOTAVAIL;
     if ( (cpupool_moving_cpu != -1) && (cpu != cpupool_moving_cpu) )
         goto out;
-    if ( cpumask_test_cpu(cpu, &cpupool_locked_cpus) )
-        goto out;
 
     ret = 0;
     if ( !cpumask_test_cpu(cpu, c->cpu_valid) && (cpu != cpupool_moving_cpu) )
@@ -376,7 +348,7 @@ static int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu)
         rcu_read_lock(&domlist_read_lock);
         for_each_domain_in_cpupool(d, c)
         {
-            if ( !d->is_dying )
+            if ( !d->is_dying && system_state == SYS_STATE_active )
             {
                 ret = -EBUSY;
                 break;
@@ -393,8 +365,58 @@ static int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu)
     atomic_inc(&c->refcnt);
     cpupool_cpu_moving = c;
     cpumask_clear_cpu(cpu, c->cpu_valid);
+
+out:
     spin_unlock(&cpupool_lock);
 
+    return ret;
+}
+
+static long cpupool_unassign_cpu_helper(void *info)
+{
+    struct cpupool *c = info;
+    long ret;
+
+    cpupool_dprintk("cpupool_unassign_cpu(pool=%d,cpu=%d)\n",
+                    cpupool_cpu_moving->cpupool_id, cpu);
+    spin_lock(&cpupool_lock);
+
+    ret = cpupool_unassign_cpu_epilogue(c);
+
+    spin_unlock(&cpupool_lock);
+    cpupool_dprintk("cpupool_unassign_cpu ret=%ld\n", ret);
+
+    return ret;
+}
+
+/*
+ * unassign a specific cpu from a cpupool
+ * we must be sure not to run on the cpu to be unassigned! to achieve this
+ * the main functionality is performed via continue_hypercall_on_cpu on a
+ * specific cpu.
+ * if the cpu to be removed is the last one of the cpupool no active domain
+ * must be bound to the cpupool. dying domains are moved to cpupool0 as they
+ * might be zombies.
+ * possible failures:
+ * - last cpu and still active domains in cpupool
+ * - cpu just being unplugged
+ */
+static int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu)
+{
+    int work_cpu;
+    int ret;
+
+    cpupool_dprintk("cpupool_unassign_cpu(pool=%d,cpu=%d)\n",
+                    c->cpupool_id, cpu);
+
+    ret = cpupool_unassign_cpu_prologue(c, cpu);
+    if ( ret )
+    {
+        cpupool_dprintk("cpupool_unassign_cpu(pool=%d,cpu=%d) ret %d\n",
+                        c->cpupool_id, cpu, ret);
+        return ret;
+    }
+
     work_cpu = smp_processor_id();
     if ( work_cpu == cpu )
     {
@@ -403,12 +425,6 @@ static int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu)
             work_cpu = cpumask_next(cpu, cpupool0->cpu_valid);
     }
     return continue_hypercall_on_cpu(work_cpu, cpupool_unassign_cpu_helper, c);
-
-out:
-    spin_unlock(&cpupool_lock);
-    cpupool_dprintk("cpupool_unassign_cpu(pool=%d,cpu=%d) ret %d\n",
-                    c->cpupool_id, cpu, ret);
-    return ret;
 }
 
 /*
@@ -492,30 +508,54 @@ static int cpupool_cpu_add(unsigned int cpu)
 }
 
 /*
- * Called to remove a CPU from a pool. The CPU is locked, to forbid removing
- * it from pool0. In fact, if we want to hot-unplug a CPU, it must belong to
- * pool0, or we fail.
+ * This function is called in stop_machine context, so we can be sure no
+ * non-idle vcpu is active on the system.
  */
-static int cpupool_cpu_remove(unsigned int cpu)
+static void cpupool_cpu_remove(unsigned int cpu)
 {
-    int ret = -ENODEV;
+    int ret;
 
-    spin_lock(&cpupool_lock);
+    ASSERT(is_idle_vcpu(current));
 
-    if ( cpumask_test_cpu(cpu, cpupool0->cpu_valid) )
+    if ( !cpumask_test_cpu(cpu, &cpupool_free_cpus) )
     {
-        /*
-         * If we are not suspending, we are hot-unplugging cpu, and that is
-         * allowed only for CPUs in pool0.
-         */
-        cpumask_clear_cpu(cpu, cpupool0->cpu_valid);
-        ret = 0;
+        ret = cpupool_unassign_cpu_epilogue(cpupool0);
+        BUG_ON(ret);
     }
+}
 
-    if ( !ret )
+/*
+ * Called before a CPU is being removed from the system.
+ * Removing a CPU is allowed for free CPUs or CPUs in Pool-0 (those are moved
+ * to free cpus actually before removing them).
+ * The CPU is locked, to forbid adding it again to another cpupool.
+ */
+static int cpupool_cpu_remove_prologue(unsigned int cpu)
+{
+    int ret = 0;
+
+    spin_lock(&cpupool_lock);
+
+    if ( cpumask_test_cpu(cpu, &cpupool_locked_cpus) )
+        ret = -EBUSY;
+    else
         cpumask_set_cpu(cpu, &cpupool_locked_cpus);
+
     spin_unlock(&cpupool_lock);
 
+    if ( ret )
+        return  ret;
+
+    if ( cpumask_test_cpu(cpu, cpupool0->cpu_valid) )
+    {
+        /* Cpupool0 is populated only after all cpus are up. */
+        ASSERT(system_state == SYS_STATE_active);
+
+        ret = cpupool_unassign_cpu_prologue(cpupool0, cpu);
+    }
+    else if ( !cpumask_test_cpu(cpu, &cpupool_free_cpus) )
+        ret = -ENODEV;
+
     return ret;
 }
 
@@ -523,13 +563,13 @@ static int cpupool_cpu_remove(unsigned int cpu)
  * Called during resume for all cpus which didn't come up again. The cpu must
  * be removed from the cpupool it is assigned to. In case a cpupool will be
  * left without cpu we move all domains of that cpupool to cpupool0.
+ * As we are called with all domains still frozen there is no need to take the
+ * cpupool lock here.
  */
 static void cpupool_cpu_remove_forced(unsigned int cpu)
 {
     struct cpupool **c;
-    struct domain *d;
-
-    spin_lock(&cpupool_lock);
+    int ret;
 
     if ( cpumask_test_cpu(cpu, &cpupool_free_cpus) )
         cpumask_clear_cpu(cpu, &cpupool_free_cpus);
@@ -539,19 +579,13 @@ static void cpupool_cpu_remove_forced(unsigned int cpu)
         {
             if ( cpumask_test_cpu(cpu, (*c)->cpu_valid) )
             {
-                cpumask_clear_cpu(cpu, (*c)->cpu_valid);
-                if ( cpumask_weight((*c)->cpu_valid) == 0 )
-                {
-                    if ( *c == cpupool0 )
-                        panic("No cpu left in cpupool0\n");
-                    for_each_domain_in_cpupool(d, *c)
-                        cpupool_move_domain_locked(d, cpupool0);
-                }
+                ret = cpupool_unassign_cpu(*c, cpu);
+                BUG_ON(ret);
             }
         }
     }
 
-    spin_unlock(&cpupool_lock);
+    sched_rm_cpu(cpu);
 }
 
 /*
@@ -619,7 +653,8 @@ int cpupool_do_sysctl(struct xen_sysctl_cpupool_op *op)
         if ( cpu >= nr_cpu_ids )
             goto addcpu_out;
         ret = -ENODEV;
-        if ( !cpumask_test_cpu(cpu, &cpupool_free_cpus) )
+        if ( !cpumask_test_cpu(cpu, &cpupool_free_cpus) ||
+             cpumask_test_cpu(cpu, &cpupool_locked_cpus) )
             goto addcpu_out;
         c = cpupool_find_by_id(op->cpupool_id);
         ret = -ENOENT;
@@ -746,7 +781,12 @@ static int cpu_callback(
     case CPU_DOWN_PREPARE:
         /* Suspend/Resume don't change assignments of cpus to cpupools. */
         if ( system_state <= SYS_STATE_active )
-            rc = cpupool_cpu_remove(cpu);
+            rc = cpupool_cpu_remove_prologue(cpu);
+        break;
+    case CPU_DYING:
+        /* Suspend/Resume don't change assignments of cpus to cpupools. */
+        if ( system_state <= SYS_STATE_active )
+            cpupool_cpu_remove(cpu);
         break;
     case CPU_RESUME_FAILED:
         cpupool_cpu_remove_forced(cpu);
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 7b71581756..93164c64f6 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1654,6 +1654,20 @@ static void cpu_schedule_down(unsigned int cpu)
     kill_timer(&sd->s_timer);
 }
 
+void sched_rm_cpu(unsigned int cpu)
+{
+    int rc;
+    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    struct scheduler *sched = per_cpu(scheduler, cpu);
+
+    rcu_read_lock(&domlist_read_lock);
+    rc = cpu_disable_scheduler(cpu);
+    BUG_ON(rc);
+    rcu_read_unlock(&domlist_read_lock);
+    sched_deinit_pdata(sched, sd->sched_priv, cpu);
+    cpu_schedule_down(cpu);
+}
+
 static int cpu_schedule_callback(
     struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
@@ -1709,16 +1723,10 @@ static int cpu_schedule_callback(
         rc = cpu_disable_scheduler_check(cpu);
         rcu_read_unlock(&domlist_read_lock);
         break;
-    case CPU_RESUME_FAILED:
     case CPU_DEAD:
         if ( system_state == SYS_STATE_suspend )
             break;
-        rcu_read_lock(&domlist_read_lock);
-        rc = cpu_disable_scheduler(cpu);
-        BUG_ON(rc);
-        rcu_read_unlock(&domlist_read_lock);
-        sched_deinit_pdata(sched, sd->sched_priv, cpu);
-        cpu_schedule_down(cpu);
+        sched_rm_cpu(cpu);
         break;
     case CPU_UP_CANCELED:
         if ( system_state != SYS_STATE_resume )
@@ -1841,6 +1849,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     struct cpupool *old_pool = per_cpu(cpupool, cpu);
     struct schedule_data *sd = &per_cpu(schedule_data, cpu);
     spinlock_t *old_lock, *new_lock;
+    unsigned long flags;
 
     /*
      * pCPUs only move from a valid cpupool to free (i.e., out of any pool),
@@ -1895,7 +1904,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
      * that the lock itself changed, and retry acquiring the new one (which
      * will be the correct, remapped one, at that point).
      */
-    old_lock = pcpu_schedule_lock_irq(cpu);
+    old_lock = pcpu_schedule_lock_irqsave(cpu, &flags);
 
     vpriv_old = idle->sched_priv;
     ppriv_old = sd->sched_priv;
@@ -1913,7 +1922,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     sd->schedule_lock = new_lock;
 
     /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
-    spin_unlock_irq(old_lock);
+    spin_unlock_irqrestore(old_lock, flags);
 
     sched_do_tick_resume(new_ops, cpu);
 
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index d82ead586a..dc255b064b 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -437,4 +437,6 @@ affinity_balance_cpumask(const struct vcpu *v, int step, cpumask_t *mask)
         cpumask_copy(mask, v->cpu_hard_affinity);
 }
 
+void sched_rm_cpu(unsigned int cpu);
+
 #endif /* __XEN_SCHED_IF_H__ */
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Xen-devel] [PATCH 3/3] xen/sched: add minimalistic idle scheduler for free cpus
  2019-08-02 13:07 [Xen-devel] [PATCH 0/3] xen/sched: use new idle scheduler for free cpus Juergen Gross
  2019-08-02 13:07 ` [Xen-devel] [PATCH 1/3] xen/sched: populate cpupool0 only after all cpus are up Juergen Gross
  2019-08-02 13:07 ` [Xen-devel] [PATCH 2/3] xen/sched: remove cpu from pool0 before removing it Juergen Gross
@ 2019-08-02 13:07 ` Juergen Gross
  2019-08-13 17:07   ` Dario Faggioli
  2019-08-09  9:47 ` [Xen-devel] [PATCH 0/3] xen/sched: use new " Juergen Gross
  2019-08-13 15:51 ` Dario Faggioli
  4 siblings, 1 reply; 15+ messages in thread
From: Juergen Gross @ 2019-08-02 13:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

Instead of having a full blown scheduler running for the free cpus
add a very minimalistic scheduler for that purpose only ever scheduling
the related idle vcpu. This has the big advantage of not needing any
per-cpu, per-domain or per-scheduling unit data for free cpus and in
turn simplifying moving cpus to and from cpupools a lot.

This new scheduler will just use a common lock for all free cpus.

As this new scheduler is not user selectable don't register it as an
official scheduler, but just include it in schedule.c.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_credit.c |   9 ---
 xen/common/sched_null.c   |   7 ---
 xen/common/schedule.c     | 153 +++++++++++++++++++++++-----------------------
 3 files changed, 75 insertions(+), 94 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 81dee5e472..70fe718127 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -617,15 +617,6 @@ csched_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
 {
     unsigned long flags;
     struct csched_private *prv = CSCHED_PRIV(ops);
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
-
-    /*
-     * This is called either during during boot, resume or hotplug, in
-     * case Credit1 is the scheduler chosen at boot. In such cases, the
-     * scheduler lock for cpu is already pointing to the default per-cpu
-     * spinlock, as Credit1 needs it, so there is no remapping to be done.
-     */
-    ASSERT(sd->schedule_lock == &sd->_lock && !spin_is_locked(&sd->_lock));
 
     spin_lock_irqsave(&prv->lock, flags);
     init_pdata(prv, pdata, cpu);
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index 5aec9f17bd..47330eedf4 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -167,17 +167,10 @@ static void init_pdata(struct null_private *prv, unsigned int cpu)
 static void null_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
 {
     struct null_private *prv = null_priv(ops);
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
 
     /* alloc_pdata is not implemented, so we want this to be NULL. */
     ASSERT(!pdata);
 
-    /*
-     * The scheduler lock points already to the default per-cpu spinlock,
-     * so there is no remapping to be done.
-     */
-    ASSERT(sd->schedule_lock == &sd->_lock && !spin_is_locked(&sd->_lock));
-
     init_pdata(prv, cpu);
 }
 
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 93164c64f6..1106698fb4 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -54,6 +54,10 @@ boolean_param("sched_smt_power_savings", sched_smt_power_savings);
  * */
 int sched_ratelimit_us = SCHED_DEFAULT_RATELIMIT_US;
 integer_param("sched_ratelimit_us", sched_ratelimit_us);
+
+/* Common lock for free cpus. */
+static DEFINE_SPINLOCK(sched_free_cpu_lock);
+
 /* Various timer handlers. */
 static void s_timer_fn(void *unused);
 static void vcpu_periodic_timer_fn(void *data);
@@ -73,6 +77,58 @@ extern const struct scheduler *__start_schedulers_array[], *__end_schedulers_arr
 
 static struct scheduler __read_mostly ops;
 
+static spinlock_t *
+sched_idle_switch_sched(struct scheduler *new_ops, unsigned int cpu,
+                        void *pdata, void *vdata)
+{
+    idle_vcpu[cpu]->sched_priv = NULL;
+
+    return &sched_free_cpu_lock;
+}
+
+static int
+sched_idle_cpu_pick(const struct scheduler *ops, struct vcpu *v)
+{
+    return v->processor;
+}
+
+static void *
+sched_idle_alloc_vdata(const struct scheduler *ops, struct vcpu *v,
+                       void *dd)
+{
+    /* Any non-NULL pointer is fine here. */
+    return (void *)1UL;
+}
+
+static void
+sched_idle_free_vdata(const struct scheduler *ops, void *priv)
+{
+}
+
+static struct task_slice sched_idle_schedule(
+    const struct scheduler *ops, s_time_t now,
+    bool tasklet_work_scheduled)
+{
+    const unsigned int cpu = smp_processor_id();
+    struct task_slice ret = { .time = -1 };
+
+    ret.task = idle_vcpu[cpu];
+    return ret;
+}
+
+static struct scheduler sched_idle_ops = {
+    .name           = "Idle Scheduler",
+    .opt_name       = "idle",
+    .sched_data     = NULL,
+
+    .pick_cpu       = sched_idle_cpu_pick,
+    .do_schedule    = sched_idle_schedule,
+
+    .alloc_vdata    = sched_idle_alloc_vdata,
+    .free_vdata     = sched_idle_free_vdata,
+    .switch_sched   = sched_idle_switch_sched,
+};
+
 static inline struct scheduler *dom_scheduler(const struct domain *d)
 {
     if ( likely(d->cpupool != NULL) )
@@ -1587,12 +1643,10 @@ static void poll_timer_fn(void *data)
 static int cpu_schedule_up(unsigned int cpu)
 {
     struct schedule_data *sd = &per_cpu(schedule_data, cpu);
-    void *sched_priv;
 
-    per_cpu(scheduler, cpu) = &ops;
+    per_cpu(scheduler, cpu) = &sched_idle_ops;
     spin_lock_init(&sd->_lock);
-    sd->schedule_lock = &sd->_lock;
-    sd->curr = idle_vcpu[cpu];
+    sd->schedule_lock = &sched_free_cpu_lock;
     init_timer(&sd->s_timer, s_timer_fn, NULL, cpu);
     atomic_set(&sd->urgent_count, 0);
 
@@ -1602,40 +1656,19 @@ static int cpu_schedule_up(unsigned int cpu)
 
     if ( idle_vcpu[cpu] == NULL )
         vcpu_create(idle_vcpu[0]->domain, cpu, cpu);
-    else
-    {
-        struct vcpu *idle = idle_vcpu[cpu];
-
-        /*
-         * During (ACPI?) suspend the idle vCPU for this pCPU is not freed,
-         * while its scheduler specific data (what is pointed by sched_priv)
-         * is. Also, at this stage of the resume path, we attach the pCPU
-         * to the default scheduler, no matter in what cpupool it was before
-         * suspend. To avoid inconsistency, let's allocate default scheduler
-         * data for the idle vCPU here. If the pCPU was in a different pool
-         * with a different scheduler, it is schedule_cpu_switch(), invoked
-         * later, that will set things up as appropriate.
-         */
-        ASSERT(idle->sched_priv == NULL);
 
-        idle->sched_priv = sched_alloc_vdata(&ops, idle,
-                                             idle->domain->sched_priv);
-        if ( idle->sched_priv == NULL )
-            return -ENOMEM;
-    }
     if ( idle_vcpu[cpu] == NULL )
         return -ENOMEM;
 
     /*
-     * We don't want to risk calling xfree() on an sd->sched_priv
-     * (e.g., inside free_pdata, from cpu_schedule_down() called
-     * during CPU_UP_CANCELLED) that contains an IS_ERR value.
+     * No need to allocate any scheduler data, as cpus coming online are
+     * free initially and the idle scheduler doesn't need any data areas
+     * allocated.
      */
-    sched_priv = sched_alloc_pdata(&ops, cpu);
-    if ( IS_ERR(sched_priv) )
-        return PTR_ERR(sched_priv);
 
-    sd->sched_priv = sched_priv;
+    sd->curr = idle_vcpu[cpu];
+
+    sd->sched_priv = NULL;
 
     return 0;
 }
@@ -1643,13 +1676,6 @@ static int cpu_schedule_up(unsigned int cpu)
 static void cpu_schedule_down(unsigned int cpu)
 {
     struct schedule_data *sd = &per_cpu(schedule_data, cpu);
-    struct scheduler *sched = per_cpu(scheduler, cpu);
-
-    sched_free_pdata(sched, sd->sched_priv, cpu);
-    sched_free_vdata(sched, idle_vcpu[cpu]->sched_priv);
-
-    idle_vcpu[cpu]->sched_priv = NULL;
-    sd->sched_priv = NULL;
 
     kill_timer(&sd->s_timer);
 }
@@ -1657,14 +1683,11 @@ static void cpu_schedule_down(unsigned int cpu)
 void sched_rm_cpu(unsigned int cpu)
 {
     int rc;
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
-    struct scheduler *sched = per_cpu(scheduler, cpu);
 
     rcu_read_lock(&domlist_read_lock);
     rc = cpu_disable_scheduler(cpu);
     BUG_ON(rc);
     rcu_read_unlock(&domlist_read_lock);
-    sched_deinit_pdata(sched, sd->sched_priv, cpu);
     cpu_schedule_down(cpu);
 }
 
@@ -1672,8 +1695,6 @@ static int cpu_schedule_callback(
     struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
     unsigned int cpu = (unsigned long)hcpu;
-    struct scheduler *sched = per_cpu(scheduler, cpu);
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
     int rc = 0;
 
     /*
@@ -1681,39 +1702,25 @@ static int cpu_schedule_callback(
      * allocating and initializing the per-pCPU scheduler specific data,
      * as well as "registering" this pCPU to the scheduler (which may
      * involve modifying some scheduler wide data structures).
-     * This happens by calling the alloc_pdata and init_pdata hooks, in
-     * this order. A scheduler that does not need to allocate any per-pCPU
-     * data can avoid implementing alloc_pdata. init_pdata may, however, be
-     * necessary/useful in this case too (e.g., it can contain the "register
-     * the pCPU to the scheduler" part). alloc_pdata (if present) is called
-     * during CPU_UP_PREPARE. init_pdata (if present) is called during
-     * CPU_STARTING.
+     * As new pCPUs always start as "free" cpus with the minimal idle
+     * scheduler being in charge, we don't need any of that.
      *
      * On the other hand, at teardown, we need to reverse what has been done
-     * during initialization, and then free the per-pCPU specific data. This
-     * happens by calling the deinit_pdata and free_pdata hooks, in this
+     * during initialization, and then free the per-pCPU specific data. A
+     * pCPU brought down is not forced through "free" cpus, so here we need to
+     * use the appropriate hooks.
+     *
+     * This happens by calling the deinit_pdata and free_pdata hooks, in this
      * order. If no per-pCPU memory was allocated, there is no need to
      * provide an implementation of free_pdata. deinit_pdata may, however,
      * be necessary/useful in this case too (e.g., it can undo something done
      * on scheduler wide data structure during init_pdata). Both deinit_pdata
      * and free_pdata are called during CPU_DEAD.
      *
-     * If someting goes wrong during bringup, we go to CPU_UP_CANCELLED
-     * *before* having called init_pdata. In this case, as there is no
-     * initialization needing undoing, only free_pdata should be called.
-     * This means it is possible to call free_pdata just after alloc_pdata,
-     * without a init_pdata/deinit_pdata "cycle" in between the two.
-     *
-     * So, in summary, the usage pattern should look either
-     *  - alloc_pdata-->init_pdata-->deinit_pdata-->free_pdata, or
-     *  - alloc_pdata-->free_pdata.
+     * If someting goes wrong during bringup, we go to CPU_UP_CANCELLED.
      */
     switch ( action )
     {
-    case CPU_STARTING:
-        if ( system_state != SYS_STATE_resume )
-            sched_init_pdata(sched, sd->sched_priv, cpu);
-        break;
     case CPU_UP_PREPARE:
         if ( system_state != SYS_STATE_resume )
             rc = cpu_schedule_up(cpu);
@@ -1824,9 +1831,7 @@ void __init scheduler_init(void)
     idle_domain->max_vcpus = nr_cpu_ids;
     if ( vcpu_create(idle_domain, 0, 0) == NULL )
         BUG();
-    this_cpu(schedule_data).sched_priv = sched_alloc_pdata(&ops, 0);
-    BUG_ON(IS_ERR(this_cpu(schedule_data).sched_priv));
-    sched_init_pdata(&ops, this_cpu(schedule_data).sched_priv, 0);
+    this_cpu(schedule_data).curr = idle_vcpu[0];
 }
 
 /*
@@ -1834,18 +1839,14 @@ void __init scheduler_init(void)
  * cpupool, or subject it to the scheduler of a new cpupool.
  *
  * For the pCPUs that are removed from their cpupool, their scheduler becomes
- * &ops (the default scheduler, selected at boot, which also services the
- * default cpupool). However, as these pCPUs are not really part of any pool,
- * there won't be any scheduling event on them, not even from the default
- * scheduler. Basically, they will just sit idle until they are explicitly
- * added back to a cpupool.
+ * &sched_idle_ops (the idle scheduler).
  */
 int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
 {
     struct vcpu *idle;
     void *ppriv, *ppriv_old, *vpriv, *vpriv_old;
     struct scheduler *old_ops = per_cpu(scheduler, cpu);
-    struct scheduler *new_ops = (c == NULL) ? &ops : c->sched;
+    struct scheduler *new_ops = (c == NULL) ? &sched_idle_ops : c->sched;
     struct cpupool *old_pool = per_cpu(cpupool, cpu);
     struct schedule_data *sd = &per_cpu(schedule_data, cpu);
     spinlock_t *old_lock, *new_lock;
@@ -1865,9 +1866,6 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     ASSERT((c == NULL && !cpumask_test_cpu(cpu, old_pool->cpu_valid)) ||
            (c != NULL && !cpumask_test_cpu(cpu, c->cpu_valid)));
 
-    if ( old_ops == new_ops )
-        goto out;
-
     /*
      * To setup the cpu for the new scheduler we need:
      *  - a valid instance of per-CPU scheduler specific data, as it is
@@ -1931,7 +1929,6 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     sched_free_vdata(old_ops, vpriv_old);
     sched_free_pdata(old_ops, ppriv_old, cpu);
 
- out:
     per_cpu(cpupool, cpu) = c;
     /* When a cpu is added to a pool, trigger it to go pick up some work */
     if ( c != NULL )
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Xen-devel] [PATCH 0/3] xen/sched: use new idle scheduler for free cpus
  2019-08-02 13:07 [Xen-devel] [PATCH 0/3] xen/sched: use new idle scheduler for free cpus Juergen Gross
                   ` (2 preceding siblings ...)
  2019-08-02 13:07 ` [Xen-devel] [PATCH 3/3] xen/sched: add minimalistic idle scheduler for free cpus Juergen Gross
@ 2019-08-09  9:47 ` Juergen Gross
  2019-08-09 10:35   ` Dario Faggioli
  2019-08-13 15:51 ` Dario Faggioli
  4 siblings, 1 reply; 15+ messages in thread
From: Juergen Gross @ 2019-08-09  9:47 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Tim Deegan, Dario Faggioli,
	Julien Grall, Jan Beulich, Ian Jackson

On 02.08.19 15:07, Juergen Gross wrote:
> These three patches have been carved out from my core scheduling series
> as they are sufficiently independent to be applied even without the big
> series.
> 
> Without this little series there are messages like the following to be
> seen on the console when booting with smt=0:
> 
> (XEN) Adding cpu 1 to runqueue 0
> (XEN) CPU 1 still not dead...
> (XEN) CPU 1 still not dead...
> (XEN) CPU 1 still not dead...
> (XEN) CPU 1 still not dead...
> 
> By assigning cpus to Cpupool-0 only after all cpus are up and by not
> using the more complicated credit2 scheduler for cpus not in any
> cpupool this situation can simply no longer happen, as parking the not
> to be started threads is done before.
> 
> Juergen Gross (3):
>    xen/sched: populate cpupool0 only after all cpus are up
>    xen/sched: remove cpu from pool0 before removing it
>    xen/sched: add minimalistic idle scheduler for free cpus

Would it be possible to have some feedback on this series?


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xen-devel] [PATCH 0/3] xen/sched: use new idle scheduler for free cpus
  2019-08-09  9:47 ` [Xen-devel] [PATCH 0/3] xen/sched: use new " Juergen Gross
@ 2019-08-09 10:35   ` Dario Faggioli
  0 siblings, 0 replies; 15+ messages in thread
From: Dario Faggioli @ 2019-08-09 10:35 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Tim Deegan, Julien Grall,
	Jan Beulich, Ian Jackson


[-- Attachment #1.1: Type: text/plain, Size: 891 bytes --]

On Fri, 2019-08-09 at 11:47 +0200, Juergen Gross wrote:
> On 02.08.19 15:07, Juergen Gross wrote:
> > 
> > Juergen Gross (3):
> >    xen/sched: populate cpupool0 only after all cpus are up
> >    xen/sched: remove cpu from pool0 before removing it
> >    xen/sched: add minimalistic idle scheduler for free cpus
> 
> Would it be possible to have some feedback on this series?
>
Mmm... Sorry. 

I like the idea. In fact, while looking at the core-scheduling series,
I was indeed thinking that something like this could be done. But then
I somehow missed this series.

I'll have a look today.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xen-devel] [PATCH 0/3] xen/sched: use new idle scheduler for free cpus
  2019-08-02 13:07 [Xen-devel] [PATCH 0/3] xen/sched: use new idle scheduler for free cpus Juergen Gross
                   ` (3 preceding siblings ...)
  2019-08-09  9:47 ` [Xen-devel] [PATCH 0/3] xen/sched: use new " Juergen Gross
@ 2019-08-13 15:51 ` Dario Faggioli
  2019-08-26  8:34   ` Juergen Gross
  4 siblings, 1 reply; 15+ messages in thread
From: Dario Faggioli @ 2019-08-13 15:51 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: sstabellini, wl, konrad.wilk, George.Dunlap, ian.jackson, tim,
	julien.grall, Jan Beulich, andrew.cooper3


[-- Attachment #1.1: Type: text/plain, Size: 1798 bytes --]

On Fri, 2019-08-02 at 15:07 +0200, Juergen Gross wrote:
> These three patches have been carved out from my core scheduling
> series
> as they are sufficiently independent to be applied even without the
> big
> series.
> 
> Without this little series there are messages like the following to
> be
> seen on the console when booting with smt=0:
> 
> (XEN) Adding cpu 1 to runqueue 0
> (XEN) CPU 1 still not dead...
> (XEN) CPU 1 still not dead...
> (XEN) CPU 1 still not dead...
> (XEN) CPU 1 still not dead...
> 
> By assigning cpus to Cpupool-0 only after all cpus are up and by not
> using the more complicated credit2 scheduler for cpus not in any
> cpupool this situation can simply no longer happen, as parking the
> not
> to be started threads is done before.
> 
And this is not the only problem, solved by this series (or an
equivalent approach).

Right now, CPUs that are not in any pool, still belong to Pool-0's
scheduler. This forces us to make, within the scheduler, extra effort
to avoid actually running vCPUs on those.

In the case of Credit1, this also cause issue to weights
(re)distribution, as the number of CPUs available to the scheduler is
wrong.

So, we absolutely want something like this.

This is described in the changelog of commit e7191920261d ("xen:
credit2: never consider CPUs outside of our cpupool").

Would it be possible to mention this in one of the changelogs of the
series (patch 3 would be the best place, I think).

Thanks and Regards,
Dario
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] xen/sched: populate cpupool0 only after all cpus are up
  2019-08-02 13:07 ` [Xen-devel] [PATCH 1/3] xen/sched: populate cpupool0 only after all cpus are up Juergen Gross
@ 2019-08-13 16:07   ` Dario Faggioli
  2019-08-26  8:35     ` Juergen Gross
  2019-08-14 16:15   ` George Dunlap
  1 sibling, 1 reply; 15+ messages in thread
From: Dario Faggioli @ 2019-08-13 16:07 UTC (permalink / raw)
  To: Juergen Gross, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1673 bytes --]

On Fri, 2019-08-02 at 15:07 +0200, Juergen Gross wrote:
> With core or socket scheduling we need to know the number of siblings
> per scheduling unit before we can setup the scheduler properly. In
> order to prepare that do cpupool0 population only after all cpus are
> up.
> 
> With that in place there is no need to create cpupool0 earlier, so
> do that just before assigning the cpus. Initialize free cpus with all
> online cpus at that time in order to be able to add the cpu notifier
> late, too.
> 
So, now that this series has been made independent, I think that
mentions to the core-scheduling one should be dropped.

I mean, it is at least possible that this series would go in, while the
core-scheduling one never will. And at that point, it would be very
hard, for someone doing archaeology, to understand what went on.

It seems to me that, this patch, simplifies cpupool initialization (as,
e.g., the direct call to the CPU_ONLINE notifier for the BSP was IMO
rather convoluted). And that is made possible by moving the
initialization itself to a later point, making all the online CPUs look
like free CPUs, and using the standard (internal) API directly (i.e.,
cpupool_assign_cpu_locked()) to add them to Pool-0.

So, I'd kill the very first sentence and rearrange the rest to include
at least a quick mention to the simplification that we achieve.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xen-devel] [PATCH 3/3] xen/sched: add minimalistic idle scheduler for free cpus
  2019-08-02 13:07 ` [Xen-devel] [PATCH 3/3] xen/sched: add minimalistic idle scheduler for free cpus Juergen Gross
@ 2019-08-13 17:07   ` Dario Faggioli
  0 siblings, 0 replies; 15+ messages in thread
From: Dario Faggioli @ 2019-08-13 17:07 UTC (permalink / raw)
  To: Juergen Gross, xen-devel; +Cc: george.dunlap


[-- Attachment #1.1: Type: text/plain, Size: 1114 bytes --]

On Fri, 2019-08-02 at 15:07 +0200, Juergen Gross wrote:
> Instead of having a full blown scheduler running for the free cpus
> add a very minimalistic scheduler for that purpose only ever
> scheduling
> the related idle vcpu. This has the big advantage of not needing any
> per-cpu, per-domain or per-scheduling unit data for free cpus and in
> turn simplifying moving cpus to and from cpupools a lot.
> 
> This new scheduler will just use a common lock for all free cpus.
> 
> As this new scheduler is not user selectable don't register it as an
> official scheduler, but just include it in schedule.c.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
>
This patch looks fine to me.

With the changelog adapted as I mentioned in my reply to patch 0, this
is:

Acked-by: Dario Faggioli <dfaggioli@suse.com>

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xen-devel] [PATCH 2/3] xen/sched: remove cpu from pool0 before removing it
  2019-08-02 13:07 ` [Xen-devel] [PATCH 2/3] xen/sched: remove cpu from pool0 before removing it Juergen Gross
@ 2019-08-13 17:11   ` Dario Faggioli
  2019-08-26  8:37     ` Juergen Gross
  0 siblings, 1 reply; 15+ messages in thread
From: Dario Faggioli @ 2019-08-13 17:11 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: sstabellini, wl, konrad.wilk, George.Dunlap, ian.jackson, tim,
	julien.grall, Jan Beulich, andrew.cooper3


[-- Attachment #1.1: Type: text/plain, Size: 1794 bytes --]

On Fri, 2019-08-02 at 15:07 +0200, Juergen Gross wrote:
> Today a cpu which is removed from the system is taken directly from
> Pool0 to the offline state. This will conflict with the new idle
> scheduler, so remove it from Pool0 first. Additionally accept
> removing
> a free cpu instead of requiring it to be in Pool0.
> 
> For the resume failed case we need to call the scheduler code for
> that
> situation after the cpupool handling, so move the scheduler code into
> a function and call it from cpupool_cpu_remove_forced() and remove
> the
> CPU_RESUME_FAILED case from cpu_schedule_callback().
> 
> Note that we are calling now schedule_cpu_switch() in stop_machine
> context so we need to switch from spinlock_irq to spinlock_irqsave.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> 
> --- a/xen/common/cpupool.c
> +++ b/xen/common/cpupool.c
> @@ -282,22 +282,14 @@ static int cpupool_assign_cpu_locked(struct
> cpupool *c, unsigned int cpu)
>      return 0;
>  }
>  
> -static long cpupool_unassign_cpu_helper(void *info)
> +static int cpupool_unassign_cpu_epilogue(struct cpupool *c)
>
in schedule.c, for a similar situation, we have used '_start' and
'_finish' as suffixes. What do you think about using those here too?

It's certainly a minor thing, I know, but I (personally) like them
better (especially than 'epilogue') and I think it gives us some
consistency (yes, sure, different files.. but scheduling and cpupools
are quite tightly related).

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] xen/sched: populate cpupool0 only after all cpus are up
  2019-08-02 13:07 ` [Xen-devel] [PATCH 1/3] xen/sched: populate cpupool0 only after all cpus are up Juergen Gross
  2019-08-13 16:07   ` Dario Faggioli
@ 2019-08-14 16:15   ` George Dunlap
  2019-08-14 16:58     ` Dario Faggioli
  1 sibling, 1 reply; 15+ messages in thread
From: George Dunlap @ 2019-08-14 16:15 UTC (permalink / raw)
  To: Juergen Gross; +Cc: xen-devel, Dario Faggioli

On Fri, Aug 2, 2019 at 2:08 PM Juergen Gross <jgross@suse.com> wrote:
>
> With core or socket scheduling we need to know the number of siblings
> per scheduling unit before we can setup the scheduler properly. In
> order to prepare that do cpupool0 population only after all cpus are
> up.
>
> With that in place there is no need to create cpupool0 earlier, so
> do that just before assigning the cpus. Initialize free cpus with all
> online cpus at that time in order to be able to add the cpu notifier
> late, too.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> V1: new patch
> ---
>  xen/common/cpupool.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
> index f90e496eda..caea5bd8b3 100644
> --- a/xen/common/cpupool.c
> +++ b/xen/common/cpupool.c
> @@ -762,18 +762,28 @@ static struct notifier_block cpu_nfb = {
>      .notifier_call = cpu_callback
>  };
>
> -static int __init cpupool_presmp_init(void)
> +static int __init cpupool_init(void)
>  {
> +    unsigned int cpu;
>      int err;
> -    void *cpu = (void *)(long)smp_processor_id();
> +
>      cpupool0 = cpupool_create(0, 0, &err);
>      BUG_ON(cpupool0 == NULL);
>      cpupool_put(cpupool0);
> -    cpu_callback(&cpu_nfb, CPU_ONLINE, cpu);
>      register_cpu_notifier(&cpu_nfb);
> +
> +    spin_lock(&cpupool_lock);
> +
> +    cpumask_copy(&cpupool_free_cpus, &cpu_online_map);
> +
> +    for_each_cpu ( cpu, &cpupool_free_cpus )
> +        cpupool_assign_cpu_locked(cpupool0, cpu);
> +
> +    spin_unlock(&cpupool_lock);

Just to make sure I understand what's happening here -- cpu_callback()
now won't get called with CPU_ONLINE early in the boot process; but it
will still be called with CPU_ONLINE in other circumstances (e.g.,
hot-plug / suspend / whatever)?

If not, then it's probably best to remove CPU_ONLINE from that switch statement.

Sorry that's an overly-basic question; I don't have a good picture for
the cpu state machine.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] xen/sched: populate cpupool0 only after all cpus are up
  2019-08-14 16:15   ` George Dunlap
@ 2019-08-14 16:58     ` Dario Faggioli
  0 siblings, 0 replies; 15+ messages in thread
From: Dario Faggioli @ 2019-08-14 16:58 UTC (permalink / raw)
  To: George Dunlap, Juergen Gross; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2161 bytes --]

On Wed, 2019-08-14 at 17:15 +0100, George Dunlap wrote:
> On Fri, Aug 2, 2019 at 2:08 PM Juergen Gross <jgross@suse.com> wrote:
> > --- a/xen/common/cpupool.c
> > +++ b/xen/common/cpupool.c
> > @@ -762,18 +762,28 @@ static struct notifier_block cpu_nfb = {
> >      .notifier_call = cpu_callback
> >  };
> > 
> > -static int __init cpupool_presmp_init(void)
> > +static int __init cpupool_init(void)
> >  {
> > +    unsigned int cpu;
> >      int err;
> > -    void *cpu = (void *)(long)smp_processor_id();
> > +
> >      cpupool0 = cpupool_create(0, 0, &err);
> >      BUG_ON(cpupool0 == NULL);
> >      cpupool_put(cpupool0);
> > -    cpu_callback(&cpu_nfb, CPU_ONLINE, cpu);
> >      register_cpu_notifier(&cpu_nfb);
> > +
> > +    spin_lock(&cpupool_lock);
> > +
> > +    cpumask_copy(&cpupool_free_cpus, &cpu_online_map);
> > +
> > +    for_each_cpu ( cpu, &cpupool_free_cpus )
> > +        cpupool_assign_cpu_locked(cpupool0, cpu);
> > +
> > +    spin_unlock(&cpupool_lock);
> 
> Just to make sure I understand what's happening here --
> cpu_callback()
> now won't get called with CPU_ONLINE early in the boot process; but
> it
> will still be called with CPU_ONLINE in other circumstances (e.g.,
> hot-plug / suspend / whatever)?
> 
Exactly.

It is not used for resume (from suspend) any longer, since commit
6870ea9d1fad ("xen/cpupool: simplify suspend/resume handling).

But it is used for putting the various CPUs in Pool-0, as they come-up, 
during boot.

This patch remove the "hack" of calling it directly, during cpupool
initialization, for the BSP.

> Sorry that's an overly-basic question; I don't have a good picture
> for
> the cpu state machine.
> 
Well, I used to... I tried to quickly double check things, and what I
said above should be still valid, even after the latest changes (or so
I hope :-) ).

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xen-devel] [PATCH 0/3] xen/sched: use new idle scheduler for free cpus
  2019-08-13 15:51 ` Dario Faggioli
@ 2019-08-26  8:34   ` Juergen Gross
  0 siblings, 0 replies; 15+ messages in thread
From: Juergen Gross @ 2019-08-26  8:34 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: sstabellini, wl, konrad.wilk, George.Dunlap, ian.jackson, tim,
	julien.grall, Jan Beulich, andrew.cooper3

On 13.08.19 17:51, Dario Faggioli wrote:
> On Fri, 2019-08-02 at 15:07 +0200, Juergen Gross wrote:
>> These three patches have been carved out from my core scheduling
>> series
>> as they are sufficiently independent to be applied even without the
>> big
>> series.
>>
>> Without this little series there are messages like the following to
>> be
>> seen on the console when booting with smt=0:
>>
>> (XEN) Adding cpu 1 to runqueue 0
>> (XEN) CPU 1 still not dead...
>> (XEN) CPU 1 still not dead...
>> (XEN) CPU 1 still not dead...
>> (XEN) CPU 1 still not dead...
>>
>> By assigning cpus to Cpupool-0 only after all cpus are up and by not
>> using the more complicated credit2 scheduler for cpus not in any
>> cpupool this situation can simply no longer happen, as parking the
>> not
>> to be started threads is done before.
>>
> And this is not the only problem, solved by this series (or an
> equivalent approach).
> 
> Right now, CPUs that are not in any pool, still belong to Pool-0's
> scheduler. This forces us to make, within the scheduler, extra effort
> to avoid actually running vCPUs on those.
> 
> In the case of Credit1, this also cause issue to weights
> (re)distribution, as the number of CPUs available to the scheduler is
> wrong.
> 
> So, we absolutely want something like this.
> 
> This is described in the changelog of commit e7191920261d ("xen:
> credit2: never consider CPUs outside of our cpupool").
> 
> Would it be possible to mention this in one of the changelogs of the
> series (patch 3 would be the best place, I think).

Okay.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] xen/sched: populate cpupool0 only after all cpus are up
  2019-08-13 16:07   ` Dario Faggioli
@ 2019-08-26  8:35     ` Juergen Gross
  0 siblings, 0 replies; 15+ messages in thread
From: Juergen Gross @ 2019-08-26  8:35 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel

On 13.08.19 18:07, Dario Faggioli wrote:
> On Fri, 2019-08-02 at 15:07 +0200, Juergen Gross wrote:
>> With core or socket scheduling we need to know the number of siblings
>> per scheduling unit before we can setup the scheduler properly. In
>> order to prepare that do cpupool0 population only after all cpus are
>> up.
>>
>> With that in place there is no need to create cpupool0 earlier, so
>> do that just before assigning the cpus. Initialize free cpus with all
>> online cpus at that time in order to be able to add the cpu notifier
>> late, too.
>>
> So, now that this series has been made independent, I think that
> mentions to the core-scheduling one should be dropped.
> 
> I mean, it is at least possible that this series would go in, while the
> core-scheduling one never will. And at that point, it would be very
> hard, for someone doing archaeology, to understand what went on.
> 
> It seems to me that, this patch, simplifies cpupool initialization (as,
> e.g., the direct call to the CPU_ONLINE notifier for the BSP was IMO
> rather convoluted). And that is made possible by moving the
> initialization itself to a later point, making all the online CPUs look
> like free CPUs, and using the standard (internal) API directly (i.e.,
> cpupool_assign_cpu_locked()) to add them to Pool-0.
> 
> So, I'd kill the very first sentence and rearrange the rest to include
> at least a quick mention to the simplification that we achieve.

Fine with me.


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xen-devel] [PATCH 2/3] xen/sched: remove cpu from pool0 before removing it
  2019-08-13 17:11   ` Dario Faggioli
@ 2019-08-26  8:37     ` Juergen Gross
  0 siblings, 0 replies; 15+ messages in thread
From: Juergen Gross @ 2019-08-26  8:37 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: sstabellini, wl, konrad.wilk, George.Dunlap, ian.jackson, tim,
	julien.grall, Jan Beulich, andrew.cooper3

On 13.08.19 19:11, Dario Faggioli wrote:
> On Fri, 2019-08-02 at 15:07 +0200, Juergen Gross wrote:
>> Today a cpu which is removed from the system is taken directly from
>> Pool0 to the offline state. This will conflict with the new idle
>> scheduler, so remove it from Pool0 first. Additionally accept
>> removing
>> a free cpu instead of requiring it to be in Pool0.
>>
>> For the resume failed case we need to call the scheduler code for
>> that
>> situation after the cpupool handling, so move the scheduler code into
>> a function and call it from cpupool_cpu_remove_forced() and remove
>> the
>> CPU_RESUME_FAILED case from cpu_schedule_callback().
>>
>> Note that we are calling now schedule_cpu_switch() in stop_machine
>> context so we need to switch from spinlock_irq to spinlock_irqsave.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>
>> --- a/xen/common/cpupool.c
>> +++ b/xen/common/cpupool.c
>> @@ -282,22 +282,14 @@ static int cpupool_assign_cpu_locked(struct
>> cpupool *c, unsigned int cpu)
>>       return 0;
>>   }
>>   
>> -static long cpupool_unassign_cpu_helper(void *info)
>> +static int cpupool_unassign_cpu_epilogue(struct cpupool *c)
>>
> in schedule.c, for a similar situation, we have used '_start' and
> '_finish' as suffixes. What do you think about using those here too?
> 
> It's certainly a minor thing, I know, but I (personally) like them
> better (especially than 'epilogue') and I think it gives us some
> consistency (yes, sure, different files.. but scheduling and cpupools
> are quite tightly related).

Okay, will rename.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-08-26  8:37 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-02 13:07 [Xen-devel] [PATCH 0/3] xen/sched: use new idle scheduler for free cpus Juergen Gross
2019-08-02 13:07 ` [Xen-devel] [PATCH 1/3] xen/sched: populate cpupool0 only after all cpus are up Juergen Gross
2019-08-13 16:07   ` Dario Faggioli
2019-08-26  8:35     ` Juergen Gross
2019-08-14 16:15   ` George Dunlap
2019-08-14 16:58     ` Dario Faggioli
2019-08-02 13:07 ` [Xen-devel] [PATCH 2/3] xen/sched: remove cpu from pool0 before removing it Juergen Gross
2019-08-13 17:11   ` Dario Faggioli
2019-08-26  8:37     ` Juergen Gross
2019-08-02 13:07 ` [Xen-devel] [PATCH 3/3] xen/sched: add minimalistic idle scheduler for free cpus Juergen Gross
2019-08-13 17:07   ` Dario Faggioli
2019-08-09  9:47 ` [Xen-devel] [PATCH 0/3] xen/sched: use new " Juergen Gross
2019-08-09 10:35   ` Dario Faggioli
2019-08-13 15:51 ` Dario Faggioli
2019-08-26  8:34   ` Juergen Gross

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).