[PATCH 0/5] xen: sched: scheduling (mostly, Credit2) and cpupool fixes and improvements

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/5] xen: sched: scheduling (mostly, Credit2) and cpupool fixes and improvements
@ 2017-01-17 17:26 Dario Faggioli
  2017-01-17 17:26 ` [PATCH 1/5] xen: credit2: use the correct scratch cpumask Dario Faggioli
                   ` (4 more replies)
  0 siblings, 5 replies; 28+ messages in thread
From: Dario Faggioli @ 2017-01-17 17:26 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Juergen Gross, Anshul Makkar, Jan Beulich

Hello,

This series fixes a few issues issues, related to Credit2 and to scheduling and
cpupools interactions in a more general fashion.

The first 3 patches cures (symptoms of) bugs in Credit2, and should be
backported to 4.8 (it should not be too hard to do so, and I can help with
that, if necessary).

In fact, patch 1 ("xen: credit2: use the correct scratch cpumask."), fixes a
buggy behavior identified by Jan here [1]. No Oops, or ASSERT were triggering,
but there's the risk of incurring in nonoptimal or unpredictable scheduling
behavior, when multiple cpupools, with different schedulers, are used.

Patch 2 ("xen: credit2: never consider CPUs outside of our cpupool.") is
necessary because I thought we were already taking all the proper measure to
have Credit2 vCPUs live in their cpupool, but that wasn't the case. The patch
cures potential crash, so it's important, IMO, and should also be backported.
As noted in the extended changelog, while working on this, I identified some
unideal aspects of the interface and the interactions between cpupools and the
scheduler. Fixing that properly will require more work, if not a rethink of the
said interface.

Path 3 ("xen: credit2: fix shutdown/suspend when playing with cpupools.") also
fixes a bug which manifests itself when the host is shutdown or attempts
suspending with the BSP (CPU 0, as of now) not belonging to cpupool0 as it does
by default. This again manifests only when Credit2 is involved (see patch
description for more details), but is more general and could potentially affect
any scheduler that does a runqueue lock remapping and management similar to
what Credit2 does in that department. This is probably the most 'invasive'
(affects schedule.c), but I think it should also be backported.

The last 2 patches, OTOH, are improvements rather than bugfixes, and so they're
not backport candidates.

There is a git branch with the patch applied available here:

 * git://xenbits.xen.org/people/dariof/xen.git rel/sched/fix-credit2-cpupool
 * http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=shortlog;h=refs/heads/rel/sched/fix-credit2-cpupool
 * https://travis-ci.org/fdario/xen/builds/192726171

Thanks and Regards,
Dario

---
Dario Faggioli (5):
      xen: credit2: use the correct scratch cpumask.
      xen: credit2: never consider CPUs outside of our cpupool.
      xen: credit2: fix shutdown/suspend when playing with cpupools.
      xen: sched: impove use of cpumask scratch space in Credit1.
      xen: sched: simplify ACPI S3 resume path.

 xen/common/sched_credit.c  |    5 +-
 xen/common/sched_credit2.c |  110 ++++++++++++++++++++++++++++++++------------
 xen/common/schedule.c      |   48 ++++++++++++-------
 xen/include/xen/sched-if.h |    7 +++
 4 files changed, 118 insertions(+), 52 deletions(-)
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/5] xen: credit2: use the correct scratch cpumask.
  2017-01-17 17:26 [PATCH 0/5] xen: sched: scheduling (mostly, Credit2) and cpupool fixes and improvements Dario Faggioli
@ 2017-01-17 17:26 ` Dario Faggioli
  2017-01-19 12:22   ` George Dunlap
  2017-01-17 17:26 ` [PATCH 2/5] xen: credit2: never consider CPUs outside of our cpupool Dario Faggioli
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 28+ messages in thread
From: Dario Faggioli @ 2017-01-17 17:26 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Jan Beulich

In fact, there is one scratch mask per each CPU. When
you use the one of a CPU, it must be true that:
 - the CPU belongs to your cpupool and scheduler,
 - you own the runqueue lock (the one you take via
   {v,p}cpu_schedule_lock()) for that CPU.

This was not the case within the following functions:

get_fallback_cpu(), csched2_cpu_pick(): as we can't be
sure we either are on, or hold the lock for, the CPU
that is in the vCPU's 'v->processor'.

migrate(): it's ok, when called from balance_load(),
because that comes from csched2_schedule(), which takes
the runqueue lock of the CPU where it executes. But it is
not ok when we come from csched2_vcpu_migrate(), which
can be called from other places.

The fix is to explicitly use the scratch space of the
CPUs for which we know we hold the runqueue lock.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reported-by: Jan Beulich <JBeulich@suse.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
This is a bugfix, and should be backported to 4.8.
---
 xen/common/sched_credit2.c |   39 ++++++++++++++++++++-------------------
 1 file changed, 20 insertions(+), 19 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index ef8e0d8..523922e 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -510,24 +510,23 @@ void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
  */
 static int get_fallback_cpu(struct csched2_vcpu *svc)
 {
-    int cpu;
+    int fallback_cpu, cpu = svc->vcpu->processor;
 
-    if ( likely(cpumask_test_cpu(svc->vcpu->processor,
-                                 svc->vcpu->cpu_hard_affinity)) )
-        return svc->vcpu->processor;
+    if ( likely(cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity)) )
+        return cpu;
 
-    cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
+    cpumask_and(cpumask_scratch_cpu(cpu), svc->vcpu->cpu_hard_affinity,
                 &svc->rqd->active);
-    cpu = cpumask_first(cpumask_scratch);
-    if ( likely(cpu < nr_cpu_ids) )
-        return cpu;
+    fallback_cpu = cpumask_first(cpumask_scratch_cpu(cpu));
+    if ( likely(fallback_cpu < nr_cpu_ids) )
+        return fallback_cpu;
 
     cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
                 cpupool_domain_cpumask(svc->vcpu->domain));
 
-    ASSERT(!cpumask_empty(cpumask_scratch));
+    ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)));
 
-    return cpumask_first(cpumask_scratch);
+    return cpumask_first(cpumask_scratch_cpu(cpu));
 }
 
 /*
@@ -1492,7 +1491,7 @@ static int
 csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
 {
     struct csched2_private *prv = CSCHED2_PRIV(ops);
-    int i, min_rqi = -1, new_cpu;
+    int i, min_rqi = -1, new_cpu, cpu = vc->processor;
     struct csched2_vcpu *svc = CSCHED2_VCPU(vc);
     s_time_t min_avgload = MAX_LOAD;
 
@@ -1512,7 +1511,7 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
      * just grab the prv lock.  Instead, we'll have to trylock, and
      * do something else reasonable if we fail.
      */
-    ASSERT(spin_is_locked(per_cpu(schedule_data, vc->processor).schedule_lock));
+    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
 
     if ( !read_trylock(&prv->lock) )
     {
@@ -1539,9 +1538,9 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
         }
         else
         {
-            cpumask_and(cpumask_scratch, vc->cpu_hard_affinity,
+            cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
                         &svc->migrate_rqd->active);
-            new_cpu = cpumask_any(cpumask_scratch);
+            new_cpu = cpumask_any(cpumask_scratch_cpu(cpu));
             if ( new_cpu < nr_cpu_ids )
                 goto out_up;
         }
@@ -1598,9 +1597,9 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
         goto out_up;
     }
 
-    cpumask_and(cpumask_scratch, vc->cpu_hard_affinity,
+    cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
                 &prv->rqd[min_rqi].active);
-    new_cpu = cpumask_any(cpumask_scratch);
+    new_cpu = cpumask_any(cpumask_scratch_cpu(cpu));
     BUG_ON(new_cpu >= nr_cpu_ids);
 
  out_up:
@@ -1675,6 +1674,8 @@ static void migrate(const struct scheduler *ops,
                     struct csched2_runqueue_data *trqd, 
                     s_time_t now)
 {
+    int cpu = svc->vcpu->processor;
+
     if ( unlikely(tb_init_done) )
     {
         struct {
@@ -1696,7 +1697,7 @@ static void migrate(const struct scheduler *ops,
         svc->migrate_rqd = trqd;
         __set_bit(_VPF_migrating, &svc->vcpu->pause_flags);
         __set_bit(__CSFLAG_runq_migrate_request, &svc->flags);
-        cpu_raise_softirq(svc->vcpu->processor, SCHEDULE_SOFTIRQ);
+        cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
         SCHED_STAT_CRANK(migrate_requested);
     }
     else
@@ -1711,9 +1712,9 @@ static void migrate(const struct scheduler *ops,
         }
         __runq_deassign(svc);
 
-        cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
+        cpumask_and(cpumask_scratch_cpu(cpu), svc->vcpu->cpu_hard_affinity,
                     &trqd->active);
-        svc->vcpu->processor = cpumask_any(cpumask_scratch);
+        svc->vcpu->processor = cpumask_any(cpumask_scratch_cpu(cpu));
         ASSERT(svc->vcpu->processor < nr_cpu_ids);
 
         __runq_assign(svc, trqd);


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/5] xen: credit2: never consider CPUs outside of our cpupool.
  2017-01-17 17:26 [PATCH 0/5] xen: sched: scheduling (mostly, Credit2) and cpupool fixes and improvements Dario Faggioli
  2017-01-17 17:26 ` [PATCH 1/5] xen: credit2: use the correct scratch cpumask Dario Faggioli
@ 2017-01-17 17:26 ` Dario Faggioli
  2017-01-19  8:08   ` Juergen Gross
                     ` (2 more replies)
  2017-01-17 17:26 ` [PATCH 3/5] xen: credit2: fix shutdown/suspend when playing with cpupools Dario Faggioli
                   ` (2 subsequent siblings)
  4 siblings, 3 replies; 28+ messages in thread
From: Dario Faggioli @ 2017-01-17 17:26 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Juergen Gross, Jan Beulich

In fact, relying on the mask of what pCPUs belong to
which Credit2 runqueue is not enough. If we only do that,
when Credit2 is the boot scheduler, we may ASSERT() or
panic when moving a pCPU from Pool-0 to another cpupool.

This is because pCPUs outside of any pool are considered
part of cpupool0. This puts us at risk of crash when those
same pCPUs are added to another pool and something
different than the idle domain is found to be running
on them.

Note that, even if we prevent the above to happen (which
is the purpose of this patch), this is still pretty bad,
in fact, when we remove a pCPU from Pool-0:
- in Credit1, as we do *not* update prv->ncpus and
  prv->credit, which means we're considering the wrong
  total credits when doing accounting;
- in Credit2, the pCPU remains part of one runqueue,
  and is hence at least considered during load balancing,
  even if no vCPU should really run there.

In Credit1, this "only" causes skewed accounting and
no crashes because there is a lot of `cpumask_and`ing
going on with the cpumask of the domains' cpupool
(which, BTW, comes at a price).

A quick and not to involved (and easily backportable)
solution for Credit2, is to do exactly the same.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
This is a bugfix, and should be backported to 4.8.
---
The proper solution would mean calling deinit_pdata() when removing a pCPU from
cpupool0, as well as a bit more of code reshuffling.

And, although worth doing, it certainly will take more work, more time, and
will probably be hard (well, surely harder than this) to backport.

Therefore, I'd argue in favor of both taking and backporting this change, which
at least enables using Credit2 as default scheduler without risking a crash
when creating a second cpupool.

Afterwards, a proper solution would be proposed for Xen 4.9.

Finally, given the wide number of issues similar to this that I've found and
fixed in the last release cycle, I think it would be good to take a stab at
whether the interface between cpupools and the schedulers could not be
simplified. :-O

Regards,
Dario
---
 xen/common/sched_credit2.c |   59 ++++++++++++++++++++++++++++----------------
 1 file changed, 38 insertions(+), 21 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 523922e..ce0e146 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -510,19 +510,22 @@ void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
  */
 static int get_fallback_cpu(struct csched2_vcpu *svc)
 {
-    int fallback_cpu, cpu = svc->vcpu->processor;
+    struct vcpu *v = svc->vcpu;
+    int cpu = v->processor;
 
-    if ( likely(cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity)) )
-        return cpu;
+    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                cpupool_domain_cpumask(v->domain));
 
-    cpumask_and(cpumask_scratch_cpu(cpu), svc->vcpu->cpu_hard_affinity,
-                &svc->rqd->active);
-    fallback_cpu = cpumask_first(cpumask_scratch_cpu(cpu));
-    if ( likely(fallback_cpu < nr_cpu_ids) )
-        return fallback_cpu;
+    if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
+        return cpu;
 
-    cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
-                cpupool_domain_cpumask(svc->vcpu->domain));
+    if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
+                                   &svc->rqd->active)) )
+    {
+        cpumask_and(cpumask_scratch_cpu(cpu), &svc->rqd->active,
+                    cpumask_scratch_cpu(cpu));
+        return cpumask_first(cpumask_scratch_cpu(cpu));
+    }
 
     ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)));
 
@@ -940,6 +943,9 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
                     (unsigned char *)&d);
     }
 
+    cpumask_and(cpumask_scratch_cpu(cpu), new->vcpu->cpu_hard_affinity,
+                cpupool_domain_cpumask(new->vcpu->domain));
+
     /*
      * First of all, consider idle cpus, checking if we can just
      * re-use the pcpu where we were running before.
@@ -952,7 +958,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
         cpumask_andnot(&mask, &rqd->idle, &rqd->smt_idle);
     else
         cpumask_copy(&mask, &rqd->smt_idle);
-    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
+    cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
     i = cpumask_test_or_cycle(cpu, &mask);
     if ( i < nr_cpu_ids )
     {
@@ -967,7 +973,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
      * gone through the scheduler yet.
      */
     cpumask_andnot(&mask, &rqd->idle, &rqd->tickled);
-    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
+    cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
     i = cpumask_test_or_cycle(cpu, &mask);
     if ( i < nr_cpu_ids )
     {
@@ -983,7 +989,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
      */
     cpumask_andnot(&mask, &rqd->active, &rqd->idle);
     cpumask_andnot(&mask, &mask, &rqd->tickled);
-    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
+    cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
     if ( cpumask_test_cpu(cpu, &mask) )
     {
         cur = CSCHED2_VCPU(curr_on_cpu(cpu));
@@ -1525,6 +1531,9 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
         goto out;
     }
 
+    cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
+                cpupool_domain_cpumask(vc->domain));
+
     /*
      * First check to see if we're here because someone else suggested a place
      * for us to move.
@@ -1536,13 +1545,13 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
             printk(XENLOG_WARNING "%s: target runqueue disappeared!\n",
                    __func__);
         }
-        else
+        else if ( cpumask_intersects(cpumask_scratch_cpu(cpu),
+                                     &svc->migrate_rqd->active) )
         {
-            cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
+            cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                         &svc->migrate_rqd->active);
             new_cpu = cpumask_any(cpumask_scratch_cpu(cpu));
-            if ( new_cpu < nr_cpu_ids )
-                goto out_up;
+            goto out_up;
         }
         /* Fall-through to normal cpu pick */
     }
@@ -1570,12 +1579,12 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
          */
         if ( rqd == svc->rqd )
         {
-            if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
+            if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
                 rqd_avgload = max_t(s_time_t, rqd->b_avgload - svc->avgload, 0);
         }
         else if ( spin_trylock(&rqd->lock) )
         {
-            if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
+            if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
                 rqd_avgload = rqd->b_avgload;
 
             spin_unlock(&rqd->lock);
@@ -1597,7 +1606,7 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
         goto out_up;
     }
 
-    cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
+    cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                 &prv->rqd[min_rqi].active);
     new_cpu = cpumask_any(cpumask_scratch_cpu(cpu));
     BUG_ON(new_cpu >= nr_cpu_ids);
@@ -1713,6 +1722,8 @@ static void migrate(const struct scheduler *ops,
         __runq_deassign(svc);
 
         cpumask_and(cpumask_scratch_cpu(cpu), svc->vcpu->cpu_hard_affinity,
+                    cpupool_domain_cpumask(svc->vcpu->domain));
+        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                     &trqd->active);
         svc->vcpu->processor = cpumask_any(cpumask_scratch_cpu(cpu));
         ASSERT(svc->vcpu->processor < nr_cpu_ids);
@@ -1738,8 +1749,14 @@ static void migrate(const struct scheduler *ops,
 static bool_t vcpu_is_migrateable(struct csched2_vcpu *svc,
                                   struct csched2_runqueue_data *rqd)
 {
+    struct vcpu *v = svc->vcpu;
+    int cpu = svc->vcpu->processor;
+
+    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                cpupool_domain_cpumask(v->domain));
+
     return !(svc->flags & CSFLAG_runq_migrate_request) &&
-           cpumask_intersects(svc->vcpu->cpu_hard_affinity, &rqd->active);
+           cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active);
 }
 
 static void balance_load(const struct scheduler *ops, int cpu, s_time_t now)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/5] xen: credit2: fix shutdown/suspend when playing with cpupools.
  2017-01-17 17:26 [PATCH 0/5] xen: sched: scheduling (mostly, Credit2) and cpupool fixes and improvements Dario Faggioli
  2017-01-17 17:26 ` [PATCH 1/5] xen: credit2: use the correct scratch cpumask Dario Faggioli
  2017-01-17 17:26 ` [PATCH 2/5] xen: credit2: never consider CPUs outside of our cpupool Dario Faggioli
@ 2017-01-17 17:26 ` Dario Faggioli
  2017-01-23 15:42   ` George Dunlap
  2017-01-17 17:27 ` [PATCH 4/5] xen: sched: impove use of cpumask scratch space in Credit1 Dario Faggioli
  2017-01-17 17:27 ` [PATCH 5/5] xen: sched: simplify ACPI S3 resume path Dario Faggioli
  4 siblings, 1 reply; 28+ messages in thread
From: Dario Faggioli @ 2017-01-17 17:26 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap

In fact, during shutdown/suspend, we temporarily move all
the vCPUs to the BSP (i.e., pCPU 0, as of now). For Credit2
domains, we call csched2_vcpu_migrate(), expects to find the
target pCPU in the domain's pool

Therefore, if Credit2 is the default scheduler and we have
removed pCPU 0 from cpupool0, shutdown/suspend fails like
this:

 RIP:    e008:[<ffff82d08012906d>] sched_credit2.c#migrate+0x274/0x2d1
 Xen call trace:
    [<ffff82d08012906d>] sched_credit2.c#migrate+0x274/0x2d1
    [<ffff82d080129138>] sched_credit2.c#csched2_vcpu_migrate+0x6e/0x86
    [<ffff82d08012c468>] schedule.c#vcpu_move_locked+0x69/0x6f
    [<ffff82d08012ec14>] cpu_disable_scheduler+0x3d7/0x430
    [<ffff82d08019669b>] __cpu_disable+0x299/0x2b0
    [<ffff82d0801012f8>] cpu.c#take_cpu_down+0x2f/0x38
    [<ffff82d0801312d8>] stop_machine.c#stopmachine_action+0x7f/0x8d
    [<ffff82d0801330b8>] tasklet.c#do_tasklet_work+0x74/0xab
    [<ffff82d0801333ed>] do_tasklet+0x66/0x8b
    [<ffff82d080166a73>] domain.c#idle_loop+0x3b/0x5e

 ****************************************
 Panic on CPU 8:
 Assertion 'svc->vcpu->processor < nr_cpu_ids' failed at sched_credit2.c:1729
 ****************************************

On the other hand, if Credit2 is the scheduler of another
pool, when trying (still during shutdown/suspend) to move
the vCPUs of the Credit2 domains to pCPU 0, it figures
out that pCPU 0 is not a Credit2 pCPU, and fails like this:

 RIP:    e008:[<ffff82d08012916b>] sched_credit2.c#csched2_vcpu_migrate+0xa1/0x107
 Xen call trace:
    [<ffff82d08012916b>] sched_credit2.c#csched2_vcpu_migrate+0xa1/0x107
    [<ffff82d08012c4e9>] schedule.c#vcpu_move_locked+0x69/0x6f
    [<ffff82d08012edfc>] cpu_disable_scheduler+0x3d7/0x430
    [<ffff82d08019687b>] __cpu_disable+0x299/0x2b0
    [<ffff82d0801012f8>] cpu.c#take_cpu_down+0x2f/0x38
    [<ffff82d0801314c0>] stop_machine.c#stopmachine_action+0x7f/0x8d
    [<ffff82d0801332a0>] tasklet.c#do_tasklet_work+0x74/0xab
    [<ffff82d0801335d5>] do_tasklet+0x66/0x8b
    [<ffff82d080166c53>] domain.c#idle_loop+0x3b/0x5e

The solution is to recognise the specific situation, inside
csched2_vcpu_migrate() and, considering it is something temporary,
which only happens during shutdown/suspend, quickly deal with it.

Then, in the resume path, in restore_vcpu_affinity(), things
are set back to normal, and a new v->processor is chosen, for
each vCPU, from the proper set of pCPUs (i.e., the ones of
the proper cpupool).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
---
This is a bugfix, and should be backported to 4.8.

Note that Credit2 being used, either as default scheduler or in a cpupool is
what triggers the bug, but it's actually more a general thing, which would
affect any scheduler that remaps the runqueue locks.
---
 xen/common/sched_credit2.c |   32 +++++++++++++++++++++++++++++++-
 xen/common/schedule.c      |   25 ++++++++++++++++++++++---
 2 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index ce0e146..2ce738d 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1946,13 +1946,43 @@ static void
 csched2_vcpu_migrate(
     const struct scheduler *ops, struct vcpu *vc, unsigned int new_cpu)
 {
+    struct domain *d = vc->domain;
     struct csched2_vcpu * const svc = CSCHED2_VCPU(vc);
     struct csched2_runqueue_data *trqd;
+    s_time_t now = NOW();
 
     /* Check if new_cpu is valid */
     ASSERT(cpumask_test_cpu(new_cpu, &CSCHED2_PRIV(ops)->initialized));
     ASSERT(cpumask_test_cpu(new_cpu, vc->cpu_hard_affinity));
 
+    /*
+     * Being passed a target pCPU which is outside of our cpupool is only
+     * valid if we are shutting down (or doing ACPI suspend), and we are
+     * moving everyone to BSP, no matter whether or not BSP is inside our
+     * cpupool.
+     *
+     * And since there indeed is the chance that it is not part of it, all
+     * we must do is remove _and_ unassign the vCPU from any runqueue, as
+     * well as updating v->processor with the target, so that the suspend
+     * process can continue.
+     *
+     * It will then be during resume that a new, meaningful, value for
+     * v->processor will be chosen, and during actual domain unpause that
+     * the vCPU will be assigned to and added to the proper runqueue.
+     */
+    if ( unlikely(!cpumask_test_cpu(new_cpu, cpupool_domain_cpumask(d))) )
+    {
+        ASSERT(system_state == SYS_STATE_suspend);
+        if ( __vcpu_on_runq(svc) )
+        {
+            __runq_remove(svc);
+            update_load(ops, svc->rqd, NULL, -1, now);
+        }
+        __runq_deassign(svc);
+        vc->processor = new_cpu;
+        return;
+    }
+
     trqd = RQD(ops, new_cpu);
 
     /*
@@ -1964,7 +1994,7 @@ csched2_vcpu_migrate(
      * pointing to a pcpu where we can't run any longer.
      */
     if ( trqd != svc->rqd )
-        migrate(ops, svc, trqd, NOW());
+        migrate(ops, svc, trqd, now);
     else
         vc->processor = new_cpu;
 }
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 5b444c4..36ff2e9 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -633,8 +633,11 @@ void vcpu_force_reschedule(struct vcpu *v)
 
 void restore_vcpu_affinity(struct domain *d)
 {
+    unsigned int cpu = smp_processor_id();
     struct vcpu *v;
 
+    ASSERT(system_state == SYS_STATE_resume);
+
     for_each_vcpu ( d, v )
     {
         spinlock_t *lock = vcpu_schedule_lock_irq(v);
@@ -643,18 +646,34 @@ void restore_vcpu_affinity(struct domain *d)
         {
             cpumask_copy(v->cpu_hard_affinity, v->cpu_hard_affinity_saved);
             v->affinity_broken = 0;
+
         }
 
-        if ( v->processor == smp_processor_id() )
+        /*
+         * During suspend (in cpu_disable_scheduler()), we moved every vCPU
+         * to BSP (which, as of now, is pCPU 0), as a temporary measure to
+         * allow the nonboot processors to have their data structure freed
+         * and go to sleep. But nothing guardantees that the BSP is a valid
+         * pCPU for a particular domain.
+         *
+         * Therefore, here, before actually unpausing the domains, we should
+         * set v->processor of each of their vCPUs to something that will
+         * make sense for the scheduler of the cpupool in which they are in.
+         */
+        cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                    cpupool_domain_cpumask(v->domain));
+        v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
+
+        if ( v->processor == cpu )
         {
             set_bit(_VPF_migrating, &v->pause_flags);
-            vcpu_schedule_unlock_irq(lock, v);
+            spin_unlock_irq(lock);;
             vcpu_sleep_nosync(v);
             vcpu_migrate(v);
         }
         else
         {
-            vcpu_schedule_unlock_irq(lock, v);
+            spin_unlock_irq(lock);
         }
     }
 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/5] xen: sched: impove use of cpumask scratch space in Credit1.
  2017-01-17 17:26 [PATCH 0/5] xen: sched: scheduling (mostly, Credit2) and cpupool fixes and improvements Dario Faggioli
                   ` (2 preceding siblings ...)
  2017-01-17 17:26 ` [PATCH 3/5] xen: credit2: fix shutdown/suspend when playing with cpupools Dario Faggioli
@ 2017-01-17 17:27 ` Dario Faggioli
  2017-01-18  9:45   ` Jan Beulich
  2017-01-23 15:47   ` George Dunlap
  2017-01-17 17:27 ` [PATCH 5/5] xen: sched: simplify ACPI S3 resume path Dario Faggioli
  4 siblings, 2 replies; 28+ messages in thread
From: Dario Faggioli @ 2017-01-17 17:27 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Jan Beulich

It is ok to use just cpumask_scratch in csched_runq_steal().
In fact, the cpu parameter comes from the cpu local variable
in csched_load_balance(), which in turn comes from cpu in
csched_schedule(), which is smp_processor_id().

While there, also:
 - move the comment about cpumask_scratch in the header
   where the scratch space is declared;
 - spell more clearly (in that same comment) what are the
   serialization rules.

No functional change intended.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
 xen/common/sched_credit.c  |    5 ++---
 xen/common/schedule.c      |    7 +------
 xen/include/xen/sched-if.h |    7 +++++++
 3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index dfe8545..ad20819 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1636,9 +1636,8 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
                  && !__vcpu_has_soft_affinity(vc, vc->cpu_hard_affinity) )
                 continue;
 
-            csched_balance_cpumask(vc, balance_step, cpumask_scratch_cpu(cpu));
-            if ( __csched_vcpu_is_migrateable(vc, cpu,
-                                              cpumask_scratch_cpu(cpu)) )
+            csched_balance_cpumask(vc, balance_step, cpumask_scratch);
+            if ( __csched_vcpu_is_migrateable(vc, cpu, cpumask_scratch) )
             {
                 /* We got a candidate. Grab it! */
                 TRACE_3D(TRC_CSCHED_STOLEN_VCPU, peer_cpu,
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 36ff2e9..bee5d1f 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -65,12 +65,7 @@ static void poll_timer_fn(void *data);
 DEFINE_PER_CPU(struct schedule_data, schedule_data);
 DEFINE_PER_CPU(struct scheduler *, scheduler);
 
-/*
- * Scratch space, for avoiding having too many cpumask_var_t on the stack.
- * Properly serializing access, if necessary, is responsibility of each
- * scheduler (typically, one can expect this to be protected by the per pCPU
- * or per runqueue lock).
- */
+/* Scratch space for cpumasks. */
 DEFINE_PER_CPU(cpumask_t, cpumask_scratch);
 
 extern const struct scheduler *__start_schedulers_array[], *__end_schedulers_array[];
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index bc0e794..c25cda6 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -47,6 +47,13 @@ DECLARE_PER_CPU(struct schedule_data, schedule_data);
 DECLARE_PER_CPU(struct scheduler *, scheduler);
 DECLARE_PER_CPU(struct cpupool *, cpupool);
 
+/*
+ * Scratch space, for avoiding having too many cpumask_var_t on the stack.
+ * Within each scheduler, when using the scratch mask of one pCPU:
+ * - the pCPU must belong to the scheduler,
+ * - the caller must own the per-pCPU scheduler lock (a.k.a. runqueue
+ *   lock).
+ */
 DECLARE_PER_CPU(cpumask_t, cpumask_scratch);
 #define cpumask_scratch        (&this_cpu(cpumask_scratch))
 #define cpumask_scratch_cpu(c) (&per_cpu(cpumask_scratch, c))


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 5/5] xen: sched: simplify ACPI S3 resume path.
  2017-01-17 17:26 [PATCH 0/5] xen: sched: scheduling (mostly, Credit2) and cpupool fixes and improvements Dario Faggioli
                   ` (3 preceding siblings ...)
  2017-01-17 17:27 ` [PATCH 4/5] xen: sched: impove use of cpumask scratch space in Credit1 Dario Faggioli
@ 2017-01-17 17:27 ` Dario Faggioli
  2017-01-23 15:52   ` George Dunlap
  4 siblings, 1 reply; 28+ messages in thread
From: Dario Faggioli @ 2017-01-17 17:27 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap

In fact, when domains are being unpaused:
 - it's not necessary to put the vcpu to sleep, as
   they are all already paused;
 - it is not necessary to call vcpu_migrate(), as
   the vcpus are still paused, and therefore won't
   wakeup anyway.

Basically, the only important thing is to call
pick_cpu, to let the scheduler run and figure out
what would be the best initial placement (i.e., the
value stored in v->processor), for the vcpus, as
they come back up, one after another.

Note that this is consistent with what was happening
before this change, as vcpu_migrate() calls pick_cpu.
But much simpler and quicker.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
---
 xen/common/schedule.c |   22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index bee5d1f..43b5b99 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -635,7 +635,11 @@ void restore_vcpu_affinity(struct domain *d)
 
     for_each_vcpu ( d, v )
     {
-        spinlock_t *lock = vcpu_schedule_lock_irq(v);
+        spinlock_t *lock;
+
+        ASSERT(!vcpu_runnable(v));
+
+        lock = vcpu_schedule_lock_irq(v);
 
         if ( v->affinity_broken )
         {
@@ -659,17 +663,11 @@ void restore_vcpu_affinity(struct domain *d)
                     cpupool_domain_cpumask(v->domain));
         v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
 
-        if ( v->processor == cpu )
-        {
-            set_bit(_VPF_migrating, &v->pause_flags);
-            spin_unlock_irq(lock);;
-            vcpu_sleep_nosync(v);
-            vcpu_migrate(v);
-        }
-        else
-        {
-            spin_unlock_irq(lock);
-        }
+        spin_unlock_irq(lock);;
+
+        lock = vcpu_schedule_lock_irq(v);
+        v->processor = SCHED_OP(VCPU2OP(v), pick_cpu, v);
+        spin_unlock_irq(lock);
     }
 
     domain_update_node_affinity(d);


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/5] xen: sched: impove use of cpumask scratch space in Credit1.
  2017-01-17 17:27 ` [PATCH 4/5] xen: sched: impove use of cpumask scratch space in Credit1 Dario Faggioli
@ 2017-01-18  9:45   ` Jan Beulich
  2017-01-18  9:54     ` Dario Faggioli
  2017-01-23 15:47   ` George Dunlap
  1 sibling, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2017-01-18  9:45 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel

>>> On 17.01.17 at 18:27, <dario.faggioli@citrix.com> wrote:
> --- a/xen/include/xen/sched-if.h
> +++ b/xen/include/xen/sched-if.h
> @@ -47,6 +47,13 @@ DECLARE_PER_CPU(struct schedule_data, schedule_data);
>  DECLARE_PER_CPU(struct scheduler *, scheduler);
>  DECLARE_PER_CPU(struct cpupool *, cpupool);
>  
> +/*
> + * Scratch space, for avoiding having too many cpumask_var_t on the stack.

Mind dropping the stray / misleading _var infix from here? There's
no problem having many cpumask_var_t-s on the stack, as they're
small. cpumask_t instances are problematic.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/5] xen: sched: impove use of cpumask scratch space in Credit1.
  2017-01-18  9:45   ` Jan Beulich
@ 2017-01-18  9:54     ` Dario Faggioli
  0 siblings, 0 replies; 28+ messages in thread
From: Dario Faggioli @ 2017-01-18  9:54 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 998 bytes --]

On Wed, 2017-01-18 at 02:45 -0700, Jan Beulich wrote:
> > > > On 17.01.17 at 18:27, <dario.faggioli@citrix.com> wrote:
> > --- a/xen/include/xen/sched-if.h
> > +++ b/xen/include/xen/sched-if.h
> > @@ -47,6 +47,13 @@ DECLARE_PER_CPU(struct schedule_data,
> > schedule_data);
> >  DECLARE_PER_CPU(struct scheduler *, scheduler);
> >  DECLARE_PER_CPU(struct cpupool *, cpupool);
> >  
> > +/*
> > + * Scratch space, for avoiding having too many cpumask_var_t on
> > the stack.
> 
> Mind dropping the stray / misleading _var infix from here? 
>
Ah, sure!

> There's
> no problem having many cpumask_var_t-s on the stack, as they're
> small. cpumask_t instances are problematic.
> 
Of course. Will do.

Thanks,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] xen: credit2: never consider CPUs outside of our cpupool.
  2017-01-17 17:26 ` [PATCH 2/5] xen: credit2: never consider CPUs outside of our cpupool Dario Faggioli
@ 2017-01-19  8:08   ` Juergen Gross
  2017-01-19  8:22     ` Dario Faggioli
  2017-01-23 14:40     ` George Dunlap
  2017-01-23 15:20   ` George Dunlap
  2017-02-03  8:41   ` Jan Beulich
  2 siblings, 2 replies; 28+ messages in thread
From: Juergen Gross @ 2017-01-19  8:08 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Jan Beulich

On 17/01/17 18:26, Dario Faggioli wrote:
> In fact, relying on the mask of what pCPUs belong to
> which Credit2 runqueue is not enough. If we only do that,
> when Credit2 is the boot scheduler, we may ASSERT() or
> panic when moving a pCPU from Pool-0 to another cpupool.
> 
> This is because pCPUs outside of any pool are considered
> part of cpupool0. This puts us at risk of crash when those
> same pCPUs are added to another pool and something
> different than the idle domain is found to be running
> on them.
> 
> Note that, even if we prevent the above to happen (which
> is the purpose of this patch), this is still pretty bad,
> in fact, when we remove a pCPU from Pool-0:
> - in Credit1, as we do *not* update prv->ncpus and
>   prv->credit, which means we're considering the wrong
>   total credits when doing accounting;
> - in Credit2, the pCPU remains part of one runqueue,
>   and is hence at least considered during load balancing,
>   even if no vCPU should really run there.
> 
> In Credit1, this "only" causes skewed accounting and
> no crashes because there is a lot of `cpumask_and`ing
> going on with the cpumask of the domains' cpupool
> (which, BTW, comes at a price).
> 
> A quick and not to involved (and easily backportable)
> solution for Credit2, is to do exactly the same.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> ---
> This is a bugfix, and should be backported to 4.8.
> ---
> The proper solution would mean calling deinit_pdata() when removing a pCPU from
> cpupool0, as well as a bit more of code reshuffling.
> 
> And, although worth doing, it certainly will take more work, more time, and
> will probably be hard (well, surely harder than this) to backport.
> 
> Therefore, I'd argue in favor of both taking and backporting this change, which
> at least enables using Credit2 as default scheduler without risking a crash
> when creating a second cpupool.
> 
> Afterwards, a proper solution would be proposed for Xen 4.9.
> 
> Finally, given the wide number of issues similar to this that I've found and
> fixed in the last release cycle, I think it would be good to take a stab at
> whether the interface between cpupools and the schedulers could not be
> simplified. :-O

The first patch version for support of cpupools I sent used an own
scheduler instance for the free cpus. Keir merged this instance with
the one for Pool-0.

I think it might be a good idea to check whether some of the problems
are going away with the free cpu scheduler idea again. Maybe it would
even be a good idea to add a "null-scheduler" for this purpose
supporting only one vcpu per pcpu. This scheduler could be used for
high performance domains, too.


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] xen: credit2: never consider CPUs outside of our cpupool.
  2017-01-19  8:08   ` Juergen Gross
@ 2017-01-19  8:22     ` Dario Faggioli
  2017-01-23 14:40     ` George Dunlap
  1 sibling, 0 replies; 28+ messages in thread
From: Dario Faggioli @ 2017-01-19  8:22 UTC (permalink / raw)
  To: Juergen Gross, xen-devel; +Cc: George Dunlap, anshul.makkar, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 2104 bytes --]

[Adding Anshul which also had to deal with something similar to this
 recently]

On Thu, 2017-01-19 at 09:08 +0100, Juergen Gross wrote:
> On 17/01/17 18:26, Dario Faggioli wrote:
> > Finally, given the wide number of issues similar to this that I've
> > found and
> > fixed in the last release cycle, I think it would be good to take a
> > stab at
> > whether the interface between cpupools and the schedulers could not
> > be
> > simplified. :-O
> 
> The first patch version for support of cpupools I sent used an own
> scheduler instance for the free cpus. Keir merged this instance with
> the one for Pool-0.
> 
Really? Because I more than once thought that something like that would
be a good idea. I even started to implement it once... I then dropped
it, because I was able to deal with that problem in another way, and
had more pressing priorities, but I always liked the concept.

> I think it might be a good idea to check whether some of the problems
> are going away with the free cpu scheduler idea again. 
>
I think some of them will.

AFAICT, out of the top of my head, that alone won't avoid having to
modify when and how a couple of scheduling hooks are called (the pCPU
data allocation and initialization ones), but most likely will simplify
making that change.

> Maybe it would
> even be a good idea to add a "null-scheduler" for this purpose
> supporting only one vcpu per pcpu. This scheduler could be used for
> high performance domains, too.
> 
Oh, I guess it can. At first, I was actually planning for such
scheduler to always and only select the appropriate idle vCPU... But it
can be extended to do something different, if wanted.

Let's see if others want to chime in. If not, I'll dust off the half-
backed patches I have for this and see if I can make them work.

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/5] xen: credit2: use the correct scratch cpumask.
  2017-01-17 17:26 ` [PATCH 1/5] xen: credit2: use the correct scratch cpumask Dario Faggioli
@ 2017-01-19 12:22   ` George Dunlap
  0 siblings, 0 replies; 28+ messages in thread
From: George Dunlap @ 2017-01-19 12:22 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Jan Beulich

On 17/01/17 17:26, Dario Faggioli wrote:
> In fact, there is one scratch mask per each CPU. When
> you use the one of a CPU, it must be true that:
>  - the CPU belongs to your cpupool and scheduler,
>  - you own the runqueue lock (the one you take via
>    {v,p}cpu_schedule_lock()) for that CPU.
> 
> This was not the case within the following functions:
> 
> get_fallback_cpu(), csched2_cpu_pick(): as we can't be
> sure we either are on, or hold the lock for, the CPU
> that is in the vCPU's 'v->processor'.
> 
> migrate(): it's ok, when called from balance_load(),
> because that comes from csched2_schedule(), which takes
> the runqueue lock of the CPU where it executes. But it is
> not ok when we come from csched2_vcpu_migrate(), which
> can be called from other places.
> 
> The fix is to explicitly use the scratch space of the
> CPUs for which we know we hold the runqueue lock.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Reported-by: Jan Beulich <JBeulich@suse.com>

Reviewed-by: George Dunlap <george.dunlap@citrix.com>

And queued.

> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> ---
> This is a bugfix, and should be backported to 4.8.
> ---
>  xen/common/sched_credit2.c |   39 ++++++++++++++++++++-------------------
>  1 file changed, 20 insertions(+), 19 deletions(-)
> 
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index ef8e0d8..523922e 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -510,24 +510,23 @@ void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
>   */
>  static int get_fallback_cpu(struct csched2_vcpu *svc)
>  {
> -    int cpu;
> +    int fallback_cpu, cpu = svc->vcpu->processor;
>  
> -    if ( likely(cpumask_test_cpu(svc->vcpu->processor,
> -                                 svc->vcpu->cpu_hard_affinity)) )
> -        return svc->vcpu->processor;
> +    if ( likely(cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity)) )
> +        return cpu;
>  
> -    cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
> +    cpumask_and(cpumask_scratch_cpu(cpu), svc->vcpu->cpu_hard_affinity,
>                  &svc->rqd->active);
> -    cpu = cpumask_first(cpumask_scratch);
> -    if ( likely(cpu < nr_cpu_ids) )
> -        return cpu;
> +    fallback_cpu = cpumask_first(cpumask_scratch_cpu(cpu));
> +    if ( likely(fallback_cpu < nr_cpu_ids) )
> +        return fallback_cpu;
>  
>      cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
>                  cpupool_domain_cpumask(svc->vcpu->domain));
>  
> -    ASSERT(!cpumask_empty(cpumask_scratch));
> +    ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)));
>  
> -    return cpumask_first(cpumask_scratch);
> +    return cpumask_first(cpumask_scratch_cpu(cpu));
>  }
>  
>  /*
> @@ -1492,7 +1491,7 @@ static int
>  csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
>  {
>      struct csched2_private *prv = CSCHED2_PRIV(ops);
> -    int i, min_rqi = -1, new_cpu;
> +    int i, min_rqi = -1, new_cpu, cpu = vc->processor;
>      struct csched2_vcpu *svc = CSCHED2_VCPU(vc);
>      s_time_t min_avgload = MAX_LOAD;
>  
> @@ -1512,7 +1511,7 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
>       * just grab the prv lock.  Instead, we'll have to trylock, and
>       * do something else reasonable if we fail.
>       */
> -    ASSERT(spin_is_locked(per_cpu(schedule_data, vc->processor).schedule_lock));
> +    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
>  
>      if ( !read_trylock(&prv->lock) )
>      {
> @@ -1539,9 +1538,9 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
>          }
>          else
>          {
> -            cpumask_and(cpumask_scratch, vc->cpu_hard_affinity,
> +            cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
>                          &svc->migrate_rqd->active);
> -            new_cpu = cpumask_any(cpumask_scratch);
> +            new_cpu = cpumask_any(cpumask_scratch_cpu(cpu));
>              if ( new_cpu < nr_cpu_ids )
>                  goto out_up;
>          }
> @@ -1598,9 +1597,9 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
>          goto out_up;
>      }
>  
> -    cpumask_and(cpumask_scratch, vc->cpu_hard_affinity,
> +    cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
>                  &prv->rqd[min_rqi].active);
> -    new_cpu = cpumask_any(cpumask_scratch);
> +    new_cpu = cpumask_any(cpumask_scratch_cpu(cpu));
>      BUG_ON(new_cpu >= nr_cpu_ids);
>  
>   out_up:
> @@ -1675,6 +1674,8 @@ static void migrate(const struct scheduler *ops,
>                      struct csched2_runqueue_data *trqd, 
>                      s_time_t now)
>  {
> +    int cpu = svc->vcpu->processor;
> +
>      if ( unlikely(tb_init_done) )
>      {
>          struct {
> @@ -1696,7 +1697,7 @@ static void migrate(const struct scheduler *ops,
>          svc->migrate_rqd = trqd;
>          __set_bit(_VPF_migrating, &svc->vcpu->pause_flags);
>          __set_bit(__CSFLAG_runq_migrate_request, &svc->flags);
> -        cpu_raise_softirq(svc->vcpu->processor, SCHEDULE_SOFTIRQ);
> +        cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
>          SCHED_STAT_CRANK(migrate_requested);
>      }
>      else
> @@ -1711,9 +1712,9 @@ static void migrate(const struct scheduler *ops,
>          }
>          __runq_deassign(svc);
>  
> -        cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
> +        cpumask_and(cpumask_scratch_cpu(cpu), svc->vcpu->cpu_hard_affinity,
>                      &trqd->active);
> -        svc->vcpu->processor = cpumask_any(cpumask_scratch);
> +        svc->vcpu->processor = cpumask_any(cpumask_scratch_cpu(cpu));
>          ASSERT(svc->vcpu->processor < nr_cpu_ids);
>  
>          __runq_assign(svc, trqd);
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] xen: credit2: never consider CPUs outside of our cpupool.
  2017-01-19  8:08   ` Juergen Gross
  2017-01-19  8:22     ` Dario Faggioli
@ 2017-01-23 14:40     ` George Dunlap
  2017-01-24 12:35       ` Juergen Gross
  1 sibling, 1 reply; 28+ messages in thread
From: George Dunlap @ 2017-01-23 14:40 UTC (permalink / raw)
  To: Juergen Gross; +Cc: xen-devel, Dario Faggioli, Jan Beulich

On Thu, Jan 19, 2017 at 8:08 AM, Juergen Gross <jgross@suse.com> wrote:
> On 17/01/17 18:26, Dario Faggioli wrote:
>> In fact, relying on the mask of what pCPUs belong to
>> which Credit2 runqueue is not enough. If we only do that,
>> when Credit2 is the boot scheduler, we may ASSERT() or
>> panic when moving a pCPU from Pool-0 to another cpupool.
>>
>> This is because pCPUs outside of any pool are considered
>> part of cpupool0. This puts us at risk of crash when those
>> same pCPUs are added to another pool and something
>> different than the idle domain is found to be running
>> on them.
>>
>> Note that, even if we prevent the above to happen (which
>> is the purpose of this patch), this is still pretty bad,
>> in fact, when we remove a pCPU from Pool-0:
>> - in Credit1, as we do *not* update prv->ncpus and
>>   prv->credit, which means we're considering the wrong
>>   total credits when doing accounting;
>> - in Credit2, the pCPU remains part of one runqueue,
>>   and is hence at least considered during load balancing,
>>   even if no vCPU should really run there.
>>
>> In Credit1, this "only" causes skewed accounting and
>> no crashes because there is a lot of `cpumask_and`ing
>> going on with the cpumask of the domains' cpupool
>> (which, BTW, comes at a price).
>>
>> A quick and not to involved (and easily backportable)
>> solution for Credit2, is to do exactly the same.
>>
>> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com
>> ---
>> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>> Cc: Juergen Gross <jgross@suse.com>
>> Cc: Jan Beulich <jbeulich@suse.com>
>> ---
>> This is a bugfix, and should be backported to 4.8.
>> ---
>> The proper solution would mean calling deinit_pdata() when removing a pCPU from
>> cpupool0, as well as a bit more of code reshuffling.
>>
>> And, although worth doing, it certainly will take more work, more time, and
>> will probably be hard (well, surely harder than this) to backport.
>>
>> Therefore, I'd argue in favor of both taking and backporting this change, which
>> at least enables using Credit2 as default scheduler without risking a crash
>> when creating a second cpupool.
>>
>> Afterwards, a proper solution would be proposed for Xen 4.9.
>>
>> Finally, given the wide number of issues similar to this that I've found and
>> fixed in the last release cycle, I think it would be good to take a stab at
>> whether the interface between cpupools and the schedulers could not be
>> simplified. :-O
>
> The first patch version for support of cpupools I sent used an own
> scheduler instance for the free cpus. Keir merged this instance with
> the one for Pool-0.

You mean he just made the change unilaterally without posting it to
the list?  I just went back and skimmed through the old cpupools
threads and didn't see this discussed.

Having a "cpupool-remove" operation that doesn't actually remove the
cpu from the pool is a bit mad...

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] xen: credit2: never consider CPUs outside of our cpupool.
  2017-01-17 17:26 ` [PATCH 2/5] xen: credit2: never consider CPUs outside of our cpupool Dario Faggioli
  2017-01-19  8:08   ` Juergen Gross
@ 2017-01-23 15:20   ` George Dunlap
  2017-02-03  8:41   ` Jan Beulich
  2 siblings, 0 replies; 28+ messages in thread
From: George Dunlap @ 2017-01-23 15:20 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Juergen Gross, xen-devel, Jan Beulich

On Tue, Jan 17, 2017 at 5:26 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> In fact, relying on the mask of what pCPUs belong to
> which Credit2 runqueue is not enough. If we only do that,
> when Credit2 is the boot scheduler, we may ASSERT() or
> panic when moving a pCPU from Pool-0 to another cpupool.
>
> This is because pCPUs outside of any pool are considered
> part of cpupool0. This puts us at risk of crash when those
> same pCPUs are added to another pool and something
> different than the idle domain is found to be running
> on them.
>
> Note that, even if we prevent the above to happen (which
> is the purpose of this patch), this is still pretty bad,
> in fact, when we remove a pCPU from Pool-0:
> - in Credit1, as we do *not* update prv->ncpus and
>   prv->credit, which means we're considering the wrong
>   total credits when doing accounting;
> - in Credit2, the pCPU remains part of one runqueue,
>   and is hence at least considered during load balancing,
>   even if no vCPU should really run there.
>
> In Credit1, this "only" causes skewed accounting and
> no crashes because there is a lot of `cpumask_and`ing
> going on with the cpumask of the domains' cpupool
> (which, BTW, comes at a price).
>
> A quick and not to involved (and easily backportable)
> solution for Credit2, is to do exactly the same.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com

Blech.  But I agree we need a fix we can backport:

Acked-by: George Dunlap <george.dunlap@citrix.com>

> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> ---
> This is a bugfix, and should be backported to 4.8.
> ---
> The proper solution would mean calling deinit_pdata() when removing a pCPU from
> cpupool0, as well as a bit more of code reshuffling.
>
> And, although worth doing, it certainly will take more work, more time, and
> will probably be hard (well, surely harder than this) to backport.
>
> Therefore, I'd argue in favor of both taking and backporting this change, which
> at least enables using Credit2 as default scheduler without risking a crash
> when creating a second cpupool.
>
> Afterwards, a proper solution would be proposed for Xen 4.9.
>
> Finally, given the wide number of issues similar to this that I've found and
> fixed in the last release cycle, I think it would be good to take a stab at
> whether the interface between cpupools and the schedulers could not be
> simplified. :-O
>
> Regards,
> Dario
> ---
>  xen/common/sched_credit2.c |   59 ++++++++++++++++++++++++++++----------------
>  1 file changed, 38 insertions(+), 21 deletions(-)
>
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 523922e..ce0e146 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -510,19 +510,22 @@ void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
>   */
>  static int get_fallback_cpu(struct csched2_vcpu *svc)
>  {
> -    int fallback_cpu, cpu = svc->vcpu->processor;
> +    struct vcpu *v = svc->vcpu;
> +    int cpu = v->processor;
>
> -    if ( likely(cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity)) )
> -        return cpu;
> +    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
> +                cpupool_domain_cpumask(v->domain));
>
> -    cpumask_and(cpumask_scratch_cpu(cpu), svc->vcpu->cpu_hard_affinity,
> -                &svc->rqd->active);
> -    fallback_cpu = cpumask_first(cpumask_scratch_cpu(cpu));
> -    if ( likely(fallback_cpu < nr_cpu_ids) )
> -        return fallback_cpu;
> +    if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
> +        return cpu;
>
> -    cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
> -                cpupool_domain_cpumask(svc->vcpu->domain));
> +    if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
> +                                   &svc->rqd->active)) )
> +    {
> +        cpumask_and(cpumask_scratch_cpu(cpu), &svc->rqd->active,
> +                    cpumask_scratch_cpu(cpu));
> +        return cpumask_first(cpumask_scratch_cpu(cpu));
> +    }
>
>      ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)));
>
> @@ -940,6 +943,9 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
>                      (unsigned char *)&d);
>      }
>
> +    cpumask_and(cpumask_scratch_cpu(cpu), new->vcpu->cpu_hard_affinity,
> +                cpupool_domain_cpumask(new->vcpu->domain));
> +
>      /*
>       * First of all, consider idle cpus, checking if we can just
>       * re-use the pcpu where we were running before.
> @@ -952,7 +958,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
>          cpumask_andnot(&mask, &rqd->idle, &rqd->smt_idle);
>      else
>          cpumask_copy(&mask, &rqd->smt_idle);
> -    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
> +    cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
>      i = cpumask_test_or_cycle(cpu, &mask);
>      if ( i < nr_cpu_ids )
>      {
> @@ -967,7 +973,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
>       * gone through the scheduler yet.
>       */
>      cpumask_andnot(&mask, &rqd->idle, &rqd->tickled);
> -    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
> +    cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
>      i = cpumask_test_or_cycle(cpu, &mask);
>      if ( i < nr_cpu_ids )
>      {
> @@ -983,7 +989,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
>       */
>      cpumask_andnot(&mask, &rqd->active, &rqd->idle);
>      cpumask_andnot(&mask, &mask, &rqd->tickled);
> -    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
> +    cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
>      if ( cpumask_test_cpu(cpu, &mask) )
>      {
>          cur = CSCHED2_VCPU(curr_on_cpu(cpu));
> @@ -1525,6 +1531,9 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
>          goto out;
>      }
>
> +    cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
> +                cpupool_domain_cpumask(vc->domain));
> +
>      /*
>       * First check to see if we're here because someone else suggested a place
>       * for us to move.
> @@ -1536,13 +1545,13 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
>              printk(XENLOG_WARNING "%s: target runqueue disappeared!\n",
>                     __func__);
>          }
> -        else
> +        else if ( cpumask_intersects(cpumask_scratch_cpu(cpu),
> +                                     &svc->migrate_rqd->active) )
>          {
> -            cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
> +            cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
>                          &svc->migrate_rqd->active);
>              new_cpu = cpumask_any(cpumask_scratch_cpu(cpu));
> -            if ( new_cpu < nr_cpu_ids )
> -                goto out_up;
> +            goto out_up;
>          }
>          /* Fall-through to normal cpu pick */
>      }
> @@ -1570,12 +1579,12 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
>           */
>          if ( rqd == svc->rqd )
>          {
> -            if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
> +            if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
>                  rqd_avgload = max_t(s_time_t, rqd->b_avgload - svc->avgload, 0);
>          }
>          else if ( spin_trylock(&rqd->lock) )
>          {
> -            if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
> +            if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
>                  rqd_avgload = rqd->b_avgload;
>
>              spin_unlock(&rqd->lock);
> @@ -1597,7 +1606,7 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
>          goto out_up;
>      }
>
> -    cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
> +    cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
>                  &prv->rqd[min_rqi].active);
>      new_cpu = cpumask_any(cpumask_scratch_cpu(cpu));
>      BUG_ON(new_cpu >= nr_cpu_ids);
> @@ -1713,6 +1722,8 @@ static void migrate(const struct scheduler *ops,
>          __runq_deassign(svc);
>
>          cpumask_and(cpumask_scratch_cpu(cpu), svc->vcpu->cpu_hard_affinity,
> +                    cpupool_domain_cpumask(svc->vcpu->domain));
> +        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
>                      &trqd->active);
>          svc->vcpu->processor = cpumask_any(cpumask_scratch_cpu(cpu));
>          ASSERT(svc->vcpu->processor < nr_cpu_ids);
> @@ -1738,8 +1749,14 @@ static void migrate(const struct scheduler *ops,
>  static bool_t vcpu_is_migrateable(struct csched2_vcpu *svc,
>                                    struct csched2_runqueue_data *rqd)
>  {
> +    struct vcpu *v = svc->vcpu;
> +    int cpu = svc->vcpu->processor;
> +
> +    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
> +                cpupool_domain_cpumask(v->domain));
> +
>      return !(svc->flags & CSFLAG_runq_migrate_request) &&
> -           cpumask_intersects(svc->vcpu->cpu_hard_affinity, &rqd->active);
> +           cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active);
>  }
>
>  static void balance_load(const struct scheduler *ops, int cpu, s_time_t now)
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/5] xen: credit2: fix shutdown/suspend when playing with cpupools.
  2017-01-17 17:26 ` [PATCH 3/5] xen: credit2: fix shutdown/suspend when playing with cpupools Dario Faggioli
@ 2017-01-23 15:42   ` George Dunlap
  0 siblings, 0 replies; 28+ messages in thread
From: George Dunlap @ 2017-01-23 15:42 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: xen-devel

On Tue, Jan 17, 2017 at 5:26 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> In fact, during shutdown/suspend, we temporarily move all
> the vCPUs to the BSP (i.e., pCPU 0, as of now). For Credit2
> domains, we call csched2_vcpu_migrate(), expects to find the
> target pCPU in the domain's pool
>
> Therefore, if Credit2 is the default scheduler and we have
> removed pCPU 0 from cpupool0, shutdown/suspend fails like
> this:
>
>  RIP:    e008:[<ffff82d08012906d>] sched_credit2.c#migrate+0x274/0x2d1
>  Xen call trace:
>     [<ffff82d08012906d>] sched_credit2.c#migrate+0x274/0x2d1
>     [<ffff82d080129138>] sched_credit2.c#csched2_vcpu_migrate+0x6e/0x86
>     [<ffff82d08012c468>] schedule.c#vcpu_move_locked+0x69/0x6f
>     [<ffff82d08012ec14>] cpu_disable_scheduler+0x3d7/0x430
>     [<ffff82d08019669b>] __cpu_disable+0x299/0x2b0
>     [<ffff82d0801012f8>] cpu.c#take_cpu_down+0x2f/0x38
>     [<ffff82d0801312d8>] stop_machine.c#stopmachine_action+0x7f/0x8d
>     [<ffff82d0801330b8>] tasklet.c#do_tasklet_work+0x74/0xab
>     [<ffff82d0801333ed>] do_tasklet+0x66/0x8b
>     [<ffff82d080166a73>] domain.c#idle_loop+0x3b/0x5e
>
>  ****************************************
>  Panic on CPU 8:
>  Assertion 'svc->vcpu->processor < nr_cpu_ids' failed at sched_credit2.c:1729
>  ****************************************
>
> On the other hand, if Credit2 is the scheduler of another
> pool, when trying (still during shutdown/suspend) to move
> the vCPUs of the Credit2 domains to pCPU 0, it figures
> out that pCPU 0 is not a Credit2 pCPU, and fails like this:
>
>  RIP:    e008:[<ffff82d08012916b>] sched_credit2.c#csched2_vcpu_migrate+0xa1/0x107
>  Xen call trace:
>     [<ffff82d08012916b>] sched_credit2.c#csched2_vcpu_migrate+0xa1/0x107
>     [<ffff82d08012c4e9>] schedule.c#vcpu_move_locked+0x69/0x6f
>     [<ffff82d08012edfc>] cpu_disable_scheduler+0x3d7/0x430
>     [<ffff82d08019687b>] __cpu_disable+0x299/0x2b0
>     [<ffff82d0801012f8>] cpu.c#take_cpu_down+0x2f/0x38
>     [<ffff82d0801314c0>] stop_machine.c#stopmachine_action+0x7f/0x8d
>     [<ffff82d0801332a0>] tasklet.c#do_tasklet_work+0x74/0xab
>     [<ffff82d0801335d5>] do_tasklet+0x66/0x8b
>     [<ffff82d080166c53>] domain.c#idle_loop+0x3b/0x5e
>
> The solution is to recognise the specific situation, inside
> csched2_vcpu_migrate() and, considering it is something temporary,
> which only happens during shutdown/suspend, quickly deal with it.
>
> Then, in the resume path, in restore_vcpu_affinity(), things
> are set back to normal, and a new v->processor is chosen, for
> each vCPU, from the proper set of pCPUs (i.e., the ones of
> the proper cpupool).
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Acked-by: George Dunlap <george.dunlap@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/5] xen: sched: impove use of cpumask scratch space in Credit1.
  2017-01-17 17:27 ` [PATCH 4/5] xen: sched: impove use of cpumask scratch space in Credit1 Dario Faggioli
  2017-01-18  9:45   ` Jan Beulich
@ 2017-01-23 15:47   ` George Dunlap
  1 sibling, 0 replies; 28+ messages in thread
From: George Dunlap @ 2017-01-23 15:47 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: xen-devel, Jan Beulich

On Tue, Jan 17, 2017 at 5:27 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> It is ok to use just cpumask_scratch in csched_runq_steal().
> In fact, the cpu parameter comes from the cpu local variable
> in csched_load_balance(), which in turn comes from cpu in
> csched_schedule(), which is smp_processor_id().
>
> While there, also:
>  - move the comment about cpumask_scratch in the header
>    where the scratch space is declared;
>  - spell more clearly (in that same comment) what are the
>    serialization rules.
>
> No functional change intended.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Acked-by: George Dunlap <george.dunlap@citrix.com>

And...

> +/*
> + * Scratch space, for avoiding having too many cpumask_var_t on the stack.

I can fix this up on check-in.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/5] xen: sched: simplify ACPI S3 resume path.
  2017-01-17 17:27 ` [PATCH 5/5] xen: sched: simplify ACPI S3 resume path Dario Faggioli
@ 2017-01-23 15:52   ` George Dunlap
  0 siblings, 0 replies; 28+ messages in thread
From: George Dunlap @ 2017-01-23 15:52 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: xen-devel

On Tue, Jan 17, 2017 at 5:27 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> In fact, when domains are being unpaused:
>  - it's not necessary to put the vcpu to sleep, as
>    they are all already paused;
>  - it is not necessary to call vcpu_migrate(), as
>    the vcpus are still paused, and therefore won't
>    wakeup anyway.
>
> Basically, the only important thing is to call
> pick_cpu, to let the scheduler run and figure out
> what would be the best initial placement (i.e., the
> value stored in v->processor), for the vcpus, as
> they come back up, one after another.
>
> Note that this is consistent with what was happening
> before this change, as vcpu_migrate() calls pick_cpu.
> But much simpler and quicker.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Reviewed-by: George Dunlap <george.dunlap@citrix.com>

> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> ---
>  xen/common/schedule.c |   22 ++++++++++------------
>  1 file changed, 10 insertions(+), 12 deletions(-)
>
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index bee5d1f..43b5b99 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -635,7 +635,11 @@ void restore_vcpu_affinity(struct domain *d)
>
>      for_each_vcpu ( d, v )
>      {
> -        spinlock_t *lock = vcpu_schedule_lock_irq(v);
> +        spinlock_t *lock;
> +
> +        ASSERT(!vcpu_runnable(v));
> +
> +        lock = vcpu_schedule_lock_irq(v);
>
>          if ( v->affinity_broken )
>          {
> @@ -659,17 +663,11 @@ void restore_vcpu_affinity(struct domain *d)
>                      cpupool_domain_cpumask(v->domain));
>          v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
>
> -        if ( v->processor == cpu )
> -        {
> -            set_bit(_VPF_migrating, &v->pause_flags);
> -            spin_unlock_irq(lock);;
> -            vcpu_sleep_nosync(v);
> -            vcpu_migrate(v);
> -        }
> -        else
> -        {
> -            spin_unlock_irq(lock);
> -        }
> +        spin_unlock_irq(lock);;
> +
> +        lock = vcpu_schedule_lock_irq(v);
> +        v->processor = SCHED_OP(VCPU2OP(v), pick_cpu, v);
> +        spin_unlock_irq(lock);
>      }
>
>      domain_update_node_affinity(d);
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] xen: credit2: never consider CPUs outside of our cpupool.
  2017-01-23 14:40     ` George Dunlap
@ 2017-01-24 12:35       ` Juergen Gross
  2017-01-24 12:49         ` Dario Faggioli
  0 siblings, 1 reply; 28+ messages in thread
From: Juergen Gross @ 2017-01-24 12:35 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Dario Faggioli, Jan Beulich

On 23/01/17 15:40, George Dunlap wrote:
> On Thu, Jan 19, 2017 at 8:08 AM, Juergen Gross <jgross@suse.com> wrote:
>> On 17/01/17 18:26, Dario Faggioli wrote:
>>> In fact, relying on the mask of what pCPUs belong to
>>> which Credit2 runqueue is not enough. If we only do that,
>>> when Credit2 is the boot scheduler, we may ASSERT() or
>>> panic when moving a pCPU from Pool-0 to another cpupool.
>>>
>>> This is because pCPUs outside of any pool are considered
>>> part of cpupool0. This puts us at risk of crash when those
>>> same pCPUs are added to another pool and something
>>> different than the idle domain is found to be running
>>> on them.
>>>
>>> Note that, even if we prevent the above to happen (which
>>> is the purpose of this patch), this is still pretty bad,
>>> in fact, when we remove a pCPU from Pool-0:
>>> - in Credit1, as we do *not* update prv->ncpus and
>>>   prv->credit, which means we're considering the wrong
>>>   total credits when doing accounting;
>>> - in Credit2, the pCPU remains part of one runqueue,
>>>   and is hence at least considered during load balancing,
>>>   even if no vCPU should really run there.
>>>
>>> In Credit1, this "only" causes skewed accounting and
>>> no crashes because there is a lot of `cpumask_and`ing
>>> going on with the cpumask of the domains' cpupool
>>> (which, BTW, comes at a price).
>>>
>>> A quick and not to involved (and easily backportable)
>>> solution for Credit2, is to do exactly the same.
>>>
>>> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com
>>> ---
>>> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>>> Cc: Juergen Gross <jgross@suse.com>
>>> Cc: Jan Beulich <jbeulich@suse.com>
>>> ---
>>> This is a bugfix, and should be backported to 4.8.
>>> ---
>>> The proper solution would mean calling deinit_pdata() when removing a pCPU from
>>> cpupool0, as well as a bit more of code reshuffling.
>>>
>>> And, although worth doing, it certainly will take more work, more time, and
>>> will probably be hard (well, surely harder than this) to backport.
>>>
>>> Therefore, I'd argue in favor of both taking and backporting this change, which
>>> at least enables using Credit2 as default scheduler without risking a crash
>>> when creating a second cpupool.
>>>
>>> Afterwards, a proper solution would be proposed for Xen 4.9.
>>>
>>> Finally, given the wide number of issues similar to this that I've found and
>>> fixed in the last release cycle, I think it would be good to take a stab at
>>> whether the interface between cpupools and the schedulers could not be
>>> simplified. :-O
>>
>> The first patch version for support of cpupools I sent used an own
>> scheduler instance for the free cpus. Keir merged this instance with
>> the one for Pool-0.
> 
> You mean he just made the change unilaterally without posting it to
> the list?  I just went back and skimmed through the old cpupools
> threads and didn't see this discussed.

I'm not sure I remember everything correctly. Unfortunately I don't
have a copy of all emails back from then as this work was done for
another employer.

Looking into the archives it seems the switch was done already in the
final version I sent to the list. I believe Keir did some testing based
on my first series and suggested some modifications which I happily
accepted.

> Having a "cpupool-remove" operation that doesn't actually remove the
> cpu from the pool is a bit mad...

Logically it does remove the cpu from Pool-0. It is just that there is
no other scheduler entity involved for doing so.


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] xen: credit2: never consider CPUs outside of our cpupool.
  2017-01-24 12:35       ` Juergen Gross
@ 2017-01-24 12:49         ` Dario Faggioli
  2017-01-24 16:37           ` George Dunlap
  0 siblings, 1 reply; 28+ messages in thread
From: Dario Faggioli @ 2017-01-24 12:49 UTC (permalink / raw)
  To: Juergen Gross, George Dunlap; +Cc: xen-devel, anshul.makkar, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 1286 bytes --]

On Tue, 2017-01-24 at 13:35 +0100, Juergen Gross wrote:
> On 23/01/17 15:40, George Dunlap wrote:
> > 
> > Having a "cpupool-remove" operation that doesn't actually remove
> > the
> > cpu from the pool is a bit mad...
> 
> Logically it does remove the cpu from Pool-0. It is just that there
> is
> no other scheduler entity involved for doing so.
> 
Yes, it's removed from the pool, *but* it basically remains part of the
pool's scheduler, that's the issue.

As you say, there's no another scheduler to which we can attach it...
So, as of now, I see two options:
1) create such (dummy) scheduler;
2) go all the way down toward deallocating the scheduler related data 
   of the cpu (and reallocate them back when re-added).

BTW, Anshul also hit this problem while also doing some work on
Credit2, and told me he'd be giving some thinking to it, and try to
figure some ideas out (as soon as he'd be free of a couple of other
burdens :-D).

Let's see what he'll come up with. :-)

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] xen: credit2: never consider CPUs outside of our cpupool.
  2017-01-24 12:49         ` Dario Faggioli
@ 2017-01-24 16:37           ` George Dunlap
  0 siblings, 0 replies; 28+ messages in thread
From: George Dunlap @ 2017-01-24 16:37 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Juergen Gross, xen-devel, Anshul Makkar, Jan Beulich


> On Jan 24, 2017, at 12:49 PM, Dario Faggioli <dario.faggioli@citrix.com> wrote:
> 
> On Tue, 2017-01-24 at 13:35 +0100, Juergen Gross wrote:
>> On 23/01/17 15:40, George Dunlap wrote:
>>>  
>>> Having a "cpupool-remove" operation that doesn't actually remove
>>> the
>>> cpu from the pool is a bit mad...
>> 
>> Logically it does remove the cpu from Pool-0. It is just that there
>> is
>> no other scheduler entity involved for doing so.
>> 
> Yes, it's removed from the pool, *but* it basically remains part of the
> pool's scheduler, that's the issue.

And in particular for credit2, remains in the runqueue cpu mask (which is what this patch is mostly about).

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] xen: credit2: never consider CPUs outside of our cpupool.
  2017-01-17 17:26 ` [PATCH 2/5] xen: credit2: never consider CPUs outside of our cpupool Dario Faggioli
  2017-01-19  8:08   ` Juergen Gross
  2017-01-23 15:20   ` George Dunlap
@ 2017-02-03  8:41   ` Jan Beulich
  2017-02-03 15:27     ` Dario Faggioli
  2 siblings, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2017-02-03  8:41 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel, Juergen Gross

>>> On 17.01.17 at 18:26, <dario.faggioli@citrix.com> wrote:
> In fact, relying on the mask of what pCPUs belong to
> which Credit2 runqueue is not enough. If we only do that,
> when Credit2 is the boot scheduler, we may ASSERT() or
> panic when moving a pCPU from Pool-0 to another cpupool.
> 
> This is because pCPUs outside of any pool are considered
> part of cpupool0. This puts us at risk of crash when those
> same pCPUs are added to another pool and something
> different than the idle domain is found to be running
> on them.
> 
> Note that, even if we prevent the above to happen (which
> is the purpose of this patch), this is still pretty bad,
> in fact, when we remove a pCPU from Pool-0:
> - in Credit1, as we do *not* update prv->ncpus and
>   prv->credit, which means we're considering the wrong
>   total credits when doing accounting;
> - in Credit2, the pCPU remains part of one runqueue,
>   and is hence at least considered during load balancing,
>   even if no vCPU should really run there.
> 
> In Credit1, this "only" causes skewed accounting and
> no crashes because there is a lot of `cpumask_and`ing
> going on with the cpumask of the domains' cpupool
> (which, BTW, comes at a price).
> 
> A quick and not to involved (and easily backportable)
> solution for Credit2, is to do exactly the same.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com 
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> ---
> This is a bugfix, and should be backported to 4.8.

While I did manage to backport "xen: credit2: use the correct scratch
cpumask" and "xen: credit2: fix shutdown/suspend when playing with
cpupools" also to 4.7, this one is even worse for me to deal with
(purely by its subject I'd assume it's applicable): Its first hunk applies
fine, but then everything breaks.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] xen: credit2: never consider CPUs outside of our cpupool.
  2017-02-03  8:41   ` Jan Beulich
@ 2017-02-03 15:27     ` Dario Faggioli
  2017-02-03 15:40       ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Dario Faggioli @ 2017-02-03 15:27 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, xen-devel, Juergen Gross


[-- Attachment #1.1: Type: text/plain, Size: 913 bytes --]

On Fri, 2017-02-03 at 01:41 -0700, Jan Beulich wrote:
> > > > On 17.01.17 at 18:26, <dario.faggioli@citrix.com> wrote:
> > ---
> > This is a bugfix, and should be backported to 4.8.
> 
> While I did manage to backport "xen: credit2: use the correct scratch
> cpumask" and "xen: credit2: fix shutdown/suspend when playing with
> cpupools" also to 4.7, this one is even worse for me to deal with
> (purely by its subject I'd assume it's applicable): Its first hunk
> applies
> fine, but then everything breaks.
> 
I see. Sorry for that.

Is it ok if I give it a try myself and, if I manage, send you the
backported patch?

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] xen: credit2: never consider CPUs outside of our cpupool.
  2017-02-03 15:27     ` Dario Faggioli
@ 2017-02-03 15:40       ` Jan Beulich
  2017-02-08 16:48         ` Dario Faggioli
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2017-02-03 15:40 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel, Juergen Gross

>>> On 03.02.17 at 16:27, <dario.faggioli@citrix.com> wrote:
> On Fri, 2017-02-03 at 01:41 -0700, Jan Beulich wrote:
>> > > > On 17.01.17 at 18:26, <dario.faggioli@citrix.com> wrote:
>> > ---
>> > This is a bugfix, and should be backported to 4.8.
>> 
>> While I did manage to backport "xen: credit2: use the correct scratch
>> cpumask" and "xen: credit2: fix shutdown/suspend when playing with
>> cpupools" also to 4.7, this one is even worse for me to deal with
>> (purely by its subject I'd assume it's applicable): Its first hunk
>> applies
>> fine, but then everything breaks.
>> 
> I see. Sorry for that.
> 
> Is it ok if I give it a try myself and, if I manage, send you the
> backported patch?

Yes, I should probably have phrased my earlier reply that way.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] xen: credit2: never consider CPUs outside of our cpupool.
  2017-02-03 15:40       ` Jan Beulich
@ 2017-02-08 16:48         ` Dario Faggioli
  2017-02-08 17:02           ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Dario Faggioli @ 2017-02-08 16:48 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, xen-devel, Juergen Gross


[-- Attachment #1.1: Type: text/plain, Size: 2046 bytes --]

On Fri, 2017-02-03 at 08:40 -0700, Jan Beulich wrote:
> > > > > > On 17.01.17 at 18:26, <dario.faggioli@citrix.com> wrote:
> > > While I did manage to backport "xen: credit2: use the correct
> > > scratch
> > > cpumask" and "xen: credit2: fix shutdown/suspend when playing
> > > with
> > > cpupools" also to 4.7, this one is even worse for me to deal with
> > > (purely by its subject I'd assume it's applicable): Its first
> > > hunk
> > > applies
> > > fine, but then everything breaks.
> > > 
> > I see. Sorry for that.
> > 
> > Is it ok if I give it a try myself and, if I manage, send you the
> > backported patch?
> 
> Yes, I should probably have phrased my earlier reply that way.
> 
Well, I only wanted to double check whether you were meaning "please
provide backport" or "let's give up". :-)

But now that I'm looking into this, I need to double check something
again. So, as far as I've understood, you want the following 3 patches
to be backported to both 4.8 and 4.7:

 xen: credit2: use the correct scratch cpumask
 xen: credit2: fix shutdown/suspend when playing with cpupools
 xen: credit2: never consider CPUs outside of our cpupool

And you're saying you already have done some of them, but I don't seem
to be able to find _any_ of them in any of the following branches:

  staging-4.8 http://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=refs/heads/staging-4.8
  stable-4.8 http://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=refs/heads/stable-4.8
  staging-4.7 http://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=refs/heads/staging-4.7
  stable-4.7 http://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=refs/heads/stable-4.7

Am I looking in the wrong places / misunderstood anything?

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] xen: credit2: never consider CPUs outside of our cpupool.
  2017-02-08 16:48         ` Dario Faggioli
@ 2017-02-08 17:02           ` Jan Beulich
  2017-02-08 18:55             ` Dario Faggioli
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2017-02-08 17:02 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel, Juergen Gross

>>> On 08.02.17 at 17:48, <dario.faggioli@citrix.com> wrote:
> On Fri, 2017-02-03 at 08:40 -0700, Jan Beulich wrote:
>> > > > > > On 17.01.17 at 18:26, <dario.faggioli@citrix.com> wrote:
>> > > While I did manage to backport "xen: credit2: use the correct
>> > > scratch
>> > > cpumask" and "xen: credit2: fix shutdown/suspend when playing
>> > > with
>> > > cpupools" also to 4.7, this one is even worse for me to deal with
>> > > (purely by its subject I'd assume it's applicable): Its first
>> > > hunk
>> > > applies
>> > > fine, but then everything breaks.
>> > > 
>> > I see. Sorry for that.
>> > 
>> > Is it ok if I give it a try myself and, if I manage, send you the
>> > backported patch?
>> 
>> Yes, I should probably have phrased my earlier reply that way.
>> 
> Well, I only wanted to double check whether you were meaning "please
> provide backport" or "let's give up". :-)
> 
> But now that I'm looking into this, I need to double check something
> again. So, as far as I've understood, you want the following 3 patches
> to be backported to both 4.8 and 4.7:
> 
>  xen: credit2: use the correct scratch cpumask
>  xen: credit2: fix shutdown/suspend when playing with cpupools
>  xen: credit2: never consider CPUs outside of our cpupool
> 
> And you're saying you already have done some of them, but I don't seem
> to be able to find _any_ of them in any of the following branches:
> 
>   staging-4.8 
> http://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=refs/heads/staging-4.8 
>   stable-4.8 
> http://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=refs/heads/stable-4.8 
>   staging-4.7 
> http://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=refs/heads/staging-4.7 
>   stable-4.7 
> http://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=refs/heads/stable-4.7 
> 
> Am I looking in the wrong places / misunderstood anything?

Well, me having done the backports didn't imply me committing them
right away; I wanted them to first undergo some testing. I hope to
get to that tomorrow. If you want to wait for what I have to appear
there, that's of course fine.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] xen: credit2: never consider CPUs outside of our cpupool.
  2017-02-08 17:02           ` Jan Beulich
@ 2017-02-08 18:55             ` Dario Faggioli
  2017-02-09  9:17               ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Dario Faggioli @ 2017-02-08 18:55 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, xen-devel, Juergen Gross


[-- Attachment #1.1: Type: text/plain, Size: 1305 bytes --]

On Wed, 2017-02-08 at 10:02 -0700, Jan Beulich wrote:
> > > > On 08.02.17 at 17:48, <dario.faggioli@citrix.com> wrote:
> > Am I looking in the wrong places / misunderstood anything?
> 
> Well, me having done the backports didn't imply me committing them
> right away; I wanted them to first undergo some testing. 
>
Ok, right, I see it now. Thanks for clarifying (again).

> I hope to
> get to that tomorrow. If you want to wait for what I have to appear
> there, that's of course fine.
> 
So, patches applies cleanly to 4.8, as you've probably seen already.

As per 4.7, I've prepared a branch for you:

 git://xenbits.xen.org/people/dariof/xen.git for-jan/staging-4.7/credit2-cpupool-fixes
 http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=shortlog;h=refs/heads/for-jan/staging-4.7/credit2-cpupool-fixes
 https://travis-ci.org/fdario/xen/builds/199709757

I've only quickly tested it so far, and I have to leave now. I'll play
with it a bit more tomorrow morning and let you know how it goes.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] xen: credit2: never consider CPUs outside of our cpupool.
  2017-02-08 18:55             ` Dario Faggioli
@ 2017-02-09  9:17               ` Jan Beulich
  2017-02-09  9:25                 ` Dario Faggioli
  2017-02-09 10:32                 ` Dario Faggioli
  0 siblings, 2 replies; 28+ messages in thread
From: Jan Beulich @ 2017-02-09  9:17 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel, Juergen Gross

>>> On 08.02.17 at 19:55, <dario.faggioli@citrix.com> wrote:
> As per 4.7, I've prepared a branch for you:

Thanks.

>  git://xenbits.xen.org/people/dariof/xen.git 
> for-jan/staging-4.7/credit2-cpupool-fixes
>  
> http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=shortlog;h=refs/head 
> s/for-jan/staging-4.7/credit2-cpupool-fixes
>  https://travis-ci.org/fdario/xen/builds/199709757 
> 
> I've only quickly tested it so far, and I have to leave now. I'll play
> with it a bit more tomorrow morning and let you know how it goes.

Well, looking at the topmost commit, you've clearly missed the
follow-up fix (ad5808d905), which I did fold into that backport
right away. I'm going to commit what I have later today, and
I'll pull in the one extra backport next time round.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] xen: credit2: never consider CPUs outside of our cpupool.
  2017-02-09  9:17               ` Jan Beulich
@ 2017-02-09  9:25                 ` Dario Faggioli
  2017-02-09 10:32                 ` Dario Faggioli
  1 sibling, 0 replies; 28+ messages in thread
From: Dario Faggioli @ 2017-02-09  9:25 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, xen-devel, Juergen Gross


[-- Attachment #1.1: Type: text/plain, Size: 904 bytes --]

On Thu, 2017-02-09 at 02:17 -0700, Jan Beulich wrote:
> > > > On 08.02.17 at 19:55, <dario.faggioli@citrix.com> wrote:
> > I've only quickly tested it so far, and I have to leave now. I'll
> > play
> > with it a bit more tomorrow morning and let you know how it goes.
> 
> Well, looking at the topmost commit, you've clearly missed the
> follow-up fix (ad5808d905), which I did fold into that backport
> right away. 
>
Yes, in fact, I've just added it now (will re-push to the branch after
testing).

> I'm going to commit what I have later today, and
> I'll pull in the one extra backport next time round.
> 
Ok, sure.

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] xen: credit2: never consider CPUs outside of our cpupool.
  2017-02-09  9:17               ` Jan Beulich
  2017-02-09  9:25                 ` Dario Faggioli
@ 2017-02-09 10:32                 ` Dario Faggioli
  1 sibling, 0 replies; 28+ messages in thread
From: Dario Faggioli @ 2017-02-09 10:32 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, xen-devel, Juergen Gross


[-- Attachment #1.1.1: Type: text/plain, Size: 589 bytes --]

On Thu, 2017-02-09 at 02:17 -0700, Jan Beulich wrote:
> > > > On 08.02.17 at 19:55, <dario.faggioli@citrix.com> wrote:
>  I'm going to commit what I have later today, and
> I'll pull in the one extra backport next time round.
> 
Ok, patch attached.

I've tested it on top of current tip of staging-4.7.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.1.2: xen-credit2-never-consider-cpus-outside-cpupool-4.7.patch --]
[-- Type: text/x-patch, Size: 7797 bytes --]

commit 09725f8af37415c30a1a53d4a34e67fabcba105d
Author: Dario Faggioli <dario.faggioli@citrix.com>
Date:   Wed Feb 8 19:01:53 2017 +0100

    xen: credit2: never consider CPUs outside of our cpupool.
    
    In fact, relying on the mask of what pCPUs belong to
    which Credit2 runqueue is not enough. If we only do that,
    when Credit2 is the boot scheduler, we may ASSERT() or
    panic when moving a pCPU from Pool-0 to another cpupool.
    
    This is because pCPUs outside of any pool are considered
    part of cpupool0. This puts us at risk of crash when those
    same pCPUs are added to another pool and something
    different than the idle domain is found to be running
    on them.
    
    Note that, even if we prevent the above to happen (which
    is the purpose of this patch), this is still pretty bad,
    in fact, when we remove a pCPU from Pool-0:
    - in Credit1, as we do *not* update prv->ncpus and
      prv->credit, which means we're considering the wrong
      total credits when doing accounting;
    - in Credit2, the pCPU remains part of one runqueue,
      and is hence at least considered during load balancing,
      even if no vCPU should really run there.
    
    In Credit1, this "only" causes skewed accounting and
    no crashes because there is a lot of `cpumask_and`ing
    going on with the cpumask of the domains' cpupool
    (which, BTW, comes at a price).
    
    A quick and not to involved (and easily backportable)
    solution for Credit2, is to do exactly the same.
    
    Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com
    ---
    Cc: George Dunlap <george.dunlap@eu.citrix.com>
    Cc: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 25b4c91..35dad15 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -331,19 +331,22 @@ static int csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc);
  */
 static int get_fallback_cpu(struct csched2_vcpu *svc)
 {
-    int fallback_cpu, cpu = svc->vcpu->processor;
+    struct vcpu *v = svc->vcpu;
+    int cpu = v->processor;
 
-    if ( likely(cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity)) )
-        return cpu;
+    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                cpupool_domain_cpumask(v->domain));
 
-    cpumask_and(cpumask_scratch_cpu(cpu), svc->vcpu->cpu_hard_affinity,
-                &svc->rqd->active);
-    fallback_cpu = cpumask_first(cpumask_scratch_cpu(cpu));
-    if ( likely(fallback_cpu < nr_cpu_ids) )
-        return fallback_cpu;
+    if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
+        return cpu;
 
-    cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
-                cpupool_domain_cpumask(svc->vcpu->domain));
+    if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
+                                   &svc->rqd->active)) )
+    {
+        cpumask_and(cpumask_scratch_cpu(cpu), &svc->rqd->active,
+                    cpumask_scratch_cpu(cpu));
+        return cpumask_first(cpumask_scratch_cpu(cpu));
+    }
 
     ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)));
 
@@ -582,9 +585,12 @@ runq_tickle(const struct scheduler *ops, unsigned int cpu, struct csched2_vcpu *
         goto tickle;
     }
     
+    cpumask_and(cpumask_scratch_cpu(cpu), new->vcpu->cpu_hard_affinity,
+                cpupool_domain_cpumask(new->vcpu->domain));
+
     /* Get a mask of idle, but not tickled, that new is allowed to run on. */
     cpumask_andnot(&mask, &rqd->idle, &rqd->tickled);
-    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
+    cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
     
     /* If it's not empty, choose one */
     i = cpumask_cycle(cpu, &mask);
@@ -599,7 +605,7 @@ runq_tickle(const struct scheduler *ops, unsigned int cpu, struct csched2_vcpu *
      * that new is allowed to run on. */
     cpumask_andnot(&mask, &rqd->active, &rqd->idle);
     cpumask_andnot(&mask, &mask, &rqd->tickled);
-    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
+    cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
 
     for_each_cpu(i, &mask)
     {
@@ -1160,6 +1166,9 @@ choose_cpu(const struct scheduler *ops, struct vcpu *vc)
         return get_fallback_cpu(svc);
     }
 
+    cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
+                cpupool_domain_cpumask(vc->domain));
+
     /* First check to see if we're here because someone else suggested a place
      * for us to move. */
     if ( test_and_clear_bit(__CSFLAG_runq_migrate_request, &svc->flags) )
@@ -1169,16 +1178,14 @@ choose_cpu(const struct scheduler *ops, struct vcpu *vc)
             printk("%s: Runqueue migrate aborted because target runqueue disappeared!\n",
                    __func__);
         }
-        else
+        else if ( cpumask_intersects(cpumask_scratch_cpu(cpu),
+                                     &svc->migrate_rqd->active) )
         {
-            cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
+            cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                         &svc->migrate_rqd->active);
             new_cpu = cpumask_any(cpumask_scratch_cpu(cpu));
-            if ( new_cpu < nr_cpu_ids )
-            {
-                d2printk("%pv +\n", svc->vcpu);
-                goto out_up;
-            }
+            d2printk("%pv +\n", svc->vcpu);
+            goto out_up;
         }
         /* Fall-through to normal cpu pick */
     }
@@ -1208,12 +1215,12 @@ choose_cpu(const struct scheduler *ops, struct vcpu *vc)
          */
         if ( rqd == svc->rqd )
         {
-            if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
+            if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
                 rqd_avgload = rqd->b_avgload - svc->avgload;
         }
         else if ( spin_trylock(&rqd->lock) )
         {
-            if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
+            if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
                 rqd_avgload = rqd->b_avgload;
 
             spin_unlock(&rqd->lock);
@@ -1231,7 +1238,7 @@ choose_cpu(const struct scheduler *ops, struct vcpu *vc)
         new_cpu = get_fallback_cpu(svc);
     else
     {
-        cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
+        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                     &prv->rqd[min_rqi].active);
         new_cpu = cpumask_any(cpumask_scratch_cpu(cpu));
         BUG_ON(new_cpu >= nr_cpu_ids);
@@ -1318,6 +1325,8 @@ static void migrate(const struct scheduler *ops,
         __runq_deassign(svc);
 
         cpumask_and(cpumask_scratch_cpu(cpu), svc->vcpu->cpu_hard_affinity,
+                    cpupool_domain_cpumask(svc->vcpu->domain));
+        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                     &trqd->active);
         svc->vcpu->processor = cpumask_any(cpumask_scratch_cpu(cpu));
         BUG_ON(svc->vcpu->processor >= nr_cpu_ids);
@@ -1343,8 +1352,14 @@ static void migrate(const struct scheduler *ops,
 static bool_t vcpu_is_migrateable(struct csched2_vcpu *svc,
                                   struct csched2_runqueue_data *rqd)
 {
+    struct vcpu *v = svc->vcpu;
+    int cpu = svc->vcpu->processor;
+
+    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                cpupool_domain_cpumask(v->domain));
+
     return !(svc->flags & CSFLAG_runq_migrate_request) &&
-           cpumask_intersects(svc->vcpu->cpu_hard_affinity, &rqd->active);
+           cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active);
 }
 
 static void balance_load(const struct scheduler *ops, int cpu, s_time_t now)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2017-02-09 10:33 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-17 17:26 [PATCH 0/5] xen: sched: scheduling (mostly, Credit2) and cpupool fixes and improvements Dario Faggioli
2017-01-17 17:26 ` [PATCH 1/5] xen: credit2: use the correct scratch cpumask Dario Faggioli
2017-01-19 12:22   ` George Dunlap
2017-01-17 17:26 ` [PATCH 2/5] xen: credit2: never consider CPUs outside of our cpupool Dario Faggioli
2017-01-19  8:08   ` Juergen Gross
2017-01-19  8:22     ` Dario Faggioli
2017-01-23 14:40     ` George Dunlap
2017-01-24 12:35       ` Juergen Gross
2017-01-24 12:49         ` Dario Faggioli
2017-01-24 16:37           ` George Dunlap
2017-01-23 15:20   ` George Dunlap
2017-02-03  8:41   ` Jan Beulich
2017-02-03 15:27     ` Dario Faggioli
2017-02-03 15:40       ` Jan Beulich
2017-02-08 16:48         ` Dario Faggioli
2017-02-08 17:02           ` Jan Beulich
2017-02-08 18:55             ` Dario Faggioli
2017-02-09  9:17               ` Jan Beulich
2017-02-09  9:25                 ` Dario Faggioli
2017-02-09 10:32                 ` Dario Faggioli
2017-01-17 17:26 ` [PATCH 3/5] xen: credit2: fix shutdown/suspend when playing with cpupools Dario Faggioli
2017-01-23 15:42   ` George Dunlap
2017-01-17 17:27 ` [PATCH 4/5] xen: sched: impove use of cpumask scratch space in Credit1 Dario Faggioli
2017-01-18  9:45   ` Jan Beulich
2017-01-18  9:54     ` Dario Faggioli
2017-01-23 15:47   ` George Dunlap
2017-01-17 17:27 ` [PATCH 5/5] xen: sched: simplify ACPI S3 resume path Dario Faggioli
2017-01-23 15:52   ` George Dunlap

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.