[PATCH 0/4] xen: sched / cpupool: fixes and improvements, mostly for when suspend/resume is involved

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4] xen: sched / cpupool: fixes and improvements, mostly for when suspend/resume is involved
@ 2015-06-25 12:15 Dario Faggioli
  2015-06-25 12:15 ` [PATCH 1/4] xen: sched: avoid dumping duplicate information Dario Faggioli
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Dario Faggioli @ 2015-06-25 12:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Sisu Xi, George Dunlap, Andrew Cooper,
	Robert VanVossen, Josh Whitehead, Meng Xu, Jan Beulich,
	Ben Guthro

This is mostly about fixing bugs showing up during suspend/resume, with "non
default" configurations such as, pCPUs free from any cpupool, more than one
cpupool in the system, etc.

I tried a few different appoaches, for dealing with these cases. For instance,
I tried creating an 'idle cpupool', and then putting the free pCPUs there,
instead than sort-of parking them in cpupool0 (although in a special
condition), like we're doing now, but that introduces other issues.  I think
this series is, the least invasive, and yet correct, way of dealing with the
situation.

In some more detail:
 * patch 1 is just refactoring/beautifying dump output;
 * patch 2 is the fix for a bug showing up during resume, when two or more
   cpupools exist;
 * patch 3 fixes a bug (in the suspend/resume path again) and also improves
   Credit1 behavior, i.e., stops it from considering pCPUs that are outside
   of any pool as potential candidates where to execute vCPUs;
 * patch 4 is refactoring again, with the intent of making what made patch
   3 necessary less likely to happen! :-)

Thanks and Regards,
Dario
---
Dario Faggioli (4):
      xen: sched: avoid dumping duplicate information
      xen: x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown
      xen: credit1: properly deal with pCPUs not in any cpupool
      xen: sched: get rid of cpupool_scheduler_cpumask()

 xen/arch/x86/smpboot.c      |    1 -
 xen/common/cpupool.c        |    8 +++++---
 xen/common/domain.c         |    5 +++--
 xen/common/domctl.c         |    4 ++--
 xen/common/sched_arinc653.c |    2 +-
 xen/common/sched_credit.c   |   27 ++++++++++++++++++---------
 xen/common/sched_rt.c       |   12 ++++++------
 xen/common/sched_sedf.c     |    5 +++--
 xen/common/schedule.c       |   20 ++++++++++++++------
 xen/include/xen/sched-if.h  |   12 ++++++++++--
 10 files changed, 62 insertions(+), 34 deletions(-)

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/4] xen: sched: avoid dumping duplicate information
  2015-06-25 12:15 [PATCH 0/4] xen: sched / cpupool: fixes and improvements, mostly for when suspend/resume is involved Dario Faggioli
@ 2015-06-25 12:15 ` Dario Faggioli
  2015-06-26 13:43   ` Juergen Gross
  2015-07-02 11:18   ` George Dunlap
  2015-06-25 12:15 ` [PATCH 2/4] xen: x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown Dario Faggioli
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 22+ messages in thread
From: Dario Faggioli @ 2015-06-25 12:15 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap

When dumping scheduling information (debug key 'r'), what
we print as 'Idle cpupool' is pretty much the same of what
we print immediately after as 'Cpupool0'. In fact, if there
are no pCPUs outside of any cpupools, it is exactly the
same.

If there are free pCPUs, there is some valuable information,
but still a lot of duplication:

 (XEN) Online Cpus: 0-15
 (XEN) Free Cpus: 8
 (XEN) Idle cpupool:
 (XEN) Scheduler: SMP Credit Scheduler (credit)
 (XEN) info:
 (XEN)   ncpus              = 13
 (XEN)   master             = 0
 (XEN)   credit             = 3900
 (XEN)   credit balance     = 45
 (XEN)   weight             = 1280
 (XEN)   runq_sort          = 11820
 (XEN)   default-weight     = 256
 (XEN)   tslice             = 30ms
 (XEN)   ratelimit          = 1000us
 (XEN)   credits per msec   = 10
 (XEN)   ticks per tslice   = 3
 (XEN)   migration delay    = 0us
 (XEN) idlers: 00000000,00006d29
 (XEN) active vcpus:
 (XEN)     1: [1.7] pri=-1 flags=0 cpu=15 credit=-116 [w=256,cap=0] (84+300) {a/i=22/21 m=18+5 (k=0)}
 (XEN)     2: [1.3] pri=0 flags=0 cpu=1 credit=-113 [w=256,cap=0] (87+300) {a/i=37/36 m=11+544 (k=0)}
 (XEN)     3: [0.15] pri=-1 flags=0 cpu=4 credit=95 [w=256,cap=0] (210+300) {a/i=127/126 m=108+9 (k=0)}
 (XEN)     4: [0.10] pri=-2 flags=0 cpu=12 credit=-287 [w=256,cap=0] (-84+300) {a/i=163/162 m=36+568 (k=0)}
 (XEN)     5: [0.7] pri=-2 flags=0 cpu=2 credit=-242 [w=256,cap=0] (-42+300) {a/i=129/128 m=16+50 (k=0)}
 (XEN) CPU[08]  sort=5791, sibling=00000000,00000300, core=00000000,0000ff00
 (XEN)   run: [32767.8] pri=-64 flags=0 cpu=8
 (XEN) Cpupool 0:
 (XEN) Cpus: 0-5,10-15
 (XEN) Scheduler: SMP Credit Scheduler (credit)
 (XEN) info:
 (XEN)   ncpus              = 13
 (XEN)   master             = 0
 (XEN)   credit             = 3900
 (XEN)   credit balance     = 45
 (XEN)   weight             = 1280
 (XEN)   runq_sort          = 11820
 (XEN)   default-weight     = 256
 (XEN)   tslice             = 30ms
 (XEN)   ratelimit          = 1000us
 (XEN)   credits per msec   = 10
 (XEN)   ticks per tslice   = 3
 (XEN)   migration delay    = 0us
 (XEN) idlers: 00000000,00006d29
 (XEN) active vcpus:
 (XEN)     1: [1.7] pri=-1 flags=0 cpu=15 credit=-116 [w=256,cap=0] (84+300) {a/i=22/21 m=18+5 (k=0)}
 (XEN)     2: [1.3] pri=0 flags=0 cpu=1 credit=-113 [w=256,cap=0] (87+300) {a/i=37/36 m=11+544 (k=0)}
 (XEN)     3: [0.15] pri=-1 flags=0 cpu=4 credit=95 [w=256,cap=0] (210+300) {a/i=127/126 m=108+9 (k=0)}
 (XEN)     4: [0.10] pri=-2 flags=0 cpu=12 credit=-287 [w=256,cap=0] (-84+300) {a/i=163/162 m=36+568 (k=0)}
 (XEN)     5: [0.7] pri=-2 flags=0 cpu=2 credit=-242 [w=256,cap=0] (-42+300) {a/i=129/128 m=16+50 (k=0)}
 (XEN) CPU[00]  sort=11801, sibling=00000000,00000003, core=00000000,000000ff
 (XEN)   run: [32767.0] pri=-64 flags=0 cpu=0
 ... ... ...
 (XEN) CPU[15]  sort=11820, sibling=00000000,0000c000, core=00000000,0000ff00
 (XEN)   run: [1.7] pri=-1 flags=0 cpu=15 credit=-116 [w=256,cap=0] (84+300) {a/i=22/21 m=18+5 (k=0)}
 (XEN)     1: [32767.15] pri=-64 flags=0 cpu=15
 (XEN) Cpupool 1:
 (XEN) Cpus: 6-7,9
 (XEN) Scheduler: SMP RTDS Scheduler (rtds)
 (XEN) CPU[06]
 (XEN) CPU[07]
 (XEN) CPU[09]

With this change, we get rid of the redundancy, and retain
only the information about the free pCPUs.

(While there, turn a loop index variable from `int' to
`unsigned int' in schedule_dump().)

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: Juergen Gross <jgross@suse.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
---
 xen/common/cpupool.c  |    6 +++---
 xen/common/schedule.c |   18 +++++++++++++-----
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index 563864d..5471f93 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -728,10 +728,10 @@ void dump_runq(unsigned char key)
 
     print_cpumap("Online Cpus", &cpu_online_map);
     if ( !cpumask_empty(&cpupool_free_cpus) )
+    {
         print_cpumap("Free Cpus", &cpupool_free_cpus);
-
-    printk("Idle cpupool:\n");
-    schedule_dump(NULL);
+        schedule_dump(NULL);
+    }
 
     for_each_cpupool(c)
     {
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index ecf1545..4ffcd98 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1473,16 +1473,24 @@ void scheduler_free(struct scheduler *sched)
 
 void schedule_dump(struct cpupool *c)
 {
-    int               i;
+    unsigned int      i;
     struct scheduler *sched;
     cpumask_t        *cpus;
 
     /* Locking, if necessary, must be handled withing each scheduler */
 
-    sched = (c == NULL) ? &ops : c->sched;
-    cpus = cpupool_scheduler_cpumask(c);
-    printk("Scheduler: %s (%s)\n", sched->name, sched->opt_name);
-    SCHED_OP(sched, dump_settings);
+    if ( c != NULL )
+    {
+        sched = c->sched;
+        cpus = c->cpu_valid;
+        printk("Scheduler: %s (%s)\n", sched->name, sched->opt_name);
+        SCHED_OP(sched, dump_settings);
+    }
+    else
+    {
+        sched = &ops;
+        cpus = &cpupool_free_cpus;
+    }
 
     for_each_cpu (i, cpus)
     {

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/4] xen: x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown
  2015-06-25 12:15 [PATCH 0/4] xen: sched / cpupool: fixes and improvements, mostly for when suspend/resume is involved Dario Faggioli
  2015-06-25 12:15 ` [PATCH 1/4] xen: sched: avoid dumping duplicate information Dario Faggioli
@ 2015-06-25 12:15 ` Dario Faggioli
  2015-06-25 14:20   ` Andrew Cooper
  2015-06-26 13:54   ` Juergen Gross
  2015-06-25 12:15 ` [PATCH 3/4] xen: credit1: properly deal with pCPUs not in any cpupool Dario Faggioli
  2015-06-25 12:15 ` [PATCH 4/4] xen: sched: get rid of cpupool_scheduler_cpumask() Dario Faggioli
  3 siblings, 2 replies; 22+ messages in thread
From: Dario Faggioli @ 2015-06-25 12:15 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Andrew Cooper, Jan Beulich

In fact, if a pCPU belonging to some other pool than
cpupool0 goes down, we want to clear the relevant bit
from its actual pool, rather than always from cpupool0.

Before this commit, all the pCPUs in the non-default
pool(s) will be considered immediately valid, during
system resume, even the one that have not been brought
up yet. As a result, the (Credit1) scheduler will attempt
to run its load balancing logic on them, causing the
following Oops:

# xl cpupool-cpu-remove Pool-0 8-15
# xl cpupool-create name=\"Pool-1\"
# xl cpupool-cpu-add Pool-1 8-15
--> suspend
--> resume
(XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    8
(XEN) RIP:    e008:[<ffff82d080123078>] csched_schedule+0x4be/0xb97
(XEN) RFLAGS: 0000000000010087   CONTEXT: hypervisor
(XEN) rax: 80007d2f7fccb780   rbx: 0000000000000009   rcx: 0000000000000000
(XEN) rdx: ffff82d08031ed40   rsi: ffff82d080334980   rdi: 0000000000000000
(XEN) rbp: ffff83010000fe20   rsp: ffff83010000fd40   r8:  0000000000000004
(XEN) r9:  0000ffff0000ffff   r10: 00ff00ff00ff00ff   r11: 0f0f0f0f0f0f0f0f
(XEN) r12: ffff8303191ea870   r13: ffff8303226aadf0   r14: 0000000000000009
(XEN) r15: 0000000000000008   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 00000000dba9d000   cr2: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) ... ... ...
(XEN) Xen call trace:
(XEN)    [<ffff82d080123078>] csched_schedule+0x4be/0xb97
(XEN)    [<ffff82d08012c732>] schedule+0x12a/0x63c
(XEN)    [<ffff82d08012f8c8>] __do_softirq+0x82/0x8d
(XEN)    [<ffff82d08012f920>] do_softirq+0x13/0x15
(XEN)    [<ffff82d080164791>] idle_loop+0x5b/0x6b
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 8:
(XEN) GENERAL PROTECTION FAULT
(XEN) [error_code=0000]
(XEN) ****************************************

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: Juergen Gross <jgross@suse.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/smpboot.c |    1 -
 xen/common/cpupool.c   |    2 ++
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 2289284..a4ec396 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -887,7 +887,6 @@ void __cpu_disable(void)
     remove_siblinginfo(cpu);
 
     /* It's now safe to remove this processor from the online map */
-    cpumask_clear_cpu(cpu, cpupool0->cpu_valid);
     cpumask_clear_cpu(cpu, &cpu_online_map);
     fixup_irqs();
 
diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index 5471f93..b48ae17 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -530,6 +530,7 @@ static int cpupool_cpu_remove(unsigned int cpu)
             if ( cpumask_test_cpu(cpu, (*c)->cpu_valid ) )
             {
                 cpumask_set_cpu(cpu, (*c)->cpu_suspended);
+                cpumask_clear_cpu(cpu, (*c)->cpu_valid);
                 break;
             }
         }
@@ -552,6 +553,7 @@ static int cpupool_cpu_remove(unsigned int cpu)
          * If we are not suspending, we are hot-unplugging cpu, and that is
          * allowed only for CPUs in pool0.
          */
+        cpumask_clear_cpu(cpu, cpupool0->cpu_valid);
         ret = 0;
     }

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/4] xen: credit1: properly deal with pCPUs not in any cpupool
  2015-06-25 12:15 [PATCH 0/4] xen: sched / cpupool: fixes and improvements, mostly for when suspend/resume is involved Dario Faggioli
  2015-06-25 12:15 ` [PATCH 1/4] xen: sched: avoid dumping duplicate information Dario Faggioli
  2015-06-25 12:15 ` [PATCH 2/4] xen: x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown Dario Faggioli
@ 2015-06-25 12:15 ` Dario Faggioli
  2015-06-26 14:05   ` Juergen Gross
  2015-07-02 15:24   ` George Dunlap
  2015-06-25 12:15 ` [PATCH 4/4] xen: sched: get rid of cpupool_scheduler_cpumask() Dario Faggioli
  3 siblings, 2 replies; 22+ messages in thread
From: Dario Faggioli @ 2015-06-25 12:15 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Juergen Gross

Ideally, the pCPUs that are 'free', i.e., not assigned
to any cpupool, should not be considred by the scheduler
for load balancing or anything. In Credit1, we fail at
this, because of how we use cpupool_scheduler_cpumask().
In fact, for a free pCPU, cpupool_scheduler_cpumask()
returns a pointer to cpupool_free_cpus, and hence, near
the top of csched_load_balance():

 if ( unlikely(!cpumask_test_cpu(cpu, online)) )
     goto out;

is false (the pCPU _is_ free!), and we therefore do not
jump to the end right away, as we should. This, causes
the following splat when resuming from ACPI S3 with
pCPUs not assigned to any pool:

(XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
(XEN) ... ... ...
(XEN) Xen call trace:
(XEN)    [<ffff82d080122eaa>] csched_load_balance+0x213/0x794
(XEN)    [<ffff82d08012374c>] csched_schedule+0x321/0x452
(XEN)    [<ffff82d08012c85e>] schedule+0x12a/0x63c
(XEN)    [<ffff82d08012fa09>] __do_softirq+0x82/0x8d
(XEN)    [<ffff82d08012fa61>] do_softirq+0x13/0x15
(XEN)    [<ffff82d080164780>] idle_loop+0x5b/0x6b
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 8:
(XEN) GENERAL PROTECTION FAULT
(XEN) [error_code=0000]
(XEN) ****************************************

The cure is:
 * use cpupool_online_cpumask(), as a better guard to the
   case when the cpu is being offlined;
 * explicitly check whether the cpu is free.

SEDF is in a similar situation, so fix it too.

Still in Credit1, we must make sure that free (or offline)
CPUs are not considered "ticklable". Not doing so would impair
the load balancing algorithm, making the scheduler think that
it is possible to 'ask' the pCPU to pick up some work, while
in reallity, that will never happen! Evidence of such behavior
is shown in this trace:

 Name               CPU list
 Pool-0             0,1,2,3,4,5,6,7,8,9,10,11,12,13,14

    0.112998198 | ||.|| -|x||-|- d0v0 runstate_change d0v4 offline->runnable
 ]  0.112998198 | ||.|| -|x||-|- d0v0   22006(2:2:6) 1 [ f ]
 ]  0.112999612 | ||.|| -|x||-|- d0v0   28004(2:8:4) 2 [ 0 4 ]
    0.113003387 | ||.|| -||||-|x d32767v15 runstate_continue d32767v15 running->running

where "22006(2:2:6) 1 [ f ]" means that pCPU 15, which is
free from any pool, is tickled.

The cure, in this case, is to filter out the free pCPUs,
within __runq_tickle().

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_credit.c |   23 ++++++++++++++++-------
 xen/common/sched_sedf.c   |    3 ++-
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 953ecb0..a1945ac 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -366,12 +366,17 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
 {
     struct csched_vcpu * const cur = CSCHED_VCPU(curr_on_cpu(cpu));
     struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
-    cpumask_t mask, idle_mask;
+    cpumask_t mask, idle_mask, *online;
     int balance_step, idlers_empty;
 
     ASSERT(cur);
     cpumask_clear(&mask);
-    idlers_empty = cpumask_empty(prv->idlers);
+
+    /* cpu is vc->processor, so it must be in a cpupool. */
+    ASSERT(per_cpu(cpupool, cpu) != NULL);
+    online = cpupool_online_cpumask(per_cpu(cpupool, cpu));
+    cpumask_and(&idle_mask, prv->idlers, online);
+    idlers_empty = cpumask_empty(&idle_mask);
 
 
     /*
@@ -408,8 +413,8 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
             /* Are there idlers suitable for new (for this balance step)? */
             csched_balance_cpumask(new->vcpu, balance_step,
                                    csched_balance_mask);
-            cpumask_and(&idle_mask, prv->idlers, csched_balance_mask);
-            new_idlers_empty = cpumask_empty(&idle_mask);
+            cpumask_and(csched_balance_mask, csched_balance_mask, &idle_mask);
+            new_idlers_empty = cpumask_empty(csched_balance_mask);
 
             /*
              * Let's not be too harsh! If there aren't idlers suitable
@@ -1510,6 +1515,7 @@ static struct csched_vcpu *
 csched_load_balance(struct csched_private *prv, int cpu,
     struct csched_vcpu *snext, bool_t *stolen)
 {
+    struct cpupool *c = per_cpu(cpupool, cpu);
     struct csched_vcpu *speer;
     cpumask_t workers;
     cpumask_t *online;
@@ -1517,10 +1523,13 @@ csched_load_balance(struct csched_private *prv, int cpu,
     int node = cpu_to_node(cpu);
 
     BUG_ON( cpu != snext->vcpu->processor );
-    online = cpupool_scheduler_cpumask(per_cpu(cpupool, cpu));
+    online = cpupool_online_cpumask(c);
 
-    /* If this CPU is going offline we shouldn't steal work. */
-    if ( unlikely(!cpumask_test_cpu(cpu, online)) )
+    /*
+     * If this CPU is going offline, or is not (yet) part of any cpupool
+     * (as it happens, e.g., during cpu bringup), we shouldn't steal work.
+     */
+    if ( unlikely(!cpumask_test_cpu(cpu, online) || c == NULL) )
         goto out;
 
     if ( snext->pri == CSCHED_PRI_IDLE )
diff --git a/xen/common/sched_sedf.c b/xen/common/sched_sedf.c
index a1a4cb7..cad5b39 100644
--- a/xen/common/sched_sedf.c
+++ b/xen/common/sched_sedf.c
@@ -790,7 +790,8 @@ static struct task_slice sedf_do_schedule(
     if ( tasklet_work_scheduled ||
          (list_empty(runq) && list_empty(waitq)) ||
          unlikely(!cpumask_test_cpu(cpu,
-                   cpupool_scheduler_cpumask(per_cpu(cpupool, cpu)))) )
+                   cpupool_online_cpumask(per_cpu(cpupool, cpu))) ||
+                  per_cpu(cpupool, cpu) == NULL) )
     {
         ret.task = IDLETASK(cpu);
         ret.time = SECONDS(1);

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/4] xen: sched: get rid of cpupool_scheduler_cpumask()
  2015-06-25 12:15 [PATCH 0/4] xen: sched / cpupool: fixes and improvements, mostly for when suspend/resume is involved Dario Faggioli
                   ` (2 preceding siblings ...)
  2015-06-25 12:15 ` [PATCH 3/4] xen: credit1: properly deal with pCPUs not in any cpupool Dario Faggioli
@ 2015-06-25 12:15 ` Dario Faggioli
  2015-06-26 14:08   ` Juergen Gross
                     ` (2 more replies)
  3 siblings, 3 replies; 22+ messages in thread
From: Dario Faggioli @ 2015-06-25 12:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Sisu Xi, George Dunlap, Robert VanVossen,
	Josh Whitehead, Meng Xu

and of (almost every) direct use of cpupool_online_cpumask().

In fact, what we really want for the most of the times,
is the set of valid pCPUs of the cpupool a certain domain
is part of. Furthermore, in case it's called with a NULL
pool as argument, cpupool_scheduler_cpumask() does more
harm than good, by returning the bitmask of free pCPUs!

This commit, therefore:
 * gets rid of cpupool_scheduler_cpumask(), in favour of
   cpupool_domain_cpumask(), which makes it more evident
   what we are after, and accommodates some sanity checking;
 * replaces some of the calls to cpupool_online_cpumask()
   with calls to the new functions too.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Robert VanVossen <robert.vanvossen@dornerworks.com>
Cc: Josh Whitehead <josh.whitehead@dornerworks.com>
Cc: Meng Xu <mengxu@cis.upenn.edu>
Cc: Sisu Xi <xisisu@gmail.com>
---
 xen/common/domain.c         |    5 +++--
 xen/common/domctl.c         |    4 ++--
 xen/common/sched_arinc653.c |    2 +-
 xen/common/sched_credit.c   |    6 +++---
 xen/common/sched_rt.c       |   12 ++++++------
 xen/common/sched_sedf.c     |    2 +-
 xen/common/schedule.c       |    2 +-
 xen/include/xen/sched-if.h  |   12 ++++++++++--
 8 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 3bc52e6..c20accb 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -184,7 +184,8 @@ struct vcpu *alloc_vcpu(
     /* Must be called after making new vcpu visible to for_each_vcpu(). */
     vcpu_check_shutdown(v);
 
-    domain_update_node_affinity(d);
+    if ( !is_idle_domain(d) )
+        domain_update_node_affinity(d);
 
     return v;
 }
@@ -437,7 +438,7 @@ void domain_update_node_affinity(struct domain *d)
         return;
     }
 
-    online = cpupool_online_cpumask(d->cpupool);
+    online = cpupool_domain_cpumask(d);
 
     spin_lock(&d->node_affinity_lock);
 
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 2a2d203..a399aa6 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -664,7 +664,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
             goto maxvcpu_out;
 
         ret = -ENOMEM;
-        online = cpupool_online_cpumask(d->cpupool);
+        online = cpupool_domain_cpumask(d);
         if ( max > d->max_vcpus )
         {
             struct vcpu **vcpus;
@@ -748,7 +748,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
         if ( op->cmd == XEN_DOMCTL_setvcpuaffinity )
         {
             cpumask_var_t new_affinity, old_affinity;
-            cpumask_t *online = cpupool_online_cpumask(v->domain->cpupool);;
+            cpumask_t *online = cpupool_domain_cpumask(v->domain);;
 
             /*
              * We want to be able to restore hard affinity if we are trying
diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index cff5da9..dbe02ed 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -667,7 +667,7 @@ a653sched_pick_cpu(const struct scheduler *ops, struct vcpu *vc)
      * If present, prefer vc's current processor, else
      * just find the first valid vcpu .
      */
-    online = cpupool_scheduler_cpumask(vc->domain->cpupool);
+    online = cpupool_domain_cpumask(vc->domain);
 
     cpu = cpumask_first(online);
 
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index a1945ac..8c36635 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -309,7 +309,7 @@ __runq_remove(struct csched_vcpu *svc)
 static inline int __vcpu_has_soft_affinity(const struct vcpu *vc,
                                            const cpumask_t *mask)
 {
-    return !cpumask_subset(cpupool_online_cpumask(vc->domain->cpupool),
+    return !cpumask_subset(cpupool_domain_cpumask(vc->domain),
                            vc->cpu_soft_affinity) &&
            !cpumask_subset(vc->cpu_hard_affinity, vc->cpu_soft_affinity) &&
            cpumask_intersects(vc->cpu_soft_affinity, mask);
@@ -374,7 +374,7 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
 
     /* cpu is vc->processor, so it must be in a cpupool. */
     ASSERT(per_cpu(cpupool, cpu) != NULL);
-    online = cpupool_online_cpumask(per_cpu(cpupool, cpu));
+    online = cpupool_domain_cpumask(new->sdom->dom);
     cpumask_and(&idle_mask, prv->idlers, online);
     idlers_empty = cpumask_empty(&idle_mask);
 
@@ -641,7 +641,7 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
     int balance_step;
 
     /* Store in cpus the mask of online cpus on which the domain can run */
-    online = cpupool_scheduler_cpumask(vc->domain->cpupool);
+    online = cpupool_domain_cpumask(vc->domain);
     cpumask_and(&cpus, vc->cpu_hard_affinity, online);
 
     for_each_csched_balance_step( balance_step )
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 4372486..08611c8 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -256,7 +256,7 @@ rt_dump_vcpu(const struct scheduler *ops, const struct rt_vcpu *svc)
      */
     mask = _cpumask_scratch[svc->vcpu->processor];
 
-    cpupool_mask = cpupool_scheduler_cpumask(svc->vcpu->domain->cpupool);
+    cpupool_mask = cpupool_domain_cpumask(svc->vcpu->domain);
     cpumask_and(mask, cpupool_mask, svc->vcpu->cpu_hard_affinity);
     cpulist_scnprintf(keyhandler_scratch, sizeof(keyhandler_scratch), mask);
     printk("[%5d.%-2u] cpu %u, (%"PRI_stime", %"PRI_stime"),"
@@ -673,7 +673,7 @@ rt_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
     cpumask_t *online;
     int cpu;
 
-    online = cpupool_scheduler_cpumask(vc->domain->cpupool);
+    online = cpupool_domain_cpumask(vc->domain);
     cpumask_and(&cpus, online, vc->cpu_hard_affinity);
 
     cpu = cpumask_test_cpu(vc->processor, &cpus)
@@ -753,7 +753,7 @@ __runq_pick(const struct scheduler *ops, const cpumask_t *mask)
         iter_svc = __q_elem(iter);
 
         /* mask cpu_hard_affinity & cpupool & mask */
-        online = cpupool_scheduler_cpumask(iter_svc->vcpu->domain->cpupool);
+        online = cpupool_domain_cpumask(iter_svc->vcpu->domain);
         cpumask_and(&cpu_common, online, iter_svc->vcpu->cpu_hard_affinity);
         cpumask_and(&cpu_common, mask, &cpu_common);
         if ( cpumask_empty(&cpu_common) )
@@ -965,7 +965,7 @@ runq_tickle(const struct scheduler *ops, struct rt_vcpu *new)
     if ( new == NULL || is_idle_vcpu(new->vcpu) )
         return;
 
-    online = cpupool_scheduler_cpumask(new->vcpu->domain->cpupool);
+    online = cpupool_domain_cpumask(new->vcpu->domain);
     cpumask_and(&not_tickled, online, new->vcpu->cpu_hard_affinity);
     cpumask_andnot(&not_tickled, &not_tickled, &prv->tickled);
 
@@ -1078,7 +1078,7 @@ rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
 
     ASSERT(!list_empty(&prv->sdom));
     sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem);
-    online = cpupool_scheduler_cpumask(sdom->dom->cpupool);
+    online = cpupool_domain_cpumask(sdom->dom);
     snext = __runq_pick(ops, online); /* pick snext from ALL valid cpus */
 
     runq_tickle(ops, snext);
@@ -1113,7 +1113,7 @@ rt_context_saved(const struct scheduler *ops, struct vcpu *vc)
 
         ASSERT(!list_empty(&prv->sdom));
         sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem);
-        online = cpupool_scheduler_cpumask(sdom->dom->cpupool);
+        online = cpupool_domain_cpumask(sdom->dom);
         snext = __runq_pick(ops, online); /* pick snext from ALL cpus */
 
         runq_tickle(ops, snext);
diff --git a/xen/common/sched_sedf.c b/xen/common/sched_sedf.c
index cad5b39..396b651 100644
--- a/xen/common/sched_sedf.c
+++ b/xen/common/sched_sedf.c
@@ -383,7 +383,7 @@ static int sedf_pick_cpu(const struct scheduler *ops, struct vcpu *v)
     cpumask_t online_affinity;
     cpumask_t *online;
 
-    online = cpupool_scheduler_cpumask(v->domain->cpupool);
+    online = cpupool_domain_cpumask(v->domain);
     cpumask_and(&online_affinity, v->cpu_hard_affinity, online);
     return cpumask_cycle(v->vcpu_id % cpumask_weight(&online_affinity) - 1,
                          &online_affinity);
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 4ffcd98..7ad298a 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -80,7 +80,7 @@ static struct scheduler __read_mostly ops;
 
 #define DOM2OP(_d)    (((_d)->cpupool == NULL) ? &ops : ((_d)->cpupool->sched))
 #define VCPU2OP(_v)   (DOM2OP((_v)->domain))
-#define VCPU2ONLINE(_v) cpupool_online_cpumask((_v)->domain->cpupool)
+#define VCPU2ONLINE(_v) cpupool_domain_cpumask((_v)->domain)
 
 static inline void trace_runstate_change(struct vcpu *v, int new_state)
 {
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 7cc25c6..20af791 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -183,9 +183,17 @@ struct cpupool
     atomic_t         refcnt;
 };
 
-#define cpupool_scheduler_cpumask(_pool) \
-    (((_pool) == NULL) ? &cpupool_free_cpus : (_pool)->cpu_valid)
 #define cpupool_online_cpumask(_pool) \
     (((_pool) == NULL) ? &cpu_online_map : (_pool)->cpu_valid)
 
+static inline cpumask_t* cpupool_domain_cpumask(struct domain *d)
+{
+    /*
+     * d->cpupool == NULL only for the idle domain, which should
+     * have no interest in calling into this.
+     */
+    ASSERT(d->cpupool != NULL);
+    return cpupool_online_cpumask(d->cpupool);
+}
+
 #endif /* __XEN_SCHED_IF_H__ */

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] xen: x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown
  2015-06-25 12:15 ` [PATCH 2/4] xen: x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown Dario Faggioli
@ 2015-06-25 14:20   ` Andrew Cooper
  2015-06-25 15:04     ` Dario Faggioli
  2015-06-26 13:54   ` Juergen Gross
  1 sibling, 1 reply; 22+ messages in thread
From: Andrew Cooper @ 2015-06-25 14:20 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Juergen Gross, Jan Beulich

On 25/06/15 13:15, Dario Faggioli wrote:
> In fact, if a pCPU belonging to some other pool than
> cpupool0 goes down, we want to clear the relevant bit
> from its actual pool, rather than always from cpupool0.

This sentence is a little hard to parse.

I presume you mean "use the correct cpupools valid mask, rather than
cpupool0's".

For the change itself, I definitely agree that it needs fixing.

>
> Before this commit, all the pCPUs in the non-default
> pool(s) will be considered immediately valid, during
> system resume, even the one that have not been brought
> up yet. As a result, the (Credit1) scheduler will attempt
> to run its load balancing logic on them, causing the
> following Oops:
>
> # xl cpupool-cpu-remove Pool-0 8-15
> # xl cpupool-create name=\"Pool-1\"
> # xl cpupool-cpu-add Pool-1 8-15
> --> suspend
> --> resume
> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
> (XEN) CPU:    8
> (XEN) RIP:    e008:[<ffff82d080123078>] csched_schedule+0x4be/0xb97
> (XEN) RFLAGS: 0000000000010087   CONTEXT: hypervisor
> (XEN) rax: 80007d2f7fccb780   rbx: 0000000000000009   rcx: 0000000000000000
> (XEN) rdx: ffff82d08031ed40   rsi: ffff82d080334980   rdi: 0000000000000000
> (XEN) rbp: ffff83010000fe20   rsp: ffff83010000fd40   r8:  0000000000000004
> (XEN) r9:  0000ffff0000ffff   r10: 00ff00ff00ff00ff   r11: 0f0f0f0f0f0f0f0f
> (XEN) r12: ffff8303191ea870   r13: ffff8303226aadf0   r14: 0000000000000009
> (XEN) r15: 0000000000000008   cr0: 000000008005003b   cr4: 00000000000026f0
> (XEN) cr3: 00000000dba9d000   cr2: 0000000000000000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) ... ... ...
> (XEN) Xen call trace:
> (XEN)    [<ffff82d080123078>] csched_schedule+0x4be/0xb97
> (XEN)    [<ffff82d08012c732>] schedule+0x12a/0x63c
> (XEN)    [<ffff82d08012f8c8>] __do_softirq+0x82/0x8d
> (XEN)    [<ffff82d08012f920>] do_softirq+0x13/0x15
> (XEN)    [<ffff82d080164791>] idle_loop+0x5b/0x6b
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 8:
> (XEN) GENERAL PROTECTION FAULT
> (XEN) [error_code=0000]
> (XEN) ****************************************

What is the actual cause of the #GP fault?  There are no obviously
poised registers.  Is it something we should modify to be a BUG or ASSERT?

>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Jan Beulich <JBeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>  xen/arch/x86/smpboot.c |    1 -
>  xen/common/cpupool.c   |    2 ++
>  2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
> index 2289284..a4ec396 100644
> --- a/xen/arch/x86/smpboot.c
> +++ b/xen/arch/x86/smpboot.c
> @@ -887,7 +887,6 @@ void __cpu_disable(void)
>      remove_siblinginfo(cpu);
>  
>      /* It's now safe to remove this processor from the online map */
> -    cpumask_clear_cpu(cpu, cpupool0->cpu_valid);
>      cpumask_clear_cpu(cpu, &cpu_online_map);
>      fixup_irqs();
>  
> diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
> index 5471f93..b48ae17 100644
> --- a/xen/common/cpupool.c
> +++ b/xen/common/cpupool.c
> @@ -530,6 +530,7 @@ static int cpupool_cpu_remove(unsigned int cpu)
>              if ( cpumask_test_cpu(cpu, (*c)->cpu_valid ) )
>              {
>                  cpumask_set_cpu(cpu, (*c)->cpu_suspended);
> +                cpumask_clear_cpu(cpu, (*c)->cpu_valid);
>                  break;
>              }
>          }
> @@ -552,6 +553,7 @@ static int cpupool_cpu_remove(unsigned int cpu)
>           * If we are not suspending, we are hot-unplugging cpu, and that is
>           * allowed only for CPUs in pool0.
>           */
> +        cpumask_clear_cpu(cpu, cpupool0->cpu_valid);
>          ret = 0;
>      }
>  
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] xen: x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown
  2015-06-25 14:20   ` Andrew Cooper
@ 2015-06-25 15:04     ` Dario Faggioli
  2015-06-25 15:52       ` Andrew Cooper
  0 siblings, 1 reply; 22+ messages in thread
From: Dario Faggioli @ 2015-06-25 15:04 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Juergen Gross, xen-devel, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 4883 bytes --]

On Thu, 2015-06-25 at 15:20 +0100, Andrew Cooper wrote:
> On 25/06/15 13:15, Dario Faggioli wrote:
> > In fact, if a pCPU belonging to some other pool than
> > cpupool0 goes down, we want to clear the relevant bit
> > from its actual pool, rather than always from cpupool0.
> 
> This sentence is a little hard to parse.
> 
> I presume you mean "use the correct cpupools valid mask, rather than
> cpupool0's".
> 
Yes, that's a better way to say what I meant.

> > # xl cpupool-cpu-remove Pool-0 8-15
> > # xl cpupool-create name=\"Pool-1\"
> > # xl cpupool-cpu-add Pool-1 8-15
> > --> suspend
> > --> resume
> > (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
> > (XEN) CPU:    8
> > (XEN) RIP:    e008:[<ffff82d080123078>] csched_schedule+0x4be/0xb97
> > (XEN) RFLAGS: 0000000000010087   CONTEXT: hypervisor
> > (XEN) rax: 80007d2f7fccb780   rbx: 0000000000000009   rcx: 0000000000000000
> > (XEN) rdx: ffff82d08031ed40   rsi: ffff82d080334980   rdi: 0000000000000000
> > (XEN) rbp: ffff83010000fe20   rsp: ffff83010000fd40   r8:  0000000000000004
> > (XEN) r9:  0000ffff0000ffff   r10: 00ff00ff00ff00ff   r11: 0f0f0f0f0f0f0f0f
> > (XEN) r12: ffff8303191ea870   r13: ffff8303226aadf0   r14: 0000000000000009
> > (XEN) r15: 0000000000000008   cr0: 000000008005003b   cr4: 00000000000026f0
> > (XEN) cr3: 00000000dba9d000   cr2: 0000000000000000
> > (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> > (XEN) ... ... ...
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82d080123078>] csched_schedule+0x4be/0xb97
> > (XEN)    [<ffff82d08012c732>] schedule+0x12a/0x63c
> > (XEN)    [<ffff82d08012f8c8>] __do_softirq+0x82/0x8d
> > (XEN)    [<ffff82d08012f920>] do_softirq+0x13/0x15
> > (XEN)    [<ffff82d080164791>] idle_loop+0x5b/0x6b
> > (XEN)
> > (XEN) ****************************************
> > (XEN) Panic on CPU 8:
> > (XEN) GENERAL PROTECTION FAULT
> > (XEN) [error_code=0000]
> > (XEN) ****************************************
> 
> What is the actual cause of the #GP fault?  There are no obviously
> poised registers.  
>
IIRC, CPU 8 has been just brought up and is scheduling. Not any other
CPU from Pool-1 is online yet. We are on CPU 8, in
csched_load_balance(), more specifically here:

    ...
    BUG_ON( cpu != snext->vcpu->processor );
    online = cpupool_scheduler_cpumask(per_cpu(cpupool, cpu));
    ...
    for_each_csched_balance_step( bstep )
    {
        /*
         * We peek at the non-idling CPUs in a node-wise fashion. In fact,
         * it is more likely that we find some affine work on our same
         * node, not to mention that migrating vcpus within the same node
         * could well expected to be cheaper than across-nodes (memory
         * stays local, there might be some node-wide cache[s], etc.).
         */
        peer_node = node;
        do
        {
            /* Find out what the !idle are in this node */
            cpumask_andnot(&workers, online, prv->idlers);
            cpumask_and(&workers, &workers, &node_to_cpumask(peer_node));
            __cpumask_clear_cpu(cpu, &workers);

            peer_cpu = cpumask_first(&workers);
            if ( peer_cpu >= nr_cpu_ids )
                goto next_node;
            do
            {
                /*
                 * Get ahold of the scheduler lock for this peer CPU.
                 *
                 * Note: We don't spin on this lock but simply try it. Spinning
                 * could cause a deadlock if the peer CPU is also load
                 * balancing and trying to lock this CPU.
                 */
                spinlock_t *lock = pcpu_schedule_trylock(peer_cpu);

Because of the fact that we did not clear Pool-1->cpu_valid online is
8-15. Also, since we _did_ clear bits 8-15 in prv->idlers when tearing
them down, during suspend, they're all (or all but 8) workers, as far as
the code above can tell.

We therefore enter the inner do{}while with, for instance (that's what
I've seen in my debugging), peer_cpu=9, but we've not yet done
cpu_schedule_up()-->alloc_pdata()-->etc. for that CPU, so we die at (or
shortly after) the end of the code snippet shown above.

> Is it something we should modify to be a BUG or ASSERT?
> 
Not sure how/where. Note that some more fixing of similar situations
happen in other patches in the series, and that includes also adding
ASSERT-s (although, no, they probably won't cover this case).

I can try to think at it and to come up with something if you think it's
important...

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] xen: x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown
  2015-06-25 15:04     ` Dario Faggioli
@ 2015-06-25 15:52       ` Andrew Cooper
  2015-06-25 16:13         ` Dario Faggioli
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Cooper @ 2015-06-25 15:52 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Juergen Gross, xen-devel, Jan Beulich

On 25/06/15 16:04, Dario Faggioli wrote:
> On Thu, 2015-06-25 at 15:20 +0100, Andrew Cooper wrote:
>> On 25/06/15 13:15, Dario Faggioli wrote:
>>> In fact, if a pCPU belonging to some other pool than
>>> cpupool0 goes down, we want to clear the relevant bit
>>> from its actual pool, rather than always from cpupool0.
>> This sentence is a little hard to parse.
>>
>> I presume you mean "use the correct cpupools valid mask, rather than
>> cpupool0's".
>>
> Yes, that's a better way to say what I meant.
>
>>> # xl cpupool-cpu-remove Pool-0 8-15
>>> # xl cpupool-create name=\"Pool-1\"
>>> # xl cpupool-cpu-add Pool-1 8-15
>>> --> suspend
>>> --> resume
>>> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
>>> (XEN) CPU:    8
>>> (XEN) RIP:    e008:[<ffff82d080123078>] csched_schedule+0x4be/0xb97
>>> (XEN) RFLAGS: 0000000000010087   CONTEXT: hypervisor
>>> (XEN) rax: 80007d2f7fccb780   rbx: 0000000000000009   rcx: 0000000000000000
>>> (XEN) rdx: ffff82d08031ed40   rsi: ffff82d080334980   rdi: 0000000000000000
>>> (XEN) rbp: ffff83010000fe20   rsp: ffff83010000fd40   r8:  0000000000000004
>>> (XEN) r9:  0000ffff0000ffff   r10: 00ff00ff00ff00ff   r11: 0f0f0f0f0f0f0f0f
>>> (XEN) r12: ffff8303191ea870   r13: ffff8303226aadf0   r14: 0000000000000009
>>> (XEN) r15: 0000000000000008   cr0: 000000008005003b   cr4: 00000000000026f0
>>> (XEN) cr3: 00000000dba9d000   cr2: 0000000000000000
>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>> (XEN) ... ... ...
>>> (XEN) Xen call trace:
>>> (XEN)    [<ffff82d080123078>] csched_schedule+0x4be/0xb97
>>> (XEN)    [<ffff82d08012c732>] schedule+0x12a/0x63c
>>> (XEN)    [<ffff82d08012f8c8>] __do_softirq+0x82/0x8d
>>> (XEN)    [<ffff82d08012f920>] do_softirq+0x13/0x15
>>> (XEN)    [<ffff82d080164791>] idle_loop+0x5b/0x6b
>>> (XEN)
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 8:
>>> (XEN) GENERAL PROTECTION FAULT
>>> (XEN) [error_code=0000]
>>> (XEN) ****************************************
>> What is the actual cause of the #GP fault?  There are no obviously
>> poised registers.  
>>
> IIRC, CPU 8 has been just brought up and is scheduling. Not any other
> CPU from Pool-1 is online yet. We are on CPU 8, in
> csched_load_balance(), more specifically here:
>
>     ...
>     BUG_ON( cpu != snext->vcpu->processor );
>     online = cpupool_scheduler_cpumask(per_cpu(cpupool, cpu));
>     ...
>     for_each_csched_balance_step( bstep )
>     {
>         /*
>          * We peek at the non-idling CPUs in a node-wise fashion. In fact,
>          * it is more likely that we find some affine work on our same
>          * node, not to mention that migrating vcpus within the same node
>          * could well expected to be cheaper than across-nodes (memory
>          * stays local, there might be some node-wide cache[s], etc.).
>          */
>         peer_node = node;
>         do
>         {
>             /* Find out what the !idle are in this node */
>             cpumask_andnot(&workers, online, prv->idlers);
>             cpumask_and(&workers, &workers, &node_to_cpumask(peer_node));
>             __cpumask_clear_cpu(cpu, &workers);
>
>             peer_cpu = cpumask_first(&workers);
>             if ( peer_cpu >= nr_cpu_ids )
>                 goto next_node;
>             do
>             {
>                 /*
>                  * Get ahold of the scheduler lock for this peer CPU.
>                  *
>                  * Note: We don't spin on this lock but simply try it. Spinning
>                  * could cause a deadlock if the peer CPU is also load
>                  * balancing and trying to lock this CPU.
>                  */
>                 spinlock_t *lock = pcpu_schedule_trylock(peer_cpu);
>
> Because of the fact that we did not clear Pool-1->cpu_valid online is
> 8-15. Also, since we _did_ clear bits 8-15 in prv->idlers when tearing
> them down, during suspend, they're all (or all but 8) workers, as far as
> the code above can tell.
>
> We therefore enter the inner do{}while with, for instance (that's what
> I've seen in my debugging), peer_cpu=9, but we've not yet done
> cpu_schedule_up()-->alloc_pdata()-->etc. for that CPU, so we die at (or
> shortly after) the end of the code snippet shown above.

Aah - it is a dereference with %rax as a pointer, which is

#define INVALID_PERCPU_AREA (0x8000000000000000L - (long)__per_cpu_start)

That explains the #GP fault which is due to a non-canonical address.

It might be better to use 0xDEAD000000000000L as the constant to make it
slightly easier to spot as a poisoned pointer.

>
>> Is it something we should modify to be a BUG or ASSERT?
>>
> Not sure how/where. Note that some more fixing of similar situations
> happen in other patches in the series, and that includes also adding
> ASSERT-s (although, no, they probably won't cover this case).
>
> I can try to think at it and to come up with something if you think it's
> important...

Not to worry. I was more concerned about working out why it was dying
with an otherwise unqualified #GP fault.

~Andrew

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] xen: x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown
  2015-06-25 15:52       ` Andrew Cooper
@ 2015-06-25 16:13         ` Dario Faggioli
  2015-06-25 16:39           ` Andrew Cooper
  0 siblings, 1 reply; 22+ messages in thread
From: Dario Faggioli @ 2015-06-25 16:13 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Juergen Gross, xen-devel, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 3690 bytes --]

On Thu, 2015-06-25 at 16:52 +0100, Andrew Cooper wrote:
> On 25/06/15 16:04, Dario Faggioli wrote:
> > On Thu, 2015-06-25 at 15:20 +0100, Andrew Cooper wrote:
> >> On 25/06/15 13:15, Dario Faggioli wrote:

> >>> # xl cpupool-cpu-remove Pool-0 8-15
> >>> # xl cpupool-create name=\"Pool-1\"
> >>> # xl cpupool-cpu-add Pool-1 8-15
> >>> --> suspend
> >>> --> resume
> >>> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
> >>> (XEN) CPU:    8
> >>> (XEN) RIP:    e008:[<ffff82d080123078>] csched_schedule+0x4be/0xb97
> >>> (XEN) RFLAGS: 0000000000010087   CONTEXT: hypervisor
> >>> (XEN) rax: 80007d2f7fccb780   rbx: 0000000000000009   rcx: 0000000000000000
> >>> (XEN) rdx: ffff82d08031ed40   rsi: ffff82d080334980   rdi: 0000000000000000
> >>> (XEN) rbp: ffff83010000fe20   rsp: ffff83010000fd40   r8:  0000000000000004
> >>> (XEN) r9:  0000ffff0000ffff   r10: 00ff00ff00ff00ff   r11: 0f0f0f0f0f0f0f0f
> >>> (XEN) r12: ffff8303191ea870   r13: ffff8303226aadf0   r14: 0000000000000009
> >>> (XEN) r15: 0000000000000008   cr0: 000000008005003b   cr4: 00000000000026f0
> >>> (XEN) cr3: 00000000dba9d000   cr2: 0000000000000000
> >>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> >>> (XEN) ... ... ...
> >>> (XEN) Xen call trace:
> >>> (XEN)    [<ffff82d080123078>] csched_schedule+0x4be/0xb97
> >>> (XEN)    [<ffff82d08012c732>] schedule+0x12a/0x63c
> >>> (XEN)    [<ffff82d08012f8c8>] __do_softirq+0x82/0x8d
> >>> (XEN)    [<ffff82d08012f920>] do_softirq+0x13/0x15
> >>> (XEN)    [<ffff82d080164791>] idle_loop+0x5b/0x6b
> >>> (XEN)
> >>> (XEN) ****************************************
> >>> (XEN) Panic on CPU 8:
> >>> (XEN) GENERAL PROTECTION FAULT
> >>> (XEN) [error_code=0000]
> >>> (XEN) ****************************************
> >> What is the actual cause of the #GP fault?  There are no obviously
> >> poised registers.  
> >>

> >             do
> >             {
> >                 /*
> >                  * Get ahold of the scheduler lock for this peer CPU.
> >                  *
> >                  * Note: We don't spin on this lock but simply try it. Spinning
> >                  * could cause a deadlock if the peer CPU is also load
> >                  * balancing and trying to lock this CPU.
> >                  */
> >                 spinlock_t *lock = pcpu_schedule_trylock(peer_cpu);
> >
> > We therefore enter the inner do{}while with, for instance (that's what
> > I've seen in my debugging), peer_cpu=9, but we've not yet done
> > cpu_schedule_up()-->alloc_pdata()-->etc. for that CPU, so we die at (or
> > shortly after) the end of the code snippet shown above.
> 
> Aah - it is a dereference with %rax as a pointer, which is
> 
> #define INVALID_PERCPU_AREA (0x8000000000000000L - (long)__per_cpu_start)
> 
Exactly!

> That explains the #GP fault which is due to a non-canonical address.
> 
> It might be better to use 0xDEAD000000000000L as the constant to make it
> slightly easier to spot as a poisoned pointer.
> 
Indeed. :-)

> > I can try to think at it and to come up with something if you think it's
> > important...
> 
> Not to worry. I was more concerned about working out why it was dying
> with an otherwise unqualified #GP fault.
> 
Ok, thanks. So, just to clarify things to me, from your side, this patch
needs "just" a better changelog, right?

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] xen: x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown
  2015-06-25 16:13         ` Dario Faggioli
@ 2015-06-25 16:39           ` Andrew Cooper
  0 siblings, 0 replies; 22+ messages in thread
From: Andrew Cooper @ 2015-06-25 16:39 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Juergen Gross, xen-devel, Jan Beulich

On 25/06/15 17:13, Dario Faggioli wrote:
> On Thu, 2015-06-25 at 16:52 +0100, Andrew Cooper wrote:
>> On 25/06/15 16:04, Dario Faggioli wrote:
>>> On Thu, 2015-06-25 at 15:20 +0100, Andrew Cooper wrote:
>>>> On 25/06/15 13:15, Dario Faggioli wrote:
>>>>> # xl cpupool-cpu-remove Pool-0 8-15
>>>>> # xl cpupool-create name=\"Pool-1\"
>>>>> # xl cpupool-cpu-add Pool-1 8-15
>>>>> --> suspend
>>>>> --> resume
>>>>> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
>>>>> (XEN) CPU:    8
>>>>> (XEN) RIP:    e008:[<ffff82d080123078>] csched_schedule+0x4be/0xb97
>>>>> (XEN) RFLAGS: 0000000000010087   CONTEXT: hypervisor
>>>>> (XEN) rax: 80007d2f7fccb780   rbx: 0000000000000009   rcx: 0000000000000000
>>>>> (XEN) rdx: ffff82d08031ed40   rsi: ffff82d080334980   rdi: 0000000000000000
>>>>> (XEN) rbp: ffff83010000fe20   rsp: ffff83010000fd40   r8:  0000000000000004
>>>>> (XEN) r9:  0000ffff0000ffff   r10: 00ff00ff00ff00ff   r11: 0f0f0f0f0f0f0f0f
>>>>> (XEN) r12: ffff8303191ea870   r13: ffff8303226aadf0   r14: 0000000000000009
>>>>> (XEN) r15: 0000000000000008   cr0: 000000008005003b   cr4: 00000000000026f0
>>>>> (XEN) cr3: 00000000dba9d000   cr2: 0000000000000000
>>>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>>>> (XEN) ... ... ...
>>>>> (XEN) Xen call trace:
>>>>> (XEN)    [<ffff82d080123078>] csched_schedule+0x4be/0xb97
>>>>> (XEN)    [<ffff82d08012c732>] schedule+0x12a/0x63c
>>>>> (XEN)    [<ffff82d08012f8c8>] __do_softirq+0x82/0x8d
>>>>> (XEN)    [<ffff82d08012f920>] do_softirq+0x13/0x15
>>>>> (XEN)    [<ffff82d080164791>] idle_loop+0x5b/0x6b
>>>>> (XEN)
>>>>> (XEN) ****************************************
>>>>> (XEN) Panic on CPU 8:
>>>>> (XEN) GENERAL PROTECTION FAULT
>>>>> (XEN) [error_code=0000]
>>>>> (XEN) ****************************************
>>>> What is the actual cause of the #GP fault?  There are no obviously
>>>> poised registers.  
>>>>
>>>             do
>>>             {
>>>                 /*
>>>                  * Get ahold of the scheduler lock for this peer CPU.
>>>                  *
>>>                  * Note: We don't spin on this lock but simply try it. Spinning
>>>                  * could cause a deadlock if the peer CPU is also load
>>>                  * balancing and trying to lock this CPU.
>>>                  */
>>>                 spinlock_t *lock = pcpu_schedule_trylock(peer_cpu);
>>>
>>> We therefore enter the inner do{}while with, for instance (that's what
>>> I've seen in my debugging), peer_cpu=9, but we've not yet done
>>> cpu_schedule_up()-->alloc_pdata()-->etc. for that CPU, so we die at (or
>>> shortly after) the end of the code snippet shown above.
>> Aah - it is a dereference with %rax as a pointer, which is
>>
>> #define INVALID_PERCPU_AREA (0x8000000000000000L - (long)__per_cpu_start)
>>
> Exactly!
>
>> That explains the #GP fault which is due to a non-canonical address.
>>
>> It might be better to use 0xDEAD000000000000L as the constant to make it
>> slightly easier to spot as a poisoned pointer.
>>
> Indeed. :-)
>
>>> I can try to think at it and to come up with something if you think it's
>>> important...
>> Not to worry. I was more concerned about working out why it was dying
>> with an otherwise unqualified #GP fault.
>>
> Ok, thanks. So, just to clarify things to me, from your side, this patch
> needs "just" a better changelog, right?

Yes - Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> for the x86
nature, but I am not very familiar with the cpupool code, so would
prefer review from a knowledgeable 3rd party.

>
> Regards,
> Dario
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] xen: sched: avoid dumping duplicate information
  2015-06-25 12:15 ` [PATCH 1/4] xen: sched: avoid dumping duplicate information Dario Faggioli
@ 2015-06-26 13:43   ` Juergen Gross
  2015-07-02 11:18   ` George Dunlap
  1 sibling, 0 replies; 22+ messages in thread
From: Juergen Gross @ 2015-06-26 13:43 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap

On 06/25/2015 02:15 PM, Dario Faggioli wrote:
> When dumping scheduling information (debug key 'r'), what
> we print as 'Idle cpupool' is pretty much the same of what
> we print immediately after as 'Cpupool0'. In fact, if there
> are no pCPUs outside of any cpupools, it is exactly the
> same.
>
> If there are free pCPUs, there is some valuable information,
> but still a lot of duplication:
>
>   (XEN) Online Cpus: 0-15
>   (XEN) Free Cpus: 8
>   (XEN) Idle cpupool:
>   (XEN) Scheduler: SMP Credit Scheduler (credit)
>   (XEN) info:
>   (XEN)   ncpus              = 13
>   (XEN)   master             = 0
>   (XEN)   credit             = 3900
>   (XEN)   credit balance     = 45
>   (XEN)   weight             = 1280
>   (XEN)   runq_sort          = 11820
>   (XEN)   default-weight     = 256
>   (XEN)   tslice             = 30ms
>   (XEN)   ratelimit          = 1000us
>   (XEN)   credits per msec   = 10
>   (XEN)   ticks per tslice   = 3
>   (XEN)   migration delay    = 0us
>   (XEN) idlers: 00000000,00006d29
>   (XEN) active vcpus:
>   (XEN)     1: [1.7] pri=-1 flags=0 cpu=15 credit=-116 [w=256,cap=0] (84+300) {a/i=22/21 m=18+5 (k=0)}
>   (XEN)     2: [1.3] pri=0 flags=0 cpu=1 credit=-113 [w=256,cap=0] (87+300) {a/i=37/36 m=11+544 (k=0)}
>   (XEN)     3: [0.15] pri=-1 flags=0 cpu=4 credit=95 [w=256,cap=0] (210+300) {a/i=127/126 m=108+9 (k=0)}
>   (XEN)     4: [0.10] pri=-2 flags=0 cpu=12 credit=-287 [w=256,cap=0] (-84+300) {a/i=163/162 m=36+568 (k=0)}
>   (XEN)     5: [0.7] pri=-2 flags=0 cpu=2 credit=-242 [w=256,cap=0] (-42+300) {a/i=129/128 m=16+50 (k=0)}
>   (XEN) CPU[08]  sort=5791, sibling=00000000,00000300, core=00000000,0000ff00
>   (XEN)   run: [32767.8] pri=-64 flags=0 cpu=8
>   (XEN) Cpupool 0:
>   (XEN) Cpus: 0-5,10-15
>   (XEN) Scheduler: SMP Credit Scheduler (credit)
>   (XEN) info:
>   (XEN)   ncpus              = 13
>   (XEN)   master             = 0
>   (XEN)   credit             = 3900
>   (XEN)   credit balance     = 45
>   (XEN)   weight             = 1280
>   (XEN)   runq_sort          = 11820
>   (XEN)   default-weight     = 256
>   (XEN)   tslice             = 30ms
>   (XEN)   ratelimit          = 1000us
>   (XEN)   credits per msec   = 10
>   (XEN)   ticks per tslice   = 3
>   (XEN)   migration delay    = 0us
>   (XEN) idlers: 00000000,00006d29
>   (XEN) active vcpus:
>   (XEN)     1: [1.7] pri=-1 flags=0 cpu=15 credit=-116 [w=256,cap=0] (84+300) {a/i=22/21 m=18+5 (k=0)}
>   (XEN)     2: [1.3] pri=0 flags=0 cpu=1 credit=-113 [w=256,cap=0] (87+300) {a/i=37/36 m=11+544 (k=0)}
>   (XEN)     3: [0.15] pri=-1 flags=0 cpu=4 credit=95 [w=256,cap=0] (210+300) {a/i=127/126 m=108+9 (k=0)}
>   (XEN)     4: [0.10] pri=-2 flags=0 cpu=12 credit=-287 [w=256,cap=0] (-84+300) {a/i=163/162 m=36+568 (k=0)}
>   (XEN)     5: [0.7] pri=-2 flags=0 cpu=2 credit=-242 [w=256,cap=0] (-42+300) {a/i=129/128 m=16+50 (k=0)}
>   (XEN) CPU[00]  sort=11801, sibling=00000000,00000003, core=00000000,000000ff
>   (XEN)   run: [32767.0] pri=-64 flags=0 cpu=0
>   ... ... ...
>   (XEN) CPU[15]  sort=11820, sibling=00000000,0000c000, core=00000000,0000ff00
>   (XEN)   run: [1.7] pri=-1 flags=0 cpu=15 credit=-116 [w=256,cap=0] (84+300) {a/i=22/21 m=18+5 (k=0)}
>   (XEN)     1: [32767.15] pri=-64 flags=0 cpu=15
>   (XEN) Cpupool 1:
>   (XEN) Cpus: 6-7,9
>   (XEN) Scheduler: SMP RTDS Scheduler (rtds)
>   (XEN) CPU[06]
>   (XEN) CPU[07]
>   (XEN) CPU[09]
>
> With this change, we get rid of the redundancy, and retain
> only the information about the free pCPUs.
>
> (While there, turn a loop index variable from `int' to
> `unsigned int' in schedule_dump().)
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Acked-by: Juergen Gross <jgross@suse.com>

> ---
> Cc: Juergen Gross <jgross@suse.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> ---
>   xen/common/cpupool.c  |    6 +++---
>   xen/common/schedule.c |   18 +++++++++++++-----
>   2 files changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
> index 563864d..5471f93 100644
> --- a/xen/common/cpupool.c
> +++ b/xen/common/cpupool.c
> @@ -728,10 +728,10 @@ void dump_runq(unsigned char key)
>
>       print_cpumap("Online Cpus", &cpu_online_map);
>       if ( !cpumask_empty(&cpupool_free_cpus) )
> +    {
>           print_cpumap("Free Cpus", &cpupool_free_cpus);
> -
> -    printk("Idle cpupool:\n");
> -    schedule_dump(NULL);
> +        schedule_dump(NULL);
> +    }
>
>       for_each_cpupool(c)
>       {
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index ecf1545..4ffcd98 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -1473,16 +1473,24 @@ void scheduler_free(struct scheduler *sched)
>
>   void schedule_dump(struct cpupool *c)
>   {
> -    int               i;
> +    unsigned int      i;
>       struct scheduler *sched;
>       cpumask_t        *cpus;
>
>       /* Locking, if necessary, must be handled withing each scheduler */
>
> -    sched = (c == NULL) ? &ops : c->sched;
> -    cpus = cpupool_scheduler_cpumask(c);
> -    printk("Scheduler: %s (%s)\n", sched->name, sched->opt_name);
> -    SCHED_OP(sched, dump_settings);
> +    if ( c != NULL )
> +    {
> +        sched = c->sched;
> +        cpus = c->cpu_valid;
> +        printk("Scheduler: %s (%s)\n", sched->name, sched->opt_name);
> +        SCHED_OP(sched, dump_settings);
> +    }
> +    else
> +    {
> +        sched = &ops;
> +        cpus = &cpupool_free_cpus;
> +    }
>
>       for_each_cpu (i, cpus)
>       {
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] xen: x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown
  2015-06-25 12:15 ` [PATCH 2/4] xen: x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown Dario Faggioli
  2015-06-25 14:20   ` Andrew Cooper
@ 2015-06-26 13:54   ` Juergen Gross
  1 sibling, 0 replies; 22+ messages in thread
From: Juergen Gross @ 2015-06-26 13:54 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Andrew Cooper, Jan Beulich

On 06/25/2015 02:15 PM, Dario Faggioli wrote:
> In fact, if a pCPU belonging to some other pool than
> cpupool0 goes down, we want to clear the relevant bit
> from its actual pool, rather than always from cpupool0.
>
> Before this commit, all the pCPUs in the non-default
> pool(s) will be considered immediately valid, during
> system resume, even the one that have not been brought
> up yet. As a result, the (Credit1) scheduler will attempt
> to run its load balancing logic on them, causing the
> following Oops:
>
> # xl cpupool-cpu-remove Pool-0 8-15
> # xl cpupool-create name=\"Pool-1\"
> # xl cpupool-cpu-add Pool-1 8-15
> --> suspend
> --> resume
> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
> (XEN) CPU:    8
> (XEN) RIP:    e008:[<ffff82d080123078>] csched_schedule+0x4be/0xb97
> (XEN) RFLAGS: 0000000000010087   CONTEXT: hypervisor
> (XEN) rax: 80007d2f7fccb780   rbx: 0000000000000009   rcx: 0000000000000000
> (XEN) rdx: ffff82d08031ed40   rsi: ffff82d080334980   rdi: 0000000000000000
> (XEN) rbp: ffff83010000fe20   rsp: ffff83010000fd40   r8:  0000000000000004
> (XEN) r9:  0000ffff0000ffff   r10: 00ff00ff00ff00ff   r11: 0f0f0f0f0f0f0f0f
> (XEN) r12: ffff8303191ea870   r13: ffff8303226aadf0   r14: 0000000000000009
> (XEN) r15: 0000000000000008   cr0: 000000008005003b   cr4: 00000000000026f0
> (XEN) cr3: 00000000dba9d000   cr2: 0000000000000000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) ... ... ...
> (XEN) Xen call trace:
> (XEN)    [<ffff82d080123078>] csched_schedule+0x4be/0xb97
> (XEN)    [<ffff82d08012c732>] schedule+0x12a/0x63c
> (XEN)    [<ffff82d08012f8c8>] __do_softirq+0x82/0x8d
> (XEN)    [<ffff82d08012f920>] do_softirq+0x13/0x15
> (XEN)    [<ffff82d080164791>] idle_loop+0x5b/0x6b
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 8:
> (XEN) GENERAL PROTECTION FAULT
> (XEN) [error_code=0000]
> (XEN) ****************************************
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Acked-by: Juergen Gross <jgross@suse.com>

> ---
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Jan Beulich <JBeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>   xen/arch/x86/smpboot.c |    1 -
>   xen/common/cpupool.c   |    2 ++
>   2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
> index 2289284..a4ec396 100644
> --- a/xen/arch/x86/smpboot.c
> +++ b/xen/arch/x86/smpboot.c
> @@ -887,7 +887,6 @@ void __cpu_disable(void)
>       remove_siblinginfo(cpu);
>
>       /* It's now safe to remove this processor from the online map */
> -    cpumask_clear_cpu(cpu, cpupool0->cpu_valid);
>       cpumask_clear_cpu(cpu, &cpu_online_map);
>       fixup_irqs();
>
> diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
> index 5471f93..b48ae17 100644
> --- a/xen/common/cpupool.c
> +++ b/xen/common/cpupool.c
> @@ -530,6 +530,7 @@ static int cpupool_cpu_remove(unsigned int cpu)
>               if ( cpumask_test_cpu(cpu, (*c)->cpu_valid ) )
>               {
>                   cpumask_set_cpu(cpu, (*c)->cpu_suspended);
> +                cpumask_clear_cpu(cpu, (*c)->cpu_valid);
>                   break;
>               }
>           }
> @@ -552,6 +553,7 @@ static int cpupool_cpu_remove(unsigned int cpu)
>            * If we are not suspending, we are hot-unplugging cpu, and that is
>            * allowed only for CPUs in pool0.
>            */
> +        cpumask_clear_cpu(cpu, cpupool0->cpu_valid);
>           ret = 0;
>       }
>
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] xen: credit1: properly deal with pCPUs not in any cpupool
  2015-06-25 12:15 ` [PATCH 3/4] xen: credit1: properly deal with pCPUs not in any cpupool Dario Faggioli
@ 2015-06-26 14:05   ` Juergen Gross
  2015-07-02 15:24   ` George Dunlap
  1 sibling, 0 replies; 22+ messages in thread
From: Juergen Gross @ 2015-06-26 14:05 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap

On 06/25/2015 02:15 PM, Dario Faggioli wrote:
> Ideally, the pCPUs that are 'free', i.e., not assigned
> to any cpupool, should not be considred by the scheduler
> for load balancing or anything. In Credit1, we fail at
> this, because of how we use cpupool_scheduler_cpumask().
> In fact, for a free pCPU, cpupool_scheduler_cpumask()
> returns a pointer to cpupool_free_cpus, and hence, near
> the top of csched_load_balance():
>
>   if ( unlikely(!cpumask_test_cpu(cpu, online)) )
>       goto out;
>
> is false (the pCPU _is_ free!), and we therefore do not
> jump to the end right away, as we should. This, causes
> the following splat when resuming from ACPI S3 with
> pCPUs not assigned to any pool:
>
> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
> (XEN) ... ... ...
> (XEN) Xen call trace:
> (XEN)    [<ffff82d080122eaa>] csched_load_balance+0x213/0x794
> (XEN)    [<ffff82d08012374c>] csched_schedule+0x321/0x452
> (XEN)    [<ffff82d08012c85e>] schedule+0x12a/0x63c
> (XEN)    [<ffff82d08012fa09>] __do_softirq+0x82/0x8d
> (XEN)    [<ffff82d08012fa61>] do_softirq+0x13/0x15
> (XEN)    [<ffff82d080164780>] idle_loop+0x5b/0x6b
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 8:
> (XEN) GENERAL PROTECTION FAULT
> (XEN) [error_code=0000]
> (XEN) ****************************************
>
> The cure is:
>   * use cpupool_online_cpumask(), as a better guard to the
>     case when the cpu is being offlined;
>   * explicitly check whether the cpu is free.
>
> SEDF is in a similar situation, so fix it too.
>
> Still in Credit1, we must make sure that free (or offline)
> CPUs are not considered "ticklable". Not doing so would impair
> the load balancing algorithm, making the scheduler think that
> it is possible to 'ask' the pCPU to pick up some work, while
> in reallity, that will never happen! Evidence of such behavior
> is shown in this trace:
>
>   Name               CPU list
>   Pool-0             0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
>
>      0.112998198 | ||.|| -|x||-|- d0v0 runstate_change d0v4 offline->runnable
>   ]  0.112998198 | ||.|| -|x||-|- d0v0   22006(2:2:6) 1 [ f ]
>   ]  0.112999612 | ||.|| -|x||-|- d0v0   28004(2:8:4) 2 [ 0 4 ]
>      0.113003387 | ||.|| -||||-|x d32767v15 runstate_continue d32767v15 running->running
>
> where "22006(2:2:6) 1 [ f ]" means that pCPU 15, which is
> free from any pool, is tickled.
>
> The cure, in this case, is to filter out the free pCPUs,
> within __runq_tickle().
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Acked-by: Juergen Gross <jgross@suse.com>

> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Juergen Gross <jgross@suse.com>
> ---
>   xen/common/sched_credit.c |   23 ++++++++++++++++-------
>   xen/common/sched_sedf.c   |    3 ++-
>   2 files changed, 18 insertions(+), 8 deletions(-)
>
> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
> index 953ecb0..a1945ac 100644
> --- a/xen/common/sched_credit.c
> +++ b/xen/common/sched_credit.c
> @@ -366,12 +366,17 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
>   {
>       struct csched_vcpu * const cur = CSCHED_VCPU(curr_on_cpu(cpu));
>       struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
> -    cpumask_t mask, idle_mask;
> +    cpumask_t mask, idle_mask, *online;
>       int balance_step, idlers_empty;
>
>       ASSERT(cur);
>       cpumask_clear(&mask);
> -    idlers_empty = cpumask_empty(prv->idlers);
> +
> +    /* cpu is vc->processor, so it must be in a cpupool. */
> +    ASSERT(per_cpu(cpupool, cpu) != NULL);
> +    online = cpupool_online_cpumask(per_cpu(cpupool, cpu));
> +    cpumask_and(&idle_mask, prv->idlers, online);
> +    idlers_empty = cpumask_empty(&idle_mask);
>
>
>       /*
> @@ -408,8 +413,8 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
>               /* Are there idlers suitable for new (for this balance step)? */
>               csched_balance_cpumask(new->vcpu, balance_step,
>                                      csched_balance_mask);
> -            cpumask_and(&idle_mask, prv->idlers, csched_balance_mask);
> -            new_idlers_empty = cpumask_empty(&idle_mask);
> +            cpumask_and(csched_balance_mask, csched_balance_mask, &idle_mask);
> +            new_idlers_empty = cpumask_empty(csched_balance_mask);
>
>               /*
>                * Let's not be too harsh! If there aren't idlers suitable
> @@ -1510,6 +1515,7 @@ static struct csched_vcpu *
>   csched_load_balance(struct csched_private *prv, int cpu,
>       struct csched_vcpu *snext, bool_t *stolen)
>   {
> +    struct cpupool *c = per_cpu(cpupool, cpu);
>       struct csched_vcpu *speer;
>       cpumask_t workers;
>       cpumask_t *online;
> @@ -1517,10 +1523,13 @@ csched_load_balance(struct csched_private *prv, int cpu,
>       int node = cpu_to_node(cpu);
>
>       BUG_ON( cpu != snext->vcpu->processor );
> -    online = cpupool_scheduler_cpumask(per_cpu(cpupool, cpu));
> +    online = cpupool_online_cpumask(c);
>
> -    /* If this CPU is going offline we shouldn't steal work. */
> -    if ( unlikely(!cpumask_test_cpu(cpu, online)) )
> +    /*
> +     * If this CPU is going offline, or is not (yet) part of any cpupool
> +     * (as it happens, e.g., during cpu bringup), we shouldn't steal work.
> +     */
> +    if ( unlikely(!cpumask_test_cpu(cpu, online) || c == NULL) )
>           goto out;
>
>       if ( snext->pri == CSCHED_PRI_IDLE )
> diff --git a/xen/common/sched_sedf.c b/xen/common/sched_sedf.c
> index a1a4cb7..cad5b39 100644
> --- a/xen/common/sched_sedf.c
> +++ b/xen/common/sched_sedf.c
> @@ -790,7 +790,8 @@ static struct task_slice sedf_do_schedule(
>       if ( tasklet_work_scheduled ||
>            (list_empty(runq) && list_empty(waitq)) ||
>            unlikely(!cpumask_test_cpu(cpu,
> -                   cpupool_scheduler_cpumask(per_cpu(cpupool, cpu)))) )
> +                   cpupool_online_cpumask(per_cpu(cpupool, cpu))) ||
> +                  per_cpu(cpupool, cpu) == NULL) )
>       {
>           ret.task = IDLETASK(cpu);
>           ret.time = SECONDS(1);
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] xen: sched: get rid of cpupool_scheduler_cpumask()
  2015-06-25 12:15 ` [PATCH 4/4] xen: sched: get rid of cpupool_scheduler_cpumask() Dario Faggioli
@ 2015-06-26 14:08   ` Juergen Gross
  2015-06-26 14:57     ` Joshua Whitehead
  2015-06-27 19:21   ` Meng Xu
  2015-07-02 15:39   ` George Dunlap
  2 siblings, 1 reply; 22+ messages in thread
From: Juergen Gross @ 2015-06-26 14:08 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: George Dunlap, Josh Whitehead, Robert VanVossen, Sisu Xi, Meng Xu

On 06/25/2015 02:15 PM, Dario Faggioli wrote:
> and of (almost every) direct use of cpupool_online_cpumask().
>
> In fact, what we really want for the most of the times,
> is the set of valid pCPUs of the cpupool a certain domain
> is part of. Furthermore, in case it's called with a NULL
> pool as argument, cpupool_scheduler_cpumask() does more
> harm than good, by returning the bitmask of free pCPUs!
>
> This commit, therefore:
>   * gets rid of cpupool_scheduler_cpumask(), in favour of
>     cpupool_domain_cpumask(), which makes it more evident
>     what we are after, and accommodates some sanity checking;
>   * replaces some of the calls to cpupool_online_cpumask()
>     with calls to the new functions too.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Acked-by: Juergen Gross <jgross@suse.com>

> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Robert VanVossen <robert.vanvossen@dornerworks.com>
> Cc: Josh Whitehead <josh.whitehead@dornerworks.com>
> Cc: Meng Xu <mengxu@cis.upenn.edu>
> Cc: Sisu Xi <xisisu@gmail.com>
> ---
>   xen/common/domain.c         |    5 +++--
>   xen/common/domctl.c         |    4 ++--
>   xen/common/sched_arinc653.c |    2 +-
>   xen/common/sched_credit.c   |    6 +++---
>   xen/common/sched_rt.c       |   12 ++++++------
>   xen/common/sched_sedf.c     |    2 +-
>   xen/common/schedule.c       |    2 +-
>   xen/include/xen/sched-if.h  |   12 ++++++++++--
>   8 files changed, 27 insertions(+), 18 deletions(-)
>
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 3bc52e6..c20accb 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -184,7 +184,8 @@ struct vcpu *alloc_vcpu(
>       /* Must be called after making new vcpu visible to for_each_vcpu(). */
>       vcpu_check_shutdown(v);
>
> -    domain_update_node_affinity(d);
> +    if ( !is_idle_domain(d) )
> +        domain_update_node_affinity(d);
>
>       return v;
>   }
> @@ -437,7 +438,7 @@ void domain_update_node_affinity(struct domain *d)
>           return;
>       }
>
> -    online = cpupool_online_cpumask(d->cpupool);
> +    online = cpupool_domain_cpumask(d);
>
>       spin_lock(&d->node_affinity_lock);
>
> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
> index 2a2d203..a399aa6 100644
> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -664,7 +664,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>               goto maxvcpu_out;
>
>           ret = -ENOMEM;
> -        online = cpupool_online_cpumask(d->cpupool);
> +        online = cpupool_domain_cpumask(d);
>           if ( max > d->max_vcpus )
>           {
>               struct vcpu **vcpus;
> @@ -748,7 +748,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>           if ( op->cmd == XEN_DOMCTL_setvcpuaffinity )
>           {
>               cpumask_var_t new_affinity, old_affinity;
> -            cpumask_t *online = cpupool_online_cpumask(v->domain->cpupool);;
> +            cpumask_t *online = cpupool_domain_cpumask(v->domain);;
>
>               /*
>                * We want to be able to restore hard affinity if we are trying
> diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
> index cff5da9..dbe02ed 100644
> --- a/xen/common/sched_arinc653.c
> +++ b/xen/common/sched_arinc653.c
> @@ -667,7 +667,7 @@ a653sched_pick_cpu(const struct scheduler *ops, struct vcpu *vc)
>        * If present, prefer vc's current processor, else
>        * just find the first valid vcpu .
>        */
> -    online = cpupool_scheduler_cpumask(vc->domain->cpupool);
> +    online = cpupool_domain_cpumask(vc->domain);
>
>       cpu = cpumask_first(online);
>
> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
> index a1945ac..8c36635 100644
> --- a/xen/common/sched_credit.c
> +++ b/xen/common/sched_credit.c
> @@ -309,7 +309,7 @@ __runq_remove(struct csched_vcpu *svc)
>   static inline int __vcpu_has_soft_affinity(const struct vcpu *vc,
>                                              const cpumask_t *mask)
>   {
> -    return !cpumask_subset(cpupool_online_cpumask(vc->domain->cpupool),
> +    return !cpumask_subset(cpupool_domain_cpumask(vc->domain),
>                              vc->cpu_soft_affinity) &&
>              !cpumask_subset(vc->cpu_hard_affinity, vc->cpu_soft_affinity) &&
>              cpumask_intersects(vc->cpu_soft_affinity, mask);
> @@ -374,7 +374,7 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
>
>       /* cpu is vc->processor, so it must be in a cpupool. */
>       ASSERT(per_cpu(cpupool, cpu) != NULL);
> -    online = cpupool_online_cpumask(per_cpu(cpupool, cpu));
> +    online = cpupool_domain_cpumask(new->sdom->dom);
>       cpumask_and(&idle_mask, prv->idlers, online);
>       idlers_empty = cpumask_empty(&idle_mask);
>
> @@ -641,7 +641,7 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
>       int balance_step;
>
>       /* Store in cpus the mask of online cpus on which the domain can run */
> -    online = cpupool_scheduler_cpumask(vc->domain->cpupool);
> +    online = cpupool_domain_cpumask(vc->domain);
>       cpumask_and(&cpus, vc->cpu_hard_affinity, online);
>
>       for_each_csched_balance_step( balance_step )
> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index 4372486..08611c8 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -256,7 +256,7 @@ rt_dump_vcpu(const struct scheduler *ops, const struct rt_vcpu *svc)
>        */
>       mask = _cpumask_scratch[svc->vcpu->processor];
>
> -    cpupool_mask = cpupool_scheduler_cpumask(svc->vcpu->domain->cpupool);
> +    cpupool_mask = cpupool_domain_cpumask(svc->vcpu->domain);
>       cpumask_and(mask, cpupool_mask, svc->vcpu->cpu_hard_affinity);
>       cpulist_scnprintf(keyhandler_scratch, sizeof(keyhandler_scratch), mask);
>       printk("[%5d.%-2u] cpu %u, (%"PRI_stime", %"PRI_stime"),"
> @@ -673,7 +673,7 @@ rt_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
>       cpumask_t *online;
>       int cpu;
>
> -    online = cpupool_scheduler_cpumask(vc->domain->cpupool);
> +    online = cpupool_domain_cpumask(vc->domain);
>       cpumask_and(&cpus, online, vc->cpu_hard_affinity);
>
>       cpu = cpumask_test_cpu(vc->processor, &cpus)
> @@ -753,7 +753,7 @@ __runq_pick(const struct scheduler *ops, const cpumask_t *mask)
>           iter_svc = __q_elem(iter);
>
>           /* mask cpu_hard_affinity & cpupool & mask */
> -        online = cpupool_scheduler_cpumask(iter_svc->vcpu->domain->cpupool);
> +        online = cpupool_domain_cpumask(iter_svc->vcpu->domain);
>           cpumask_and(&cpu_common, online, iter_svc->vcpu->cpu_hard_affinity);
>           cpumask_and(&cpu_common, mask, &cpu_common);
>           if ( cpumask_empty(&cpu_common) )
> @@ -965,7 +965,7 @@ runq_tickle(const struct scheduler *ops, struct rt_vcpu *new)
>       if ( new == NULL || is_idle_vcpu(new->vcpu) )
>           return;
>
> -    online = cpupool_scheduler_cpumask(new->vcpu->domain->cpupool);
> +    online = cpupool_domain_cpumask(new->vcpu->domain);
>       cpumask_and(&not_tickled, online, new->vcpu->cpu_hard_affinity);
>       cpumask_andnot(&not_tickled, &not_tickled, &prv->tickled);
>
> @@ -1078,7 +1078,7 @@ rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
>
>       ASSERT(!list_empty(&prv->sdom));
>       sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem);
> -    online = cpupool_scheduler_cpumask(sdom->dom->cpupool);
> +    online = cpupool_domain_cpumask(sdom->dom);
>       snext = __runq_pick(ops, online); /* pick snext from ALL valid cpus */
>
>       runq_tickle(ops, snext);
> @@ -1113,7 +1113,7 @@ rt_context_saved(const struct scheduler *ops, struct vcpu *vc)
>
>           ASSERT(!list_empty(&prv->sdom));
>           sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem);
> -        online = cpupool_scheduler_cpumask(sdom->dom->cpupool);
> +        online = cpupool_domain_cpumask(sdom->dom);
>           snext = __runq_pick(ops, online); /* pick snext from ALL cpus */
>
>           runq_tickle(ops, snext);
> diff --git a/xen/common/sched_sedf.c b/xen/common/sched_sedf.c
> index cad5b39..396b651 100644
> --- a/xen/common/sched_sedf.c
> +++ b/xen/common/sched_sedf.c
> @@ -383,7 +383,7 @@ static int sedf_pick_cpu(const struct scheduler *ops, struct vcpu *v)
>       cpumask_t online_affinity;
>       cpumask_t *online;
>
> -    online = cpupool_scheduler_cpumask(v->domain->cpupool);
> +    online = cpupool_domain_cpumask(v->domain);
>       cpumask_and(&online_affinity, v->cpu_hard_affinity, online);
>       return cpumask_cycle(v->vcpu_id % cpumask_weight(&online_affinity) - 1,
>                            &online_affinity);
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 4ffcd98..7ad298a 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -80,7 +80,7 @@ static struct scheduler __read_mostly ops;
>
>   #define DOM2OP(_d)    (((_d)->cpupool == NULL) ? &ops : ((_d)->cpupool->sched))
>   #define VCPU2OP(_v)   (DOM2OP((_v)->domain))
> -#define VCPU2ONLINE(_v) cpupool_online_cpumask((_v)->domain->cpupool)
> +#define VCPU2ONLINE(_v) cpupool_domain_cpumask((_v)->domain)
>
>   static inline void trace_runstate_change(struct vcpu *v, int new_state)
>   {
> diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
> index 7cc25c6..20af791 100644
> --- a/xen/include/xen/sched-if.h
> +++ b/xen/include/xen/sched-if.h
> @@ -183,9 +183,17 @@ struct cpupool
>       atomic_t         refcnt;
>   };
>
> -#define cpupool_scheduler_cpumask(_pool) \
> -    (((_pool) == NULL) ? &cpupool_free_cpus : (_pool)->cpu_valid)
>   #define cpupool_online_cpumask(_pool) \
>       (((_pool) == NULL) ? &cpu_online_map : (_pool)->cpu_valid)
>
> +static inline cpumask_t* cpupool_domain_cpumask(struct domain *d)
> +{
> +    /*
> +     * d->cpupool == NULL only for the idle domain, which should
> +     * have no interest in calling into this.
> +     */
> +    ASSERT(d->cpupool != NULL);
> +    return cpupool_online_cpumask(d->cpupool);
> +}
> +
>   #endif /* __XEN_SCHED_IF_H__ */
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] xen: sched: get rid of cpupool_scheduler_cpumask()
  2015-06-26 14:08   ` Juergen Gross
@ 2015-06-26 14:57     ` Joshua Whitehead
  0 siblings, 0 replies; 22+ messages in thread
From: Joshua Whitehead @ 2015-06-26 14:57 UTC (permalink / raw)
  To: Juergen Gross, Dario Faggioli, xen-devel
  Cc: George Dunlap, Josh Whitehead, Robert VanVossen, Sisu Xi, Meng Xu

On 6/26/2015 10:08 AM, Juergen Gross wrote:
> On 06/25/2015 02:15 PM, Dario Faggioli wrote:
>> and of (almost every) direct use of cpupool_online_cpumask().
>>
>> In fact, what we really want for the most of the times,
>> is the set of valid pCPUs of the cpupool a certain domain
>> is part of. Furthermore, in case it's called with a NULL
>> pool as argument, cpupool_scheduler_cpumask() does more
>> harm than good, by returning the bitmask of free pCPUs!
>>
>> This commit, therefore:
>>   * gets rid of cpupool_scheduler_cpumask(), in favour of
>>     cpupool_domain_cpumask(), which makes it more evident
>>     what we are after, and accommodates some sanity checking;
>>   * replaces some of the calls to cpupool_online_cpumask()
>>     with calls to the new functions too.
>>
>> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
>
> Acked-by: Juergen Gross <jgross@suse.com>
>

Acked-by: Joshua Whitehead <josh.whitehead@dornerworks.com>

>> ---
>> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>> Cc: Juergen Gross <jgross@suse.com>
>> Cc: Robert VanVossen <robert.vanvossen@dornerworks.com>
>> Cc: Josh Whitehead <josh.whitehead@dornerworks.com>
>> Cc: Meng Xu <mengxu@cis.upenn.edu>
>> Cc: Sisu Xi <xisisu@gmail.com>
>> ---
>>   xen/common/domain.c         |    5 +++--
>>   xen/common/domctl.c         |    4 ++--
>>   xen/common/sched_arinc653.c |    2 +-
>>   xen/common/sched_credit.c   |    6 +++---
>>   xen/common/sched_rt.c       |   12 ++++++------
>>   xen/common/sched_sedf.c     |    2 +-
>>   xen/common/schedule.c       |    2 +-
>>   xen/include/xen/sched-if.h  |   12 ++++++++++--
>>   8 files changed, 27 insertions(+), 18 deletions(-)
>>
>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>> index 3bc52e6..c20accb 100644
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -184,7 +184,8 @@ struct vcpu *alloc_vcpu(
>>       /* Must be called after making new vcpu visible to
>> for_each_vcpu(). */
>>       vcpu_check_shutdown(v);
>>
>> -    domain_update_node_affinity(d);
>> +    if ( !is_idle_domain(d) )
>> +        domain_update_node_affinity(d);
>>
>>       return v;
>>   }
>> @@ -437,7 +438,7 @@ void domain_update_node_affinity(struct domain *d)
>>           return;
>>       }
>>
>> -    online = cpupool_online_cpumask(d->cpupool);
>> +    online = cpupool_domain_cpumask(d);
>>
>>       spin_lock(&d->node_affinity_lock);
>>
>> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
>> index 2a2d203..a399aa6 100644
>> --- a/xen/common/domctl.c
>> +++ b/xen/common/domctl.c
>> @@ -664,7 +664,7 @@ long
>> do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>>               goto maxvcpu_out;
>>
>>           ret = -ENOMEM;
>> -        online = cpupool_online_cpumask(d->cpupool);
>> +        online = cpupool_domain_cpumask(d);
>>           if ( max > d->max_vcpus )
>>           {
>>               struct vcpu **vcpus;
>> @@ -748,7 +748,7 @@ long
>> do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>>           if ( op->cmd == XEN_DOMCTL_setvcpuaffinity )
>>           {
>>               cpumask_var_t new_affinity, old_affinity;
>> -            cpumask_t *online =
>> cpupool_online_cpumask(v->domain->cpupool);;
>> +            cpumask_t *online = cpupool_domain_cpumask(v->domain);;
>>
>>               /*
>>                * We want to be able to restore hard affinity if we are
>> trying
>> diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
>> index cff5da9..dbe02ed 100644
>> --- a/xen/common/sched_arinc653.c
>> +++ b/xen/common/sched_arinc653.c
>> @@ -667,7 +667,7 @@ a653sched_pick_cpu(const struct scheduler *ops,
>> struct vcpu *vc)
>>        * If present, prefer vc's current processor, else
>>        * just find the first valid vcpu .
>>        */
>> -    online = cpupool_scheduler_cpumask(vc->domain->cpupool);
>> +    online = cpupool_domain_cpumask(vc->domain);
>>
>>       cpu = cpumask_first(online);
>>
>> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
>> index a1945ac..8c36635 100644
>> --- a/xen/common/sched_credit.c
>> +++ b/xen/common/sched_credit.c
>> @@ -309,7 +309,7 @@ __runq_remove(struct csched_vcpu *svc)
>>   static inline int __vcpu_has_soft_affinity(const struct vcpu *vc,
>>                                              const cpumask_t *mask)
>>   {
>> -    return !cpumask_subset(cpupool_online_cpumask(vc->domain->cpupool),
>> +    return !cpumask_subset(cpupool_domain_cpumask(vc->domain),
>>                              vc->cpu_soft_affinity) &&
>>              !cpumask_subset(vc->cpu_hard_affinity,
>> vc->cpu_soft_affinity) &&
>>              cpumask_intersects(vc->cpu_soft_affinity, mask);
>> @@ -374,7 +374,7 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu
>> *new)
>>
>>       /* cpu is vc->processor, so it must be in a cpupool. */
>>       ASSERT(per_cpu(cpupool, cpu) != NULL);
>> -    online = cpupool_online_cpumask(per_cpu(cpupool, cpu));
>> +    online = cpupool_domain_cpumask(new->sdom->dom);
>>       cpumask_and(&idle_mask, prv->idlers, online);
>>       idlers_empty = cpumask_empty(&idle_mask);
>>
>> @@ -641,7 +641,7 @@ _csched_cpu_pick(const struct scheduler *ops,
>> struct vcpu *vc, bool_t commit)
>>       int balance_step;
>>
>>       /* Store in cpus the mask of online cpus on which the domain can
>> run */
>> -    online = cpupool_scheduler_cpumask(vc->domain->cpupool);
>> +    online = cpupool_domain_cpumask(vc->domain);
>>       cpumask_and(&cpus, vc->cpu_hard_affinity, online);
>>
>>       for_each_csched_balance_step( balance_step )
>> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
>> index 4372486..08611c8 100644
>> --- a/xen/common/sched_rt.c
>> +++ b/xen/common/sched_rt.c
>> @@ -256,7 +256,7 @@ rt_dump_vcpu(const struct scheduler *ops, const
>> struct rt_vcpu *svc)
>>        */
>>       mask = _cpumask_scratch[svc->vcpu->processor];
>>
>> -    cpupool_mask =
>> cpupool_scheduler_cpumask(svc->vcpu->domain->cpupool);
>> +    cpupool_mask = cpupool_domain_cpumask(svc->vcpu->domain);
>>       cpumask_and(mask, cpupool_mask, svc->vcpu->cpu_hard_affinity);
>>       cpulist_scnprintf(keyhandler_scratch,
>> sizeof(keyhandler_scratch), mask);
>>       printk("[%5d.%-2u] cpu %u, (%"PRI_stime", %"PRI_stime"),"
>> @@ -673,7 +673,7 @@ rt_cpu_pick(const struct scheduler *ops, struct
>> vcpu *vc)
>>       cpumask_t *online;
>>       int cpu;
>>
>> -    online = cpupool_scheduler_cpumask(vc->domain->cpupool);
>> +    online = cpupool_domain_cpumask(vc->domain);
>>       cpumask_and(&cpus, online, vc->cpu_hard_affinity);
>>
>>       cpu = cpumask_test_cpu(vc->processor, &cpus)
>> @@ -753,7 +753,7 @@ __runq_pick(const struct scheduler *ops, const
>> cpumask_t *mask)
>>           iter_svc = __q_elem(iter);
>>
>>           /* mask cpu_hard_affinity & cpupool & mask */
>> -        online =
>> cpupool_scheduler_cpumask(iter_svc->vcpu->domain->cpupool);
>> +        online = cpupool_domain_cpumask(iter_svc->vcpu->domain);
>>           cpumask_and(&cpu_common, online,
>> iter_svc->vcpu->cpu_hard_affinity);
>>           cpumask_and(&cpu_common, mask, &cpu_common);
>>           if ( cpumask_empty(&cpu_common) )
>> @@ -965,7 +965,7 @@ runq_tickle(const struct scheduler *ops, struct
>> rt_vcpu *new)
>>       if ( new == NULL || is_idle_vcpu(new->vcpu) )
>>           return;
>>
>> -    online = cpupool_scheduler_cpumask(new->vcpu->domain->cpupool);
>> +    online = cpupool_domain_cpumask(new->vcpu->domain);
>>       cpumask_and(&not_tickled, online, new->vcpu->cpu_hard_affinity);
>>       cpumask_andnot(&not_tickled, &not_tickled, &prv->tickled);
>>
>> @@ -1078,7 +1078,7 @@ rt_vcpu_wake(const struct scheduler *ops, struct
>> vcpu *vc)
>>
>>       ASSERT(!list_empty(&prv->sdom));
>>       sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem);
>> -    online = cpupool_scheduler_cpumask(sdom->dom->cpupool);
>> +    online = cpupool_domain_cpumask(sdom->dom);
>>       snext = __runq_pick(ops, online); /* pick snext from ALL valid
>> cpus */
>>
>>       runq_tickle(ops, snext);
>> @@ -1113,7 +1113,7 @@ rt_context_saved(const struct scheduler *ops,
>> struct vcpu *vc)
>>
>>           ASSERT(!list_empty(&prv->sdom));
>>           sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem);
>> -        online = cpupool_scheduler_cpumask(sdom->dom->cpupool);
>> +        online = cpupool_domain_cpumask(sdom->dom);
>>           snext = __runq_pick(ops, online); /* pick snext from ALL
>> cpus */
>>
>>           runq_tickle(ops, snext);
>> diff --git a/xen/common/sched_sedf.c b/xen/common/sched_sedf.c
>> index cad5b39..396b651 100644
>> --- a/xen/common/sched_sedf.c
>> +++ b/xen/common/sched_sedf.c
>> @@ -383,7 +383,7 @@ static int sedf_pick_cpu(const struct scheduler
>> *ops, struct vcpu *v)
>>       cpumask_t online_affinity;
>>       cpumask_t *online;
>>
>> -    online = cpupool_scheduler_cpumask(v->domain->cpupool);
>> +    online = cpupool_domain_cpumask(v->domain);
>>       cpumask_and(&online_affinity, v->cpu_hard_affinity, online);
>>       return cpumask_cycle(v->vcpu_id %
>> cpumask_weight(&online_affinity) - 1,
>>                            &online_affinity);
>> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
>> index 4ffcd98..7ad298a 100644
>> --- a/xen/common/schedule.c
>> +++ b/xen/common/schedule.c
>> @@ -80,7 +80,7 @@ static struct scheduler __read_mostly ops;
>>
>>   #define DOM2OP(_d)    (((_d)->cpupool == NULL) ? &ops :
>> ((_d)->cpupool->sched))
>>   #define VCPU2OP(_v)   (DOM2OP((_v)->domain))
>> -#define VCPU2ONLINE(_v) cpupool_online_cpumask((_v)->domain->cpupool)
>> +#define VCPU2ONLINE(_v) cpupool_domain_cpumask((_v)->domain)
>>
>>   static inline void trace_runstate_change(struct vcpu *v, int new_state)
>>   {
>> diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
>> index 7cc25c6..20af791 100644
>> --- a/xen/include/xen/sched-if.h
>> +++ b/xen/include/xen/sched-if.h
>> @@ -183,9 +183,17 @@ struct cpupool
>>       atomic_t         refcnt;
>>   };
>>
>> -#define cpupool_scheduler_cpumask(_pool) \
>> -    (((_pool) == NULL) ? &cpupool_free_cpus : (_pool)->cpu_valid)
>>   #define cpupool_online_cpumask(_pool) \
>>       (((_pool) == NULL) ? &cpu_online_map : (_pool)->cpu_valid)
>>
>> +static inline cpumask_t* cpupool_domain_cpumask(struct domain *d)
>> +{
>> +    /*
>> +     * d->cpupool == NULL only for the idle domain, which should
>> +     * have no interest in calling into this.
>> +     */
>> +    ASSERT(d->cpupool != NULL);
>> +    return cpupool_online_cpumask(d->cpupool);
>> +}
>> +
>>   #endif /* __XEN_SCHED_IF_H__ */
>>
>>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] xen: sched: get rid of cpupool_scheduler_cpumask()
  2015-06-25 12:15 ` [PATCH 4/4] xen: sched: get rid of cpupool_scheduler_cpumask() Dario Faggioli
  2015-06-26 14:08   ` Juergen Gross
@ 2015-06-27 19:21   ` Meng Xu
  2015-07-02 15:39   ` George Dunlap
  2 siblings, 0 replies; 22+ messages in thread
From: Meng Xu @ 2015-06-27 19:21 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Juergen Gross, Sisu Xi, George Dunlap, Robert VanVossen,
	Josh Whitehead, Meng Xu, xen-devel

2015-06-25 5:15 GMT-07:00 Dario Faggioli <dario.faggioli@citrix.com>:
> and of (almost every) direct use of cpupool_online_cpumask().
>
> In fact, what we really want for the most of the times,
> is the set of valid pCPUs of the cpupool a certain domain
> is part of. Furthermore, in case it's called with a NULL
> pool as argument, cpupool_scheduler_cpumask() does more
> harm than good, by returning the bitmask of free pCPUs!
>
> This commit, therefore:
>  * gets rid of cpupool_scheduler_cpumask(), in favour of
>    cpupool_domain_cpumask(), which makes it more evident
>    what we are after, and accommodates some sanity checking;
>  * replaces some of the calls to cpupool_online_cpumask()
>    with calls to the new functions too.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Robert VanVossen <robert.vanvossen@dornerworks.com>
> Cc: Josh Whitehead <josh.whitehead@dornerworks.com>
> Cc: Meng Xu <mengxu@cis.upenn.edu>
> Cc: Sisu Xi <xisisu@gmail.com>
> ---
>  xen/common/domain.c         |    5 +++--
>  xen/common/domctl.c         |    4 ++--
>  xen/common/sched_arinc653.c |    2 +-
>  xen/common/sched_credit.c   |    6 +++---
>  xen/common/sched_rt.c       |   12 ++++++------
>  xen/common/sched_sedf.c     |    2 +-
>  xen/common/schedule.c       |    2 +-
>  xen/include/xen/sched-if.h  |   12 ++++++++++--
>  8 files changed, 27 insertions(+), 18 deletions(-)

As to xen/common/sched_rt.c:

Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>

Thanks,

Meng


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] xen: sched: avoid dumping duplicate information
  2015-06-25 12:15 ` [PATCH 1/4] xen: sched: avoid dumping duplicate information Dario Faggioli
  2015-06-26 13:43   ` Juergen Gross
@ 2015-07-02 11:18   ` George Dunlap
  1 sibling, 0 replies; 22+ messages in thread
From: George Dunlap @ 2015-07-02 11:18 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Juergen Gross, xen-devel

On Thu, Jun 25, 2015 at 1:15 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> When dumping scheduling information (debug key 'r'), what
> we print as 'Idle cpupool' is pretty much the same of what
> we print immediately after as 'Cpupool0'. In fact, if there
> are no pCPUs outside of any cpupools, it is exactly the
> same.
>
> If there are free pCPUs, there is some valuable information,
> but still a lot of duplication:
>
>  (XEN) Online Cpus: 0-15
>  (XEN) Free Cpus: 8
>  (XEN) Idle cpupool:
>  (XEN) Scheduler: SMP Credit Scheduler (credit)
>  (XEN) info:
>  (XEN)   ncpus              = 13
>  (XEN)   master             = 0
>  (XEN)   credit             = 3900
>  (XEN)   credit balance     = 45
>  (XEN)   weight             = 1280
>  (XEN)   runq_sort          = 11820
>  (XEN)   default-weight     = 256
>  (XEN)   tslice             = 30ms
>  (XEN)   ratelimit          = 1000us
>  (XEN)   credits per msec   = 10
>  (XEN)   ticks per tslice   = 3
>  (XEN)   migration delay    = 0us
>  (XEN) idlers: 00000000,00006d29
>  (XEN) active vcpus:
>  (XEN)     1: [1.7] pri=-1 flags=0 cpu=15 credit=-116 [w=256,cap=0] (84+300) {a/i=22/21 m=18+5 (k=0)}
>  (XEN)     2: [1.3] pri=0 flags=0 cpu=1 credit=-113 [w=256,cap=0] (87+300) {a/i=37/36 m=11+544 (k=0)}
>  (XEN)     3: [0.15] pri=-1 flags=0 cpu=4 credit=95 [w=256,cap=0] (210+300) {a/i=127/126 m=108+9 (k=0)}
>  (XEN)     4: [0.10] pri=-2 flags=0 cpu=12 credit=-287 [w=256,cap=0] (-84+300) {a/i=163/162 m=36+568 (k=0)}
>  (XEN)     5: [0.7] pri=-2 flags=0 cpu=2 credit=-242 [w=256,cap=0] (-42+300) {a/i=129/128 m=16+50 (k=0)}
>  (XEN) CPU[08]  sort=5791, sibling=00000000,00000300, core=00000000,0000ff00
>  (XEN)   run: [32767.8] pri=-64 flags=0 cpu=8
>  (XEN) Cpupool 0:
>  (XEN) Cpus: 0-5,10-15
>  (XEN) Scheduler: SMP Credit Scheduler (credit)
>  (XEN) info:
>  (XEN)   ncpus              = 13
>  (XEN)   master             = 0
>  (XEN)   credit             = 3900
>  (XEN)   credit balance     = 45
>  (XEN)   weight             = 1280
>  (XEN)   runq_sort          = 11820
>  (XEN)   default-weight     = 256
>  (XEN)   tslice             = 30ms
>  (XEN)   ratelimit          = 1000us
>  (XEN)   credits per msec   = 10
>  (XEN)   ticks per tslice   = 3
>  (XEN)   migration delay    = 0us
>  (XEN) idlers: 00000000,00006d29
>  (XEN) active vcpus:
>  (XEN)     1: [1.7] pri=-1 flags=0 cpu=15 credit=-116 [w=256,cap=0] (84+300) {a/i=22/21 m=18+5 (k=0)}
>  (XEN)     2: [1.3] pri=0 flags=0 cpu=1 credit=-113 [w=256,cap=0] (87+300) {a/i=37/36 m=11+544 (k=0)}
>  (XEN)     3: [0.15] pri=-1 flags=0 cpu=4 credit=95 [w=256,cap=0] (210+300) {a/i=127/126 m=108+9 (k=0)}
>  (XEN)     4: [0.10] pri=-2 flags=0 cpu=12 credit=-287 [w=256,cap=0] (-84+300) {a/i=163/162 m=36+568 (k=0)}
>  (XEN)     5: [0.7] pri=-2 flags=0 cpu=2 credit=-242 [w=256,cap=0] (-42+300) {a/i=129/128 m=16+50 (k=0)}
>  (XEN) CPU[00]  sort=11801, sibling=00000000,00000003, core=00000000,000000ff
>  (XEN)   run: [32767.0] pri=-64 flags=0 cpu=0
>  ... ... ...
>  (XEN) CPU[15]  sort=11820, sibling=00000000,0000c000, core=00000000,0000ff00
>  (XEN)   run: [1.7] pri=-1 flags=0 cpu=15 credit=-116 [w=256,cap=0] (84+300) {a/i=22/21 m=18+5 (k=0)}
>  (XEN)     1: [32767.15] pri=-64 flags=0 cpu=15
>  (XEN) Cpupool 1:
>  (XEN) Cpus: 6-7,9
>  (XEN) Scheduler: SMP RTDS Scheduler (rtds)
>  (XEN) CPU[06]
>  (XEN) CPU[07]
>  (XEN) CPU[09]
>
> With this change, we get rid of the redundancy, and retain
> only the information about the free pCPUs.
>
> (While there, turn a loop index variable from `int' to
> `unsigned int' in schedule_dump().)
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

> ---
> Cc: Juergen Gross <jgross@suse.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> ---
>  xen/common/cpupool.c  |    6 +++---
>  xen/common/schedule.c |   18 +++++++++++++-----
>  2 files changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
> index 563864d..5471f93 100644
> --- a/xen/common/cpupool.c
> +++ b/xen/common/cpupool.c
> @@ -728,10 +728,10 @@ void dump_runq(unsigned char key)
>
>      print_cpumap("Online Cpus", &cpu_online_map);
>      if ( !cpumask_empty(&cpupool_free_cpus) )
> +    {
>          print_cpumap("Free Cpus", &cpupool_free_cpus);
> -
> -    printk("Idle cpupool:\n");
> -    schedule_dump(NULL);
> +        schedule_dump(NULL);
> +    }
>
>      for_each_cpupool(c)
>      {
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index ecf1545..4ffcd98 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -1473,16 +1473,24 @@ void scheduler_free(struct scheduler *sched)
>
>  void schedule_dump(struct cpupool *c)
>  {
> -    int               i;
> +    unsigned int      i;
>      struct scheduler *sched;
>      cpumask_t        *cpus;
>
>      /* Locking, if necessary, must be handled withing each scheduler */
>
> -    sched = (c == NULL) ? &ops : c->sched;
> -    cpus = cpupool_scheduler_cpumask(c);
> -    printk("Scheduler: %s (%s)\n", sched->name, sched->opt_name);
> -    SCHED_OP(sched, dump_settings);
> +    if ( c != NULL )
> +    {
> +        sched = c->sched;
> +        cpus = c->cpu_valid;
> +        printk("Scheduler: %s (%s)\n", sched->name, sched->opt_name);
> +        SCHED_OP(sched, dump_settings);
> +    }
> +    else
> +    {
> +        sched = &ops;
> +        cpus = &cpupool_free_cpus;
> +    }
>
>      for_each_cpu (i, cpus)
>      {
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] xen: credit1: properly deal with pCPUs not in any cpupool
  2015-06-25 12:15 ` [PATCH 3/4] xen: credit1: properly deal with pCPUs not in any cpupool Dario Faggioli
  2015-06-26 14:05   ` Juergen Gross
@ 2015-07-02 15:24   ` George Dunlap
  2015-07-02 16:01     ` Dario Faggioli
  1 sibling, 1 reply; 22+ messages in thread
From: George Dunlap @ 2015-07-02 15:24 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Juergen Gross, xen-devel

On Thu, Jun 25, 2015 at 1:15 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> Ideally, the pCPUs that are 'free', i.e., not assigned
> to any cpupool, should not be considred by the scheduler
> for load balancing or anything. In Credit1, we fail at
> this, because of how we use cpupool_scheduler_cpumask().
> In fact, for a free pCPU, cpupool_scheduler_cpumask()
> returns a pointer to cpupool_free_cpus, and hence, near
> the top of csched_load_balance():
>
>  if ( unlikely(!cpumask_test_cpu(cpu, online)) )
>      goto out;
>
> is false (the pCPU _is_ free!), and we therefore do not
> jump to the end right away, as we should. This, causes
> the following splat when resuming from ACPI S3 with
> pCPUs not assigned to any pool:

What I haven't quite twigged yet with regard to this specific bug is
why csched_load_balance() is being run on this cpu at all if it hasn't
been added to the cpupool yet.  AFAICT it's only called from
csched_schedule() -- why is that being called on a cpu that isn't
online yet?  If we can't fix that before the code freeze (or if we
wouldn't want to avoid it anyway), would it make more sense to check
for that condition in csched_schedule() and just return the idle
domain?

(Or schedule() even?)

 -George

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] xen: sched: get rid of cpupool_scheduler_cpumask()
  2015-06-25 12:15 ` [PATCH 4/4] xen: sched: get rid of cpupool_scheduler_cpumask() Dario Faggioli
  2015-06-26 14:08   ` Juergen Gross
  2015-06-27 19:21   ` Meng Xu
@ 2015-07-02 15:39   ` George Dunlap
  2015-07-03  7:48     ` Dario Faggioli
  2 siblings, 1 reply; 22+ messages in thread
From: George Dunlap @ 2015-07-02 15:39 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Juergen Gross, Sisu Xi, Robert VanVossen, Josh Whitehead,
	Meng Xu, xen-devel

On Thu, Jun 25, 2015 at 1:15 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> and of (almost every) direct use of cpupool_online_cpumask().
>
> In fact, what we really want for the most of the times,
> is the set of valid pCPUs of the cpupool a certain domain
> is part of. Furthermore, in case it's called with a NULL
> pool as argument, cpupool_scheduler_cpumask() does more
> harm than good, by returning the bitmask of free pCPUs!
>
> This commit, therefore:
>  * gets rid of cpupool_scheduler_cpumask(), in favour of
>    cpupool_domain_cpumask(), which makes it more evident
>    what we are after, and accommodates some sanity checking;
>  * replaces some of the calls to cpupool_online_cpumask()
>    with calls to the new functions too.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Robert VanVossen <robert.vanvossen@dornerworks.com>
> Cc: Josh Whitehead <josh.whitehead@dornerworks.com>
> Cc: Meng Xu <mengxu@cis.upenn.edu>
> Cc: Sisu Xi <xisisu@gmail.com>
> ---
>  xen/common/domain.c         |    5 +++--
>  xen/common/domctl.c         |    4 ++--
>  xen/common/sched_arinc653.c |    2 +-
>  xen/common/sched_credit.c   |    6 +++---
>  xen/common/sched_rt.c       |   12 ++++++------
>  xen/common/sched_sedf.c     |    2 +-
>  xen/common/schedule.c       |    2 +-
>  xen/include/xen/sched-if.h  |   12 ++++++++++--
>  8 files changed, 27 insertions(+), 18 deletions(-)
>

> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
> index a1945ac..8c36635 100644
> --- a/xen/common/sched_credit.c
> +++ b/xen/common/sched_credit.c
> @@ -309,7 +309,7 @@ __runq_remove(struct csched_vcpu *svc)
>  static inline int __vcpu_has_soft_affinity(const struct vcpu *vc,
>                                             const cpumask_t *mask)
>  {
> -    return !cpumask_subset(cpupool_online_cpumask(vc->domain->cpupool),
> +    return !cpumask_subset(cpupool_domain_cpumask(vc->domain),
>                             vc->cpu_soft_affinity) &&
>             !cpumask_subset(vc->cpu_hard_affinity, vc->cpu_soft_affinity) &&
>             cpumask_intersects(vc->cpu_soft_affinity, mask);
> @@ -374,7 +374,7 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
>
>      /* cpu is vc->processor, so it must be in a cpupool. */
>      ASSERT(per_cpu(cpupool, cpu) != NULL);
> -    online = cpupool_online_cpumask(per_cpu(cpupool, cpu));
> +    online = cpupool_domain_cpumask(new->sdom->dom);

Two comments in passing here (no action required):

1. Looks like passing both cpu and svc to this function is a bit
redundant, as the function can just look up new->vcpu->processor
itself

2. After this patch, the ASSERT there will be redundant, as there's
already an identical assert in cpupool_domain_cpumask()

Those won't hurt, but if you end up respinning you might think about
at least taking the ASSERT out.

> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 4ffcd98..7ad298a 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -80,7 +80,7 @@ static struct scheduler __read_mostly ops;
>
>  #define DOM2OP(_d)    (((_d)->cpupool == NULL) ? &ops : ((_d)->cpupool->sched))
>  #define VCPU2OP(_v)   (DOM2OP((_v)->domain))
> -#define VCPU2ONLINE(_v) cpupool_online_cpumask((_v)->domain->cpupool)
> +#define VCPU2ONLINE(_v) cpupool_domain_cpumask((_v)->domain)
>
>  static inline void trace_runstate_change(struct vcpu *v, int new_state)
>  {
> diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
> index 7cc25c6..20af791 100644
> --- a/xen/include/xen/sched-if.h
> +++ b/xen/include/xen/sched-if.h
> @@ -183,9 +183,17 @@ struct cpupool
>      atomic_t         refcnt;
>  };
>
> -#define cpupool_scheduler_cpumask(_pool) \
> -    (((_pool) == NULL) ? &cpupool_free_cpus : (_pool)->cpu_valid)
>  #define cpupool_online_cpumask(_pool) \
>      (((_pool) == NULL) ? &cpu_online_map : (_pool)->cpu_valid)
>
> +static inline cpumask_t* cpupool_domain_cpumask(struct domain *d)
> +{
> +    /*
> +     * d->cpupool == NULL only for the idle domain, which should
> +     * have no interest in calling into this.
> +     */
> +    ASSERT(d->cpupool != NULL);
> +    return cpupool_online_cpumask(d->cpupool);
> +}

It's a bit strange to assert that d->cpupool != NULL, and then call a
macro whose return value only depends on whether d->cpupool is NULL or
not.  Why not just return d->cpupool->cpu_valid?

Thanks for tracking these down, BTW!

 -George

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] xen: credit1: properly deal with pCPUs not in any cpupool
  2015-07-02 15:24   ` George Dunlap
@ 2015-07-02 16:01     ` Dario Faggioli
  2015-07-02 16:14       ` George Dunlap
  0 siblings, 1 reply; 22+ messages in thread
From: Dario Faggioli @ 2015-07-02 16:01 UTC (permalink / raw)
  To: George Dunlap; +Cc: Juergen Gross, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2987 bytes --]

On Thu, 2015-07-02 at 16:24 +0100, George Dunlap wrote:
> On Thu, Jun 25, 2015 at 1:15 PM, Dario Faggioli
> <dario.faggioli@citrix.com> wrote:
> > Ideally, the pCPUs that are 'free', i.e., not assigned
> > to any cpupool, should not be considred by the scheduler
> > for load balancing or anything. In Credit1, we fail at
> > this, because of how we use cpupool_scheduler_cpumask().
> > In fact, for a free pCPU, cpupool_scheduler_cpumask()
> > returns a pointer to cpupool_free_cpus, and hence, near
> > the top of csched_load_balance():
> >
> >  if ( unlikely(!cpumask_test_cpu(cpu, online)) )
> >      goto out;
> >
> > is false (the pCPU _is_ free!), and we therefore do not
> > jump to the end right away, as we should. This, causes
> > the following splat when resuming from ACPI S3 with
> > pCPUs not assigned to any pool:
> 
> What I haven't quite twigged yet with regard to this specific bug is
> why csched_load_balance() is being run on this cpu at all if it hasn't
> been added to the cpupool yet.  
>
Because the cpu is online already. That happens in start_secondary(),
right after having notified the CPU_STARTING phase of CPU bringup.

OTOH, adding a cpu to a pool only happens during the CPU_ONLINE phase,
which is after that.

> AFAICT it's only called from
> csched_schedule() -- why is that being called on a cpu that isn't
> online yet?  
>
Because, sadly, it is online. :-/

> If we can't fix that before the code freeze (or if we
> wouldn't want to avoid it anyway), would it make more sense to check
> for that condition in csched_schedule() and just return the idle
> domain?
>
I tried to move things around (i.e., moving the assignment to a cpupool
ahead in the bringup process), but that introduces irq-safe vs. non
irq-safe locking issues (ISTR that was on the spin locks protecting the
memory being allocated from inside schedule_cpu_switch(), which is
called by the cpu to cpupool assignment routine).

So, yeah, it's something I also don't like much, but it's hard to fix
properly, even without considering 4.6's freeze. :-/

> (Or schedule() even?)
> 
Perhaps, but:
 - this is (I think) pretty orthogonal wrt this patch,
 - I tried something like that as well, but then I did not like having a
   check for "is this cpu outside of any cpupool?" in such code, and
   going through it all the time (and this is an hot path!), for the
   sake of a corner case (as I consider having cpus outside of any
   cpupool a transient situation, as it is system start).

The approach implemented seemed the best one in term of both how the
code looks and performs for the vast majority of use cases and of the
time.

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] xen: credit1: properly deal with pCPUs not in any cpupool
  2015-07-02 16:01     ` Dario Faggioli
@ 2015-07-02 16:14       ` George Dunlap
  0 siblings, 0 replies; 22+ messages in thread
From: George Dunlap @ 2015-07-02 16:14 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Juergen Gross, xen-devel

On Thu, Jul 2, 2015 at 5:01 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> On Thu, 2015-07-02 at 16:24 +0100, George Dunlap wrote:
>> On Thu, Jun 25, 2015 at 1:15 PM, Dario Faggioli
>> <dario.faggioli@citrix.com> wrote:
>> > Ideally, the pCPUs that are 'free', i.e., not assigned
>> > to any cpupool, should not be considred by the scheduler
>> > for load balancing or anything. In Credit1, we fail at
>> > this, because of how we use cpupool_scheduler_cpumask().
>> > In fact, for a free pCPU, cpupool_scheduler_cpumask()
>> > returns a pointer to cpupool_free_cpus, and hence, near
>> > the top of csched_load_balance():
>> >
>> >  if ( unlikely(!cpumask_test_cpu(cpu, online)) )
>> >      goto out;
>> >
>> > is false (the pCPU _is_ free!), and we therefore do not
>> > jump to the end right away, as we should. This, causes
>> > the following splat when resuming from ACPI S3 with
>> > pCPUs not assigned to any pool:
>>
>> What I haven't quite twigged yet with regard to this specific bug is
>> why csched_load_balance() is being run on this cpu at all if it hasn't
>> been added to the cpupool yet.
>>
> Because the cpu is online already. That happens in start_secondary(),
> right after having notified the CPU_STARTING phase of CPU bringup.
>
> OTOH, adding a cpu to a pool only happens during the CPU_ONLINE phase,
> which is after that.
>
>> AFAICT it's only called from
>> csched_schedule() -- why is that being called on a cpu that isn't
>> online yet?
>>
> Because, sadly, it is online. :-/
>
>> If we can't fix that before the code freeze (or if we
>> wouldn't want to avoid it anyway), would it make more sense to check
>> for that condition in csched_schedule() and just return the idle
>> domain?
>>
> I tried to move things around (i.e., moving the assignment to a cpupool
> ahead in the bringup process), but that introduces irq-safe vs. non
> irq-safe locking issues (ISTR that was on the spin locks protecting the
> memory being allocated from inside schedule_cpu_switch(), which is
> called by the cpu to cpupool assignment routine).
>
> So, yeah, it's something I also don't like much, but it's hard to fix
> properly, even without considering 4.6's freeze. :-/
>
>> (Or schedule() even?)
>>
> Perhaps, but:
>  - this is (I think) pretty orthogonal wrt this patch,
>  - I tried something like that as well, but then I did not like having a
>    check for "is this cpu outside of any cpupool?" in such code, and
>    going through it all the time (and this is an hot path!), for the
>    sake of a corner case (as I consider having cpus outside of any
>    cpupool a transient situation, as it is system start).
>
> The approach implemented seemed the best one in term of both how the
> code looks and performs for the vast majority of use cases and of the
> time.

Fair enough:

Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] xen: sched: get rid of cpupool_scheduler_cpumask()
  2015-07-02 15:39   ` George Dunlap
@ 2015-07-03  7:48     ` Dario Faggioli
  0 siblings, 0 replies; 22+ messages in thread
From: Dario Faggioli @ 2015-07-03  7:48 UTC (permalink / raw)
  To: George Dunlap
  Cc: Juergen Gross, Sisu Xi, Robert VanVossen, Josh Whitehead,
	Meng Xu, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3864 bytes --]

On Thu, 2015-07-02 at 16:39 +0100, George Dunlap wrote:
> On Thu, Jun 25, 2015 at 1:15 PM, Dario Faggioli

> > diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
> > index a1945ac..8c36635 100644
> > --- a/xen/common/sched_credit.c
> > +++ b/xen/common/sched_credit.c

> > @@ -374,7 +374,7 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
> >
> >      /* cpu is vc->processor, so it must be in a cpupool. */
> >      ASSERT(per_cpu(cpupool, cpu) != NULL);
> > -    online = cpupool_online_cpumask(per_cpu(cpupool, cpu));
> > +    online = cpupool_domain_cpumask(new->sdom->dom);
> 
> Two comments in passing here (no action required):
> 
> 1. Looks like passing both cpu and svc to this function is a bit
> redundant, as the function can just look up new->vcpu->processor
> itself
> 
Indeed! It's not only redundant, it's disturbing, IMO. In fact, ever
time I look at it, I wander what cpu is, because, if it just was
->processor, it wouldn't be necessary for it to be a parameter... then I
smack my head and remember that, yes, it's _that_ function!

So, yes, I'm all for killing it!

> 2. After this patch, the ASSERT there will be redundant, as there's
> already an identical assert in cpupool_domain_cpumask()
> 
> Those won't hurt, but if you end up respinning you might think about
> at least taking the ASSERT out.
> 
Right, I'll take it out.

> > diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> > index 4ffcd98..7ad298a 100644
> > --- a/xen/common/schedule.c
> > +++ b/xen/common/schedule.c
> > @@ -80,7 +80,7 @@ static struct scheduler __read_mostly ops;
> >
> >  #define DOM2OP(_d)    (((_d)->cpupool == NULL) ? &ops : ((_d)->cpupool->sched))
> >  #define VCPU2OP(_v)   (DOM2OP((_v)->domain))
> > -#define VCPU2ONLINE(_v) cpupool_online_cpumask((_v)->domain->cpupool)
> > +#define VCPU2ONLINE(_v) cpupool_domain_cpumask((_v)->domain)
> >
> >  static inline void trace_runstate_change(struct vcpu *v, int new_state)
> >  {
> > diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
> > index 7cc25c6..20af791 100644
> > --- a/xen/include/xen/sched-if.h
> > +++ b/xen/include/xen/sched-if.h
> > @@ -183,9 +183,17 @@ struct cpupool
> >      atomic_t         refcnt;
> >  };
> >
> > -#define cpupool_scheduler_cpumask(_pool) \
> > -    (((_pool) == NULL) ? &cpupool_free_cpus : (_pool)->cpu_valid)
> >  #define cpupool_online_cpumask(_pool) \
> >      (((_pool) == NULL) ? &cpu_online_map : (_pool)->cpu_valid)
> >
> > +static inline cpumask_t* cpupool_domain_cpumask(struct domain *d)
> > +{
> > +    /*
> > +     * d->cpupool == NULL only for the idle domain, which should
> > +     * have no interest in calling into this.
> > +     */
> > +    ASSERT(d->cpupool != NULL);
> > +    return cpupool_online_cpumask(d->cpupool);
> > +}
> 
> It's a bit strange to assert that d->cpupool != NULL, and then call a
> macro whose return value only depends on whether d->cpupool is NULL or
> not.  Why not just return d->cpupool->cpu_valid?
> 
Yes, you're right. I think this is because my original intent was to
replace cpupool_scheduler_cpumask() with *something* built on top of
cpupool_online_cpumask(), and I retained this design, even when it
became something that could just be 'return d->cpupool->cpu_valid'! :-)

I'll change this.

> Thanks for tracking these down, BTW!
> 
It was a bit of a nightmare, actually, to (1) figure out what was going
on, and (2) come up with a satisfying fix... But, yes, it was well
worth. :-)

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2015-07-03  7:48 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-25 12:15 [PATCH 0/4] xen: sched / cpupool: fixes and improvements, mostly for when suspend/resume is involved Dario Faggioli
2015-06-25 12:15 ` [PATCH 1/4] xen: sched: avoid dumping duplicate information Dario Faggioli
2015-06-26 13:43   ` Juergen Gross
2015-07-02 11:18   ` George Dunlap
2015-06-25 12:15 ` [PATCH 2/4] xen: x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown Dario Faggioli
2015-06-25 14:20   ` Andrew Cooper
2015-06-25 15:04     ` Dario Faggioli
2015-06-25 15:52       ` Andrew Cooper
2015-06-25 16:13         ` Dario Faggioli
2015-06-25 16:39           ` Andrew Cooper
2015-06-26 13:54   ` Juergen Gross
2015-06-25 12:15 ` [PATCH 3/4] xen: credit1: properly deal with pCPUs not in any cpupool Dario Faggioli
2015-06-26 14:05   ` Juergen Gross
2015-07-02 15:24   ` George Dunlap
2015-07-02 16:01     ` Dario Faggioli
2015-07-02 16:14       ` George Dunlap
2015-06-25 12:15 ` [PATCH 4/4] xen: sched: get rid of cpupool_scheduler_cpumask() Dario Faggioli
2015-06-26 14:08   ` Juergen Gross
2015-06-26 14:57     ` Joshua Whitehead
2015-06-27 19:21   ` Meng Xu
2015-07-02 15:39   ` George Dunlap
2015-07-03  7:48     ` Dario Faggioli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).